azure-architecture-review
Azure Architecture Review Skill
This skill reviews Terraform code for Azure resources against Microsoft's Cloud Adoption Framework (CAF) and Well-Architected Framework (WAF). It analyzes .tf files to ensure your infrastructure-as-code follows Microsoft's frameworks.
When to Use This Skill
- Reviewing Terraform code - Check
.tffiles against CAF/WAF before deployment - Pull request reviews - Validate infrastructure-as-code changes
- Architecture validation - Ensure Terraform designs follow Microsoft frameworks
- Before terraform apply - Catch compliance issues early in development
- Code refactoring - Align existing Terraform with CAF/WAF best practices
- Compliance audits - Review infrastructure-as-code for governance
Applicable to ALL Azure Resources
This skill works for:
- Networking: Virtual Networks, Firewalls, VPN Gateways, Load Balancers
- Compute: Virtual Machines, App Service, AKS, Container Instances
- Storage: Storage Accounts, Managed Disks, File Shares
- Databases: SQL Database, Cosmos DB, PostgreSQL, MySQL
- Security: Key Vault, Managed Identity, Private Endpoints
- Monitoring: Log Analytics, Application Insights, Alerts
- And more: Any Azure service with Terraform support
Prerequisites
Required MCP Tools:
mcp_azure_mcp_documentation- Search Microsoft Learn documentationmcp_azure_mcp_azureterraformbestpractices- Get Terraform-specific Azure guidance
Context Needed:
- Understanding of the infrastructure being reviewed
- Business requirements (SLA, security, compliance)
- Current state (greenfield vs brownfield)
Review Methodology
Step 1: Get Terraform Best Practices (MANDATORY)
Always call this FIRST before generating or reviewing Terraform code:
# MCP Command
mcp_azure_mcp_azureterraformbestpractices
command: get
intent: "Get Azure Terraform best practices before code review"
Returns: Terraform validation workflow (validate → plan → apply), Azure-specific patterns, security defaults, naming conventions
Step 2: Search Cloud Adoption Framework Documentation
Query Pattern: "Cloud Adoption Framework {resource-type} {design-area}"
Examples:
# Networking
query: "Cloud Adoption Framework hub spoke network topology"
# Storage
query: "Cloud Adoption Framework storage account security encryption"
# Compute
query: "Cloud Adoption Framework App Service landing zone networking"
# Databases
query: "Cloud Adoption Framework Azure SQL Database security"
# Security
query: "Cloud Adoption Framework Key Vault secrets management"
Key CAF Design Areas (apply to ALL resources):
- Resource Organization - Naming, resource groups, subscriptions
- Security - Authentication, encryption, network isolation, RBAC
- Network Topology - VNet integration, private endpoints, connectivity
- Identity & Access - Managed identities, RBAC, Azure AD
- Governance - Tags, Azure Policy, diagnostics, cost tracking
Step 3: Search Well-Architected Framework Documentation
Query Pattern: "Well-Architected Framework {pillar} {service-name} {concern}"
Generic Pillar Queries:
# Reliability
query: "Well-Architected Framework reliability availability zones disaster recovery"
# Security
query: "Well-Architected Framework security encryption network access control"
# Cost Optimization
query: "Well-Architected Framework cost optimization pricing SKU right-sizing"
# Operational Excellence
query: "Well-Architected Framework operational excellence monitoring diagnostics"
# Performance Efficiency
query: "Well-Architected Framework performance efficiency scalability throughput"
Service-Specific Pattern: "{Service} Well-Architected {pillar}"
# Examples
query: "Storage Account Well-Architected security encryption"
query: "App Service Well-Architected reliability availability zones"
query: "Azure SQL Database Well-Architected reliability high availability"
query: "Key Vault Well-Architected security access policies"
Step 4: Service-Specific Guidance
Search for service-specific WAF guidance:
# Pattern: "{Azure Service} Well-Architected best practices"
query: "Azure Firewall Well-Architected best practices"
query: "VPN Gateway Well-Architected reliability"
query: "Virtual Machines Well-Architected best practices"
query: "Azure Kubernetes Service Well-Architected security"
Validation Framework
Universal Checklist (ALL Resources)
CAF Validation:
- Naming convention:
{type}-{workload}-{env}-{region}-{instance} - Required tags: environment, project, owner, cost-center, managed-by
- Resource organization: appropriate subscription/resource group
- Documentation: README with usage examples
WAF Reliability:
- Zone redundancy or appropriate SLA
- Backup/recovery strategy
- Health monitoring configured
WAF Security:
- Managed identity (not service principal)
- Private endpoint (no public access)
- Encryption at rest and in transit
- RBAC with least privilege
- No hardcoded secrets
WAF Cost:
- Right-sized SKU for workload
- Appropriate replication/redundancy level
- Cost tags for tracking
WAF Operations:
- Diagnostic settings enabled
- Log Analytics integration
- Infrastructure as Code (Terraform)
- Input validation
WAF Performance:
- SKU meets performance requirements
- Scaling configured appropriately
- Optimised for workload patterns
Note: Detailed resource-specific validation checklists are in references/REFERENCE.md
Review Output Template
Summary
- Architecture Type: [Hub-Spoke / Virtual WAN / Other]
- Environment: [Production / Non-Production]
- Overall Compliance Score: [X/100]
Framework Alignment
CAF Score: [X%] ✅ / ⚠️ / ❌
- Network Topology: [findings]
- Naming Conventions: [findings]
- Resource Organization: [findings]
- Tagging Strategy: [findings]
WAF Pillar Scores
| Pillar | Score | Status | Key Findings |
|---|---|---|---|
| Reliability | X% | ✅/⚠️/❌ | [summary] |
| Security | X% | ✅/⚠️/❌ | [summary] |
| Cost Optimization | X% | ✅/⚠️/❌ | [summary] |
| Operational Excellence | X% | ✅/⚠️/❌ | [summary] |
| Performance Efficiency | X% | ✅/⚠️/❌ | [summary] |
Recommended Actions
High Priority
- [Issue] - [Recommendation with code example]
Reference: [Link to Microsoft docs]# Current (❌) [bad code] # Recommended (✅) [good code]
Medium Priority
- [Enhancement] - [Recommendation]
Low Priority
- [Optional Enhancement] - [Recommendation]
Example: Storage Account Review
## Review Request
Review my storage account Terraform for CAF/WAF compliance.
## Steps Taken
1. ✅ Called `azureterraformbestpractices get`
2. ✅ Searched CAF: "Cloud Adoption Framework storage account security"
3. ✅ Searched WAF: "Storage Account Well-Architected security encryption"
## Overall Score: 75/100
### Framework Scores
- **CAF**: 85% ✅ (naming correct, tags present)
- **WAF Reliability**: 60% ⚠️ (LRS - not zone-redundant)
- **WAF Security**: 70% ⚠️ (public access enabled)
- **WAF Cost**: 90% ✅
- **WAF Operations**: 80% ✅
### Critical Findings
#### High Priority
**1. Public Access Enabled** (Security)
```hcl
# Current (❌)
public_network_access_enabled = true
# Recommended (✅)
public_network_access_enabled = false
resource "azurerm_private_endpoint" "storage" {
name = "pe-${var.storage_account_name}"
location = var.location
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id
private_service_connection {
name = "psc-storage"
private_connection_resource_id = azurerm_storage_account.this.id
subresource_names = ["blob"]
is_manual_connection = false
}
}
Reference: Storage Account Private Endpoints
2. LRS Not Zone-Redundant (Reliability)
# Current (⚠️)
account_replication_type = "LRS" # Single zone
# Production (✅)
account_replication_type = "ZRS" # Zone-redundant
# OR
account_replication_type = "GZRS" # Geo-zone-redundant
Reference: Storage Redundancy
## Example: App Service Review
```markdown
## Review Request
Check App Service Terraform against Microsoft best practices.
## Overall Score: 82/100
### High Priority
**Zone Redundancy Not Enabled** (Reliability)
```hcl
# Current (⚠️)
sku_name = "P1v2" # Doesn't support zones
# Production (✅)
sku_name = "P1v3" # PremiumV3
zone_balancing_enabled = true # Zone redundancy
Impact: Improves availability from 99.95% → 99.99%
Reference: App Service Zone Redundancy
Medium Priority
Health Check Not Configured (Reliability)
# Add this (✅)
site_config {
health_check_path = "/health"
health_check_eviction_time_in_min = 5
}
## Tips and Best Practices
### 🔍 Search Query Optimization
**Good queries** (specific, actionable):
- "Cloud Adoption Framework {service} security best practices"
- "Well-Architected Framework {service} reliability zones"
- "{service} encryption private endpoint security baseline"
**Poor queries** (too broad):
- "Azure best practices" ❌
- "Cloud Adoption Framework" ❌
- "security" ❌
### 🎯 Resource-Agnostic Approach
The methodology is the same for ALL resources:
1. Call Terraform best practices (always first)
2. Search CAF for resource-specific guidance
3. Search WAF for each pillar
4. Generate report with scores and recommendations
**The queries change, the methodology doesn't!**
### ⚡ Common Gaps to Check
1. **Missing diagnostic settings** - Most common WAF gap
2. **No NSG flow logs** - Security and operational visibility
3. **Hardcoded values** - Should use variables
4. **No cost estimates** - Users need to understand spend
5. **Missing examples** - Modules need working examples
### 🎯 Validation Priority
1. **Security** - Always highest priority (breaches are expensive)
2. **Reliability** - Downtime impacts business
3. **Operational Excellence** - Reduces long-term costs
4. **Cost Optimization** - Balance with other pillars
5. **Performance Efficiency** - Optimize after stability
## Scoring Framework
**Overall Score Calculation**:
Total Score = (CAF Score × 0.4) + (WAF Score × 0.6) CAF Score = (Topology + Naming + Tagging + Organization) / 4 WAF Score = (Reliability + Security + Cost + OpEx + Performance) / 5
**Grade Interpretation**:
- **90-100%**: ✅ Excellent - Production ready
- **80-89%**: ✅ Good - Minor enhancements needed
- **70-79%**: ⚠️ Acceptable - Several improvements required
- **60-69%**: ⚠️ Needs Work - Significant gaps
- **Below 60%**: ❌ Not Recommended - Major revisions needed
## Additional Resources
- **Detailed Validation Checklists**: See [references/REFERENCE.md](references/REFERENCE.md) for resource-specific validation criteria
- **MCP Commands**: Full command syntax in [references/REFERENCE.md](references/REFERENCE.md)
- **Scoring Details**: Detailed pillar scoring in [references/REFERENCE.md](references/REFERENCE.md)
- **Common Findings**: Examples of fixes in [references/REFERENCE.md](references/REFERENCE.md)
## Integration with Other Skills
- **azure-verified-modules** - Learn implementation patterns from AVM
- **terraform-security-scan** - Deep security validation with tfsec/checkov
- **github-actions-terraform** - CI/CD pipeline best practices
## References
- [Azure Cloud Adoption Framework](https://learn.microsoft.com/azure/cloud-adoption-framework/)
- [Azure Well-Architected Framework](https://learn.microsoft.com/azure/well-architected/)
- [Hub-Spoke Network Topology](https://learn.microsoft.com/azure/architecture/networking/architecture/hub-spoke)
- [Azure Landing Zones](https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/)
- [Network Design Area](https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-area/network-topology-and-connectivity)