azure-architecture-review
Azure Architecture Review Skill
This skill reviews Terraform code for Azure resources against Microsoft's Cloud Adoption Framework (CAF) and Well-Architected Framework (WAF). It analyzes .tf files to ensure your infrastructure-as-code follows Microsoft's frameworks.
When to Use This Skill
- Reviewing Terraform code - Check
.tffiles against CAF/WAF before deployment - Pull request reviews - Validate infrastructure-as-code changes
- Architecture validation - Ensure Terraform designs follow Microsoft frameworks
- Before terraform apply - Catch compliance issues early in development
- Code refactoring - Align existing Terraform with CAF/WAF best practices
- Compliance audits - Review infrastructure-as-code for governance
Applicable to ALL Azure Resources
This skill works for:
- Networking: Virtual Networks, Firewalls, VPN Gateways, Load Balancers
- Compute: Virtual Machines, App Service, AKS, Container Instances
- Storage: Storage Accounts, Managed Disks, File Shares
- Databases: SQL Database, Cosmos DB, PostgreSQL, MySQL
- Security: Key Vault, Managed Identity, Private Endpoints
- Monitoring: Log Analytics, Application Insights, Alerts
- And more: Any Azure service with Terraform support
Prerequisites
Required MCP Tools:
mcp_azure_mcp_documentation- Search Microsoft Learn documentationmcp_azure_mcp_azureterraformbestpractices- Get Terraform-specific Azure guidance
Context Needed:
- Understanding of the infrastructure being reviewed
- Business requirements (SLA, security, compliance)
- Current state (greenfield vs brownfield)
Review Methodology
Step 1: Get Terraform Best Practices (MANDATORY)
Always call this FIRST before generating or reviewing Terraform code:
# MCP Command
mcp_azure_mcp_azureterraformbestpractices
command: get
intent: "Get Azure Terraform best practices before code review"
Returns: Terraform validation workflow (validate → plan → apply), Azure-specific patterns, security defaults, naming conventions
Step 2: Search Cloud Adoption Framework Documentation
Query Pattern: "Cloud Adoption Framework {resource-type} {design-area}"
Examples:
# Networking
query: "Cloud Adoption Framework hub spoke network topology"
# Storage
query: "Cloud Adoption Framework storage account security encryption"
# Compute
query: "Cloud Adoption Framework App Service landing zone networking"
# Databases
query: "Cloud Adoption Framework Azure SQL Database security"
# Security
query: "Cloud Adoption Framework Key Vault secrets management"
Key CAF Design Areas (apply to ALL resources):
- Resource Organization - Naming, resource groups, subscriptions
- Security - Authentication, encryption, network isolation, RBAC
- Network Topology - VNet integration, private endpoints, connectivity
- Identity & Access - Managed identities, RBAC, Azure AD
- Governance - Tags, Azure Policy, diagnostics, cost tracking
Step 3: Search Well-Architected Framework Documentation
Query Pattern: "Well-Architected Framework {pillar} {service-name} {concern}"
Generic Pillar Queries:
# Reliability
query: "Well-Architected Framework reliability availability zones disaster recovery"
# Security
query: "Well-Architected Framework security encryption network access control"
# Cost Optimization
query: "Well-Architected Framework cost optimization pricing SKU right-sizing"
# Operational Excellence
query: "Well-Architected Framework operational excellence monitoring diagnostics"
# Performance Efficiency
query: "Well-Architected Framework performance efficiency scalability throughput"
Service-Specific Pattern: "{Service} Well-Architected {pillar}"
# Examples
query: "Storage Account Well-Architected security encryption"
query: "App Service Well-Architected reliability availability zones"
query: "Azure SQL Database Well-Architected reliability high availability"
query: "Key Vault Well-Architected security access policies"
Step 4: Service-Specific Guidance
Search for service-specific WAF guidance:
# Pattern: "{Azure Service} Well-Architected best practices"
query: "Azure Firewall Well-Architected best practices"
query: "VPN Gateway Well-Architected reliability"
query: "Virtual Machines Well-Architected best practices"
query: "Azure Kubernetes Service Well-Architected security"
Validation Framework
Universal Checklist (ALL Resources)
CAF Validation:
- Naming convention:
{type}-{workload}-{env}-{region}-{instance} - Required tags: environment, project, owner, cost-center, managed-by
- Resource organization: appropriate subscription/resource group
- Documentation: README with usage examples
WAF Reliability:
- Zone redundancy or appropriate SLA
- Backup/recovery strategy
- Health monitoring configured
WAF Security:
- Managed identity (not service principal)
- Private endpoint (no public access)
- Encryption at rest and in transit
- RBAC with least privilege
- No hardcoded secrets
WAF Cost:
- Right-sized SKU for workload
- Appropriate replication/redundancy level
- Cost tags for tracking
WAF Operations:
- Diagnostic settings enabled
- Log Analytics integration
- Infrastructure as Code (Terraform)
- Input validation
WAF Performance:
- SKU meets performance requirements
- Scaling configured appropriately
- Optimised for workload patterns
Note: Detailed resource-specific validation checklists are in references/REFERENCE.md
Review Output Template
Summary
- Architecture Type: [Hub-Spoke / Virtual WAN / Other]
- Environment: [Production / Non-Production]
- Overall Compliance Score: [X/100]
Framework Alignment
CAF Score: [X%] ✅ / ⚠️ / ❌
- Network Topology: [findings]
- Naming Conventions: [findings]
- Resource Organization: [findings]
- Tagging Strategy: [findings]
WAF Pillar Scores
| Pillar | Score | Status | Key Findings |
|---|---|---|---|
| Reliability | X% | ✅/⚠️/❌ | [summary] |
| Security | X% | ✅/⚠️/❌ | [summary] |
| Cost Optimization | X% | ✅/⚠️/❌ | [summary] |
| Operational Excellence | X% | ✅/⚠️/❌ | [summary] |
| Performance Efficiency | X% | ✅/⚠️/❌ | [summary] |
Recommended Actions
High Priority
- [Issue] - [Recommendation with code example]
Reference: [Link to Microsoft docs]# Current (❌) [bad code] # Recommended (✅) [good code]
Medium Priority
- [Enhancement] - [Recommendation]
Low Priority
- [Optional Enhancement] - [Recommendation]
Example: Storage Account Review
## Review Request
Review my storage account Terraform for CAF/WAF compliance.
## Steps Taken
1. ✅ Called `azureterraformbestpractices get`
2. ✅ Searched CAF: "Cloud Adoption Framework storage account security"
3. ✅ Searched WAF: "Storage Account Well-Architected security encryption"
## Overall Score: 75/100
### Framework Scores
- **CAF**: 85% ✅ (naming correct, tags present)
- **WAF Reliability**: 60% ⚠️ (LRS - not zone-redundant)
- **WAF Security**: 70% ⚠️ (public access enabled)
- **WAF Cost**: 90% ✅
- **WAF Operations**: 80% ✅
### Critical Findings
#### High Priority
**1. Public Access Enabled** (Security)
```hcl
# Current (❌)
public_network_access_enabled = true
# Recommended (✅)
public_network_access_enabled = false
resource "azurerm_private_endpoint" "storage" {
name = "pe-${var.storage_account_name}"
location = var.location
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id
private_service_connection {
name = "psc-storage"
private_connection_resource_id = azurerm_storage_account.this.id
subresource_names = ["blob"]
is_manual_connection = false
}
}
Reference: Storage Account Private Endpoints
2. LRS Not Zone-Redundant (Reliability)
# Current (⚠️)
account_replication_type = "LRS" # Single zone
# Production (✅)
account_replication_type = "ZRS" # Zone-redundant
# OR
account_replication_type = "GZRS" # Geo-zone-redundant
Reference: Storage Redundancy
## Example: App Service Review
```markdown
## Review Request
Check App Service Terraform against Microsoft best practices.
## Overall Score: 82/100
### High Priority
**Zone Redundancy Not Enabled** (Reliability)
```hcl
# Current (⚠️)
sku_name = "P1v2" # Doesn't support zones
# Production (✅)
sku_name = "P1v3" # PremiumV3
zone_balancing_enabled = true # Zone redundancy
Impact: Improves availability from 99.95% → 99.99%
Reference: App Service Zone Redundancy
Medium Priority
Health Check Not Configured (Reliability)
# Add this (✅)
site_config {
health_check_path = "/health"
health_check_eviction_time_in_min = 5
}
## Tips and Best Practices
### 🔍 Search Query Optimization
**Good queries** (specific, actionable):
- "Cloud Adoption Framework {service} security best practices"
- "Well-Architected Framework {service} reliability zones"
- "{service} encryption private endpoint security baseline"
**Poor queries** (too broad):
- "Azure best practices" ❌
- "Cloud Adoption Framework" ❌
- "security" ❌
### 🎯 Resource-Agnostic Approach
The methodology is the same for ALL resources:
1. Call Terraform best practices (always first)
2. Search CAF for resource-specific guidance
3. Search WAF for each pillar
4. Generate report with scores and recommendations
**The queries change, the methodology doesn't!**
### ⚡ Common Gaps to Check
1. **Missing diagnostic settings** - Most common WAF gap
2. **No NSG flow logs** - Security and operational visibility
3. **Hardcoded values** - Should use variables
4. **No cost estimates** - Users need to understand spend
5. **Missing examples** - Modules need working examples
### 🎯 Validation Priority
1. **Security** - Always highest priority (breaches are expensive)
2. **Reliability** - Downtime impacts business
3. **Operational Excellence** - Reduces long-term costs
4. **Cost Optimization** - Balance with other pillars
5. **Performance Efficiency** - Optimize after stability
## Scoring Framework
**Overall Score Calculation**:
Total Score = (CAF Score × 0.4) + (WAF Score × 0.6) CAF Score = (Topology + Naming + Tagging + Organization) / 4 WAF Score = (Reliability + Security + Cost + OpEx + Performance) / 5
**Grade Interpretation**:
- **90-100%**: ✅ Excellent - Production ready
- **80-89%**: ✅ Good - Minor enhancements needed
- **70-79%**: ⚠️ Acceptable - Several improvements required
- **60-69%**: ⚠️ Needs Work - Significant gaps
- **Below 60%**: ❌ Not Recommended - Major revisions needed
## Additional Resources
- **Detailed Validation Checklists**: See [references/REFERENCE.md](references/REFERENCE.md) for resource-specific validation criteria
- **MCP Commands**: Full command syntax in [references/REFERENCE.md](references/REFERENCE.md)
- **Scoring Details**: Detailed pillar scoring in [references/REFERENCE.md](references/REFERENCE.md)
- **Common Findings**: Examples of fixes in [references/REFERENCE.md](references/REFERENCE.md)
## Integration with Other Skills
- **azure-verified-modules** - Learn implementation patterns from AVM
- **terraform-security-scan** - Deep security validation with tfsec/checkov
- **github-actions-terraform** - CI/CD pipeline best practices
## References
- [Azure Cloud Adoption Framework](https://learn.microsoft.com/azure/cloud-adoption-framework/)
- [Azure Well-Architected Framework](https://learn.microsoft.com/azure/well-architected/)
- [Hub-Spoke Network Topology](https://learn.microsoft.com/azure/architecture/networking/architecture/hub-spoke)
- [Azure Landing Zones](https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/)
- [Network Design Area](https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-area/network-topology-and-connectivity)
More from thomast1906/github-copilot-skills-terraform
azure-verified-modules
Research and learn from Azure Verified Modules (AVM) patterns to build better custom Terraform modules. Use this skill when creating Terraform modules, researching Azure security defaults, understanding proper resource configuration patterns, looking for Microsoft patterns, reference implementation examples, security defaults, or AVM examples. NOT for consuming AVM modules directly.
1terraform-security-scan
Perform security scanning and compliance checking of Terraform configurations for Azure. Use this skill when scanning for security issues, checking CIS/Azure Security Benchmark compliance, auditing Terraform code for vulnerabilities, implementing security gates in CI/CD pipelines, vulnerability scan, security audit, compliance check, CIS benchmark, tfsec, or checkov.
1github-actions-terraform
Debug and fix failing Terraform GitHub Actions workflows. Use this skill when debugging CI/CD failures, fixing Terraform pipeline issues, troubleshooting authentication errors, setting up new GitHub Actions workflows for infrastructure deployments, workflow failed, pipeline error, CI/CD broken, or deploy failure.
1terraform-provider-upgrade
Safe Terraform provider upgrades with automatic resource migration, breaking change detection, and state management using moved blocks. Use when upgrading provider versions, handling removed resources, migrating deprecated syntax, or performing major version upgrades.
1