AWS Solution Architect

Design scalable, cost-effective AWS architectures for startups with infrastructure-as-code templates.

Trigger Terms
Workflow
Tools
Quick Start
Input Requirements
Output Formats

Trigger Terms

Use this skill when you encounter:

Category	Terms
Architecture Design	serverless architecture, AWS architecture, cloud design, microservices, three-tier
IaC Generation	CloudFormation, CDK, Terraform, infrastructure as code, deploy template
Serverless	Lambda, API Gateway, DynamoDB, Step Functions, EventBridge, AppSync
Containers	ECS, Fargate, EKS, container orchestration, Docker on AWS
Cost Optimization	reduce AWS costs, optimize spending, right-sizing, Savings Plans
Database	Aurora, RDS, DynamoDB design, database migration, data modeling
Security	IAM policies, VPC design, encryption, Cognito, WAF
CI/CD	CodePipeline, CodeBuild, CodeDeploy, GitHub Actions AWS
Monitoring	CloudWatch, X-Ray, observability, alarms, dashboards
Migration	migrate to AWS, lift and shift, replatform, DMS

Workflow

Step 1: Gather Requirements

Collect application specifications:

- Application type (web app, mobile backend, data pipeline, SaaS)
- Expected users and requests per second
- Budget constraints (monthly spend limit)
- Team size and AWS experience level
- Compliance requirements (GDPR, HIPAA, SOC 2)
- Availability requirements (SLA, RPO/RTO)

Step 2: Design Architecture

Run the architecture designer to get pattern recommendations:

python scripts/architecture_designer.py --input requirements.json

Select from recommended patterns:

Serverless Web: S3 + CloudFront + API Gateway + Lambda + DynamoDB
Event-Driven Microservices: EventBridge + Lambda + SQS + Step Functions
Three-Tier: ALB + ECS Fargate + Aurora + ElastiCache
GraphQL Backend: AppSync + Lambda + DynamoDB + Cognito

See references/architecture_patterns.md for detailed pattern specifications.

Step 3: Generate IaC Templates

Create infrastructure-as-code for the selected pattern:

# Serverless stack (CloudFormation)
python scripts/serverless_stack.py --app-name my-app --region us-east-1

# Output: CloudFormation YAML template ready to deploy

Step 4: Review Costs

Analyze estimated costs and optimization opportunities:

python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000

Output includes:

Monthly cost breakdown by service
Right-sizing recommendations
Savings Plans opportunities
Potential monthly savings

Step 5: Deploy

Deploy the generated infrastructure:

# CloudFormation
aws cloudformation create-stack \
  --stack-name my-app-stack \
  --template-body file://template.yaml \
  --capabilities CAPABILITY_IAM

# CDK
cdk deploy

# Terraform
terraform init && terraform apply

Step 6: Validate

Verify deployment and set up monitoring:

# Check stack status
aws cloudformation describe-stacks --stack-name my-app-stack

# Set up CloudWatch alarms
aws cloudwatch put-metric-alarm --alarm-name high-errors ...

Tools

architecture_designer.py

Generates architecture patterns based on requirements.

python scripts/architecture_designer.py --input requirements.json --output design.json

Input: JSON with app type, scale, budget, compliance needs Output: Recommended pattern, service stack, cost estimate, pros/cons

serverless_stack.py

Creates serverless CloudFormation templates.

python scripts/serverless_stack.py --app-name my-app --region us-east-1

Output: Production-ready CloudFormation YAML with:

API Gateway + Lambda
DynamoDB table
Cognito user pool
IAM roles with least privilege
CloudWatch logging

cost_optimizer.py

Analyzes costs and recommends optimizations.

python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000

Output: Recommendations for:

Idle resource removal
Instance right-sizing
Reserved capacity purchases
Storage tier transitions
NAT Gateway alternatives

Quick Start

MVP Architecture (< $100/month)

Ask: "Design a serverless MVP backend for a mobile app with 1000 users"

Result:
- Lambda + API Gateway for API
- DynamoDB pay-per-request for data
- Cognito for authentication
- S3 + CloudFront for static assets
- Estimated: $20-50/month

Scaling Architecture ($500-2000/month)

Ask: "Design a scalable architecture for a SaaS platform with 50k users"

Result:
- ECS Fargate for containerized API
- Aurora Serverless for relational data
- ElastiCache for session caching
- CloudFront for CDN
- CodePipeline for CI/CD
- Multi-AZ deployment

Cost Optimization

Ask: "Optimize my AWS setup to reduce costs by 30%. Current spend: $3000/month"

Provide: Current resource inventory (EC2, RDS, S3, etc.)

Result:
- Idle resource identification
- Right-sizing recommendations
- Savings Plans analysis
- Storage lifecycle policies
- Target savings: $900/month

IaC Generation

Ask: "Generate CloudFormation for a three-tier web app with auto-scaling"

Result:
- VPC with public/private subnets
- ALB with HTTPS
- ECS Fargate with auto-scaling
- Aurora with read replicas
- Security groups and IAM roles

Input Requirements

Provide these details for architecture design:

Requirement	Description	Example
Application type	What you're building	SaaS platform, mobile backend
Expected scale	Users, requests/sec	10k users, 100 RPS
Budget	Monthly AWS limit	$500/month max
Team context	Size, AWS experience	3 devs, intermediate
Compliance	Regulatory needs	HIPAA, GDPR, SOC 2
Availability	Uptime requirements	99.9% SLA, 1hr RPO

JSON Format:

{
  "application_type": "saas_platform",
  "expected_users": 10000,
  "requests_per_second": 100,
  "budget_monthly_usd": 500,
  "team_size": 3,
  "aws_experience": "intermediate",
  "compliance": ["SOC2"],
  "availability_sla": "99.9%"
}

Output Formats

Architecture Design

Pattern recommendation with rationale
Service stack diagram (ASCII)
Configuration specifications
Monthly cost estimate
Scaling characteristics
Trade-offs and limitations

IaC Templates

CloudFormation YAML: Production-ready SAM/CFN templates
CDK TypeScript: Type-safe infrastructure code
Terraform HCL: Multi-cloud compatible configs

Cost Analysis

Current spend breakdown
Optimization recommendations with savings
Priority action list (high/medium/low)
Implementation checklist

Reference Documentation

Document	Contents
`references/architecture_patterns.md`	6 patterns: serverless, microservices, three-tier, data processing, GraphQL, multi-region
`references/service_selection.md`	Decision matrices for compute, database, storage, messaging
`references/best_practices.md`	Serverless design, cost optimization, security hardening, scalability

Limitations

Lambda: 15-minute execution, 10GB memory max
API Gateway: 29-second timeout, 10MB payload
DynamoDB: 400KB item size, eventually consistent by default
Regional availability varies by service
Some services have AWS-specific lock-in

Troubleshooting

Problem	Cause	Solution
Lambda cold starts exceed 500ms	Function package too large or VPC-attached Lambda without provisioned concurrency	Reduce deployment package size, use Lambda layers, enable provisioned concurrency for latency-sensitive endpoints, or move to Fargate for consistent performance
CloudFormation stack stuck in `ROLLBACK_IN_PROGRESS`	Resource creation failed mid-deploy and rollback also failed (e.g., non-empty S3 bucket)	Check CloudFormation events for the root cause, manually delete the blocking resource, then delete the stack; use `DeletionPolicy: Retain` for stateful resources
Monthly AWS bill significantly exceeds estimate	Untagged resources, forgotten dev/staging environments, or NAT Gateway data transfer costs	Enable Cost Explorer, set up AWS Budgets with 50%/80%/100% alerts, run `cost_optimizer.py` against current inventory, and audit resources with missing tags
DynamoDB throttling errors (ProvisionedThroughputExceededException)	Read/write capacity insufficient for traffic spikes, or hot partition key	Switch to on-demand billing mode, redesign partition key for even distribution, or enable DynamoDB Auto Scaling with appropriate min/max settings
API Gateway returns 504 Gateway Timeout	Backend Lambda or integration exceeds the 29-second API Gateway limit	Optimize Lambda execution time, offload long tasks to Step Functions or SQS, increase Lambda memory (which also increases CPU), or use asynchronous invocation patterns
Cross-region replication lag causes stale reads	DynamoDB Global Tables or Aurora Global Database replication latency under heavy write load	Design for eventual consistency, route reads to the write-primary region for strong consistency, or use conflict resolution strategies documented in `references/architecture_patterns.md`
IAM permission denied errors after deployment	Least-privilege policies missing required actions, or trust policy not updated for new services	Review CloudTrail logs for denied API calls, add the specific missing actions to the IAM policy, and validate with IAM Policy Simulator before deploying

Success Criteria

Cost accuracy: Monthly AWS bill stays within 10% of the architecture estimate produced by cost_optimizer.py.
Availability: Production workloads meet or exceed the target SLA (99.9% uptime for three-tier, 99.95% for multi-region).
Recovery time: RTO under 4 hours and RPO under 1 hour for all production architectures with disaster recovery configured.
Deployment speed: Infrastructure provisioned from generated IaC templates in under 30 minutes for serverless stacks and under 60 minutes for three-tier stacks.
Security posture: Zero critical findings in AWS Security Hub within 30 days of deployment; all resources encrypted at rest and in transit.
Scaling response: Auto-scaling responds to traffic spikes within 2 minutes, handling 10x baseline load without manual intervention.
Operational overhead: Team spends less than 4 hours per week on infrastructure operations after initial deployment.

Scope & Limitations

This skill covers:

AWS architecture design for startups and growth-stage companies (serverless, three-tier, microservices, data pipelines, IoT, multi-region patterns)
Infrastructure-as-code generation for CloudFormation (SAM), CDK (TypeScript), and Terraform (HCL)
Cost analysis, right-sizing recommendations, and Savings Plans evaluation
Service selection guidance for compute, database, storage, networking, and security

This skill does NOT cover:

Multi-cloud or hybrid-cloud architectures (Azure, GCP) -- see engineering/cloud-migration-specialist/ for cross-cloud strategies
Application-level code, business logic, or framework-specific implementation -- see engineering/senior-fullstack/ for fullstack development
Compliance audit execution or regulatory evidence collection -- see ra-qm-team/ for SOC 2, HIPAA, GDPR, and ISO compliance skills
AWS account management, organization policies, or billing disputes -- see AWS Support or engineering/ms365-tenant-manager/ for tenant administration patterns

Integration Points

Skill	Integration	Data Flow
`engineering/senior-devops`	CI/CD pipeline configuration for deploying generated IaC templates	Architecture templates flow into DevOps deployment pipelines and monitoring setup
`engineering/senior-secops`	Security hardening of generated architectures (IAM policies, WAF rules, GuardDuty)	Architecture design feeds into security review; SecOps findings feed back as architecture constraints
`ra-qm-team/soc2-compliance`	Compliance validation of AWS architectures against SOC 2 Trust Services Criteria	Architecture resource inventory feeds into compliance audit; audit findings drive architecture changes
`engineering/senior-backend`	Backend service implementation that runs on the designed AWS infrastructure	Architecture patterns define the runtime environment; backend requirements inform service selection
`engineering/tech-stack-evaluator`	Technology selection decisions that influence architecture pattern choice	Stack evaluation outputs (database, compute, messaging choices) feed into architecture requirements JSON
`c-level-advisor/cto-advisor`	Strategic infrastructure decisions, build-vs-buy, and cloud budget planning	Cost analysis from `cost_optimizer.py` informs CTO budget decisions; CTO constraints flow back as architecture requirements

Tool Reference

architecture_designer.py

Purpose: Generates architecture pattern recommendations based on application requirements. Analyzes app type, expected scale, budget, team experience, and compliance needs to recommend the optimal AWS architecture pattern with full service configurations and cost estimates.

Usage:

from scripts.architecture_designer import ArchitectureDesigner

designer = ArchitectureDesigner(requirements)
pattern = designer.recommend_architecture_pattern()
checklist = designer.generate_service_checklist()

Constructor Parameters:

Parameter	Type	Required	Default	Description
`requirements`	`dict`	Yes	--	Dictionary containing all application requirements (see fields below)

Requirements Dictionary Fields:

Field	Type	Default	Description
`application_type`	`str`	`"web_application"`	One of: `web_application`, `mobile_backend`, `data_pipeline`, `microservices`, `saas_platform`, `iot_platform`
`expected_users`	`int`	`1000`	Expected number of users (or devices for IoT)
`requests_per_second`	`int`	`10`	Expected peak requests per second
`budget_monthly_usd`	`float`	`500`	Maximum monthly AWS budget in USD
`team_size`	`int`	`3`	Number of engineers on the team
`aws_experience`	`str`	`"beginner"`	Team AWS experience level
`compliance`	`list`	`[]`	List of compliance frameworks (e.g., `["SOC2", "HIPAA"]`)
`data_size_gb`	`int`	`10`	Expected data volume in GB

Methods:

Method	Returns	Description
`recommend_architecture_pattern()`	`dict`	Returns recommended pattern with services, cost estimate, pros/cons, and scaling characteristics
`generate_service_checklist()`	`list[dict]`	Returns phased implementation checklist (Planning, Foundation, Core Services, Security, Monitoring, CI/CD)

Example:

from scripts.architecture_designer import ArchitectureDesigner

requirements = {
    "application_type": "saas_platform",
    "expected_users": 10000,
    "requests_per_second": 100,
    "budget_monthly_usd": 500,
    "team_size": 3,
    "aws_experience": "intermediate",
    "compliance": ["SOC2"],
    "data_size_gb": 50
}

designer = ArchitectureDesigner(requirements)
result = designer.recommend_architecture_pattern()
print(result['pattern_name'])       # "Serverless Web Application"
print(result['estimated_cost'])     # {"monthly_usd": ..., "breakdown": {...}}
print(result['services'])           # Full service stack with configurations

Output Format: Returns a dictionary with keys: pattern_name, description, use_case, services (nested service configurations), estimated_cost (with monthly_usd and breakdown), pros, cons, and scaling_characteristics.

Supported Patterns:

Serverless Web Application (< 10k users)
Modern Three-Tier Application (10k-100k users)
Multi-Region High Availability (100k+ users)
Serverless Mobile Backend (mobile app type)
Event-Driven Microservices (microservices type)
Real-Time Data Pipeline (data pipeline type)
IoT Platform (IoT type)

serverless_stack.py

Purpose: Generates production-ready infrastructure-as-code templates for serverless applications. Produces CloudFormation (SAM), CDK (TypeScript), and Terraform (HCL) configurations with API Gateway, Lambda, DynamoDB, Cognito, IAM roles, and CloudWatch logging preconfigured.

Usage:

from scripts.serverless_stack import ServerlessStackGenerator

generator = ServerlessStackGenerator(app_name, requirements)
cfn_template = generator.generate_cloudformation_template()
cdk_stack = generator.generate_cdk_stack()
terraform_config = generator.generate_terraform_configuration()

Constructor Parameters:

Parameter	Type	Required	Default	Description
`app_name`	`str`	Yes	--	Application name (used for resource naming; auto-lowercased, spaces replaced with hyphens)
`requirements`	`dict`	Yes	--	Dictionary with deployment requirements (see fields below)

Requirements Dictionary Fields:

Field	Type	Default	Description
`region`	`str`	`"us-east-1"`	AWS region for deployment

Methods:

Method	Returns	Description
`generate_cloudformation_template()`	`str`	YAML CloudFormation/SAM template with DynamoDB, Lambda, API Gateway, Cognito, IAM, and CloudWatch
`generate_cdk_stack()`	`str`	TypeScript CDK stack with equivalent resources
`generate_terraform_configuration()`	`str`	Terraform HCL configuration with equivalent resources

Example:

from scripts.serverless_stack import ServerlessStackGenerator

generator = ServerlessStackGenerator("my-saas-app", {"region": "us-west-2"})

# Generate CloudFormation template
cfn = generator.generate_cloudformation_template()
with open("template.yaml", "w") as f:
    f.write(cfn)

# Generate CDK stack
cdk = generator.generate_cdk_stack()
with open("lib/stack.ts", "w") as f:
    f.write(cdk)

# Generate Terraform config
tf = generator.generate_terraform_configuration()
with open("main.tf", "w") as f:
    f.write(tf)

Output Format: Each method returns a string containing the full IaC template. Templates include: DynamoDB table (single-table design with PK/SK), Lambda function (Node.js 18.x, 512 MB, 10s timeout), API Gateway (REST, Cognito auth, CORS, throttling), Cognito User Pool (email sign-in, optional MFA), IAM roles (least privilege), and CloudWatch log group (7-day retention). All templates output: API URL, User Pool ID, User Pool Client ID, and Table Name.

cost_optimizer.py

Purpose: Analyzes current AWS resource inventory and spending to generate prioritized cost optimization recommendations. Evaluates compute (EC2, Lambda), storage (S3), databases (RDS, DynamoDB), networking (NAT Gateway, VPC endpoints), and general optimizations (CloudWatch Logs, Elastic IPs, budget alerts).

Usage:

from scripts.cost_optimizer import CostOptimizer

optimizer = CostOptimizer(current_resources, monthly_spend)
analysis = optimizer.analyze_and_optimize()
checklist = optimizer.generate_optimization_checklist()

Constructor Parameters:

Parameter	Type	Required	Default	Description
`current_resources`	`dict`	Yes	--	Dictionary describing current AWS resources (see fields below)
`monthly_spend`	`float`	Yes	--	Current monthly AWS spend in USD

Resources Dictionary Fields:

Field	Type	Description
`ec2_instances`	`list[dict]`	EC2 instances with `cpu_utilization` (%), `pricing` (`"on-demand"` or `"reserved"`)
`lambda_functions`	`list[dict]`	Lambda functions with `memory_mb`, `avg_memory_used_mb`
`s3_buckets`	`list[dict]`	S3 buckets with `name`, `size_gb`, `storage_class`, `has_lifecycle_policy` (bool)
`rds_instances`	`list[dict]`	RDS instances with `name`, `connections_per_day`, `monthly_cost`, `engine`, `utilization` (%)
`dynamodb_tables`	`list[dict]`	DynamoDB tables with `name`, `billing_mode`, `read_capacity_units`, `write_capacity_units`, `utilization_percentage`
`nat_gateways`	`list[dict]`	NAT Gateway resources
`multi_az_required`	`bool`	Whether multi-AZ NAT is required
`vpc_endpoints`	`list`	Existing VPC endpoints
`s3_data_transfer_gb`	`float`	Monthly S3 data transfer volume in GB
`cloudwatch_log_groups`	`list[dict]`	Log groups with `name`, `retention_days` (`-1` for never expire), `size_gb`
`elastic_ips`	`list[dict]`	Elastic IPs with `attached` (bool)
`has_budget_alerts`	`bool`	Whether AWS Budgets are configured
`has_cost_explorer`	`bool`	Whether Cost Explorer is enabled

Methods:

Method	Returns	Description
`analyze_and_optimize()`	`dict`	Full cost analysis with current spend, potential savings, optimized spend, savings percentage, recommendations list, and top 5 priority actions
`generate_optimization_checklist()`	`list[dict]`	Phased action checklist: Immediate (today), This Week, This Month, Ongoing

Example:

from scripts.cost_optimizer import CostOptimizer

resources = {
    "ec2_instances": [
        {"cpu_utilization": 5, "pricing": "on-demand"},
        {"cpu_utilization": 65, "pricing": "on-demand"}
    ],
    "s3_buckets": [
        {"name": "app-assets", "size_gb": 200, "storage_class": "STANDARD", "has_lifecycle_policy": False}
    ],
    "nat_gateways": [{"id": "nat-1"}, {"id": "nat-2"}],
    "multi_az_required": False,
    "has_budget_alerts": False,
    "has_cost_explorer": False
}

optimizer = CostOptimizer(resources, monthly_spend=3000)
result = optimizer.analyze_and_optimize()

print(f"Current spend: ${result['current_monthly_spend']}")
print(f"Potential savings: ${result['potential_monthly_savings']}")
print(f"Savings: {result['savings_percentage']}%")
for rec in result['priority_actions']:
    print(f"  [{rec['priority']}] {rec['service']}: {rec['recommendation']}")

Output Format: analyze_and_optimize() returns a dictionary with keys: current_monthly_spend (float), potential_monthly_savings (float), optimized_monthly_spend (float), savings_percentage (float), recommendations (list of dicts with service, type, issue, recommendation, potential_savings, priority), and priority_actions (top 5 high-priority recommendations sorted by savings).

aws-solution-architect

AWS Solution Architect

Table of Contents

Trigger Terms

Workflow

Step 1: Gather Requirements

Step 2: Design Architecture

Step 3: Generate IaC Templates

Step 4: Review Costs

Step 5: Deploy

Step 6: Validate

Tools

architecture_designer.py

serverless_stack.py

cost_optimizer.py

Quick Start

MVP Architecture (< $100/month)

Scaling Architecture ($500-2000/month)

Cost Optimization

IaC Generation

Input Requirements

Output Formats

Architecture Design

IaC Templates

Cost Analysis

Reference Documentation

Limitations

Troubleshooting

Success Criteria

Scope & Limitations

Integration Points

Tool Reference

architecture_designer.py

serverless_stack.py

cost_optimizer.py