system-architecture
System Architecture Expert
When to use this Skill
Use this Skill when:
- Designing distributed systems
- Writing system design documentation
- Preparing for system design interviews
- Creating architecture diagrams
- Analyzing trade-offs between design choices
- Reviewing or improving existing system designs
System Design Framework
1. Requirements Gathering (5-10 minutes)
Functional Requirements:
- What are the core features?
- What actions can users perform?
- What are the inputs and outputs?
Non-Functional Requirements:
- Scale: How many users? How much data?
- Performance: Latency requirements? (p50, p95, p99)
- Availability: What uptime is needed? (99.9%, 99.99%)
- Consistency: Strong or eventual consistency?
Constraints:
- Budget limitations
- Technology stack constraints
- Team expertise
- Timeline
Example Questions:
- How many daily active users?
- What's the read:write ratio?
- What's the average data size?
- What's the peak load vs average load?
- Do we need real-time updates?
- Can we have data loss?
2. Capacity Estimation (Back-of-the-envelope)
Calculate:
Traffic:
- DAU = 100M users
- Each user makes 10 requests/day
- QPS = 100M * 10 / 86400 ≈ 11,574 QPS
- Peak QPS = 2-3x average ≈ 30,000 QPS
Storage:
- 100M users * 1KB per user = 100GB
- With 3x replication = 300GB
- Growth: 300GB * 365 days = 109.5TB/year
Bandwidth:
- QPS * average request size
- 11,574 * 10KB = 115.74MB/s
Memory/Cache:
- 80-20 rule: 20% of data gets 80% of traffic
- Cache = 20% of total data for hot data
3. High-Level Design
Core Components:
- Client Layer (Web, Mobile, Desktop)
- API Gateway / Load Balancer
- Application Servers (Business logic)
- Cache Layer (Redis, Memcached)
- Database (SQL, NoSQL, or both)
- Message Queue (Kafka, RabbitMQ)
- Object Storage (S3, GCS)
- CDN (CloudFront, Akamai)
Draw Architecture:
[Clients] → [CDN]
↓
[Load Balancer]
↓
[Application Servers]
↙ ↓ ↘
[Cache] [DB] [Queue] → [Workers]
↓
[Object Storage]
4. Database Design
SQL vs NoSQL Decision:
Use SQL when:
- ACID transactions required
- Complex queries with JOINs
- Structured data with relationships
- Examples: PostgreSQL, MySQL
Use NoSQL when:
- Massive scale (horizontal scaling)
- Flexible schema
- High write throughput
- Examples: Cassandra, DynamoDB, MongoDB
Sharding Strategy:
- Hash-based:
user_id % num_shards - Range-based: Users 1-100M on shard 1
- Geographic: US users on US shard
- Consistent hashing: For even distribution
Schema Design:
-- Example: URL Shortener
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_url VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count INT DEFAULT 0,
INDEX (short_url),
INDEX (user_id)
);
5. Deep Dive Components
Caching Strategy:
- Cache-Aside: App reads from cache, loads from DB on miss
- Write-Through: Write to cache and DB together
- Write-Behind: Write to cache, async write to DB
Eviction Policies:
- LRU (Least Recently Used) - Most common
- LFU (Least Frequently Used)
- TTL (Time To Live)
Load Balancing:
- Round Robin: Simple, equal distribution
- Least Connections: Route to least busy server
- Consistent Hashing: Minimize redistribution
- Weighted: Based on server capacity
Message Queue Patterns:
- Pub/Sub: One-to-many (notifications)
- Work Queue: Task distribution (job processing)
- Fan-out: Broadcast to multiple queues
6. Scalability Patterns
Horizontal Scaling:
- Add more servers
- Use load balancers
- Stateless application servers
- Session stored in cache/DB
Vertical Scaling:
- Add more CPU/RAM to servers
- Limited by hardware
- Simpler but has limits
Microservices:
Monolith:
[Single App] → [DB]
Microservices:
[User Service] → [User DB]
[Post Service] → [Post DB]
[Feed Service] → [Feed DB]
Benefits:
- Independent scaling
- Technology flexibility
- Fault isolation
Drawbacks:
- Increased complexity
- Network latency
- Distributed transactions
7. Reliability & Availability
Replication:
- Master-Slave: One writer, multiple readers
- Master-Master: Multiple writers (conflict resolution needed)
- Multi-region: Geographic redundancy
Failover:
- Active-Passive: Standby server takes over
- Active-Active: Both servers handle traffic
Rate Limiting:
- Token bucket algorithm
- Leaky bucket algorithm
- Fixed window counter
- Sliding window log
Circuit Breaker:
States:
Closed → Normal operation
Open → Reject requests immediately
Half-Open → Test if service recovered
8. Common System Design Patterns
Content Delivery:
- Use CDN for static assets
- Geo-distributed edge servers
- Cache at edge locations
Data Consistency:
- Strong Consistency: Read reflects latest write (ACID)
- Eventual Consistency: Reads eventually reflect write (BASE)
- CAP Theorem: Choose 2 of 3: Consistency, Availability, Partition Tolerance
API Design:
RESTful:
GET /api/users/{id}
POST /api/users
PUT /api/users/{id}
DELETE /api/users/{id}
GraphQL:
query {
user(id: "123") {
name
posts {
title
}
}
}
9. System Design Template
Use this structure (based on system_design/00_template.md):
# {System Name}
## 1. Requirements
### Functional
- [List core features]
### Non-Functional
- Scale: [Users, QPS, Data]
- Performance: [Latency requirements]
- Availability: [Uptime target]
## 2. Capacity Estimation
- Traffic: [QPS calculations]
- Storage: [Data size, growth]
- Bandwidth: [Network requirements]
## 3. API Design
[endpoint] - [description]
## 4. High-Level Architecture
[Diagram]
## 5. Database Schema
[Tables and relationships]
## 6. Detailed Design
### Component 1
[Deep dive]
### Component 2
[Deep dive]
## 7. Scalability
[How to scale each component]
## 8. Trade-offs
[Decisions and alternatives]
10. Real-World Examples
Reference case studies in system_design/:
- Netflix: Video streaming, recommendation
- Twitter: Timeline, tweet storage, trending
- Uber: Real-time matching, location tracking
- Instagram: Image storage, feed generation
- WhatsApp: Message delivery, presence
Common Patterns:
- News Feed: Fan-out on write vs fan-out on read
- Rate Limiter: Token bucket with Redis
- URL Shortener: Base62 encoding, hash collision
- Chat System: WebSocket, message queue
- Notification: Push notification service, APNs/FCM
Interview Tips
Time Management:
- Requirements: 10%
- High-level design: 25%
- Deep dive: 50%
- Wrap up: 15%
Communication:
- Think out loud
- Ask clarifying questions
- Discuss trade-offs
- Acknowledge limitations
What interviewers look for:
- Problem-solving approach
- Technical depth
- Trade-off analysis
- Scale awareness
- Communication skills
Common Mistakes to Avoid
- Jumping to solution without requirements
- Over-engineering simple problems
- Under-estimating scale requirements
- Ignoring single points of failure
- Not considering monitoring/alerting
- Forgetting about data consistency
- Missing security considerations
Project Context
- Templates in
system_design/00_template.md - Case studies in
system_design/*.md - Reference materials in
doc/system_design/ - Follow the established documentation pattern
More from yennanliu/cs_basics
code-refactor-master
Code refactoring expert for improving code quality, readability, maintainability, and performance. Specializes in Java and Python refactoring patterns, eliminating code smells, and applying clean code principles. Use when refactoring code, improving existing implementations, or cleaning up technical debt.
70markdown-doc-writer
Technical documentation writer specializing in creating clear, well-structured markdown documents for algorithms, system design, interview preparation, and code documentation. Use when writing README files, algorithm explanations, system design docs, or technical guides.
68java-python-code-reviewer
Comprehensive code reviewer for Java and Python implementations focusing on correctness, efficiency, code quality, and algorithmic optimization. Reviews LeetCode solutions, data structures, and algorithm implementations. Use when reviewing code, checking solutions, or providing feedback on implementations.
63java-developer
Expert Java developer for implementing LeetCode problems, data structures, and algorithms. Helps write clean, efficient Java code following best practices for competitive programming and interview preparation. Use when writing or implementing Java solutions.
55