system-architecture
System Architecture
When to Use
Activate this skill when:
- Designing a new module, service, or major feature that requires structural decisions
- Choosing between architectural approaches (e.g., where to place logic, how to structure data flow)
- Planning database schema changes or refactoring existing schema
- Making frontend state management decisions (server state vs client state, context vs store)
- Evaluating technology trade-offs for a new capability
- Creating or reviewing Architecture Decision Records (ADRs)
- Setting up a new project or major subsystem from scratch
Input: If plan.md exists (from project-planner), read it for context about the feature scope and affected modules. Otherwise, work from the user's request directly.
Output: Write architecture decisions to architecture.md and create ADRs in docs/adr/ADR-NNN-<title>.md. Tell the user: "Architecture written to architecture.md. Run /api-design-patterns for API contracts or /task-decomposition for implementation tasks."
Do NOT use this skill for:
- Writing implementation code (use
python-backend-expertorreact-frontend-expert) - API contract design or endpoint specifications (use
api-design-patterns) - Testing patterns or strategies (use
pytest-patternsorreact-testing-patterns) - Deployment or infrastructure decisions (use
docker-best-practicesordeployment-pipeline)
Instructions
Project Layer Architecture
The standard Python/React full-stack architecture follows a layered pattern with strict dependency direction.
Backend Layers (FastAPI)
HTTP Request
↓
┌─────────────────────┐
│ Routers (routes/) │ ← HTTP concerns: request parsing, response formatting, status codes
│ │ Uses: Depends() for injection, Pydantic schemas for validation
├─────────────────────┤
│ Services │ ← Business logic: orchestration, validation rules, domain operations
│ (services/) │ No HTTP awareness. Raises domain exceptions, not HTTPException.
├─────────────────────┤
│ Repositories │ ← Data access: queries, CRUD operations, database interactions
│ (repositories/) │ No business logic. Returns model instances or None.
├─────────────────────┤
│ Models (models/) │ ← SQLAlchemy ORM models: table definitions, relationships, indexes
│ Schemas (schemas/) │ ← Pydantic v2 models: request/response contracts, validation
└─────────────────────┘
↓
Database
Dependency direction rules:
- Routers depend on Services (never on Repositories directly)
- Services depend on Repositories (never on Routers)
- Repositories depend on Models (never on Services)
- Schemas are shared across layers but define no dependencies themselves
- Never skip layers: no direct database access from routes
Dependency injection pattern:
# Router depends on Service via Depends()
@router.post("/users", response_model=UserResponse)
async def create_user(
data: UserCreate,
service: UserService = Depends(get_user_service),
) -> UserResponse:
return await service.create_user(data)
# Service depends on Repository via constructor injection
class UserService:
def __init__(self, repo: UserRepository) -> None:
self.repo = repo
# Repository depends on AsyncSession via Depends()
class UserRepository:
def __init__(self, session: AsyncSession) -> None:
self.session = session
Frontend Layers (React/TypeScript)
┌─────────────────────┐
│ Pages (pages/) │ ← Route-level components: data fetching, layout composition
├─────────────────────┤
│ Layouts │ ← Page structure: navigation, sidebars, content areas
│ (layouts/) │
├─────────────────────┤
│ Features │ ← Domain-specific: UserProfile, OrderList, ChatPanel
│ (features/) │ Composed from shared components + hooks
├─────────────────────┤
│ Shared Components │ ← Reusable UI: Button, Modal, Table, Form, Input
│ (components/) │ No business logic. Configurable via props.
├─────────────────────┤
│ Hooks (hooks/) │ ← Custom hooks: useAuth, usePagination, useDebounce
│ API (api/) │ ← API client functions, TanStack Query configurations
├─────────────────────┤
│ Types (types/) │ ← Shared TypeScript interfaces and type definitions
└─────────────────────┘
Component dependency direction:
- Pages import Features and Layouts
- Features import Shared Components and Hooks
- Shared Components import only other Shared Components and Types
- Hooks import API functions and Types
- API functions import Types only
Decision Framework
When facing architectural decisions, follow this structured process:
Step 1: Define the Problem
- What capability is needed?
- What are the non-functional requirements? (performance, scalability, maintainability)
- What constraints exist? (team size, timeline, existing infrastructure)
Step 2: Identify Options
- List 2-3 viable architectural approaches
- For each option, document:
- How it works (brief technical description)
- Advantages
- Disadvantages
- Risks
Step 3: Evaluate Against Criteria
| Criterion | Weight | Description |
|---|---|---|
| Maintainability | High | Can the team understand, modify, and debug this easily? |
| Testability | High | Can each component be tested in isolation? |
| Performance | Medium | Does it meet latency and throughput requirements? |
| Team familiarity | Medium | Does the team have experience with this approach? |
| Operational cost | Low | What are the infrastructure and maintenance costs? |
| Future flexibility | Low | How easily can this evolve as requirements change? |
Step 4: Decide and Document
- Choose the option that best satisfies the weighted criteria
- Document the decision in an ADR (see
references/architecture-decision-record-template.md) - Record what was NOT chosen and why — this context is valuable for future decisions
Step 5: Communicate
- Share the ADR with the team
- Identify any migration or rollout steps needed
- Flag reversibility: is this a one-way door or a two-way door?
Database Schema Design
Design Principles
- Start normalized (3NF) — Denormalize only for proven performance bottlenecks, not speculation
- One migration per logical change — Each Alembic migration should represent a single, coherent schema modification
- Always include downgrade — Every migration must have a working
downgrade()function - Index strategically:
- Primary keys (automatic)
- Foreign keys (always)
- Columns in WHERE clauses of frequent queries
- Composite indexes for multi-column lookups
- Partial indexes for filtered queries (e.g.,
WHERE is_active = true)
SQLAlchemy 2.0 Async Patterns
# Model definition with Mapped types (SQLAlchemy 2.0 style)
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
email: Mapped[str] = mapped_column(String(255), unique=True, index=True)
is_active: Mapped[bool] = mapped_column(default=True)
created_at: Mapped[datetime] = mapped_column(server_default=func.now())
# Relationships: ALWAYS use eager loading with async
posts: Mapped[list["Post"]] = relationship(
back_populates="author",
lazy="selectin", # or "joined" — NEVER "lazy" with async
)
Async session rules:
- One
AsyncSessionper request — never share across concurrent tasks - Use
async withcontext manager for automatic cleanup - Map session boundaries to transaction boundaries
- Use
selectinorjoinedloading — lazy loading is incompatible with asyncio - Use
run_sync()only as a last resort for legacy code
Migration Planning
- Schema change → Generate migration:
alembic revision --autogenerate -m "description" - Review generated migration — verify column types, indexes, constraints
- Test upgrade:
alembic upgrade head - Test downgrade:
alembic downgrade -1 - Test data preservation: ensure existing data survives the round-trip
Frontend Architecture
State Management Decision Tree
Is the data from the server?
├── YES → Use TanStack Query (useQuery, useMutation)
│ Configure staleTime, gcTime, query keys
│
└── NO → Is it needed across multiple components?
├── YES → Is it complex with actions/reducers?
│ ├── YES → Use Zustand store
│ └── NO → Use React Context
│
└── NO → Use useState / useReducer locally
TanStack Query conventions:
- Query keys:
[resource, ...identifiers](e.g.,["users", userId],["posts", { page, limit }]) - Use
queryOptions()factory to centralize key + fn definitions — prevents copy-paste key errors - Set
staleTimebased on data freshness needs (default 0 is too aggressive for most cases) - Invalidate with
invalidateQueries()after mutations — never manualrefetch() - Handle all states:
isPending,isError,data
Component design rules:
- Props for configuration, hooks for data
- Lift state only as high as needed — no premature context creation
- Keep components under 200 lines — extract sub-components or custom hooks when larger
- Use
childrenand composition over deep prop drilling
Routing Structure
Organize routes to mirror the URL structure:
src/
├── pages/
│ ├── HomePage.tsx → /
│ ├── LoginPage.tsx → /login
│ ├── users/
│ │ ├── UserListPage.tsx → /users
│ │ └── UserDetailPage.tsx → /users/:id
│ └── settings/
│ └── SettingsPage.tsx → /settings
Cross-Cutting Concerns
Authentication Flow
Login Request
↓
Backend: Validate credentials → Generate JWT (access + refresh tokens)
↓
Frontend: Store access token in memory, refresh token in httpOnly cookie
↓
API Calls: Attach access token via Authorization header
↓
Token Expired: Use refresh token to obtain new access token
↓
Refresh Failed: Redirect to login
Architecture decisions for auth:
- Access tokens: short-lived (15-30 min), stored in memory (not localStorage)
- Refresh tokens: longer-lived (7-30 days), stored in httpOnly cookie
- Backend: FastAPI
Depends()chain for token validation → user extraction → permission check - Frontend: Auth context providing
user,login(),logout(),isAuthenticated
Error Handling Strategy
Errors should be handled at the appropriate layer:
| Layer | Error Type | Action |
|---|---|---|
| Router | HTTPException |
Return HTTP error response with status code |
| Service | Domain exceptions | Raise custom exceptions (e.g., UserNotFoundError) |
| Repository | Database exceptions | Catch and re-raise as domain exceptions or let propagate |
| Frontend | API errors | Display user-friendly messages, retry where appropriate |
Backend exception hierarchy:
class AppError(Exception):
"""Base application error."""
class NotFoundError(AppError):
"""Resource not found."""
class ConflictError(AppError):
"""Resource conflict (duplicate, version mismatch)."""
class ValidationError(AppError):
"""Business rule violation."""
Router-level exception handler maps domain exceptions to HTTP responses:
@app.exception_handler(NotFoundError)
async def not_found_handler(request: Request, exc: NotFoundError):
return JSONResponse(status_code=404, content={"detail": str(exc)})
Logging Architecture
Backend (structlog):
- Structured JSON logs in production
- Human-readable console in development
- Bind request context (request_id, user_id) at middleware level
- Log at service layer (business events), not repository layer (too noisy)
- Use log levels: DEBUG (development only), INFO (business events), WARNING (recoverable issues), ERROR (failures requiring attention)
Frontend:
console.*in development- Structured error reporting to backend or Sentry in production
- Log user actions for debugging, not for analytics
Configuration Management
Backend (pydantic-settings):
class Settings(BaseSettings):
model_config = SettingsConfigDict(env_file=".env")
database_url: str
redis_url: str = "redis://localhost:6379"
jwt_secret: str
debug: bool = False
Frontend (environment variables):
VITE_API_URLfor API base URL- Build-time injection via Vite's
import.meta.env - No secrets in frontend environment variables
Output Files
architecture.md
Write the architecture document to architecture.md at the project root:
# Architecture: [Feature/System Name]
## Overview
[1-2 sentence summary of the architectural approach]
## Layer Structure
[Backend and frontend layer descriptions from this skill's patterns]
## Key Decisions
[Summary of decisions made, with links to ADRs]
## Database Schema
[Entity descriptions, relationships, key indexes]
## Cross-Cutting Concerns
[Auth, error handling, logging approach]
## Next Steps
- Run `/api-design-patterns` to define API contracts
- Run `/task-decomposition` to create implementation tasks
ADRs
For each significant decision, create an ADR in docs/adr/:
# ADR-NNN: [Decision Title]
## Status
Accepted | Proposed | Superseded
## Context
[Why this decision is needed]
## Decision
[What we decided]
## Consequences
[Positive and negative outcomes]
Number ADRs sequentially (ADR-001, ADR-002, etc.).
Examples
Architecture Decision: Real-Time Notifications
Problem: The application needs real-time notifications for users (new messages, status updates).
Options evaluated:
| Option | Pros | Cons |
|---|---|---|
| WebSocket | True bidirectional, low latency | Complex connection management, harder to scale |
| Server-Sent Events (SSE) | Simple, HTTP-based, auto-reconnect | Unidirectional (server→client only), limited browser connections |
| Polling | Simplest implementation, works everywhere | Higher latency, unnecessary server load |
Decision: WebSocket for this use case.
Rationale: Notifications require low latency and the system will eventually need bidirectional communication (typing indicators, presence). SSE would work for notifications alone but would require a separate solution for future bidirectional needs. Polling introduces unacceptable latency for real-time UX.
Architecture:
- Backend: FastAPI WebSocket endpoint with
ConnectionManagerclass - Frontend: Custom
useWebSockethook with automatic reconnection - Scaling: Redis pub/sub for multi-instance message distribution
- Persistence: Store notifications in database for offline users
- Fallback: REST endpoint for notification history and initial load
See references/architecture-decision-record-template.md for the full ADR format.
Edge Cases
Monolith vs Microservices
Default to modular monolith for teams smaller than 10 developers. A modular monolith provides:
- Clear module boundaries without network overhead
- Shared database with module-specific schemas
- Easy refactoring and code navigation
- Simple deployment and debugging
Consider microservices only when:
- Independent scaling is required for specific components
- Different modules need different technology stacks
- Team size exceeds 10 and ownership boundaries are clear
- Deployment independence is a business requirement
Migration path: Design module boundaries in the monolith as if they were services (no direct cross-module database access, communicate via service interfaces). This makes extraction to microservices straightforward when needed.
When to Break the Layer Pattern
The strict Router → Service → Repository pattern should be followed for standard CRUD operations. Acceptable exceptions:
- Background tasks: May call services directly without going through a router
- Event handlers: Domain event listeners may call services from any context
- CLI commands: Management scripts may access services or repositories directly
- Migrations: Data migrations may access models directly (no service/repo layer needed)
- Health checks: May access the database directly for simple connectivity verification
In all cases, business logic should still live in the service layer — these exceptions are about the entry point, not about bypassing business rules.
Evolving Architecture
When the architecture needs to change:
- Write an ADR documenting the motivation and the proposed change
- Identify all affected modules and their dependencies
- Plan an incremental migration — never big-bang rewrites
- Maintain backward compatibility during transition (strangler fig pattern)
- Set a deadline for completing the migration and removing legacy code