System Architecture

When to Use

Activate this skill when:

Designing a new module, service, or major feature that requires structural decisions
Choosing between architectural approaches (e.g., where to place logic, how to structure data flow)
Planning database schema changes or refactoring existing schema
Making frontend state management decisions (server state vs client state, context vs store)
Evaluating technology trade-offs for a new capability
Creating or reviewing Architecture Decision Records (ADRs)
Setting up a new project or major subsystem from scratch

Input: If plan.md exists (from project-planner), read it for context about the feature scope and affected modules. Otherwise, work from the user's request directly.

Output: Write architecture decisions to architecture.md and create ADRs in docs/adr/ADR-NNN-<title>.md. Tell the user: "Architecture written to architecture.md. Run /api-design-patterns for API contracts or /task-decomposition for implementation tasks."

Do NOT use this skill for:

Writing implementation code (use python-backend-expert or react-frontend-expert)
API contract design or endpoint specifications (use api-design-patterns)
Testing patterns or strategies (use pytest-patterns or react-testing-patterns)
Deployment or infrastructure decisions (use docker-best-practices or deployment-pipeline)

Instructions

Project Layer Architecture

The standard Python/React full-stack architecture follows a layered pattern with strict dependency direction.

Backend Layers (FastAPI)

HTTP Request
    ↓
┌─────────────────────┐
│   Routers (routes/)  │  ← HTTP concerns: request parsing, response formatting, status codes
│                      │     Uses: Depends() for injection, Pydantic schemas for validation
├─────────────────────┤
│   Services           │  ← Business logic: orchestration, validation rules, domain operations
│   (services/)        │     No HTTP awareness. Raises domain exceptions, not HTTPException.
├─────────────────────┤
│   Repositories       │  ← Data access: queries, CRUD operations, database interactions
│   (repositories/)    │     No business logic. Returns model instances or None.
├─────────────────────┤
│   Models (models/)   │  ← SQLAlchemy ORM models: table definitions, relationships, indexes
│   Schemas (schemas/) │  ← Pydantic v2 models: request/response contracts, validation
└─────────────────────┘
    ↓
Database

Dependency direction rules:

Routers depend on Services (never on Repositories directly)
Services depend on Repositories (never on Routers)
Repositories depend on Models (never on Services)
Schemas are shared across layers but define no dependencies themselves
Never skip layers: no direct database access from routes

Dependency injection pattern:

# Router depends on Service via Depends()
@router.post("/users", response_model=UserResponse)
async def create_user(
    data: UserCreate,
    service: UserService = Depends(get_user_service),
) -> UserResponse:
    return await service.create_user(data)

# Service depends on Repository via constructor injection
class UserService:
    def __init__(self, repo: UserRepository) -> None:
        self.repo = repo

# Repository depends on AsyncSession via Depends()
class UserRepository:
    def __init__(self, session: AsyncSession) -> None:
        self.session = session

Frontend Layers (React/TypeScript)

┌─────────────────────┐
│   Pages (pages/)     │  ← Route-level components: data fetching, layout composition
├─────────────────────┤
│   Layouts            │  ← Page structure: navigation, sidebars, content areas
│   (layouts/)         │
├─────────────────────┤
│   Features           │  ← Domain-specific: UserProfile, OrderList, ChatPanel
│   (features/)        │     Composed from shared components + hooks
├─────────────────────┤
│   Shared Components  │  ← Reusable UI: Button, Modal, Table, Form, Input
│   (components/)      │     No business logic. Configurable via props.
├─────────────────────┤
│   Hooks (hooks/)     │  ← Custom hooks: useAuth, usePagination, useDebounce
│   API (api/)         │  ← API client functions, TanStack Query configurations
├─────────────────────┤
│   Types (types/)     │  ← Shared TypeScript interfaces and type definitions
└─────────────────────┘

Component dependency direction:

Pages import Features and Layouts
Features import Shared Components and Hooks
Shared Components import only other Shared Components and Types
Hooks import API functions and Types
API functions import Types only

Decision Framework

When facing architectural decisions, follow this structured process:

Step 1: Define the Problem

What capability is needed?
What are the non-functional requirements? (performance, scalability, maintainability)
What constraints exist? (team size, timeline, existing infrastructure)

Step 2: Identify Options

List 2-3 viable architectural approaches
For each option, document:
- How it works (brief technical description)
- Advantages
- Disadvantages
- Risks

Step 3: Evaluate Against Criteria

Criterion	Weight	Description
Maintainability	High	Can the team understand, modify, and debug this easily?
Testability	High	Can each component be tested in isolation?
Performance	Medium	Does it meet latency and throughput requirements?
Team familiarity	Medium	Does the team have experience with this approach?
Operational cost	Low	What are the infrastructure and maintenance costs?
Future flexibility	Low	How easily can this evolve as requirements change?

Step 4: Decide and Document

Choose the option that best satisfies the weighted criteria
Document the decision in an ADR (see references/architecture-decision-record-template.md)
Record what was NOT chosen and why — this context is valuable for future decisions

Step 5: Communicate

Share the ADR with the team
Identify any migration or rollout steps needed
Flag reversibility: is this a one-way door or a two-way door?

Database Schema Design

Design Principles

Start normalized (3NF) — Denormalize only for proven performance bottlenecks, not speculation
One migration per logical change — Each Alembic migration should represent a single, coherent schema modification
Always include downgrade — Every migration must have a working downgrade() function
Index strategically:
- Primary keys (automatic)
- Foreign keys (always)
- Columns in WHERE clauses of frequent queries
- Composite indexes for multi-column lookups
- Partial indexes for filtered queries (e.g., WHERE is_active = true)

SQLAlchemy 2.0 Async Patterns

# Model definition with Mapped types (SQLAlchemy 2.0 style)
class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[str] = mapped_column(String(255), unique=True, index=True)
    is_active: Mapped[bool] = mapped_column(default=True)
    created_at: Mapped[datetime] = mapped_column(server_default=func.now())

    # Relationships: ALWAYS use eager loading with async
    posts: Mapped[list["Post"]] = relationship(
        back_populates="author",
        lazy="selectin",  # or "joined" — NEVER "lazy" with async
    )

Async session rules:

One AsyncSession per request — never share across concurrent tasks
Use async with context manager for automatic cleanup
Map session boundaries to transaction boundaries
Use selectin or joined loading — lazy loading is incompatible with asyncio
Use run_sync() only as a last resort for legacy code

Migration Planning

Schema change → Generate migration: alembic revision --autogenerate -m "description"
Review generated migration — verify column types, indexes, constraints
Test upgrade: alembic upgrade head
Test downgrade: alembic downgrade -1
Test data preservation: ensure existing data survives the round-trip

Frontend Architecture

State Management Decision Tree

Is the data from the server?
├── YES → Use TanStack Query (useQuery, useMutation)
│         Configure staleTime, gcTime, query keys
│
└── NO → Is it needed across multiple components?
         ├── YES → Is it complex with actions/reducers?
         │         ├── YES → Use Zustand store
         │         └── NO  → Use React Context
         │
         └── NO → Use useState / useReducer locally

TanStack Query conventions:

Query keys: [resource, ...identifiers] (e.g., ["users", userId], ["posts", { page, limit }])
Use queryOptions() factory to centralize key + fn definitions — prevents copy-paste key errors
Set staleTime based on data freshness needs (default 0 is too aggressive for most cases)
Invalidate with invalidateQueries() after mutations — never manual refetch()
Handle all states: isPending, isError, data

Component design rules:

Props for configuration, hooks for data
Lift state only as high as needed — no premature context creation
Keep components under 200 lines — extract sub-components or custom hooks when larger
Use children and composition over deep prop drilling

Routing Structure

Organize routes to mirror the URL structure:

src/
├── pages/
│   ├── HomePage.tsx           → /
│   ├── LoginPage.tsx          → /login
│   ├── users/
│   │   ├── UserListPage.tsx   → /users
│   │   └── UserDetailPage.tsx → /users/:id
│   └── settings/
│       └── SettingsPage.tsx   → /settings

Cross-Cutting Concerns

Authentication Flow

Login Request
    ↓
Backend: Validate credentials → Generate JWT (access + refresh tokens)
    ↓
Frontend: Store access token in memory, refresh token in httpOnly cookie
    ↓
API Calls: Attach access token via Authorization header
    ↓
Token Expired: Use refresh token to obtain new access token
    ↓
Refresh Failed: Redirect to login

Architecture decisions for auth:

Access tokens: short-lived (15-30 min), stored in memory (not localStorage)
Refresh tokens: longer-lived (7-30 days), stored in httpOnly cookie
Backend: FastAPI Depends() chain for token validation → user extraction → permission check
Frontend: Auth context providing user, login(), logout(), isAuthenticated

Error Handling Strategy

Errors should be handled at the appropriate layer:

Layer	Error Type	Action
Router	`HTTPException`	Return HTTP error response with status code
Service	Domain exceptions	Raise custom exceptions (e.g., `UserNotFoundError`)
Repository	Database exceptions	Catch and re-raise as domain exceptions or let propagate
Frontend	API errors	Display user-friendly messages, retry where appropriate

Backend exception hierarchy:

class AppError(Exception):
    """Base application error."""

class NotFoundError(AppError):
    """Resource not found."""

class ConflictError(AppError):
    """Resource conflict (duplicate, version mismatch)."""

class ValidationError(AppError):
    """Business rule violation."""

Router-level exception handler maps domain exceptions to HTTP responses:

@app.exception_handler(NotFoundError)
async def not_found_handler(request: Request, exc: NotFoundError):
    return JSONResponse(status_code=404, content={"detail": str(exc)})

Logging Architecture

Backend (structlog):

Structured JSON logs in production
Human-readable console in development
Bind request context (request_id, user_id) at middleware level
Log at service layer (business events), not repository layer (too noisy)
Use log levels: DEBUG (development only), INFO (business events), WARNING (recoverable issues), ERROR (failures requiring attention)

Frontend:

console.* in development
Structured error reporting to backend or Sentry in production
Log user actions for debugging, not for analytics

Configuration Management

Backend (pydantic-settings):

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env")

    database_url: str
    redis_url: str = "redis://localhost:6379"
    jwt_secret: str
    debug: bool = False

Frontend (environment variables):

VITE_API_URL for API base URL
Build-time injection via Vite's import.meta.env
No secrets in frontend environment variables

Output Files

architecture.md

Write the architecture document to architecture.md at the project root:

# Architecture: [Feature/System Name]

## Overview
[1-2 sentence summary of the architectural approach]

## Layer Structure
[Backend and frontend layer descriptions from this skill's patterns]

## Key Decisions
[Summary of decisions made, with links to ADRs]

## Database Schema
[Entity descriptions, relationships, key indexes]

## Cross-Cutting Concerns
[Auth, error handling, logging approach]

## Next Steps
- Run `/api-design-patterns` to define API contracts
- Run `/task-decomposition` to create implementation tasks

ADRs

For each significant decision, create an ADR in docs/adr/:

# ADR-NNN: [Decision Title]

## Status
Accepted | Proposed | Superseded

## Context
[Why this decision is needed]

## Decision
[What we decided]

## Consequences
[Positive and negative outcomes]

Number ADRs sequentially (ADR-001, ADR-002, etc.).

Examples

Architecture Decision: Real-Time Notifications

Problem: The application needs real-time notifications for users (new messages, status updates).

Options evaluated:

Option	Pros	Cons
WebSocket	True bidirectional, low latency	Complex connection management, harder to scale
Server-Sent Events (SSE)	Simple, HTTP-based, auto-reconnect	Unidirectional (server→client only), limited browser connections
Polling	Simplest implementation, works everywhere	Higher latency, unnecessary server load

Decision: WebSocket for this use case.

Rationale: Notifications require low latency and the system will eventually need bidirectional communication (typing indicators, presence). SSE would work for notifications alone but would require a separate solution for future bidirectional needs. Polling introduces unacceptable latency for real-time UX.

Architecture:

Backend: FastAPI WebSocket endpoint with ConnectionManager class
Frontend: Custom useWebSocket hook with automatic reconnection
Scaling: Redis pub/sub for multi-instance message distribution
Persistence: Store notifications in database for offline users
Fallback: REST endpoint for notification history and initial load

See references/architecture-decision-record-template.md for the full ADR format.

Edge Cases

Monolith vs Microservices

Default to modular monolith for teams smaller than 10 developers. A modular monolith provides:

Clear module boundaries without network overhead
Shared database with module-specific schemas
Easy refactoring and code navigation
Simple deployment and debugging

Consider microservices only when:

Independent scaling is required for specific components
Different modules need different technology stacks
Team size exceeds 10 and ownership boundaries are clear
Deployment independence is a business requirement

Migration path: Design module boundaries in the monolith as if they were services (no direct cross-module database access, communicate via service interfaces). This makes extraction to microservices straightforward when needed.

When to Break the Layer Pattern

The strict Router → Service → Repository pattern should be followed for standard CRUD operations. Acceptable exceptions:

Background tasks: May call services directly without going through a router
Event handlers: Domain event listeners may call services from any context
CLI commands: Management scripts may access services or repositories directly
Migrations: Data migrations may access models directly (no service/repo layer needed)
Health checks: May access the database directly for simple connectivity verification

In all cases, business logic should still live in the service layer — these exceptions are about the entry point, not about bypassing business rules.

Evolving Architecture

When the architecture needs to change:

Write an ADR documenting the motivation and the proposed change
Identify all affected modules and their dependencies
Plan an incremental migration — never big-bang rewrites
Maintain backward compatibility during transition (strangler fig pattern)
Set a deadline for completing the migration and removing legacy code

system-architecture