AI Architecture Patterns
AI Architecture Patterns Skill
Purpose
Provide expert guidance on selecting and implementing enterprise AI architecture patterns for production systems. This skill contains battle-tested patterns from real-world deployments and the AI Architect Academy.
When to Use
- Designing new AI systems
- Evaluating architecture options
- Selecting patterns for specific use cases
- Understanding tradeoffs between approaches
- Getting implementation guidance
Core Patterns Library
1. AI Gateway Pattern
Problem: Multiple AI services with inconsistent interfaces, no centralized security, and limited observability create management complexity and security risks.
Solution: Deploy a centralized AI gateway that provides unified authentication, rate limiting, request/response logging, and model routing for all AI services.
Key Components:
- API Gateway (Kong, AWS API Gateway, OCI API Gateway)
- Authentication Service (OAuth2, API Keys)
- Rate Limiter (Redis-based)
- Request Logger (OpenTelemetry)
- Model Router
When to Use:
- Multiple AI providers in your stack
- Need centralized security controls
- Want unified logging and monitoring
- Cost allocation across teams
When NOT to Use:
- Single AI provider with simple use case
- Ultra-low latency requirements (<10ms)
- Early prototyping
2. RAG Production Pattern
Problem: LLMs hallucinate and lack access to enterprise-specific knowledge, making them unreliable for business-critical applications.
Solution: Implement a RAG pipeline with document ingestion, chunking, embedding, vector storage, retrieval, and augmented generation with source citations.
Key Components:
- Document Ingestion Pipeline
- Text Chunking Service (semantic, fixed, hybrid)
- Embedding Model (OpenAI, Cohere, Local)
- Vector Database (Pinecone, Weaviate, pgvector)
- Retrieval Service with Reranking
- LLM with RAG prompt template
When to Use:
- Customer support knowledge base
- Internal document Q&A
- Legal/compliance document analysis
- Technical documentation assistants
When NOT to Use:
- General creative writing
- Real-time frequently changing data
- Very small document corpus (<100 docs)
Implementation Tips:
- Start with fixed-size chunks (512-1024 tokens)
- Add metadata extraction for filtering
- Implement hybrid search (keyword + semantic)
- Use reranking for improved precision
3. Multi-Agent Orchestration Pattern
Problem: Complex tasks require multiple specialized capabilities that exceed what a single LLM prompt can handle reliably.
Solution: Decompose complex workflows into specialized agents with an orchestrator that coordinates task distribution, handoffs, and result aggregation.
Key Components:
- Orchestrator Agent (workflow coordinator)
- Specialized Worker Agents (domain experts)
- Task Queue (for async processing)
- State Management (context preservation)
- Handoff Protocol (agent-to-agent communication)
- Result Aggregator
When to Use:
- Complex workflows with 5+ distinct steps
- Tasks requiring different expertise
- Autonomous systems
- Workflows with branching logic
When NOT to Use:
- Simple single-step tasks
- When cost is primary constraint
- High-volume, low-complexity operations
Frameworks:
- LangGraph (graph-based orchestration)
- Claude Agent SDK
- AutoGen / CrewAI
4. MCP Server Architecture
Problem: N agents x M tools = N*M integrations. Each AI agent needs custom code to integrate with each tool.
Solution: Implement MCP (Model Context Protocol) servers that provide standardized interfaces for tools, resources, and prompts.
Key Components:
- MCP Server (Node.js or Python)
- Tool Definitions (JSON Schema)
- Resource Providers
- Prompt Templates
- Transport Layer (stdio, SSE)
When to Use:
- Building tools for multiple AI agents
- Creating reusable integrations
- Claude Code environments
- Enterprise tool standardization
Implementation:
import { Server } from '@modelcontextprotocol/sdk/server';
const server = new Server({
name: 'my-mcp-server',
version: '1.0.0'
});
server.tool('search', {
description: 'Search documents',
inputSchema: {
type: 'object',
properties: {
query: { type: 'string' }
}
},
handler: async ({ query }) => {
// Implementation
}
});
5. LLMOps Pipeline Pattern
Problem: LLM applications lack mature DevOps practices, leading to unpredictable quality and difficult rollbacks.
Solution: Implement prompt versioning, automated evaluation, staged deployments, and continuous monitoring.
Key Components:
- Prompt Version Control (Git, Promptfoo)
- Evaluation Dataset
- Automated Eval Pipeline
- Deployment Orchestrator
- Monitoring Dashboard
- Rollback Mechanism
When to Use:
- Production LLM applications
- Teams with multiple prompt engineers
- Regulated industries
- High-stakes AI applications
Evaluation Metrics:
- Accuracy (vs golden answers)
- Latency (p50, p95, p99)
- Cost per request
- User satisfaction scores
6. Vector Database Selection Framework
Problem: Many vector database options with different tradeoffs. Wrong choice leads to expensive migrations.
Solution: Structured decision framework evaluating scale, features, operations, and cost.
Selection Matrix:
| Scale | Recommendation |
|---|---|
| <1M vectors | pgvector (simple), Chroma (prototyping) |
| 1-100M vectors | Weaviate, Qdrant (self-hosted) |
| 100M+ vectors | Pinecone, Milvus (managed) |
Key Considerations:
- Hybrid search support
- Metadata filtering
- Multi-tenancy
- Backup/restore
- Managed vs self-hosted
7. AI Center of Excellence Framework
Problem: Scattered AI initiatives across organization lead to duplicated effort, inconsistent quality, and security gaps.
Solution: Establish centralized governance with standardized patterns, reusable components, and shared infrastructure.
Key Components:
- Pattern Library (this skill!)
- Governance Framework
- Shared Infrastructure
- Training Program
- Review Board
- Metrics Dashboard
Governance Areas:
- Model selection criteria
- Security standards
- Cost controls
- Ethical guidelines
- Incident response
8. Security & Governance Pattern
Problem: AI introduces new security vectors: prompt injection, data leakage, model manipulation.
Solution: Implement AI-specific security controls including guardrails, PII handling, and audit logging.
Key Controls:
- Input Guardrails (prompt injection detection)
- Output Guardrails (content filtering)
- PII Detection & Redaction
- Audit Logging
- Access Control
- Compliance Reporting
Guardrails Implementation:
from guardrails import Guard
guard = Guard.from_pydantic(output_class=SafeResponse)
response = guard(
llm.invoke,
prompt=user_input,
on_fail="reask"
)
Pattern Selection Decision Tree
START: What type of AI system?
│
├── Document/Knowledge Q&A
│ └── → RAG Production Pattern
│ ├── Need multiple models? → + AI Gateway
│ └── Sensitive data? → + Security & Governance
│
├── Autonomous Agents
│ └── → Multi-Agent Orchestration
│ ├── Many tools? → + MCP Servers
│ └── Production deployment? → + LLMOps
│
├── Enterprise AI Platform
│ └── → AI Gateway + AI CoE Framework
│ ├── Cost concerns? → + Cost Optimization
│ └── Compliance? → + Security & Governance
│
└── Content Generation
└── → AI Gateway + LLMOps
└── Quality critical? → + Evaluation Pipeline
Pattern Combinations Matrix
| Use Case | Primary | Secondary | Tertiary |
|---|---|---|---|
| Customer Support Bot | RAG | AI Gateway | Security |
| Code Assistant | Multi-Agent | MCP Servers | LLMOps |
| Document Intelligence | RAG | Vector DB | AI Gateway |
| Enterprise AI Platform | AI Gateway | AI CoE | Security |
| Research Assistant | RAG | Multi-Agent | LLMOps |
Cloud Provider Mapping
AWS
- AI Gateway: API Gateway + Lambda
- RAG: Bedrock + OpenSearch
- Vector DB: OpenSearch, Aurora pgvector
GCP
- AI Gateway: Cloud Endpoints + Cloud Functions
- RAG: Vertex AI + Matching Engine
- Vector DB: Matching Engine, AlloyDB
Azure
- AI Gateway: API Management + Functions
- RAG: Azure OpenAI + AI Search
- Vector DB: AI Search, Cosmos DB
OCI
- AI Gateway: API Gateway + Functions
- RAG: OCI GenAI + OpenSearch
- Vector DB: OpenSearch, PostgreSQL
Resources
- GitHub: https://github.com/frankxai/ai-architect-academy
- Patterns Library: /01-design-patterns
- Learning Paths: /02-learning-paths
- Templates: /AI CoE Templates
Related Skills
mcp-architecture- MCP server developmentclaude-sdk- Agent development with Claudelanggraph-patterns- Graph-based agent workflowsoci-services-expert- Oracle Cloud guidance
Part of the AI Architect Academy by FrankX.AI
More from frankxai/ai-and-web3
cacos
Claude Agentic Creator OS - Native Claude Code implementation
1test-driven development
Enforce RED-GREEN-REFACTOR cycle for reliable, maintainable code
1spartan warrior
Embody the unbreakable Spartan ethos of discipline, courage, and relentless excellence through laconic wisdom and warrior mentality forged in hardship
1framer expert
Expert in Framer design and development - from interactive prototypes to production sites with Framer Motion, CMS integration, and the Framer MCP server
1skill creator
Meta-skill for creating high-quality Claude Code skills
1golden path
The journey Life Book - walk your path to the Golden Age through 7 Waypoints
1