context-engineering
Context Engineering: Optimizing AI Context Windows
Master the art of context engineering for AI applications - optimizing prompts, managing tokens, and designing effective context strategies.
Triggers
Use this skill when:
- Optimizing LLM prompts for better results
- Managing context window limits
- Implementing RAG (Retrieval Augmented Generation)
- Designing AI application architectures
- Reducing token costs while maintaining quality
- Keywords: context, prompt, tokens, RAG, context window, prompt engineering, token budget, retrieval, embedding
Core Concepts
Context Window Anatomy
┌─────────────────────────────────────────────────────┐
│ CONTEXT WINDOW │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ SYSTEM PROMPT (Fixed) │ │
│ │ - Identity & role │ │
│ │ - Behavioral rules │ │
│ │ - Output format │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ RETRIEVED CONTEXT (Dynamic) │ │
│ │ - Relevant documents │ │
│ │ - Code snippets │ │
│ │ - Reference data │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ CONVERSATION HISTORY (Growing) │ │
│ │ - Previous messages │ │
│ │ - Tool results │ │
│ │ - Intermediate outputs │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ CURRENT INPUT (Variable) │ │
│ │ - User query │ │
│ │ - Inline context │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ OUTPUT SPACE (Reserved) │ │
│ │ - max_tokens allocation │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Token Budget Planning
| Component | Typical Allocation | Notes |
|---|---|---|
| System Prompt | 500-2000 tokens | Keep stable |
| Retrieved Context | 2000-10000 tokens | Scale with need |
| History | 1000-5000 tokens | Compress over time |
| Current Input | 100-1000 tokens | User controlled |
| Output Reserve | 1000-4096 tokens | Task dependent |
Pattern 1: System Prompt Design
Layered System Prompt
# System Prompt Structure
## Layer 1: Identity (Always First)
You are [role description]. Your purpose is [primary function].
## Layer 2: Capabilities
You have access to:
- [Capability 1]
- [Capability 2]
## Layer 3: Behavioral Rules
ALWAYS:
- [Rule 1]
- [Rule 2]
NEVER:
- [Constraint 1]
- [Constraint 2]
## Layer 4: Output Format
When responding:
- [Format guideline 1]
- [Format guideline 2]
## Layer 5: Context Hints (Dynamic)
Current context: [injected at runtime]
Compression Techniques
Before (verbose):
You are a helpful AI assistant that specializes in helping users with
coding tasks. When a user asks you to write code, you should first
understand what they're trying to accomplish, then write clean and
well-documented code that follows best practices.
After (compressed):
Role: Coding assistant
Process: Understand task -> Write clean, documented, best-practice code
Pattern 2: Dynamic Context Injection
Context Template System
def build_context(task: str, retrieved_docs: list, history: list) -> str:
template = """# Task
{task}
# Relevant Context
{context}
# Conversation History
{history}
# Instructions
Respond based on the context provided. If information is missing, say so.
"""
return template.format(
task=task,
context=format_docs(retrieved_docs),
history=format_history(history)
)
def format_docs(docs: list, max_tokens: int = 5000) -> str:
formatted = []
current_tokens = 0
for doc in sorted(docs, key=lambda d: d.relevance, reverse=True):
doc_tokens = count_tokens(doc.content)
if current_tokens + doc_tokens > max_tokens:
break
formatted.append(f"## {doc.title}\n{doc.content}")
current_tokens += doc_tokens
return "\n\n".join(formatted)
Priority-Based Inclusion
class ContextPriority:
CRITICAL = 1 # Always include
HIGH = 2 # Include if space
MEDIUM = 3 # Include if plenty of space
LOW = 4 # Include only if necessary
def select_context(items: list, budget: int) -> list:
selected = []
remaining = budget
# Sort by priority, then relevance
sorted_items = sorted(items, key=lambda x: (x.priority, -x.relevance))
for item in sorted_items:
tokens = count_tokens(item.content)
if tokens <= remaining:
selected.append(item)
remaining -= tokens
elif item.priority == ContextPriority.CRITICAL:
# Summarize critical items if they don't fit
summary = summarize(item.content, remaining)
selected.append(item._replace(content=summary))
break
return selected
Pattern 3: Conversation Summarization
Rolling Summary
class ConversationManager:
def __init__(self, max_history_tokens: int = 3000):
self.messages = []
self.summary = ""
self.max_tokens = max_history_tokens
def add_message(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
self._maybe_summarize()
def _maybe_summarize(self):
total_tokens = sum(count_tokens(m["content"]) for m in self.messages)
if total_tokens > self.max_tokens:
# Keep last N messages
keep_recent = 4
to_summarize = self.messages[:-keep_recent]
recent = self.messages[-keep_recent:]
# Summarize older messages
new_summary = self._create_summary(to_summarize)
self.summary = f"{self.summary}\n{new_summary}".strip()
self.messages = recent
def get_context(self) -> str:
parts = []
if self.summary:
parts.append(f"[Previous conversation summary: {self.summary}]")
parts.extend([f"{m['role']}: {m['content']}" for m in self.messages])
return "\n\n".join(parts)
Hierarchical Memory
MEMORY LEVELS
Level 1: Working Memory (Current Context)
- Last few exchanges
- Current task details
- Active tool results
Level 2: Session Memory (Summarized)
- Earlier conversation summary
- Key decisions made
- Important context established
Level 3: Long-term Memory (Retrieved)
- Past session summaries
- User preferences
- Project knowledge
Pattern 4: RAG (Retrieval Augmented Generation)
Basic RAG Pipeline
class RAGPipeline:
def __init__(self, embedder, vector_store, llm):
self.embedder = embedder
self.store = vector_store
self.llm = llm
def query(self, question: str, k: int = 5) -> str:
# 1. Embed query
query_embedding = self.embedder.embed(question)
# 2. Retrieve relevant docs
docs = self.store.similarity_search(query_embedding, k=k)
# 3. Build context
context = self._build_context(question, docs)
# 4. Generate response
return self.llm.generate(context)
def _build_context(self, question: str, docs: list) -> str:
doc_text = "\n\n".join([
f"Source: {d.metadata.get('source', 'unknown')}\n{d.content}"
for d in docs
])
return f"""Based on the following context, answer the question.
Context:
{doc_text}
Question: {question}
Answer:"""
Advanced RAG Techniques
Hybrid Search:
def hybrid_search(query: str, k: int = 5):
# Semantic search
semantic_results = vector_search(query, k=k*2)
# Keyword search
keyword_results = bm25_search(query, k=k*2)
# Combine and dedupe
combined = merge_results(semantic_results, keyword_results)
# Rerank
return rerank(query, combined, k=k)
Query Expansion:
def expand_query(original_query: str) -> list:
expansion_prompt = f"""Generate 3 alternative phrasings for this query:
{original_query}
Return as JSON list."""
alternatives = llm.generate(expansion_prompt)
return [original_query] + json.loads(alternatives)
Pattern 5: Token Optimization
Techniques
| Technique | Savings | Trade-off |
|---|---|---|
| Abbreviations | 10-20% | Readability |
| Remove examples | 20-40% | Clarity |
| Bullet points | 15-25% | Formatting |
| Summarization | 50-80% | Detail loss |
| Selective inclusion | Variable | Coverage |
Implementation
def optimize_context(content: str, target_tokens: int) -> str:
current_tokens = count_tokens(content)
if current_tokens <= target_tokens:
return content
# Try progressive compression
strategies = [
remove_redundant_whitespace,
abbreviate_common_terms,
remove_examples,
extract_key_points,
aggressive_summarize
]
for strategy in strategies:
content = strategy(content)
if count_tokens(content) <= target_tokens:
return content
# Last resort: truncate
return truncate_to_tokens(content, target_tokens)
Pattern 6: Context Window Monitoring
Token Tracking
class TokenTracker:
def __init__(self, model: str):
self.model = model
self.limit = get_context_limit(model)
self.usage = {
"system": 0,
"context": 0,
"history": 0,
"input": 0,
"reserved": 4096 # For output
}
def update(self, component: str, content: str):
self.usage[component] = count_tokens(content)
@property
def available(self) -> int:
used = sum(self.usage.values())
return self.limit - used
@property
def utilization(self) -> float:
return sum(self.usage.values()) / self.limit
def can_add(self, content: str) -> bool:
return count_tokens(content) <= self.available
def report(self) -> str:
return f"""Token Usage:
- System: {self.usage['system']}
- Context: {self.usage['context']}
- History: {self.usage['history']}
- Input: {self.usage['input']}
- Reserved: {self.usage['reserved']}
- Available: {self.available}
- Utilization: {self.utilization:.1%}"""
Best Practices
System Prompts
- Front-load important instructions: Models attend more to beginning
- Use clear structure: Headers, bullets, consistent formatting
- Be specific: Vague instructions get vague results
- Test variations: Small changes can have big impacts
- Version control: Track what works
Context Selection
- Relevance over recency: Most relevant, not most recent
- Diversity: Include different perspectives
- Source attribution: Help model cite correctly
- Chunking strategy: Match chunk size to use case
- Metadata inclusion: Add context about context
Token Management
- Reserve output space: Don't fill entire context
- Monitor utilization: Track across sessions
- Compress proactively: Before hitting limits
- Cache summaries: Don't re-summarize repeatedly
- Profile costs: Know your token spend
Quick Reference
Token Counts (Approximate)
| Content Type | Tokens/Item |
|---|---|
| English word | 1.3 |
| Code line | 10-15 |
| Paragraph | 50-100 |
| Page of text | 500-750 |
| JSON object | 20-50 |
Model Context Limits
| Model | Context Limit |
|---|---|
| Claude Opus 4.5 | 200K |
| Claude Sonnet 4 | 200K |
| Claude Haiku 3.5 | 200K |
| GPT-4 Turbo | 128K |
| GPT-4o | 128K |
Notes
- Context engineering is iterative - test and refine
- Different tasks need different context strategies
- Monitor both quality and cost
- Cache aggressively where possible
- Document your context architecture
More from housegarofalo/claude-code-base
mqtt-iot
Configure MQTT brokers (Mosquitto, EMQX) for IoT messaging, device communication, and smart home integration. Manage topics, QoS levels, authentication, and bridging. Use when setting up IoT messaging, smart home communication, or device-to-cloud connectivity. (project)
22devops-engineer-agent
Infrastructure and DevOps specialist. Manages Docker, Kubernetes, CI/CD pipelines, and cloud deployments. Expert in GitHub Actions, Azure DevOps, Terraform, and container orchestration. Use for deployment automation, infrastructure setup, or CI/CD optimization.
6postgresql
Design, optimize, and manage PostgreSQL databases. Covers indexing, pgvector for AI embeddings, JSON operations, full-text search, and query optimization. Use when working with PostgreSQL, database design, or building data-intensive applications.
6home-assistant
Ultimate Home Assistant skill - complete administration, wireless protocols (Zigbee/ZHA/Z2M, Z-Wave JS, Thread, Matter), ESPHome device building, advanced troubleshooting, performance optimization, security hardening, custom integration development, and professional dashboard design. Covers configuration, REST API, automation debugging, database optimization, SSL/TLS, Jinja2 templating, and HACS custom cards. Use for any HA task.
6testing
Comprehensive testing skill covering unit, integration, and E2E testing with pytest, Jest, Cypress, and Playwright. Use for writing tests, improving coverage, debugging test failures, and setting up testing infrastructure.
5react-typescript
Build modern React applications with TypeScript. Covers React 18+ patterns, hooks, component architecture, state management (Zustand, Redux Toolkit), server components, and best practices. Use for React development, TypeScript integration, component design, and frontend architecture.
5