dspy-code
DSPy-Code Skill
Specialized AI assistant for building LLM applications with DSPy
When to Use This Skill
Activate dspy-code for:
Development Tasks
- Creating DSPy modules, signatures, and pipelines
- Building RAG (Retrieval-Augmented Generation) systems
- Implementing multi-hop reasoning and complex workflows
- Designing typed outputs with Pydantic schemas
- Creating agents with tool use (ReAct patterns)
- Writing custom metrics and evaluation functions
Optimization Tasks
- Running prompt optimization with GEPA
- Compiling modules with BootstrapFewShot, MIPRO, COPRO
- Hyperparameter tuning and grid search
- A/B testing optimized vs unoptimized modules
- Statistical significance testing
- Performance benchmarking
Project Management
- Initializing new DSPy projects
- Connecting to existing workspaces
- Generating demos from templates
- Validating DSPy code for correctness
- Exporting to Python, JSON, YAML formats
- Session tracking and history
Learning & Exploration
- Understanding DSPy patterns and anti-patterns
- Choosing the right predictor for your task
- Selecting optimal optimizers based on data size
- Learning about 10 predictors, 11 optimizers, 4 adapters
- Exploring 12 production-ready module templates
Key Principle
Use dspy-code for ALL DSPy-related development
DSPy is fundamentally different from traditional prompt engineering:
- Programming, not prompting - Write declarative modules, not brittle prompts
- Signatures define contracts - Clear input/output specifications
- Optimizers tune automatically - No manual prompt engineering
- Composition over monoliths - Build complex programs from simple modules
Core Capabilities
1. Deep DSPy Knowledge
10 Predictor Types:
Predict- Basic predictorChainOfThought- CoT reasoningChainOfThoughtWithHint- CoT with hintsProgramOfThought- Code execution for mathReAct- Reasoning + Acting for agentsMultiChainComparison- Compare multiple chainsRetrieve- Document retrievalTypedPredictor- Type-constrained outputsEnsemble- Multiple predictor votingmajority- Majority voting aggregation
11 Optimizer Types:
BootstrapFewShot- Example-based (10-50 examples, ⚡⚡⚡ fast)BootstrapFewShotWithRandomSearch- Hyperparameter tuning (50+, ⚡⚡)BootstrapFewShotWithOptuna- Optuna integration (50+, ⚡⚡)COPRO- Prompt optimization (50+, ⚡⚡, ⭐⭐⭐⭐)MIPRO- Multi-stage instruction (100+, ⚡, ⭐⭐⭐⭐⭐)MIPROv2- Enhanced MIPRO (200+, ⚡, ⭐⭐⭐⭐⭐)BetterTogether- Collaborative optimization (100+, ⚡⚡)Ensemble- Ensemble methods (100+, ⚡, ⭐⭐⭐⭐)KNNFewShot- KNN-based selection (100+, ⚡⚡, ⭐⭐⭐⭐)LabeledFewShot- Labeled examples (50+, ⚡⚡⚡)SignatureOptimizer- Signature tuning (100+, ⚡⚡)
4 Adapter Types:
ChatAdapter- Chat model integrationJSONAdapter- JSON output formattingFunctionAdapter- Function callingImageAdapter- Image input handling
Built-in Metrics:
- Accuracy (classification tasks)
- F1 Score (multi-label classification)
- ROUGE-L (text generation quality)
- BLEU (translation quality)
- Exact Match (strict comparison)
- Semantic Similarity (embedding-based)
- Custom metrics (user-defined)
2. GEPA Integration
Genetic-Evolutionary Prompt Architecture for automatic prompt optimization:
from dspy.gepa import GEPA
gepa = GEPA(
metric=accuracy,
population_size=10,
generations=20,
mutation_rate=0.3,
crossover_rate=0.7
)
result = gepa.optimize(
seed_prompt="question -> answer",
training_examples=trainset,
budget=100 # Max LLM calls
)
GEPA Workflow:
- Initialize population with prompt variants
- Evaluate each variant on training data
- Select best-performing prompts
- Crossover and mutate to create new variants
- Repeat for N generations
- Return optimized prompt
When to use GEPA:
- Prompt engineering bottleneck
- Need automatic optimization
- Have 50+ training examples
- Want to explore prompt space systematically
3. Session Management
Track development across multiple sessions:
session = {
'id': 'session_123',
'workspace': '/path/to/project',
'created_at': '2024-01-15T10:30:00Z',
'modules': [...],
'optimizers': [...],
'datasets': [...],
'metrics': [...]
}
Session features:
- Workspace tracking
- Module registry
- Optimizer history
- Dataset versioning
- Metric tracking
- Export/import state
4. Codebase RAG Indexing
Index existing DSPy codebases for contextual assistance:
interface CodebaseIndex {
workspace: string;
indexed_at: string;
modules: Array<{
path: string;
name: string;
signature?: string;
type: string;
}>;
signatures: Array<{
path: string;
definition: string;
}>;
metrics: Array<{
path: string;
name: string;
type: MetricType;
}>;
}
Indexing enables:
- Fast module discovery
- Signature lookups
- Metric finding
- Pattern detection
- Dependency analysis
Two-Phase Workflow
Phase 1: Development
Goal: Build working DSPy modules
┌──────────────┐
│ /init │ Initialize project structure
└──────┬───────┘
│
▼
┌──────────────┐
│ Design │ Define signatures and modules
└──────┬───────┘
│
▼
┌──────────────┐
│ Implement │ Write forward() methods
└──────┬───────┘
│
▼
┌──────────────┐
│ /validate │ Check correctness
└──────────────┘
Commands:
/init <project_name>- Create new DSPy project/connect- Connect to existing workspace/demo <template>- Generate demo from 12 templates/validate <file>- Validate module structure and signatures
Development checklist:
- Signatures defined with clear inputs/outputs
- Modules composed from predictors
- Forward methods implemented
- Type hints added where appropriate
- Unit tests written
- Validation passed
Phase 2: Optimization
Goal: Optimize modules for production
┌──────────────┐
│ Data │ Prepare training/dev/test sets
└──────┬───────┘
│
▼
┌──────────────┐
│ Metric │ Define evaluation function
└──────┬───────┘
│
▼
┌──────────────┐
│ /optimize │ Compile with optimizer
└──────┬───────┘
│
▼
┌──────────────┐
│ Evaluate │ Test on dev/test sets
└──────┬───────┘
│
▼
┌──────────────┐
│ /export │ Save optimized program
└──────────────┘
Commands:
/optimize <module>- Run full optimization workflow/evaluate <module>- Evaluate on dev/test sets/export <format>- Export to Python/JSON/YAML
Optimization checklist:
- Training data prepared (10+ examples)
- Metric function defined
- Optimizer selected based on data size
- Compilation completed successfully
- Dev set evaluation performed
- A/B test against baseline
- Optimized program saved
- Production deployment planned
Command Reference
/init <project_name>
Initialize new DSPy project with structure:
project_name/
├── modules/ # DSPy modules
├── data/ # Training/dev/test datasets
├── metrics/ # Custom metrics
├── optimized/ # Saved optimized programs
├── tests/ # Unit tests
└── config.py # Configuration
Options:
--template <name>- Use template (qa, rag, multi-hop, agent)--lm <model>- Set language model (gpt-3.5-turbo, gpt-4, claude-3, etc.)--retrieval- Include retrieval setup
/connect
Connect to existing DSPy workspace:
- Indexes codebase for RAG
- Discovers modules, signatures, metrics
- Loads configuration
- Resumes session
/demo <template>
Generate demo from 12 templates:
simple-qa- Basic question answeringrag- Retrieval-augmented generationmulti-hop- Multi-step reasoningtyped-output- Structured data extractionclassification- Multi-class classificationagent- ReAct agent with toolsensemble- Multiple predictor votingself-refining- Iterative refinementhinted-qa- Guided reasoningprogram-of-thought- Code generationchatbot- Multi-turn conversationdata-pipeline- ETL workflow
Options:
--with-optimization- Include optimization example--with-tests- Include unit tests--output <path>- Custom output path
/optimize <module>
Run complete optimization workflow:
Steps:
- Load module from file
- Prompt for training data
- Prompt for metric function
- Suggest optimizer based on data size
- Run compilation
- Evaluate on dev set
- Display results and improvements
- Save optimized program
Options:
--optimizer <type>- Force optimizer (bootstrap, mipro, copro, etc.)--budget <N>- Max optimization budget--metric <name>- Use specific metric--no-save- Don't save optimized program--use-gepa- Enable GEPA optimization
/validate <file>
Validate DSPy code for correctness:
Checks:
- Signature format validity
- Forward method presence
- Type annotations
- Module composition
- Metric function signature
- Dataset format
- Optimizer configuration
Returns:
{
valid: boolean;
errors: string[]; // Critical issues
warnings: string[]; // Non-critical issues
suggestions: string[]; // Improvement suggestions
}
/export <format>
Export module to target format:
Formats:
python- Python code with commentsjson- JSON configurationyaml- YAML configurationmarkdown- Documentation
Options:
--include-examples- Include usage examples--include-tests- Include test code--minify- Remove comments and whitespace
Version-Aware Guidance
DSPy 2.5+ (Recommended)
Full feature support:
- All 10 predictors
- All 11 optimizers
- All 4 adapters
- GEPA integration
- Typed outputs with Pydantic
- Advanced RAG features
DSPy 2.4
Legacy support:
- Most predictors available
- Optimizer syntax differences
- Limited GEPA support
- Basic RAG features
Migration guide:
# DSPy 2.4 (old)
optimizer = dspy.BootstrapFewShot(
metric=metric,
max_bootstrapped_demos=4
)
# DSPy 2.5+ (new)
optimizer = dspy.BootstrapFewShot(
metric=metric,
max_bootstrapped_demos=4,
max_labeled_demos=8 # New parameter
)
DSPy 2.0-2.3
Basic support:
- Core predictors only
- Limited optimizer options
- No GEPA
- Manual RAG setup
Usage Examples
Example 1: Build Simple QA Module
User: Create a question-answering module with chain-of-thought reasoning
Assistant (using dspy-code):
import dspy
class QAModule(dspy.Module):
"""Question answering with chain of thought."""
def __init__(self):
super().__init__()
self.qa = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.qa(question=question)
# Usage
lm = dspy.OpenAI(model="gpt-3.5-turbo")
dspy.settings.configure(lm=lm)
qa = QAModule()
result = qa(question="What is DSPy?")
print(result.answer)
Next steps: Use /validate to check, then /optimize with training data.
Example 2: Optimize RAG Pipeline
User: I have a RAG module and 100 training examples. How do I optimize it?
Assistant (using dspy-code):
# 1. Define metric
def accuracy(example, prediction, trace=None):
return example.answer.lower() in prediction.answer.lower()
# 2. Prepare data
trainset = [
dspy.Example(question="...", answer="...").with_inputs("question")
for ... in data
]
# 3. Choose optimizer (MIPRO for 100+ examples)
optimizer = dspy.MIPRO(
metric=accuracy,
prompt_model=dspy.OpenAI("gpt-4"),
task_model=dspy.OpenAI("gpt-3.5-turbo"),
num_trials=20,
minibatch_size=50
)
# 4. Compile
optimized_rag = optimizer.compile(rag_module, trainset=trainset)
# 5. Evaluate
from dspy.evaluate import Evaluate
evaluator = Evaluate(devset=devset, metric=accuracy)
score = evaluator(optimized_rag)
print(f"Accuracy: {score:.2%}")
# 6. Save
optimized_rag.save('rag_optimized.json')
Expected improvement: +30-50% accuracy with 100 examples and MIPRO.
Example 3: Use GEPA for Prompt Optimization
User: I want to automatically find the best prompt for my task
Assistant (using dspy-code):
from dspy.gepa import GEPA
# 1. Define task
seed_prompt = "question -> answer"
training_examples = trainset[:50]
# 2. Configure GEPA
gepa = GEPA(
metric=accuracy,
population_size=10,
generations=20,
mutation_rate=0.3,
crossover_rate=0.7
)
# 3. Optimize
result = gepa.optimize(
seed_prompt=seed_prompt,
training_examples=training_examples,
budget=100 # Max 100 LLM calls
)
# 4. Use optimized prompt
print(f"Best prompt: {result.best_prompt}")
print(f"Score: {result.best_score:.2%}")
# 5. Create module with optimized prompt
class OptimizedQA(dspy.Module):
def __init__(self):
super().__init__()
self.qa = dspy.ChainOfThought(result.best_prompt)
def forward(self, question):
return self.qa(question=question)
GEPA benefits: Explores prompt space automatically, no manual engineering needed.
Best Practices
1. Start Simple
Begin with basic signatures and predictors:
# Good: Start simple
self.qa = dspy.ChainOfThought("question -> answer")
# Bad: Overengineering
self.qa = dspy.Ensemble([
dspy.ChainOfThought(...),
dspy.ProgramOfThought(...),
dspy.ReAct(...)
]) # Too complex for iteration
2. Optimize Early
Run optimization on small datasets before scaling:
# Iterate quickly with 10 examples
quick_optimizer = dspy.BootstrapFewShot(metric=accuracy)
quick_test = quick_optimizer.compile(module, trainset=trainset[:10])
# Then scale to full dataset
full_optimizer = dspy.MIPRO(metric=accuracy)
production = full_optimizer.compile(module, trainset=full_trainset)
3. Measure Everything
Track metrics throughout development:
# Log all predictions
def predict_with_logging(module, input):
prediction = module(input=input)
log_prediction(input, prediction, timestamp=datetime.now())
return prediction
4. Version Control
Save and track optimized programs:
# Save with version
version = "v1.2.3"
optimized.save(f'models/qa_{version}.json')
# Track performance
performance_log = {
'version': version,
'dev_score': dev_score,
'test_score': test_score,
'optimizer': 'MIPRO',
'timestamp': datetime.now().isoformat()
}
save_performance_log(performance_log)
5. Modular Design
Keep modules focused and composable:
# Good: Single responsibility
class Retriever(dspy.Module):
def forward(self, query):
return self.retrieve(query)
class Generator(dspy.Module):
def forward(self, context, question):
return self.generate(context=context, question=question)
class RAG(dspy.Module):
def __init__(self):
self.retriever = Retriever()
self.generator = Generator()
def forward(self, question):
context = self.retriever(query=question)
return self.generator(context=context, question=question)
6. Test Thoroughly
Unit test modules before optimization:
import unittest
class TestQAModule(unittest.TestCase):
def setUp(self):
self.qa = QAModule()
def test_basic_question(self):
result = self.qa(question="What is 2+2?")
self.assertIsNotNone(result.answer)
def test_complex_question(self):
result = self.qa(question="Explain quantum computing")
self.assertTrue(len(result.answer) > 50)
Troubleshooting
Issue: Low improvement after optimization
Solutions:
- Increase training data size (aim for 50-200 examples)
- Try different optimizer (MIPRO, COPRO for better quality)
- Improve metric function (ensure it captures task requirements)
- Add more demonstrations (
max_bootstrapped_demos) - Use stronger teacher model (GPT-4 for optimization)
Issue: Optimization too slow
Solutions:
- Reduce
num_trialsorbudget - Use smaller training set for iteration
- Enable parallel evaluation (
num_threads=4) - Use faster base model (gpt-3.5-turbo)
- Cache predictions to avoid redundant calls
Issue: Module validation fails
Solutions:
- Check signature format:
"input1, input2 -> output1, output2" - Ensure forward() method exists and returns Prediction
- Add type hints:
def forward(self, input: str) -> dspy.Prediction - Verify module inheritance:
class MyModule(dspy.Module) - Check that all predictors are initialized in
__init__()
Issue: GEPA not improving prompts
Solutions:
- Increase
population_size(try 15-20) - Run more
generations(try 30-50) - Adjust mutation rate (0.4-0.5 for exploration)
- Provide more training examples (50+ recommended)
- Ensure metric function is accurate and informative
Integration with Claude Code
This skill provides:
- Contextual assistance - Deep DSPy knowledge available in chat
- Code generation - Generate modules from templates
- Validation - Check DSPy code for correctness
- Optimization guidance - Recommend optimizers and configurations
- Workflow management - Track sessions and progress
- Export utilities - Convert to multiple formats
When to use in conversation:
- "Create a RAG module with MIPRO optimization"
- "Validate my DSPy code"
- "What optimizer should I use for 50 examples?"
- "Generate a demo of multi-hop reasoning"
- "Export this module to JSON"
- "How do I use GEPA for prompt optimization?"
Resources
- DSPy Documentation: https://dspy-docs.vercel.app
- GitHub: https://github.com/stanfordnlp/dspy
- Paper: "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines"
- Examples: https://github.com/stanfordnlp/dspy/tree/main/examples
- Codebase:
/Users/mikhail/Downloads/architect/dspy-code-codebase
Skill Version: 1.0.0 Last Updated: 2025-12-02 Compatible with: DSPy 2.4+