dspy-gepa
DSPy GEPA — Generate, Evaluate, Propose, Apply
GEPA is a DSPy-powered tool for evaluating, optimizing, and generating skill scenarios.
Quick Start
Requires Python 3.10+ with dspy, pyyaml, and jsonschema:
pip install dspy-ai pyyaml jsonschema
Generate New Scenarios
Point GEPA at an existing skill to generate new test scenarios:
python scripts/gepa.py generate \
--skill-description "Creates FastAPI routers with CRUD endpoints" \
--skill-name fastapi-router-py \
--num-scenarios 5 \
--output tests/scenarios/fastapi-router-py/generated.yaml
Or expand an existing scenario file with more variations:
python scripts/gepa.py generate \
--scenarios tests/scenarios/fastapi-router-py/scenarios.yaml \
--num-scenarios 3 \
--output new-scenarios.yaml
Evaluate Scenarios
Score a DSPy program against scenario patterns:
python scripts/gepa_evaluate.py \
--scenarios tests/scenarios/fastapi-router-py/scenarios.yaml
Full GEPA Loop
Evaluate baseline → optimize → evaluate optimized → save:
python scripts/gepa.py optimize \
--scenarios tests/scenarios/fastapi-router-py/scenarios.yaml \
--output optimized_program.json
Convert Scenarios to Dataset
python scripts/scenario_to_dataset.py \
--scenarios tests/scenarios/fastapi-router-py/scenarios.yaml \
--output dataset.json
Architecture
See references/gepa-architecture.md for the full GEPA loop design and DSPy mapping.
Metrics
See references/metrics.md for pattern-matching scoring details.
Example Output
See examples/sample-run.md for a complete CLI session with output.
More from qredence/skills
agent-converter
Converts agent definitions between Markdown (with YAML frontmatter) and TOML formats. Use when transforming agent configurations for different agent systems — MD format for rich tool restrictions, TOML format for Codex-style agents with sandbox modes.
14dspy-core
Core DSPy framework guidance — signatures, modules, programs, compilation, and testing. Use when creating DSPy signatures, building modules, compiling programs, or learning DSPy fundamentals.
7dspy-fleet-rlm
fleet-rlm-specific DSPy patterns, debugging, and integration with the qredence/fleet-rlm-dspy codebase. Use when working on fleet-rlm DSPy modules, debugging fleet-rlm DSPy issues, or following fleet-rlm architecture conventions.
7babysit-pr
Babysits a GitHub pull request by continuously polling CI checks, review comments, and mergeability state until the PR is ready to merge or closed. Diagnoses failures, retries flaky failures up to 3 times, auto-fixes branch-related issues, and stops only when user help is required. Use when asked to monitor a PR, watch CI, handle review comments, or track failures on an open PR.
7dspy-optimization
DSPy optimization workflows — teleprompters, metrics, evaluation, and compilation strategies. Use when optimizing DSPy programs with BootstrapFewShot, MIPROv2, or custom metrics.
7fastapi-router-py
Creates FastAPI routers with CRUD operations, authentication dependencies, and proper response models. Use when building REST API endpoints, creating new routes, implementing CRUD operations, or adding authenticated endpoints in FastAPI applications.
7