scenario
Scenario Skill
Author and manage holdout scenarios for behavioral validation. Scenarios
define what the system should do in narrative form, with measurable
acceptance vectors and satisfaction scoring. They live in .agents/holdout/
so implementing agents cannot see them during development.
Quick Start
# Initialize holdout directory
/scenario init
# Add a scenario from a description
/scenario add "user can authenticate with valid credentials"
# List all active scenarios
/scenario list
# Validate scenarios against the schema
/scenario validate
Execution Steps
Step 1: Initialize Holdout Directory
ao scenario init
Creates .agents/holdout/ with a README.md explaining holdout isolation
rules. If the directory already exists, this is a no-op.
The README makes clear:
- Implementing agents MUST NOT read
.agents/holdout/ - Only evaluator agents and humans should author scenarios
- Hook enforcement prevents implementing agents from accessing holdout files
Step 2: Author Scenarios
Provide a narrative description and the skill generates a schema-compliant JSON scenario file.
ao scenario add "user can authenticate with valid credentials"
The skill will:
- Generate an ID (
s-YYYY-MM-DD-NNN) - Prompt for or infer the narrative, expected outcome, and acceptance vectors
- Set default satisfaction threshold (0.8)
- Write to
.agents/holdout/s-YYYY-MM-DD-NNN.json
You can also author scenarios manually by writing JSON that conforms to
schemas/scenario.v1.schema.json. See Scenario Schema Reference.
Step 3: Validate Scenarios
ao scenario validate
Validates every .json file in .agents/holdout/ against
schemas/scenario.v1.schema.json. Reports:
- Schema violations (missing fields, wrong types)
- Duplicate IDs
- Stale scenarios (status = "active" but date > 90 days old)
- Acceptance vectors with no check command
Step 4: List Scenarios
ao scenario list
Displays all scenarios with:
- ID, goal, status, source, date
- Satisfaction threshold
- Count of acceptance vectors
Filter options:
ao scenario list --status active
ao scenario list --status draft
ao scenario list --status retired
Step 5: Integration with Validation
Scenarios are consumed by STEP 1.8 in the /validation skill. During
validation, the evaluator agent:
- Loads all active scenarios from
.agents/holdout/ - Runs each acceptance vector's check command
- Computes a satisfaction score per scenario (0.0-1.0)
- Aggregates into an overall holdout score
- Fails the validation gate if any scenario falls below its threshold
Key Rules
Holdout Isolation
Scenarios are holdout data. The implementing agent must never see them. This prevents the agent from overfitting to specific test cases instead of building correct general behavior.
- Scenarios live in
.agents/holdout/, which is outside the codebase - A hook enforces that implementing agents cannot read holdout files
- Only evaluator agents, humans, or the
/validationskill access scenarios
Satisfaction Scoring
Scenarios use continuous satisfaction scoring (0.0-1.0), not boolean pass/fail. This enables:
- Partial credit for incomplete implementations
- Trend tracking across iterations
- Threshold tuning per scenario based on criticality
Each acceptance vector produces a score, and the scenario's overall score is the weighted average across all vectors.
Authorship Rules
- Scenarios should be written by humans or by evaluator agents
- The implementing agent MUST NOT author its own scenarios
- The
sourcefield tracks provenance:human,agent, orprod-telemetry - When an evaluator agent writes scenarios, it should operate in a separate session with no access to implementation details
Scenario Lifecycle
| Status | Meaning |
|---|---|
active |
Scenario is evaluated during validation |
retired |
Scenario passed consistently; kept for reference |
blocked |
Scenario cannot be evaluated (missing dependency) |
draft |
Scenario is incomplete; not yet evaluated |
Reference Documents
- Scenario Schema Reference -- full field documentation and example JSON for the scenario schema
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
validate reports missing fields |
Schema version mismatch | Check version field matches schema expectation |
| Scenario not picked up by validation | Status is not active |
Set "status": "active" in the JSON |
| Implementing agent read holdout | Hook not installed | Run ao scenario init to verify hook setup |
| Duplicate ID error | Two scenarios share an ID | Rename one using s-YYYY-MM-DD-NNN format |
| Stale scenario warning | Active scenario older than 90 days | Review and retire or refresh the scenario |
| Score always 0.0 | Check command returns non-zero | Debug the check command independently |
See Also
/validation-- consumes scenarios at STEP 1.8 for holdout evaluation/council-- multi-model review can generate scenario suggestions/vibe-- code quality validation (complementary to behavioral scenarios)