scenario

Installation
SKILL.md

Scenario Skill

Author and manage holdout scenarios for behavioral validation. Scenarios define what the system should do in narrative form, with measurable acceptance vectors and satisfaction scoring. They live in .agents/holdout/ so implementing agents cannot see them during development.

Quick Start

# Initialize holdout directory
/scenario init

# Add a scenario from a description
/scenario add "user can authenticate with valid credentials"

# List all active scenarios
/scenario list

# Validate scenarios against the schema
/scenario validate

Execution Steps

Step 1: Initialize Holdout Directory

ao scenario init

Creates .agents/holdout/ with a README.md explaining holdout isolation rules. If the directory already exists, this is a no-op.

The README makes clear:

  • Implementing agents MUST NOT read .agents/holdout/
  • Only evaluator agents and humans should author scenarios
  • Hook enforcement prevents implementing agents from accessing holdout files

Step 2: Author Scenarios

Provide a narrative description and the skill generates a schema-compliant JSON scenario file.

ao scenario add "user can authenticate with valid credentials"

The skill will:

  1. Generate an ID (s-YYYY-MM-DD-NNN)
  2. Prompt for or infer the narrative, expected outcome, and acceptance vectors
  3. Set default satisfaction threshold (0.8)
  4. Write to .agents/holdout/s-YYYY-MM-DD-NNN.json

You can also author scenarios manually by writing JSON that conforms to schemas/scenario.v1.schema.json. See Scenario Schema Reference.

Step 3: Validate Scenarios

ao scenario validate

Validates every .json file in .agents/holdout/ against schemas/scenario.v1.schema.json. Reports:

  • Schema violations (missing fields, wrong types)
  • Duplicate IDs
  • Stale scenarios (status = "active" but date > 90 days old)
  • Acceptance vectors with no check command

Step 4: List Scenarios

ao scenario list

Displays all scenarios with:

  • ID, goal, status, source, date
  • Satisfaction threshold
  • Count of acceptance vectors

Filter options:

ao scenario list --status active
ao scenario list --status draft
ao scenario list --status retired

Step 5: Integration with Validation

Scenarios are consumed by STEP 1.8 in the /validation skill. During validation, the evaluator agent:

  1. Loads all active scenarios from .agents/holdout/
  2. Runs each acceptance vector's check command
  3. Computes a satisfaction score per scenario (0.0-1.0)
  4. Aggregates into an overall holdout score
  5. Fails the validation gate if any scenario falls below its threshold

Key Rules

Holdout Isolation

Scenarios are holdout data. The implementing agent must never see them. This prevents the agent from overfitting to specific test cases instead of building correct general behavior.

  • Scenarios live in .agents/holdout/, which is outside the codebase
  • A hook enforces that implementing agents cannot read holdout files
  • Only evaluator agents, humans, or the /validation skill access scenarios

Satisfaction Scoring

Scenarios use continuous satisfaction scoring (0.0-1.0), not boolean pass/fail. This enables:

  • Partial credit for incomplete implementations
  • Trend tracking across iterations
  • Threshold tuning per scenario based on criticality

Each acceptance vector produces a score, and the scenario's overall score is the weighted average across all vectors.

Authorship Rules

  • Scenarios should be written by humans or by evaluator agents
  • The implementing agent MUST NOT author its own scenarios
  • The source field tracks provenance: human, agent, or prod-telemetry
  • When an evaluator agent writes scenarios, it should operate in a separate session with no access to implementation details

Scenario Lifecycle

Status Meaning
active Scenario is evaluated during validation
retired Scenario passed consistently; kept for reference
blocked Scenario cannot be evaluated (missing dependency)
draft Scenario is incomplete; not yet evaluated

Reference Documents

Troubleshooting

Problem Cause Fix
validate reports missing fields Schema version mismatch Check version field matches schema expectation
Scenario not picked up by validation Status is not active Set "status": "active" in the JSON
Implementing agent read holdout Hook not installed Run ao scenario init to verify hook setup
Duplicate ID error Two scenarios share an ID Rename one using s-YYYY-MM-DD-NNN format
Stale scenario warning Active scenario older than 90 days Review and retire or refresh the scenario
Score always 0.0 Check command returns non-zero Debug the check command independently

See Also

  • /validation -- consumes scenarios at STEP 1.8 for holdout evaluation
  • /council -- multi-model review can generate scenario suggestions
  • /vibe -- code quality validation (complementary to behavioral scenarios)
Weekly Installs
2
Repository
boshu2/agentops
GitHub Stars
262
First Seen
2 days ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
warp2