apastra-scaffold
Apastra Scaffold
Quickly generate new PromptOps files. All generated files follow the apastra schemas and will pass validation.
When to Use
Use this skill when you want to:
- Create a new prompt spec for a new use case
- Add test cases for an existing prompt
- Create an evaluator for a new scoring rule
- Set up a new suite tying everything together
Scaffolding a Prompt Spec
When asked to create a new prompt (e.g., "scaffold a prompt for email classification"):
Create promptops/prompts/<id>.yaml:
id: <kebab-case-id>
variables:
<var_name>:
type: string
template: |
<The actual prompt text with {{var_name}} placeholders>
output_contract:
type: object
properties:
<output_field>:
type: string
metadata:
author: <user or team name>
intent: <what this prompt does>
tags:
- <relevant-tags>
Rules for Prompt Specs
idis required and must be unique across all prompt specsidshould be kebab-case and include a version suffix (e.g.,classify-email-v1)variablesis required — defines the input schema as a map of variable names to JSON Schema type objectstemplateis required — the prompt text with{{variable}}placeholdersoutput_contractis optional but recommended — defines expected output structuremetadatais optional — use for organization and discovery
Scaffolding a Dataset
When asked to create test cases (e.g., "create test cases for the email classifier"):
Create promptops/datasets/<id>.jsonl — one JSON object per line:
{"case_id": "<unique-case-id>", "inputs": {"<var>": "<value>"}, "expected_outputs": {"<field>": "<expected>"}, "metadata": {"tags": ["<tag>"]}}
Rules for Datasets
- Use
.jsonlformat (one JSON object per line, NOT a JSON array) case_idis required and must be unique within the datasetinputsis required — keys must match the prompt spec'svariablesexpected_outputsis optional — used by evaluators for checkingmetadatais optional — useful for tagging difficulty, domain, etc.- Aim for 5-10 cases in a smoke dataset, 50+ in a regression dataset
- Include edge cases: empty inputs, very long inputs, adversarial inputs
Scaffolding an Evaluator
When asked to create a scoring rule (e.g., "create an evaluator that checks for JSON output"):
Create promptops/evaluators/<id>.yaml:
id: <evaluator-id>
type: <deterministic | schema | judge>
metrics:
- <metric-name>
description: <what this evaluator checks>
config:
<evaluator-specific configuration>
Evaluator Types
deterministic — rule-based checks:
id: keyword-check
type: deterministic
metrics:
- keyword_recall
description: Checks if output contains expected keywords.
config:
match_field: should_contain
case_sensitive: false
schema — validates output structure:
id: json-output-valid
type: schema
metrics:
- schema_valid
description: Validates that model output is valid JSON matching the output contract.
config:
schema:
type: object
required: ["category", "confidence"]
properties:
category:
type: string
confidence:
type: number
judge — AI-graded evaluation:
id: quality-judge
type: judge
metrics:
- coherence
- relevance
description: Uses AI judgment to score output quality.
config:
rubric: |
Score the output on two dimensions (0-1 each):
- coherence: Is the text well-structured and readable?
- relevance: Does the output address the input query?
model: default
Rules for Evaluators
idis required and must be uniquetypeis required — must be one of:deterministic,schema,judgemetricsis required — array of metric names this evaluator produces (minimum 1)- For
judgeevaluators: always version the rubric — changing it changes what the metric means
Scaffolding a Suite
When asked to create a test suite (e.g., "create a smoke suite for the email classifier"):
Create promptops/suites/<id>.yaml:
id: <suite-id>
name: <Human Readable Name>
description: <what this suite tests>
datasets:
- <dataset-id>
evaluators:
- <evaluator-id>
model_matrix:
- default
trials: 1
thresholds:
<metric>: <minimum-score>
Suite Tiers (Recommended)
| Tier | When to Run | Cases | Trials |
|---|---|---|---|
| Smoke | Every prompt edit | 5-10 | 1 |
| Regression | Before merging | 20-50 | 3 |
| Full | Nightly / on-demand | 50+ | 5 |
| Release | Before shipping | 100+ | 5 |
Rules for Suites
idis required and must be uniquenameis required — human-readabledatasetsis required — at least one dataset referenceevaluatorsis required — at least one evaluator referencemodel_matrixis required — at least one model identifier- Use
"default"inmodel_matrixto test against the current IDE agent's model
Full Scaffold Example
When asked something like "scaffold a complete PromptOps setup for sentiment analysis," create all four files:
promptops/prompts/sentiment-v1.yamlpromptops/datasets/sentiment-smoke.jsonlpromptops/evaluators/sentiment-accuracy.yamlpromptops/suites/sentiment-smoke.yaml
Scaffolding a Quick Eval (Single File)
For rapid iteration, scaffold a single quick eval file instead of four separate files.
When asked to create a quick eval (e.g., "scaffold a quick eval for email classification"):
Create promptops/evals/<id>.yaml:
id: classify-email-quick
prompt: |
Classify the following email into one of these categories: spam, support, sales, personal.
Respond with JSON: {"category": "<category>", "confidence": <0-1>}
Email: {{email}}
cases:
- id: obvious-spam
inputs:
email: "CONGRATULATIONS! You've won $1,000,000! Click here NOW!"
assert:
- type: is-json
- type: contains
value: "spam"
- id: support-request
inputs:
email: "Hi, I'm having trouble logging in. My password reset isn't working."
assert:
- type: is-json
- type: contains-any
value: ["support", "help"]
- id: personal-email
inputs:
email: "Hey! Want to grab lunch on Friday?"
assert:
- type: is-json
- type: contains
value: "personal"
thresholds:
pass_rate: 1.0
When to Use Quick Eval vs Full Suite
- Quick eval: 1-5 test cases, simple assertions, rapid iteration
- Full suite: 10+ cases, reusable evaluators, baseline tracking, regression detection
Scaffolding a Dataset with Inline Assertions
When the user wants inline assertions on their test cases (skipping the evaluator file), include assert arrays directly in the JSONL:
{"case_id": "case-1", "inputs": {"text": "Hello"}, "assert": [{"type": "contains", "value": "Bonjour"}, {"type": "not-contains", "value": "error"}]}
{"case_id": "case-2", "inputs": {"text": ""}, "assert": [{"type": "regex", "value": ".*"}]}
Inline assertions work alongside separate evaluator files — they complement each other. Use inline assertions for per-case checks and evaluator files for suite-wide scoring rules.
Available Assertion Types
Deterministic: equals, contains, icontains, contains-any, contains-all, regex, starts-with, is-json, contains-json, is-valid-json-schema.
Model-assisted: similar, llm-rubric, factuality, answer-relevance.
Performance: latency, cost.
Negate any type with not- prefix (e.g., not-contains, not-is-json).