Apastra Getting Started

Set up prompt versioning and evaluation in any project. No CI, no cloud, no framework — just files and your IDE agent.

What Is Apastra?

Apastra treats AI prompts as versioned software assets. Prompts, test cases, and scoring rules are files in your repo. Your IDE agent runs evaluations, compares results against baselines, and catches regressions — all locally.

Quick Setup

1. Create the promptops directory

mkdir -p promptops/prompts promptops/datasets promptops/evaluators promptops/suites promptops/schemas promptops/policies derived-index/baselines derived-index/regressions

2. Create your first prompt spec

Create promptops/prompts/summarize.yaml:

id: summarize-v1
variables:
  text:
    type: string
  max_length:
    type: string
template: |
  Summarize the following text in {{max_length}} or fewer words.
  Be concise and capture the key points.

  Text: {{text}}
output_contract:
  type: object
  properties:
    summary:
      type: string
metadata:
  author: your-name
  intent: text-summarization
  tags:
    - summarization
    - core

A prompt spec has:

id: stable identifier (never rename, create a new version instead)
variables: the inputs the prompt template expects, with JSON Schema types
template: the actual prompt text with {{variable}} placeholders
output_contract (optional): JSON Schema describing expected output structure
metadata (optional): tags, author, intent for organization

3. Create test cases

Create promptops/datasets/summarize-smoke.jsonl — one JSON object per line:

{"case_id": "short-article", "inputs": {"text": "The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet and is commonly used for typing practice.", "max_length": "20"}, "expected_outputs": {"should_contain": ["fox", "dog"]}}
{"case_id": "technical-paragraph", "inputs": {"text": "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It focuses on developing algorithms that can access data and use it to learn for themselves.", "max_length": "30"}, "expected_outputs": {"should_contain": ["machine learning", "algorithms"]}}
{"case_id": "empty-edge-case", "inputs": {"text": "", "max_length": "10"}, "expected_outputs": {"should_contain": []}}
{"case_id": "long-document", "inputs": {"text": "Climate change refers to long-term shifts in temperatures and weather patterns. These shifts may be natural, but since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas, which produces heat-trapping gases.", "max_length": "25"}, "expected_outputs": {"should_contain": ["climate"]}}
{"case_id": "multi-topic", "inputs": {"text": "Python is a programming language. JavaScript runs in browsers. Rust focuses on memory safety. Go was created at Google for systems programming.", "max_length": "20"}, "expected_outputs": {"should_contain": ["programming"]}}

Each case has:

case_id: stable identifier for tracking results across runs
inputs: values for the prompt template variables
expected_outputs (optional): values for evaluators to check against

4. Create an evaluator

Create promptops/evaluators/contains-keywords.yaml:

id: contains-keywords
type: deterministic
metrics:
  - keyword_recall
description: Checks if the model output contains expected keywords from the test case.
config:
  match_field: should_contain
  case_sensitive: false

Evaluator types:

deterministic: exact match, substring check, regex, keyword recall
schema: validates output against a JSON Schema
judge: uses another AI model to grade output (version the judge prompt!)

5. Create a smoke suite

Create promptops/suites/summarize-smoke.yaml:

id: summarize-smoke
name: Summarize Smoke Suite
description: Quick sanity check for the summarization prompt.
datasets:
  - summarize-smoke
evaluators:
  - contains-keywords
model_matrix:
  - default
trials: 1
thresholds:
  keyword_recall: 0.6

A suite ties everything together:

datasets: which test case files to use
evaluators: which scoring rules to apply
model_matrix: which models to test against (use "default" for your IDE agent's model)
thresholds: minimum scores to pass

6. Run your first evaluation

Use the eval skill:

Tell your agent: "Use the eval skill to run the summarize-smoke suite"

Or if you have the eval skill installed, your agent already knows how.

Alternative: Quick Eval (Single File)

If you want to skip creating 4 separate files, use a quick eval instead. Create promptops/evals/summarize-quick.yaml:

id: summarize-quick
prompt: "Summarize in {{max_length}} words: {{text}}"
cases:
  - id: short
    inputs: { text: "The fox jumps over the dog.", max_length: "10" }
    assert:
      - type: icontains
        value: "fox"
thresholds:
  pass_rate: 1.0

Then tell your agent: "Run the summarize-quick eval". This is the fastest way to test a prompt.

File Structure

After setup, your project should look like:

promptops/
├── prompts/
│   └── summarize.yaml          # Prompt specs (source of truth)
├── datasets/
│   └── summarize-smoke.jsonl   # Test cases (with optional inline assertions)
├── evaluators/
│   └── contains-keywords.yaml  # Scoring rules (optional if using inline assertions)
├── evals/
│   └── summarize-quick.yaml    # Quick eval files (prompt + cases + assertions)
├── suites/
│   └── summarize-smoke.yaml    # Test configurations
├── schemas/                    # JSON schemas (from apastra)
└── policies/                   # Regression policies
derived-index/
├── baselines/                  # Known-good scorecards
└── regressions/                # Regression reports

Next Steps

Install the eval skill to run evaluations
Install the baseline skill to establish your first baseline
Install the scaffold skill to quickly generate new prompt specs
Install the validate skill to check file formatting
Upgrade to CI: Ready to automate pull request gating and releases with GitHub Actions? Install the setup-ci skill (npx skills add BintzGavin/apastra/skills/setup-ci).

Checklist

Created promptops/ directory structure
Created at least one prompt spec in promptops/prompts/
Created at least one dataset in promptops/datasets/
Created at least one evaluator in promptops/evaluators/
Created at least one suite in promptops/suites/
Ran first evaluation using the eval skill

apastra-getting-started