optimize-runbook

Installation

SKILL.md

Optimize Runbook

You are analyzing previous Jetty workflow runs to identify patterns and propose targeted improvements to a local runbook. The goal is to produce specific, evidence-backed changes — not generic advice.

Cross-Agent Compatibility

This skill uses AskUserQuestion for interactive choices. If running in an environment where AskUserQuestion is not available, replace each call with a direct question in your text output.

Mode dispatch — READ THIS FIRST

Before Step 1, check whether you are in headless mode:

echo "${JETTY_OPTIMIZE:-0}"

If the value is 1, follow ONLY the "Headless Mode" section below. Do not execute Steps 1–7.
Otherwise, follow Steps 1–7 as normal (interactive flow with the user).

Headless mode runs inside a Jetty Optimize sandbox where the inputs have been pre-loaded to known file paths and there is no user on the other end to answer AskUserQuestion prompts mid-skill. The user's interaction (if any) comes after you finish the initial pass, in the form of follow-up turns.

Headless Mode (when `JETTY_OPTIMIZE=1`)

H1. Verify pre-loaded inputs

Confirm the three inputs are on disk. The trajectory metadata is at /app/trajectory.json (Mise also symlinks it under /app/trajectory/trajectory.json and the canonical /app/trajectory/{trajectory_id}.json — any of these works).

ls -la /app/RUNBOOK.md /app/baseline-RUNBOOK.md /app/trajectory.json

If any of those files is missing, emit the H3 JSON payload with patterns:["precheck failed: <which file>"] and proposed_changes:[], then stop. Never exit silently — Spot relies on the JSON to surface the failure to the user.

Read the runbook and the trajectory:

/app/RUNBOOK.md — the runbook to edit (your working copy)
/app/baseline-RUNBOOK.md — read-only snapshot for comparison
/app/trajectory.json — the one trajectory you are analyzing
/app/trajectory/ — step output assets (images, documents, uploaded inputs)

Do NOT call the Jetty trajectory list/fetch API. Do NOT use AskUserQuestion anywhere in headless mode.

H2. Analyze

From trajectory.json, extract and hold in memory:

trajectory_id, status, total duration, iteration counts per step
Each step's inputs, outputs, errors, duration
Any labels already applied

Stay-alive cue. Mise's stream-tail heartbeat keys off log writes from your CLI. Long quiet stretches (e.g. >20s of pure thinking after a batch of Reads) used to starve the heartbeat and silently kill the activity mid-turn — your edits would still land on disk but Spot would lose the trail. The newer mise build heartbeats from a separate timer so this is no longer load-bearing, but prefer printing a one-line status before transitioning between phases (e.g. "Reads complete; analyzing patterns now.") regardless. It costs nothing, keeps the SSE stream fresh, and it gives the user a visible breadcrumb when the inter-tool gap is long.

Special case — degenerate baseline. If the trajectory's top-level status is anything other than completed, OR every step has status of null / failed / no outputs, the baseline never produced real evidence (e.g. the original agent failed at startup with Not logged in, an authentication error, or a quota issue). In that case, skip the six pattern lenses and go straight to H3 with patterns:["baseline trajectory produced no usable evidence: <one-line cause>"] and proposed_changes:[]. Do not fabricate patterns from log noise.

Otherwise, apply the same six pattern lenses as Step 4 of the interactive flow, against this single trajectory:

Consistent failures (evaluation criteria below threshold)
Iteration waste (retry-heavy steps)
Timeout / long-duration bottlenecks
Divergent agent behavior vs. the runbook's stated expectations
Missing guardrails (errors that the runbook's evaluation section doesn't catch)
Score plateaus (rubric-only — iterations that didn't improve)

For each pattern you find, plan one concrete edit to /app/RUNBOOK.md. Edits must be specific (exact before/after text) and cite the trajectory step(s) that motivated them.

H3. Emit the structured analysis payload (REQUIRED — always)

Before editing any file, and before any other terminal action, print a single-line JSON object on its own line with this exact shape (Spot parses this for the diff-gutter citations):

{"type":"optimize_analysis","trajectory_id":"<id>","patterns":["<short description>","..."],"proposed_changes":[{"section":"<runbook section name>","before":"<exact current text>","after":"<exact replacement>","citations":["<step_name>","..."]},"..."]}

Rules:

One JSON object, one line, no pretty-printing.
citations must reference concrete step names or event ids from the trajectory. An empty citations array is a bug — don't emit one.
Always emit this payload, even on the no-evidence path or precheck failure. When there's nothing to propose, emit "proposed_changes": [] and put the reason in "patterns" (one short string, e.g. "baseline trajectory failed at startup: Not logged in"). Stopping without emitting the payload leaves the user staring at a dead UI.

After the JSON line, print a short human-readable summary in prose (Spot will render it in the chat pane alongside the diff). Even when there's nothing to apply, the prose paragraph tells the user what you saw and what to do next (e.g. "Re-run the baseline with credentials configured, then start a new optimize session.").

H4. Apply each change

For every item in proposed_changes, use the Edit tool against /app/RUNBOOK.md to replace the before text with the after text. Then bump the runbook version in the frontmatter (patch increment — e.g. 1.0.0 → 1.0.1). If no version field exists, add version: "1.0.1".

Do not edit /app/baseline-RUNBOOK.md — it stays pristine for the diff view.

H5. Stop and wait

Emit a one-paragraph summary (counts of patterns and edits applied). Then stop. The user may send follow-up turns asking for refinements (e.g. "also tighten step 3's iteration budget"). Apply those incrementally via Edit and wait again. Do not call the skill's subsequent steps; the interactive flow's Step 6–7 "apply changes" prompts are not applicable in headless mode.

H6. What's different from the interactive flow

	Interactive	Headless
Runbook location	Discovered via `ls RUNBOOK*.md`	Always `/app/RUNBOOK.md`
Trajectory source	Jetty API (list + fetch)	`/app/trajectory.json`
Trajectory count	N (user-chosen)	Always 1
`AskUserQuestion`	Used	Never used
Apply-changes gate	User picks "Apply all" / "Let me choose" / "Save as report"	Auto-apply all
Version bump	After apply	After apply
Output artifact	Modified `RUNBOOK.md` + optional `./runbook-optimization-report.md`	Modified `/app/RUNBOOK.md` only
Analysis JSON emission	Not emitted	Required (§H3)

Step 1: Identify the Runbook

Look for runbook files in the current directory:

ls -la RUNBOOK*.md 2>/dev/null

If multiple runbooks are found, use AskUserQuestion:
- Header: "Runbook"
- Question: "Multiple runbooks found. Which one do you want to optimize?"
- Options: list each filename
Read the chosen runbook with the Read tool. Extract from frontmatter:
- version, evaluation (programmatic or rubric)
- agent, model, snapshot (if present)
Parse the evaluation section:
- Programmatic: extract the PASS/PARTIAL/FAIL criteria table
- Rubric: extract the rubric table (criteria, score descriptions, pass threshold)
Identify the collection and task name. Check the skill argument first, then look in the runbook for Jetty API references. If not found, use AskUserQuestion:
- Header: "Collection/Task"
- Question: "Which Jetty collection and task does this runbook run as? (format: collection/task_name)"
- Options:
  - "I'll type it" / "Let me enter the collection and task name"

Step 2: Fetch Trajectories

Parse the skill argument for trajectory IDs or --last N. If not provided:

Use AskUserQuestion:

Header: "Trajectories"
Question: "How many recent runs should I analyze?"
Options:
- "Last 5 runs" / "Analyze the 5 most recent trajectories"
- "Last 10 runs" / "Analyze the 10 most recent trajectories"
- "Specific IDs" / "I'll paste trajectory IDs"

Fetch the trajectory list:

TOKEN="$(cat ~/.config/jetty/token)"
COLLECTION="the-collection"
TASK="the-task"
curl -s -H "Authorization: Bearer $TOKEN" \
  "https://flows-api.jetty.io/api/v1/db/trajectories/$COLLECTION/$TASK?limit=$LIMIT"

Parse the response — format is {"trajectories": [...], "total": N}.

For each trajectory, fetch full details:

TOKEN="$(cat ~/.config/jetty/token)"
curl -s -H "Authorization: Bearer $TOKEN" \
  "https://flows-api.jetty.io/api/v1/db/trajectory/$COLLECTION/$TASK/$TRAJECTORY_ID"

Extract and record for each:

Status: completed / failed / timed_out
Duration: total execution time
Step outputs: iterate over .steps object keys
Errors: any error messages in failed steps
Labels: any quality labels applied

Download output files where available (validation_report.json, summary.md):

TOKEN="$(cat ~/.config/jetty/token)"
curl -s -H "Authorization: Bearer $TOKEN" \
  "https://flows-api.jetty.io/api/v1/file/$FILE_PATH"

Step 3: Build Analysis Summary

Create and display a summary table:

| # | Trajectory ID | Status | Duration | Iterations | Score/Result | Key Issue |
|---|---------------|--------|----------|------------|-------------|-----------|

Fill in from trajectory data. Present to the user.

Step 4: Pattern Analysis

Analyze trajectories against the runbook for these patterns:

4a: Consistent Failures

Evaluation criteria scoring below threshold across multiple runs.

Rubric: criteria scoring < 4 in more than half of runs
Programmatic: stages showing FAIL/PARTIAL across runs

4b: Iteration Waste

Steps that consistently need 2-3 retry rounds. Predictable first-attempt failures that could be prevented with better instructions or templates.

4c: Timeout Patterns

Runs that timed out or took disproportionately long. Which steps are the bottlenecks?

4d: Divergent Agent Behavior

Cases where the agent interpreted instructions differently across runs. Structurally different outputs suggesting ambiguous instructions.

4e: Missing Guardrails

Errors not caught by evaluation criteria. Environment setup issues (wrong versions, missing tools).

4f: Score Plateaus (rubric only)

Criteria that iterate but don't improve — suggesting the Common Fixes table lacks actionable guidance.

Present each pattern found with supporting evidence (trajectory IDs, scores, error messages).

Step 5: Generate Proposed Changes

For each pattern, propose a specific change to the RUNBOOK.md:

## Proposed Changes

### Change 1: {Brief title} (addresses: {pattern})

**Section:** {Which runbook section}
**Current:**
> {Exact current text}

**Proposed:**
> {Replacement text}

**Evidence:** {Trajectory IDs, scores, errors that support this change}

Guidelines:

Changes must be specific — quote exact sections, provide exact replacements
Changes must be evidence-backed — cite trajectories, scores, or errors
Prefer additive changes (add a Common Fix, add a tip, strengthen descriptions)
If frontmatter fields are missing (agent, model, snapshot), propose adding them
Don't fabricate evidence — only cite patterns actually observed

Step 6: Apply Changes

Use AskUserQuestion:

Header: "Apply Changes"
Question: "I found {N} proposed improvements. Which should I apply?"
Options:
- "Apply all" / "Apply all {N} changes to the runbook"
- "Let me choose" / "I'll approve each change individually"
- "Save as report" / "Don't modify the runbook — save analysis to a file"

If "Apply all": Apply each change using Edit. Bump the version (patch increment).

If "Let me choose": For each change, ask approve/skip/modify.

If "Save as report": Write to ./runbook-optimization-report.md.

Step 7: Summary

## Optimization Summary

- **Runbook**: {filename}
- **Trajectories analyzed**: {count}
- **Patterns identified**: {count}
- **Changes applied**: {count} / {total}
- **Version**: {old} → {new}

### Recommended next steps
- Run the updated runbook 2-3 times to verify improvements
- Run `/jetty optimize-runbook` again after new runs to measure progress

Important Notes

Read the token from file: TOKEN="$(cat ~/.config/jetty/token)" at the start of each bash block.
URL: Use flows-api.jetty.io for API calls. Never jetty.io.
Trajectories shape: {"trajectories": [...]} — access via .trajectories[].
Steps are objects: keyed by name, not indexed.
Minimum trajectories: Works with 1+, but 3+ gives better patterns.
Don't fabricate: Only report patterns actually observed in the data.

Related skills

More from jettyio/jettyio-skills

Installs

Repository

jettyio/jettyio-skills

GitHub Stars

First Seen

Apr 13, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykFail

optimize-runbook

Optimize Runbook

Cross-Agent Compatibility

Mode dispatch — READ THIS FIRST

Headless Mode (when `JETTY_OPTIMIZE=1`)

H1. Verify pre-loaded inputs

H2. Analyze

H3. Emit the structured analysis payload (REQUIRED — always)

H4. Apply each change

H5. Stop and wait

H6. What's different from the interactive flow

Step 1: Identify the Runbook

Step 2: Fetch Trajectories

Step 3: Build Analysis Summary

Step 4: Pattern Analysis

4a: Consistent Failures

4b: Iteration Waste

4c: Timeout Patterns

4d: Divergent Agent Behavior

4e: Missing Guardrails

4f: Score Plateaus (rubric only)

Step 5: Generate Proposed Changes

Step 6: Apply Changes

Step 7: Summary

Important Notes

More from jettyio/jettyio-skills

jetty-setup

jetty

create-runbook

optimize-runbook

Optimize Runbook

Cross-Agent Compatibility

Mode dispatch — READ THIS FIRST

Headless Mode (when JETTY_OPTIMIZE=1)

H1. Verify pre-loaded inputs

H2. Analyze

H3. Emit the structured analysis payload (REQUIRED — always)

H4. Apply each change

H5. Stop and wait

H6. What's different from the interactive flow

Step 1: Identify the Runbook

Step 2: Fetch Trajectories

Step 3: Build Analysis Summary

Step 4: Pattern Analysis

4a: Consistent Failures

4b: Iteration Waste

4c: Timeout Patterns

4d: Divergent Agent Behavior

4e: Missing Guardrails

4f: Score Plateaus (rubric only)

Step 5: Generate Proposed Changes

Step 6: Apply Changes

Step 7: Summary

Important Notes

More from jettyio/jettyio-skills

jetty-setup

jetty

create-runbook

Headless Mode (when `JETTY_OPTIMIZE=1`)