vs-autopilot

Installation

SKILL.md

Autopilot

Goal in, shippable branch out. No hand-holding in between.

Autopilot takes a plan (or generates one if you don't have one) and runs seven phases autonomously:

Roast — load and follow the grill-me skill to stress-test the plan
Fix — apply grill-me findings directly to the plan
Execute — create branch, add debug instrumentation, implement with TDD + parallel subagents
Review — run roast-my-code on the diff (code reuse, quality, efficiency, roast)
QA — browser-based testing if it's a web app (debug logs provide runtime evidence)
Cleanup — remove debug instrumentation, verify everything still passes
Handoff — present results, suggest /vs-ship-it

Every intermediate decision is auto-resolved. The user sees the finished result.

Decision Principles

These auto-answer every question that would normally go to the user:

Completeness — ship the whole thing. Pick the approach that covers more edge cases.
Pragmatic — if two options fix the same thing, pick the simpler one. 5 seconds choosing, not 5 minutes.
DRY — duplicates existing functionality? Reject. Reuse what exists.
Explicit over clever — 10-line obvious fix > 200-line abstraction.
Bias toward action — move forward. Flag concerns in the decision log but don't block.
Match the codebase — follow existing patterns, naming, and structure. Don't introduce new conventions.

Circuit Breaker

Autopilot runs fully autonomous with one exception:

If the roast phase produces a NOT_READY verdict (score below 60) with unresolved high-severity issues, stop and present the findings to the user. The plan needs human judgment before execution — autopilot is not the right tool for a fundamentally broken plan.

For READY or READY_WITH_RISKS verdicts: continue autonomously.

Phase 0: Setup

Step 1: Read context

Read CLAUDE.md for project-specific commands (test, build, lint).
Read the plan file the user pointed to (or the most recent plan in conversation).
Run git status and git diff to understand the current state.
Note the project's test command, build command, and lint command from CLAUDE.md. If not found, search for them in package.json, Makefile, or equivalent. If still not found, note "unknown" — do not guess.

Step 1a: Auto-generate plan if missing

If no plan was provided (no plan file, no plan in conversation context, no clear implementation spec — just a feature request, bug description, or vague goal):

Launch a subagent with EnterPlanMode to create the plan autonomously. The subagent receives:
- The user's original request/goal
- CLAUDE.md project context
- Current codebase state (git status, relevant files)
- This directive: "Create a concrete implementation plan. Break it into discrete steps ordered by dependency. Include file paths, function names, and test strategy. Do not ask the user any questions — make every decision yourself using the codebase as evidence. When done, call ExitPlanMode."
The subagent explores the codebase, designs the approach, and produces a structured plan — all without user interaction.
Once the subagent returns, use its plan output as the implementation plan for the rest of the pipeline.
Log this in the decision log: "No plan provided — auto-generated via plan-mode subagent."

This replaces the need for the user to run /vs-brainstorm or write a plan manually. The roast phase (Phase 1) still stress-tests whatever plan comes out, so a weak auto-generated plan gets caught by the circuit breaker.

Step 1b: Validate guardrails

Dry-run each guardrail command to confirm it works before starting execution:

# Test each detected command — we only care if it runs without "not found" errors
[test command] 2>&1 | head -5
[typecheck command] --noEmit 2>&1 | head -5
[lint command] 2>&1 | head -5

If any command fails due to missing dependencies: install them now. Common fixes: bun install, npm install, pip install -r requirements.txt. Do not proceed to Step 2 until all guardrail commands execute without "command not found" or "module not found" errors.

Step 2: Create branch

If not already on a feature branch, create one:

# Detect git user prefix from git config or CLAUDE.md conventions
GIT_USER=$(git config user.name 2>/dev/null | tr '[:upper:]' '[:lower:]' | tr ' ' '-')

Derive a branch name from the plan topic: {user-prefix}/{plan-topic}. Use the branch prefix convention from CLAUDE.md if one exists.

git checkout -b <branch-name>

If already on a feature branch (not main/master/develop), stay on it.

Step 3: Extract plan steps

Break the plan into discrete implementation steps. Each step should be:

A single logical change (one file or a few tightly coupled files)
Independently committable
Ordered by dependency (foundations first, features on top)

List the steps and move on. Do not ask for confirmation.

Phase 1: Roast the Plan

Load the grill-me skill from disk:

GRILL_PATH=""
for p in ~/.claude/skills/grill-me/SKILL.md ~/.claude/skills/*/grill-me/SKILL.md; do
  [ -f "$p" ] && GRILL_PATH="$p" && break
done
echo "GRILL_PATH=${GRILL_PATH:-not_found}"

If found: read the file and follow its full methodology. If not found: run a lightweight adversarial review yourself (premise challenge, assumptions, feasibility, edge cases — same dimensions, less ceremony).

Auto-decision overrides for grill-me

Grill-me is interactive by design. In autopilot mode, every interactive point is auto-resolved:

Step 0.5 (Idea Sharpening): Skip — autopilot assumes the plan is already shaped.
Step 1 (Initial Assessment): Run the assessment. Do not wait for user acknowledgment.
Step 2 (Premise Challenge): Run it. Auto-decide: accept premises that are supported by evidence in the codebase or plan. Challenge premises that contradict what you found in pre-scan. For each challenged premise, apply decision principle #1 (completeness) — pick the interpretation that covers more ground.
Step 3 (Dimension Grilling): For each question grill-me would ask the user:
- Apply the 6 decision principles to pick an answer.
- Log the question, your answer, and which principle drove it.
- Do not wait for user input. Do not present options.
Step 4 (Outside Voice): Skip — the roast findings are sufficient for autopilot. Subagent overhead is not worth it when the fix phase follows immediately.
Step 5 (Report): Produce the report. Do not persist to disk — the findings feed directly into Phase 2.

Circuit breaker check

After the roast completes, check the verdict:

NOT_READY (score < 60, unresolved high-severity): Stop. Present findings to the user. Explain what needs human judgment. Do not proceed to Phase 2.
READY or READY_WITH_RISKS: Continue autonomously.

Phase 2: Fix the Plan

Take every finding from Phase 1 and apply it to the plan:

High severity — must be addressed. Modify the plan to resolve each one.
Medium severity — address if the fix is clear and under ~5 lines of plan change. Otherwise note it as a known risk and continue.
Low severity — note in the decision log, do not modify the plan.
Unresolved items — apply decision principles to pick a resolution. Log it.

After fixing, re-extract plan steps if the structure changed (new steps, removed steps, reordered dependencies).

Write the updated plan back to the plan file (or note the changes in the decision log if there is no plan file on disk).

Emit a short transition summary:

Phase 1-2 complete. Roast score: [N]/100. Fixed [X] high, [Y] medium issues. [Z] items noted as known risks. Proceeding to execution with [N] steps.

Phase 3: Execute

Implement the fixed plan. Use parallel subagents when possible.

Step 0: Load TDD and Debug skills

TDD_PATH=""
for p in ~/.claude/skills/tdd/SKILL.md ~/.claude/skills/*/tdd/SKILL.md; do
  [ -f "$p" ] && TDD_PATH="$p" && break
done

DEBUG_PATH=""
for p in ~/.claude/skills/debug-mode/SKILL.md ~/.claude/skills/*/debug-mode/SKILL.md; do
  [ -f "$p" ] && DEBUG_PATH="$p" && break
done

echo "TDD_PATH=${TDD_PATH:-not_found}"
echo "DEBUG_PATH=${DEBUG_PATH:-not_found}"

If TDD skill found: read it. Workers will follow TDD discipline (test first, then implement). If not found: workers write tests after implementation as a fallback.

If debug skill found: read its instrumentation approach for Phase 3 step 0b below.

Step 0b: Add debug instrumentation (when modifying existing code)

Skip this step if the plan only creates new files. Instrumentation is for observing existing behavior during modification — there's nothing to observe in code that doesn't exist yet.

If the plan modifies existing functions or modules, instrument the codebase for observability during execution:

Identify the modules/files the plan touches.
Add lightweight logging at key boundaries — function entry/exit for new or modified functions, error paths, and integration points.
Use the project's existing logging pattern (console.log, logger.debug, logging.debug, etc.). Search the codebase for the existing pattern before adding logs:
```
# Find the project's logging convention
grep -r "console\.\|logger\.\|logging\." --include="*.ts" --include="*.js" --include="*.py" -l | head -5
```

Wrap all debug instrumentation in a marker comment so it can be found and removed later:

// #region autopilot-debug
console.log('[autopilot] functionName entry', { arg1, arg2 });
// #endregion autopilot-debug

Commit the instrumentation separately: chore: add autopilot debug instrumentation

This instrumentation serves two purposes:

During execution: when guardrails fail, the logs help diagnose why
During QA: runtime evidence for browser-based testing

The instrumentation is removed in Phase 6 (cleanup) after everything passes.

Step 1: Build dependency graph

Group plan steps into layers based on dependencies:

Layer 0: [steps with no dependencies — can all run in parallel]
Layer 1: [steps that depend on Layer 0]
Layer 2: [steps that depend on Layer 1]
...

If all steps are independent (no shared files, no import dependencies between them), they are all Layer 0 — maximum parallelism.

If the plan is small (3 or fewer steps) or all steps touch the same files: skip parallelism, execute sequentially on the current branch.

Step 2: Execute layers

For each layer, launch subagents in parallel. Each subagent gets:

The overall plan context (one-liner summary, not the full plan)
Its specific step(s) to implement
Codebase conventions from CLAUDE.md
The guardrail commands detected in Phase 0
These worker instructions:

Implement the assigned step using TDD:
1. Write a failing test that defines the expected behavior.
   Match existing test patterns in the project. Run it — verify it fails.
   (If no test infrastructure exists for this area, skip to step 2 and note it.)
2. Write the minimum implementation to make the test pass.
3. Run guardrails: [type check command], [test command], [lint command]
4. If guardrails fail: read the error. Check autopilot-debug logs for context.
   Fix and re-run. Max 2 retries. If still failing after retries, use the
   debug skill's hypothesis approach: generate 3 hypotheses, investigate each.
5. Commit with a descriptive message (not "autopilot step N").
   Stage specific files only — never `git add .` or `git add -A`.
6. Report: list files changed, tests written, guardrail results (pass/fail),
   and any issues you could not resolve.

Sequential fallback: If the host does not support subagents, or if all steps have dependencies, execute sequentially yourself — same guardrail gate after each step.

Layer transitions: Wait for all subagents in a layer to complete before starting the next layer. If a subagent in Layer N fails, assess whether Layer N+1 steps depend on it — if yes, execute those sequentially yourself with the fix. If no, continue the next layer in parallel.

Step 3: Pipeline review while executing

As soon as a layer completes, kick off review on that layer's diff in the background while the next layer executes:

Launch a background subagent to review the completed layer's changes (code reuse, quality, efficiency — same as Phase 4 roast-my-code logic).
The review subagent reports findings but does NOT apply fixes yet — fixes happen in Phase 4 after all execution is done, to avoid conflicts.

This means review runs concurrently with execution of later layers. For single-layer plans, review runs after execution (no pipelining benefit).

Step 4: Final validation

After all layers complete, run the full validation suite:

Type check
Full test suite
Build

Fix any integration issues introduced by combining parallel work.

Phase 4: Review

Collect findings from pipelined review subagents (launched during Phase 3). If no pipelined reviews ran (sequential execution or no subagent support), run the full review now.

Load the roast-my-code skill from disk if available:

ROAST_PATH=""
for p in ~/.claude/skills/roast-my-code/SKILL.md ~/.claude/skills/*/roast-my-code/SKILL.md; do
  [ -f "$p" ] && ROAST_PATH="$p" && break
done
echo "ROAST_PATH=${ROAST_PATH:-not_found}"

If found: read it and follow its full two-pass methodology:

Pass 1 (Simplify): 3 agents (reuse, quality, efficiency) — auto-fix. No override needed, this pass is non-interactive by design.
Pass 2 (Roast + Codex): 2 agents (roast, codex review) — findings only.

If not found: run a lightweight self-review yourself covering the same dimensions on the branch diff.

Auto-decision override for Pass 2

The skill normally waits for the user to pick which sins to fix. In autopilot mode, auto-select option b) Critical + serious and apply immediately. Do not wait.

Applying fixes

Merge findings from all review sources (pipelined + final). Deduplicate. Then for each finding:

Apply the fix.
Re-run guardrails (tsc, tests, lint). If the fix breaks something, revert it. Execution code takes priority over review polish.
Commit review fixes separately: refactor: [description of cleanup]

Phase 5: QA (conditional — web apps only)

Detect if this is a web app:

# Check for web indicators
HAS_WEB=false
[ -f "next.config.js" ] || [ -f "next.config.ts" ] || [ -f "next.config.mjs" ] && HAS_WEB=true
[ -f "vite.config.ts" ] || [ -f "vite.config.js" ] && HAS_WEB=true
[ -f "angular.json" ] && HAS_WEB=true
grep -q '"start"' package.json 2>/dev/null && grep -qE '"(react|vue|svelte|next|nuxt|angular)"' package.json 2>/dev/null && HAS_WEB=true
echo "HAS_WEB=$HAS_WEB"

If not a web app: skip Phase 5, proceed to handoff.

If web app: load the QA skill from disk if available:

QA_PATH=""
for p in ~/.claude/skills/qa/SKILL.md ~/.claude/skills/*/qa/SKILL.md; do
  [ -f "$p" ] && QA_PATH="$p" && break
done
echo "QA_PATH=${QA_PATH:-not_found}"

If found: read it and follow its methodology in diff-aware mode — only test pages affected by the branch diff, not the full app.

If not found: skip QA. Do not attempt browser testing without the QA skill — it requires agent-browser setup and structured methodology.

Auto-decision overrides for QA

Tier: Standard (critical + high + medium).
Clean working tree check: skip — autopilot already committed everything.
Fix loop: follow it. Auto-decide all triage. Commit each fix atomically. If a fix causes regression, revert and mark as deferred.
WTF self-regulation: honor it. If WTF > 20%, stop fixing and log remaining issues for the handoff.
Re-run guardrails after QA fixes (tsc, tests, lint). QA fixes must not break what the execution phase built.

Phase 6: Cleanup

If no debug instrumentation was added in Phase 3 (plan only created new files), skip to Phase 7.

Remove debug instrumentation added in Phase 3 Step 0b.

# Find all autopilot-debug regions
grep -rn "autopilot-debug" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --include="*.py" .

Remove every #region autopilot-debug / #endregion autopilot-debug block and the code between them. Verify no functional code was accidentally wrapped in a debug region.

After removal:

Run the full guardrail suite (tsc, tests, lint, build). The code must pass without the debug logs — they were observability only.
If anything breaks after removing debug logs, something depended on a side effect of the logging (rare but possible). Investigate and fix.
Commit: chore: remove autopilot debug instrumentation

Phase 7: Handoff

The handoff summary is the user's only window into what autopilot decided. You MUST include every section below. Do not abbreviate or skip sections.

Present the result to the user:

## Autopilot Complete

### Branch
`{branch-name}` — [N] commits

### Pipeline
| Phase | Result |
|-------|--------|
| Roast | [N]/100, [X] issues fixed |
| Execute | [N] steps, guardrails pass/fail |
| Review | [N] found, [M] fixed |
| QA | skipped / [N]/100 health |

### Decision Log
| # | Phase | Decision | Principle | Rationale |
|---|-------|----------|-----------|-----------|
| 1 | ... | ... | ... | ... |

Every auto-resolved decision MUST appear here. If no decisions were logged
during execution, that is a bug — go back and reconstruct the log from
the work you did.

### Final Guardrails
- Types: pass/fail
- Tests: pass/fail ([N] passed, [M] failed)
- Build: pass/fail

### Flagged for human review
[Anything borderline or debatable — or "None"]

Suggest next step based on results:

All green → /vs-ship-it
Guardrail failures → list what's broken, recommend fixing
QA deferred issues → note them for future work

Important Rules

Never ask the user anything during Phases 1-6. The only exception is the circuit breaker (NOT_READY verdict in Phase 1).
Log every decision. No silent auto-decisions. The decision log is how the user audits what happened while they were away.
TDD by default. Every implementation step writes the failing test first. Skip only if no test infrastructure exists for that area — and note it in the log.
Debug instrumentation is temporary. When added (Phase 3, existing code only), removed at Phase 6. Never ship debug logs.
Do not skip guardrails. If a project has no test/lint/build commands, note it and continue — but never skip a guardrail that exists.
Atomic commits. One commit per logical step. Never one giant commit at the end.
Match the codebase. Autopilot follows existing patterns, not its own preferences. If the codebase uses callbacks, don't switch to async/await. If it uses classes, don't switch to functions. Read before writing.
Do not expand scope. Implement exactly what the plan says. If you notice something that should be done but isn't in the plan, log it in the handoff — do not implement it.

Workflow

Prev: /vs-grill-me (stress-tested plan) | /vs-rfc-research (approved RFC) | none (autopilot auto-generates a plan) Next: /vs-ship-it (create PR) | /vs-roast-my-code (manual review if not satisfied)

Related skills

More from vltansky/vladstack

Installs

Repository

vltansky/vladstack

GitHub Stars

First Seen

Apr 1, 2026