vs-autopilot
Autopilot
Goal in, shippable branch out. No hand-holding in between.
Autopilot takes a plan (or generates one if you don't have one) and runs seven phases autonomously:
- Roast — load and follow the grill-me skill to stress-test the plan
- Fix — apply grill-me findings directly to the plan
- Execute — create branch, add debug instrumentation, implement with TDD + parallel subagents
- Review — run roast-my-code on the diff (code reuse, quality, efficiency, roast)
- QA — browser-based testing if it's a web app (debug logs provide runtime evidence)
- Cleanup — remove debug instrumentation, verify everything still passes
- Handoff — present results, suggest
/vs-ship-it
Every intermediate decision is auto-resolved. The user sees the finished result.
Decision Principles
These auto-answer every question that would normally go to the user:
- Completeness — ship the whole thing. Pick the approach that covers more edge cases.
- Pragmatic — if two options fix the same thing, pick the simpler one. 5 seconds choosing, not 5 minutes.
- DRY — duplicates existing functionality? Reject. Reuse what exists.
- Explicit over clever — 10-line obvious fix > 200-line abstraction.
- Bias toward action — move forward. Flag concerns in the decision log but don't block.
- Match the codebase — follow existing patterns, naming, and structure. Don't introduce new conventions.
Circuit Breaker
Autopilot runs fully autonomous with one exception:
If the roast phase produces a NOT_READY verdict (score below 60) with unresolved high-severity issues, stop and present the findings to the user. The plan needs human judgment before execution — autopilot is not the right tool for a fundamentally broken plan.
For READY or READY_WITH_RISKS verdicts: continue autonomously.
Phase 0: Setup
Step 1: Read context
- Read CLAUDE.md for project-specific commands (test, build, lint).
- Read the plan file the user pointed to (or the most recent plan in conversation).
- Run
git statusandgit diffto understand the current state. - Note the project's test command, build command, and lint command from CLAUDE.md. If not found, search for them in package.json, Makefile, or equivalent. If still not found, note "unknown" — do not guess.
Step 1a: Auto-generate plan if missing
If no plan was provided (no plan file, no plan in conversation context, no clear implementation spec — just a feature request, bug description, or vague goal):
- Launch a subagent with
EnterPlanModeto create the plan autonomously. The subagent receives:- The user's original request/goal
- CLAUDE.md project context
- Current codebase state (git status, relevant files)
- This directive: "Create a concrete implementation plan. Break it into discrete steps ordered by dependency. Include file paths, function names, and test strategy. Do not ask the user any questions — make every decision yourself using the codebase as evidence. When done, call ExitPlanMode."
- The subagent explores the codebase, designs the approach, and produces a structured plan — all without user interaction.
- Once the subagent returns, use its plan output as the implementation plan for the rest of the pipeline.
- Log this in the decision log: "No plan provided — auto-generated via plan-mode subagent."
This replaces the need for the user to run /vs-brainstorm or write a plan manually.
The roast phase (Phase 1) still stress-tests whatever plan comes out, so a weak
auto-generated plan gets caught by the circuit breaker.
Step 1b: Validate guardrails
Dry-run each guardrail command to confirm it works before starting execution:
# Test each detected command — we only care if it runs without "not found" errors
[test command] 2>&1 | head -5
[typecheck command] --noEmit 2>&1 | head -5
[lint command] 2>&1 | head -5
If any command fails due to missing dependencies: install them now.
Common fixes: bun install, npm install, pip install -r requirements.txt.
Do not proceed to Step 2 until all guardrail commands execute without
"command not found" or "module not found" errors.
Step 2: Create branch
If not already on a feature branch, create one:
# Detect git user prefix from git config or CLAUDE.md conventions
GIT_USER=$(git config user.name 2>/dev/null | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
Derive a branch name from the plan topic: {user-prefix}/{plan-topic}.
Use the branch prefix convention from CLAUDE.md if one exists.
git checkout -b <branch-name>
If already on a feature branch (not main/master/develop), stay on it.
Step 3: Extract plan steps
Break the plan into discrete implementation steps. Each step should be:
- A single logical change (one file or a few tightly coupled files)
- Independently committable
- Ordered by dependency (foundations first, features on top)
List the steps and move on. Do not ask for confirmation.
Phase 1: Roast the Plan
Load the grill-me skill from disk:
GRILL_PATH=""
for p in ~/.claude/skills/grill-me/SKILL.md ~/.claude/skills/*/grill-me/SKILL.md; do
[ -f "$p" ] && GRILL_PATH="$p" && break
done
echo "GRILL_PATH=${GRILL_PATH:-not_found}"
If found: read the file and follow its full methodology. If not found: run a lightweight adversarial review yourself (premise challenge, assumptions, feasibility, edge cases — same dimensions, less ceremony).
Auto-decision overrides for grill-me
Grill-me is interactive by design. In autopilot mode, every interactive point is auto-resolved:
- Step 0.5 (Idea Sharpening): Skip — autopilot assumes the plan is already shaped.
- Step 1 (Initial Assessment): Run the assessment. Do not wait for user acknowledgment.
- Step 2 (Premise Challenge): Run it. Auto-decide: accept premises that are supported by evidence in the codebase or plan. Challenge premises that contradict what you found in pre-scan. For each challenged premise, apply decision principle #1 (completeness) — pick the interpretation that covers more ground.
- Step 3 (Dimension Grilling): For each question grill-me would ask the user:
- Apply the 6 decision principles to pick an answer.
- Log the question, your answer, and which principle drove it.
- Do not wait for user input. Do not present options.
- Step 4 (Outside Voice): Skip — the roast findings are sufficient for autopilot. Subagent overhead is not worth it when the fix phase follows immediately.
- Step 5 (Report): Produce the report. Do not persist to disk — the findings feed directly into Phase 2.
Circuit breaker check
After the roast completes, check the verdict:
- NOT_READY (score < 60, unresolved high-severity): Stop. Present findings to the user. Explain what needs human judgment. Do not proceed to Phase 2.
- READY or READY_WITH_RISKS: Continue autonomously.
Phase 2: Fix the Plan
Take every finding from Phase 1 and apply it to the plan:
- High severity — must be addressed. Modify the plan to resolve each one.
- Medium severity — address if the fix is clear and under ~5 lines of plan change. Otherwise note it as a known risk and continue.
- Low severity — note in the decision log, do not modify the plan.
- Unresolved items — apply decision principles to pick a resolution. Log it.
After fixing, re-extract plan steps if the structure changed (new steps, removed steps, reordered dependencies).
Write the updated plan back to the plan file (or note the changes in the decision log if there is no plan file on disk).
Emit a short transition summary:
Phase 1-2 complete. Roast score: [N]/100. Fixed [X] high, [Y] medium issues. [Z] items noted as known risks. Proceeding to execution with [N] steps.
Phase 3: Execute
Implement the fixed plan. Use parallel subagents when possible.
Step 0: Load TDD and Debug skills
TDD_PATH=""
for p in ~/.claude/skills/tdd/SKILL.md ~/.claude/skills/*/tdd/SKILL.md; do
[ -f "$p" ] && TDD_PATH="$p" && break
done
DEBUG_PATH=""
for p in ~/.claude/skills/debug-mode/SKILL.md ~/.claude/skills/*/debug-mode/SKILL.md; do
[ -f "$p" ] && DEBUG_PATH="$p" && break
done
echo "TDD_PATH=${TDD_PATH:-not_found}"
echo "DEBUG_PATH=${DEBUG_PATH:-not_found}"
If TDD skill found: read it. Workers will follow TDD discipline (test first, then implement). If not found: workers write tests after implementation as a fallback.
If debug skill found: read its instrumentation approach for Phase 3 step 0b below.
Step 0b: Add debug instrumentation (when modifying existing code)
Skip this step if the plan only creates new files. Instrumentation is for observing existing behavior during modification — there's nothing to observe in code that doesn't exist yet.
If the plan modifies existing functions or modules, instrument the codebase for observability during execution:
- Identify the modules/files the plan touches.
- Add lightweight logging at key boundaries — function entry/exit for new or modified functions, error paths, and integration points.
- Use the project's existing logging pattern (console.log, logger.debug, logging.debug, etc.).
Search the codebase for the existing pattern before adding logs:
# Find the project's logging convention grep -r "console\.\|logger\.\|logging\." --include="*.ts" --include="*.js" --include="*.py" -l | head -5 - Wrap all debug instrumentation in a marker comment so it can be found and removed later:
// #region autopilot-debug console.log('[autopilot] functionName entry', { arg1, arg2 }); // #endregion autopilot-debug - Commit the instrumentation separately:
chore: add autopilot debug instrumentation
This instrumentation serves two purposes:
- During execution: when guardrails fail, the logs help diagnose why
- During QA: runtime evidence for browser-based testing
The instrumentation is removed in Phase 6 (cleanup) after everything passes.
Step 1: Build dependency graph
Group plan steps into layers based on dependencies:
Layer 0: [steps with no dependencies — can all run in parallel]
Layer 1: [steps that depend on Layer 0]
Layer 2: [steps that depend on Layer 1]
...
If all steps are independent (no shared files, no import dependencies between them), they are all Layer 0 — maximum parallelism.
If the plan is small (3 or fewer steps) or all steps touch the same files: skip parallelism, execute sequentially on the current branch.
Step 2: Execute layers
For each layer, launch subagents in parallel. Each subagent gets:
- The overall plan context (one-liner summary, not the full plan)
- Its specific step(s) to implement
- Codebase conventions from CLAUDE.md
- The guardrail commands detected in Phase 0
- These worker instructions:
Implement the assigned step using TDD:
1. Write a failing test that defines the expected behavior.
Match existing test patterns in the project. Run it — verify it fails.
(If no test infrastructure exists for this area, skip to step 2 and note it.)
2. Write the minimum implementation to make the test pass.
3. Run guardrails: [type check command], [test command], [lint command]
4. If guardrails fail: read the error. Check autopilot-debug logs for context.
Fix and re-run. Max 2 retries. If still failing after retries, use the
debug skill's hypothesis approach: generate 3 hypotheses, investigate each.
5. Commit with a descriptive message (not "autopilot step N").
Stage specific files only — never `git add .` or `git add -A`.
6. Report: list files changed, tests written, guardrail results (pass/fail),
and any issues you could not resolve.
Sequential fallback: If the host does not support subagents, or if all steps have dependencies, execute sequentially yourself — same guardrail gate after each step.
Layer transitions: Wait for all subagents in a layer to complete before starting the next layer. If a subagent in Layer N fails, assess whether Layer N+1 steps depend on it — if yes, execute those sequentially yourself with the fix. If no, continue the next layer in parallel.
Step 3: Pipeline review while executing
As soon as a layer completes, kick off review on that layer's diff in the background while the next layer executes:
- Launch a background subagent to review the completed layer's changes (code reuse, quality, efficiency — same as Phase 4 roast-my-code logic).
- The review subagent reports findings but does NOT apply fixes yet — fixes happen in Phase 4 after all execution is done, to avoid conflicts.
This means review runs concurrently with execution of later layers. For single-layer plans, review runs after execution (no pipelining benefit).
Step 4: Final validation
After all layers complete, run the full validation suite:
- Type check
- Full test suite
- Build
Fix any integration issues introduced by combining parallel work.
Phase 4: Review
Collect findings from pipelined review subagents (launched during Phase 3). If no pipelined reviews ran (sequential execution or no subagent support), run the full review now.
Load the roast-my-code skill from disk if available:
ROAST_PATH=""
for p in ~/.claude/skills/roast-my-code/SKILL.md ~/.claude/skills/*/roast-my-code/SKILL.md; do
[ -f "$p" ] && ROAST_PATH="$p" && break
done
echo "ROAST_PATH=${ROAST_PATH:-not_found}"
If found: read it and follow its full two-pass methodology:
- Pass 1 (Simplify): 3 agents (reuse, quality, efficiency) — auto-fix. No override needed, this pass is non-interactive by design.
- Pass 2 (Roast + Codex): 2 agents (roast, codex review) — findings only.
If not found: run a lightweight self-review yourself covering the same dimensions on the branch diff.
Auto-decision override for Pass 2
The skill normally waits for the user to pick which sins to fix. In autopilot mode, auto-select option b) Critical + serious and apply immediately. Do not wait.
Applying fixes
Merge findings from all review sources (pipelined + final). Deduplicate. Then for each finding:
- Apply the fix.
- Re-run guardrails (tsc, tests, lint). If the fix breaks something, revert it. Execution code takes priority over review polish.
- Commit review fixes separately:
refactor: [description of cleanup]
Phase 5: QA (conditional — web apps only)
Detect if this is a web app:
# Check for web indicators
HAS_WEB=false
[ -f "next.config.js" ] || [ -f "next.config.ts" ] || [ -f "next.config.mjs" ] && HAS_WEB=true
[ -f "vite.config.ts" ] || [ -f "vite.config.js" ] && HAS_WEB=true
[ -f "angular.json" ] && HAS_WEB=true
grep -q '"start"' package.json 2>/dev/null && grep -qE '"(react|vue|svelte|next|nuxt|angular)"' package.json 2>/dev/null && HAS_WEB=true
echo "HAS_WEB=$HAS_WEB"
If not a web app: skip Phase 5, proceed to handoff.
If web app: load the QA skill from disk if available:
QA_PATH=""
for p in ~/.claude/skills/qa/SKILL.md ~/.claude/skills/*/qa/SKILL.md; do
[ -f "$p" ] && QA_PATH="$p" && break
done
echo "QA_PATH=${QA_PATH:-not_found}"
If found: read it and follow its methodology in diff-aware mode — only test pages affected by the branch diff, not the full app.
If not found: skip QA. Do not attempt browser testing without the QA skill — it
requires agent-browser setup and structured methodology.
Auto-decision overrides for QA
- Tier: Standard (critical + high + medium).
- Clean working tree check: skip — autopilot already committed everything.
- Fix loop: follow it. Auto-decide all triage. Commit each fix atomically. If a fix causes regression, revert and mark as deferred.
- WTF self-regulation: honor it. If WTF > 20%, stop fixing and log remaining issues for the handoff.
- Re-run guardrails after QA fixes (tsc, tests, lint). QA fixes must not break what the execution phase built.
Phase 6: Cleanup
If no debug instrumentation was added in Phase 3 (plan only created new files), skip to Phase 7.
Remove debug instrumentation added in Phase 3 Step 0b.
# Find all autopilot-debug regions
grep -rn "autopilot-debug" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --include="*.py" .
Remove every #region autopilot-debug / #endregion autopilot-debug block and the
code between them. Verify no functional code was accidentally wrapped in a debug region.
After removal:
- Run the full guardrail suite (tsc, tests, lint, build). The code must pass without the debug logs — they were observability only.
- If anything breaks after removing debug logs, something depended on a side effect of the logging (rare but possible). Investigate and fix.
- Commit:
chore: remove autopilot debug instrumentation
Phase 7: Handoff
The handoff summary is the user's only window into what autopilot decided. You MUST include every section below. Do not abbreviate or skip sections.
Present the result to the user:
## Autopilot Complete
### Branch
`{branch-name}` — [N] commits
### Pipeline
| Phase | Result |
|-------|--------|
| Roast | [N]/100, [X] issues fixed |
| Execute | [N] steps, guardrails pass/fail |
| Review | [N] found, [M] fixed |
| QA | skipped / [N]/100 health |
### Decision Log
| # | Phase | Decision | Principle | Rationale |
|---|-------|----------|-----------|-----------|
| 1 | ... | ... | ... | ... |
Every auto-resolved decision MUST appear here. If no decisions were logged
during execution, that is a bug — go back and reconstruct the log from
the work you did.
### Final Guardrails
- Types: pass/fail
- Tests: pass/fail ([N] passed, [M] failed)
- Build: pass/fail
### Flagged for human review
[Anything borderline or debatable — or "None"]
Suggest next step based on results:
- All green →
/vs-ship-it - Guardrail failures → list what's broken, recommend fixing
- QA deferred issues → note them for future work
Important Rules
- Never ask the user anything during Phases 1-6. The only exception is the circuit breaker (NOT_READY verdict in Phase 1).
- Log every decision. No silent auto-decisions. The decision log is how the user audits what happened while they were away.
- TDD by default. Every implementation step writes the failing test first. Skip only if no test infrastructure exists for that area — and note it in the log.
- Debug instrumentation is temporary. When added (Phase 3, existing code only), removed at Phase 6. Never ship debug logs.
- Do not skip guardrails. If a project has no test/lint/build commands, note it and continue — but never skip a guardrail that exists.
- Atomic commits. One commit per logical step. Never one giant commit at the end.
- Match the codebase. Autopilot follows existing patterns, not its own preferences. If the codebase uses callbacks, don't switch to async/await. If it uses classes, don't switch to functions. Read before writing.
- Do not expand scope. Implement exactly what the plan says. If you notice something that should be done but isn't in the plan, log it in the handoff — do not implement it.
Workflow
Prev: /vs-grill-me (stress-tested plan) | /vs-rfc-research (approved RFC) | none (autopilot auto-generates a plan)
Next: /vs-ship-it (create PR) | /vs-roast-my-code (manual review if not satisfied)
More from vltansky/vladstack
vs-fix
Autonomous bug fix pipeline. Investigates root cause, reproduces with a failing test, fixes, verifies, reviews, and hands back a clean branch. Use when the user says 'fix', 'fix this bug', 'fix this', 'something is broken', 'this doesn't work', or describes a bug to fix. Unlike /debug-mode (investigation only), /fix goes end-to-end: find it, prove it, fix it, verify it.
1vs-tdd
Test-driven development loop. Write failing test first, then implement to make it pass. Use when the user says 'tdd', 'test first', 'write the test first', 'failing test', 'red green refactor', or for any bug fix where the fix should be proven by a test. Also use when autopilot or other skills need test-first execution.
1vs-qa
Systematically QA test a web application and fix bugs found. Runs browser-based testing, iteratively fixes bugs in source code, commits each fix atomically, and re-verifies. Use when asked to 'qa', 'QA', 'test this site', 'find bugs', 'test and fix', or 'fix what's broken'. Three tiers: Quick (critical/high only), Standard (+medium, default), Exhaustive (+cosmetic). Produces before/after health scores, fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
1vs-debug-mode
Systematic root-cause debugging with optional runtime log server for frontend/UI bugs. Hypothesis-driven investigation. Use when the user says 'debug', 'debug this', 'investigate', 'why is this broken', 'root cause', 'trace this bug', 'figure out why', 'this doesn't work', 'unexpected behavior', 'UI not updating', 'state is wrong', 'value is null/undefined', 'click doesn't work'. Also use instead of adding console.log — this skill collects logs automatically.
1vs-rfc-research
Research a technical topic and produce an RFC document backed by real code evidence from GitHub. Use when user says 'write an RFC', 'RFC research', 'create RFC for', 'technical proposal', 'design doc', 'investigate X', 'research X and write a proposal', 'architecture decision record', 'ADR', or needs a structured technical decision document with prior art analysis.
1vs-roast-my-code
Code review: auto-fix (reuse, quality, efficiency), then roast + Codex cross-model review. Use when the user says 'review', 'review this', 'code review', 'simplify', 'clean up my code', 'roast my code', 'roast this', 'tear this apart', 'be brutal', 'savage review', 'how bad is this', 'rate my code', or wants post-change quality feedback.
1