swain-test
swain-test
Automated two-phase test gate. Phase 1 runs deterministic integration tests via swain-test.sh. Phase 2 runs agent-driven smoke tests that verify observable behavior against spec acceptance criteria, changed skills, and project-declared smoke items.
The gate is invoked by swain-sync before committing and by swain-release before cutting a release. Agents may also invoke it directly when they want to verify work before declaring completion.
Inputs
--artifacts SPEC-NNN[,SPEC-NNN...]— optional comma-separated list of artifact IDs the agent considers relevant to the current work. The gate resolves these to file paths and uses them to drive Phase 2 spec-derived verification.
If the agent is unsure which artifacts apply, they should pass the IDs from recent commits, the active focus lane, and any spec tags on tickets claimed during the work.
Phase 1 — Integration tests (script-driven)
- Invoke the integration script:
REPO_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)" bash "$REPO_ROOT/.agents/bin/swain-test.sh" --artifacts "$ARTIFACT_IDS" - Read the script's exit code:
- exit 1 — integration tests failed. Report the failure output to the operator. Fix the issue, re-stage any edits, and re-invoke the gate from Phase 1. Do not proceed to Phase 2.
- exit 0 — integration tests passed or were skipped. Read stdout and proceed to Phase 2.
- Parse the script's stdout. It emits these sections:
## INTEGRATION— always present. Shows the command, status, duration, and (on failure) output tail.## ARTIFACTS— present when--artifactswas passed and IDs resolved. Lists spec paths.## SKILLS— present when files underskills/,.agents/skills/, or.claude/skills/changed in the branch. Lists changed skill files.## SMOKE— present when.agents/testing.jsondeclares smoke items.## FALLBACK— present only when none of the above context sections appeared.
Phase 2 — Smoke tests (agent-executed)
Execute the sections in the order below. Skip any section that was not emitted.
Step 1 — Spec-derived verification (## ARTIFACTS)
For each spec path listed:
- Read the spec file.
- Locate the Acceptance Criteria section.
- For each acceptance criterion, exercise the relevant component and observe the outcome. Write a short evidence line per criterion: what was done, what was observed, pass or fail.
- If a criterion cannot be exercised (e.g., external dependency unavailable), record that explicitly rather than silently skipping.
Step 2 — Behavioral verification (## SKILLS)
When ## SKILLS shows detected: true:
- For each changed skill file, dispatch a subagent with a representative prompt that should trigger the skill. The haiku model is recommended for cost; the choice is not required.
- Observe whether the skill activates and produces the expected output.
- Record: the prompt used, whether activation was confirmed, a short summary of the output.
The goal is not exhaustive coverage — one representative prompt per changed skill is enough to catch structural regressions (broken frontmatter, missing required sections, wrong file path).
Step 3 — Standing smoke tests (## SMOKE)
Each smoke entry in .agents/testing.json is a declarative instruction. Execute each as an agentic task and record the result.
Step 4 — Fallback (## FALLBACK)
If ## ARTIFACTS, ## SKILLS, and ## SMOKE were all absent, follow the fallback instruction:
- Describe what changed in the working tree.
- Stand up the affected component (run the script, start the server, invoke the function).
- Exercise the happy path.
- Report what was observed.
Failure handling
- Any Phase 2 failure — fix the issue, re-stage, and re-invoke the full gate from Phase 1. Never resume Phase 2 mid-flight; the Phase 1 re-run confirms that fixes did not break integration tests.
- Two failed full-gate attempts — do not attempt a third fix. Escalate to the operator with a structured report: every failure observed, every fix attempted, the observable state after each attempt. The two-strike rule prevents runaway repair loops that consume tokens without converging.
- Operator override — the operator can bypass the gate by stating a reason. Record the override verbatim in the evidence summary as
operator override: <reason>and in the commit message when the work is committed.
Evidence summary
After the gate passes (or is overridden), produce a structured evidence summary for downstream consumers (SPEC-226 evidence recording, retros, commit messages). Include:
- Phase 1 status (PASS / FAIL / SKIP) and command.
- For each Phase 2 step executed: which section drove it, what was done, what was observed, pass or fail.
- Any overrides and their reasons.
Example flow
Agent: invokes swain-test.sh --artifacts SPEC-220
Script: exits 0, emits ## INTEGRATION (PASS), ## ARTIFACTS (SPEC-220 path), ## SKILLS (detected: true)
Agent (Step 1): reads SPEC-220, exercises each AC, records evidence.
Agent (Step 2): dispatches haiku subagent with a prompt that should trigger the changed skill; confirms activation.
Agent (Step 4): no ## FALLBACK section emitted; step skipped.
Agent: writes evidence summary.
More from cristoslc/swain
swain-do
Task tracking and implementation execution for swain projects. Invoke whenever a SPEC needs an implementation plan, the user asks what to work on next, wants to check or update task status, claim or close tasks, manage dependencies, abandon work, bookmark context, or record a decision. Also invoked by swain-design after creating a SPEC that's ready for implementation. Tracks SPECs and SPIKEs — not EPICs, VISIONs, or JOURNEYs directly (those get decomposed into SPECs first). Triggers also on: 'bookmark', 'remember where I am', 'record decision'.
125swain-update
Update swain skills to the latest version. Use when the user says 'update swain', 'upgrade swain', 'pull latest swain', 'reinstall swain', 'refresh skills', or wants to update their swain skills installation. Uses npx to pull the latest swain release from GitHub, with a git-clone fallback, then invokes swain-doctor to reconcile governance and validate project health.
122swain-release
Cut a release — detect versioning context, generate a changelog from conventional commits, bump versions, create a git tag, and optionally squash-merge to a release branch. Use when the user says "release", "cut a release", "tag a release", "bump the version", "create a changelog", "ship a version", "publish", or any variation of shipping/publishing a version. This skill is intentionally generic and works across any repo — it infers context from git history and project structure rather than assuming a specific setup. Supports the trunk+release branch model (ADR-013) when a `release` branch exists.
122swain-design
Create, validate, and transition documentation artifacts (Vision, Initiative, Epic, Spec, Spike, ADR, Persona, Runbook, Design, Journey) through lifecycle phases. Handles spec writing, feature planning, epic creation, initiative creation, ADR drafting, research spikes, persona definition, runbook creation, design capture, architecture docs, phase transitions, implementation planning, cross-reference validation, and audits. Also invoke to update frontmatter fields, re-parent an artifact under a different epic or initiative, or set priority on a Vision or Initiative. Chains into swain-do for implementation tracking on SPEC; decomposes EPIC/VISION/INITIATIVE/JOURNEY into children first.
122swain
Meta-router for swain skills. Invoke when the user explicitly asks swain to do something — not merely when they mention the project by name. Routes to the matching swain-* sub-skill — only load the one that matches. If the user's intent matches multiple rows, pick the most specific match. Sub-skills that are not installed will gracefully no-op.
119swain-search
Trove collection and normalization for swain-design artifacts. Collects sources from the web, local files, and media (video/audio), normalizes them to markdown, and caches them in reusable troves. Use when researching a topic for a spike, ADR, vision, or any artifact that needs structured research. Also use to refresh stale troves or extend existing ones with new sources. Triggers on: 'research X', 'gather sources for', 'compile research on', 'search for sources about', 'refresh the trove', 'find existing research on X', or when swain-design needs research inputs for a spike or ADR.
113