run-automated-tests
Skill: Run Automated Tests
Purpose
Determine how a target repository expects automated tests to be executed (commands, frameworks, prerequisites, and scope), then run the best matching test suite(s) with a safety-first interaction policy.
Core Objective
Primary Goal: Produce test execution results with evidence-based command selection and safety guardrails.
Success Criteria (ALL must be met):
- ✅ Test plan discovered: Evidence sources identified (docs, CI configs, or build manifests)
- ✅ Commands selected: Appropriate test commands chosen based on mode (fast/ci/full) and constraints
- ✅ User confirmation obtained: Approval received before installing dependencies, using network, or starting services
- ✅ Tests executed: Commands run with captured output and exit codes
- ✅ Results summarized: Test Plan Summary produced with evidence, commands, execution status, and failures (if any)
Acceptance Test: Can a developer reproduce the test execution by following the Test Plan Summary without additional context?
Scope Boundaries
This skill handles:
- Discovering test commands from repository evidence (docs, CI, build manifests)
- Selecting appropriate test commands based on mode and constraints
- Executing tests with safety guardrails and user confirmation
- Summarizing test results with evidence and failure diagnostics
This skill does NOT handle:
- Test quality assessment or coverage analysis (use
review-testing) - Fixing failing tests or debugging test failures (use
run-repair-loop) - Writing new tests or test infrastructure (use development skills)
- Reviewing test code for best practices (use
review-testing)
Handoff point: When tests complete (pass or fail), hand off to run-repair-loop for fixing failures or review-testing for quality assessment.
Use Cases
- You cloned a repo and want the correct test command without guessing.
- A repo has multiple test layers (unit/integration/e2e) and you need a safe default run plan.
- CI is failing and you want to reproduce locally by running the same commands used in workflows.
Behavior
-
Establish scope and constraints (ask if ambiguous)
- If the user did not specify, default to a fast, local, non-destructive run:
- Unit tests only, no external services, no Docker, no network-dependent setup.
- Ask the user to choose a mode if needed:
fast: unit tests only, minimal setup.ci: mirror CI workflow commands as closely as possible.full: include integration/e2e tests and service dependencies.
- Ask whether Docker is allowed, whether network access is allowed, and whether installing dependencies is allowed.
- If the user did not specify, default to a fast, local, non-destructive run:
-
Discover the test plan (evidence-based)
- Read these sources in order; stop early if a clear, explicit test command is found:
README.md,CONTRIBUTING.md,TESTING.md,docs/testing*,Makefile- CI configs:
.github/workflows/*.yml,.gitlab-ci.yml,azure-pipelines.yml,Jenkinsfile - Build manifests:
package.json,pyproject.toml,setup.cfg,tox.ini,go.mod,pom.xml,build.gradle*,*.csproj,Cargo.toml
- Identify:
- Primary test entrypoints (
npm test,pnpm test,yarn test,pytest,tox,go test,dotnet test,mvn test,gradle test,cargo test, etc.) - Test layers and markers (unit vs integration vs e2e)
- Environment prerequisites (DB, Redis, Docker Compose, required env vars, secrets)
- How CI sets up dependencies (services, caches, artifacts)
- Primary test entrypoints (
- Prefer explicit instructions found in docs or CI over heuristics.
- Read these sources in order; stop early if a clear, explicit test command is found:
-
Select an execution plan
- If
cimode: derive the run sequence from the repo's CI workflow steps (closest match). - If
fastmode: pick the most direct unit-test command with the least prerequisites. - If multiple stacks exist (e.g., backend + frontend), propose running each stack separately in a deterministic order.
- If the plan requires dependency installation or service startup, request confirmation before proceeding.
- If
-
Execute with guardrails
- Always print the exact commands you will run before running them.
- Use a working directory rooted at the target repo (default
.). - Capture and summarize failures:
- First failing command and exit code
- The most relevant error excerpt
- Next actions (missing toolchain, missing env var, service not running, etc.)
- Avoid destructive operations:
- Do not run
rm -rf,git clean -fdx,docker system prune, or database drop/migrate commands without explicit user approval.
- Do not run
- If the repo requires secrets, do not ask the user to paste secrets into chat. Prefer
.envfiles, secret managers, or documented local dev flows.
Input & Output
Input
- Target repository path (default
.). - Mode:
fast(default),ci, orfull. - Constraints: allow dependency install (yes/no), allow network (yes/no), allow Docker (yes/no).
Output
- A short "Test Plan Summary" containing:
- Evidence: which files/paths informed the plan
- Chosen commands (in order)
- Assumptions and prerequisites
- What was executed and what was skipped (and why)
- Command transcript snippets sufficient to debug failures (do not dump extremely long logs unless asked).
Restrictions
Hard Boundaries
- Do not invent test commands when evidence exists (prefer docs/CI).
- Do not install dependencies, run Docker, or start external services without confirmation.
- Do not modify repository files unless the user explicitly requests it (exception: generating a report file if the user asked for artifacts).
- Do not exfiltrate secrets; do not request sensitive credentials in chat.
Skill Boundaries (Avoid Overlap)
Do NOT do these (other skills handle them):
- Test quality assessment: Evaluating test coverage, test design, or testing best practices → Use
review-testing - Fixing test failures: Debugging failing tests, repairing broken test code, or investigating root causes → Use
run-repair-loop - Writing tests: Creating new test cases, test infrastructure, or test frameworks → Use development/implementation skills
- Code review: Reviewing test code for quality, maintainability, or best practices → Use
review-testing - Repository analysis: Comprehensive codebase structure analysis or architecture review → Use
review-codebase
When to stop and hand off:
- Tests fail and user asks "why?" or "how to fix?" → Hand off to
run-repair-loopfor debugging and repair - User asks "are these tests good?" or "what's our coverage?" → Hand off to
review-testingfor quality assessment - User asks "can you write tests for X?" → Hand off to development workflow for test implementation
- Tests pass and user asks "what should we test next?" → Hand off to
review-testingfor test strategy recommendations
Self-Check
Core Success Criteria (ALL must be met)
- Test plan discovered: Evidence sources identified (docs, CI configs, or build manifests)
- Commands selected: Appropriate test commands chosen based on mode (fast/ci/full) and constraints
- User confirmation obtained: Approval received before installing dependencies, using network, or starting services
- Tests executed: Commands run with captured output and exit codes
- Results summarized: Test Plan Summary produced with evidence, commands, execution status, and failures (if any)
Process Quality Checks
- Evidence-based selection: Did I identify at least one authoritative test instruction source (doc file, CI workflow, or build manifest)?
- Safety guardrails applied: Did I ask for confirmation before any action that installs dependencies, uses network, starts Docker/services, or changes state?
- Commands printed: Did I print the exact commands before running them?
- Failures diagnosed: If tests failed, did I provide the first failing command, exit code, and likely root cause category?
- No destructive operations: Did I avoid running destructive commands (
rm -rf,git clean,docker system prune, database drops) without explicit approval? - No secret exfiltration: Did I avoid requesting sensitive credentials in chat and prefer
.envfiles or documented local dev flows?
Acceptance Test
Can a developer reproduce the test execution by following the Test Plan Summary without additional context?
If NO: Test Plan Summary is incomplete. Add missing evidence, commands, or prerequisites.
If YES: Skill execution is complete. Proceed to handoff if needed.
Examples
Example 1: JavaScript repo with package.json
User: "Run tests for this repo."
Agent:
- Inspect
package.jsonscripts and.github/workflows/*. - Decide mode
fastand propose:npm test(orpnpm test/yarn testif the repo standardizes on it)
- Ask: "Install dependencies (
npm ci) and allow network?" - Run:
npm cinpm test
- Summarize results and point to failing test output if any.
Example 2 (Edge case): Monorepo with integration tests requiring Docker
User: "Mirror CI locally."
Agent:
- Parse
.github/workflows/ci.ymland identify separate jobs:- backend unit tests
- frontend tests
- integration tests with
docker compose
- Ask for confirmation:
- allow Docker
- allow network
- which jobs to run (all vs only failing job)
- Execute in a controlled order:
- install deps per job
- run unit tests first
- bring up services for integration tests
- If integration tests fail, summarize:
- service health / port conflicts
- missing env vars
- how CI config differs from local
Appendix: Output contract
Each skill execution MUST produce a Test Plan Summary in this exact JSON format:
{
"test_plan_summary": {
"mode": "fast | ci | full",
"evidence": ["path/to/source1", "path/to/source2"],
"commands": [
{"command": "npm test", "purpose": "run unit tests", "order": 1}
],
"prerequisites": ["npm ci", "Docker running"],
"executed": ["npm ci", "npm test"],
"skipped": ["integration tests - require Docker"],
"result": {
"status": "passed | failed | blocked",
"exit_code": 0,
"first_failure": {
"command": "npm test",
"exit_code": 1,
"error_excerpt": "FAIL src/utils.test.js"
}
}
}
}
| Element | Type | Description |
|---|---|---|
mode |
string | Selected mode: fast, ci, or full |
evidence |
array | Source files that informed the test plan |
commands |
array | Selected test commands with purpose and order |
prerequisites |
array | Required setup steps |
executed |
array | Commands actually run |
skipped |
array | Commands skipped and reason |
result.status |
string | passed, failed, or blocked |
result.exit_code |
number | Exit code of test command |
result.first_failure |
object | First failure details (if any) |
This schema enables Agent consumption without prose parsing.