agent-integration-testing
Agent Integration Testing
Overview
This skill guides the creation and autonomous execution of verifiable integration test specifications. It ensures that tests are actionable by agents, properly documented, and systematically executed by subagents to validate features or fix failures.
Core Process
-
Investigate Codebase Area
- By default, investigate the entire codebase to understand the context.
- If the user specifies a feature area, use
globandgrepto narrow the investigation.
-
Write Test Specification
- Create a test spec file at
./tests/<name>.md. - Use
<name> = "integration"unless the user specifies a particular feature area. - The document MUST include a Prerequisites section at the top detailing any setup needed before tests become runnable (e.g., environment variables, database seeding, background services).
- Create a test spec file at
-
Define Verifiable Tests
- Each test must be written in plain English.
- Include clear steps to reproduce.
- Include a set of expectations.
- CRITICAL: Every expectation must be strictly verifiable by an agent using available tools (e.g., shell commands, HTTP requests, reading file outputs). If a test cannot be verified by an agent, it is invalid and must be rewritten or removed.
-
Execute Tests via Subagents
- Spawn subagents (using the
Tasktool or@mentionsubagent system) to run each individual test. - The subagent must follow the prerequisites, execute the steps, and validate the outcomes against the expectations.
- Collect the results (Pass/Fail and logs) from the subagents.
- Spawn subagents (using the
-
Fix Failures (Optional)
- If the user explicitly specifies that failures should be fixed, spawn another subagent (e.g., the
SWEorBUILDERagent) to investigate and fix any noted failures.
- If the user explicitly specifies that failures should be fixed, spawn another subagent (e.g., the
Quick Reference
| Action | Pattern / Command |
|---|---|
| Test File Location | ./tests/<name>.md (default: integration.md) |
| Prerequisites | Must be documented at the top of the test file |
| Test Format | Plain English, Repro Steps, Verifiable Expectations |
| Execution | Spawn one subagent per test or test suite |
| Fixing | Spawn SWE/BUILDER subagent if requested by user |
Red Flags - STOP and Start Over
- Unverifiable Tests: "Verify the UI looks nice" or "Check if the animation is smooth." (Agents cannot verify visual aesthetics without specific tools). Fix: Rewrite to check DOM elements, network responses, or file states.
- Missing Prerequisites: Subagents failing because the server wasn't started. Fix: Ensure the prerequisite section explicitly defines the commands to start dependencies.
- Executing Tests Manually: Running tests in the main conversation thread instead of spawning subagents. Fix: Dispatch parallel subagents for isolated execution.
Example Test Specification (./tests/auth-integration.md)
# Auth Integration Tests
## Prerequisites
- Start the test database: `docker compose up -d db`
- Run migrations: `npm run migrate`
- Start the server in background: `npm run start:test &`
## Test 1: User Registration
**Steps:**
1. Send a POST request to `/api/register` with payload `{"email": "test@example.com", "password": "pass"}`.
**Expectations:**
1. The HTTP response status must be `201 Created`.
2. A subsequent query to the database using `sqlite3 test.db "SELECT email FROM users WHERE email='test@example.com';"` must return the email.
More from av/skills
tinygrad
Deep learning framework development with tinygrad - a minimal tensor library with autograd, JIT compilation, and multi-device support. Use when writing neural networks, training models, implementing tensor operations, working with UOps/PatternMatcher for graph transformations, or contributing to tinygrad internals. Triggers on tinygrad imports, Tensor operations, nn modules, optimizer usage, schedule/codegen work, or device backends.
19run-llms
Comprehensive guide for setting up and running local LLMs using Harbor. Use when user wants to run LLMs locally, set up or troubleshoot Ollama, Open WebUI, llama.cpp, vLLM, SearXNG, Open Terminal, or similar local AI services. Covers full setup from Docker prerequisites through running models, per-service configuration, VRAM optimization, GPU troubleshooting, web search integration, code execution, profiles, tunnels, and advanced features. Includes decision trees for autonomous agent workflows and step-by-step troubleshooting playbooks.
16preact-buildless-frontend
Build-less ESM frontends that run directly in the browser without bundlers. Use this skill when creating static frontends, SPAs without build tools, prototypes, or when the user explicitly wants no Vite/Webpack/bundler. Covers import maps, CDN imports, cache-busting, hash routing, and performance patterns.
12turso-db
Install, configure, and work with Turso DB — an in-process SQLite-compatible relational database engine written in Rust. Use when the user needs to (1) install Turso DB, (2) create or query databases with the tursodb CLI shell, (3) use Turso from JavaScript/Node.js via @tursodatabase/database, (4) work with vector search or embeddings in Turso, (5) set up full-text search with FTS indexes, (6) configure transactions including MVCC concurrent transactions, (7) enable encryption at rest, or (8) use Change Data Capture (CDC) for audit logging.
8boost-modules
Create custom modules for [Harbor Boost](https://github.com/av/harbor/tree/main/boost), an optimizing LLM proxy. Use when building Python modules that intercept/transform LLM chat completions—reasoning chains, prompt injection, structured outputs, artifacts, or custom workflows. Triggers on requests to create Boost modules, extend LLM behavior via proxy, or implement chat completion middleware.
8bugbash
Systematically explore and test any software project (CLI, API, Backend, Library, etc.) to find bugs, usability issues, and edge cases. Produces a structured report with full reproduction evidence (exact commands, inputs, logs, and tracebacks) for every issue.
5