quality-assurance
Quality Assurance
Quality assurance is a delivery system, not a phase. Reconstruct intended behavior, choose the cheapest evidence that can prove or falsify it, then wire the same verification into repeatable local and CI workflows.
In command examples below, <skill-dir> means the installed quality-assurance skill directory and <repo-root> means the target repository root.
Start Here
- Run
python <skill-dir>/scripts/qa-scan.py <repo-root>when the bundled scanner is available; otherwise perform the same stack and CI inventory manually. - Preserve and read the full failure artifact set before changing code: stack traces, failing assertions, screenshots, traces, query logs, retry logs, seeds, and the first bad CI step.
- Reconstruct the intended behavior and the cheapest proof that can falsify or confirm it.
- Reuse repo commands from
Makefile,package.json,pyproject.toml,tox.ini,noxfile.py,justfile,Taskfile.yml, or CI config before inventing new ones. - Read repo-local instructions before deciding whether tests may be run, which suites are mandatory, or how evidence must be reported.
- Load only the reference files that match the task, and state the proof command before making any success claim.
Operating Rules
- Evidence before claims. Do not say fixed, passing, or complete without fresh command output.
- Reproduce before repair. A regression test is part of the fix whenever the repo and task permit it.
- Read the full artifact before editing. The first failing step, root-cause frame, slow query, or browser trace usually matters more than the last summary line.
- Use the lowest-fidelity test that can actually prove the behavior. Escalate only when cheaper layers cannot prove it.
- Mock boundaries, not business logic.
- Frontend QA must prove user-visible state transitions, not just that markup rendered.
- Do not delete, weaken, or silently skip existing tests without explicit sign-off from the user or repo owners.
- Review comments are technical claims to evaluate, not social cues to obey.
- Flaky tests are bugs. Quarantine is temporary containment, not completion.
- Coverage is a lagging indicator. Use it to find blind spots, not to justify weak tests.
- CI-only failures usually mean environment, ordering, timing, data, or cache assumptions were hidden locally. Debug those assumptions directly.
- At scale, speed comes from suite architecture, hermetic setup, sharding, disciplined test selection, and high-signal artifacts.
QA Router
Repo and stack detection
Use scripts/qa-scan.py. It detects likely languages, frameworks, test runners, linters, and CI providers, then suggests which references to load and which commands probably matter.
Code review and review feedback
Read references/code-review.md for:
- review output format
- severity taxonomy
- self-review before requesting review
- receiving feedback and pushing back with evidence
Test strategy and regression design
Read references/test-strategy.md for:
- test type selection
- red-green-refactor and regression rules
- mocking, fixtures, and data strategy
- coverage interpretation
Backend-heavy QA
Read references/backend-testing.md for:
- APIs, services, jobs, queues, migrations, and contracts
- common backend stack patterns
- database and concurrency concerns
Frontend-heavy QA
Read references/frontend-testing.md for:
- component, integration, browser, accessibility, and visual testing
- async UI control
- provider and fixture setup
- network, storage, and time handling
- flake repair and incremental test workflow
Failure triage and debugging
Read references/debugging.md for:
- failing tests
- CI-only failures
- flaky tests
- performance and observability-led debugging
CI/CD and quality gates
Read references/ci-cd.md for:
- local-to-CI parity
- pipeline staging
- caching, sharding, artifacts, and branch protection
- provider patterns for common CI systems
Suite scaling and monorepos
Read references/suite-architecture.md for:
- ownership
- test selection
- quarantine policy
- monorepo and large-suite design
Completion and release verification
Read references/verification.md before saying something is fixed, asking for merge, or treating a release as ready.
Anti-pattern sweep
Read references/anti-patterns.md for fast smell detection across review, testing, debugging, and CI.
Standard Loops
Review loop
- Reconstruct intended behavior from the issue, PR description, diff, or failing report.
- Review highest-risk paths first: correctness, data integrity, auth, concurrency, performance, and user-visible regressions.
- Emit findings with severity, impact, and concrete file or command evidence.
- Propose the smallest safe fix or the precise follow-up question needed to unblock.
- Verify changed behavior with focused commands.
Bug-fix loop
- Reproduce.
- Isolate the smallest failing case.
- Add or identify a failing regression test.
- Fix the root cause, not just the symptom.
- Run the focused proof command, then broader regression commands.
Frontend verification loop
- Choose the correct test layer: unit, component, integration, browser, or visual.
- Render through realistic providers and control network, time, storage, viewport, locale, and feature flags explicitly.
- If browser state is unclear, inspect the rendered DOM, screenshot, console, or trace before automating more actions.
- Assert loading, empty, error, success, retry, disabled, and optimistic states when they matter.
- Verify accessible names, keyboard flow, and focus behavior for user-facing changes.
- Run the smallest proof first, then broaden only when necessary.
Test-authoring loop
- Decide which layer owns the behavior.
- Build data with factories, builders, or fixtures instead of ad hoc duplication.
- Assert observable outcomes.
- Remove timing, order, and environment sensitivity.
- For large scopes, work incrementally: one file or behavior slice at a time, verify, then continue.
- Wire the command into local scripts and CI if it protects a critical behavior.
CI hardening loop
- Inventory commands already trusted locally.
- Split fast gates from slow gates.
- Parallelize only isolated jobs.
- Cache dependencies and reusable artifacts.
- Publish logs and artifacts that make failures diagnosable.
- Enforce merge protection only on stable, high-signal jobs.
Helper Scripts
scripts/qa-scan.py: detect stack, runners, CI providers, and likely QA commands.scripts/qa-check.sh: run lint, type, and test commands across common Python, JS, Ruby, and Go repos.scripts/coverage-report.sh: run coverage with configurable thresholds across common runners.
Skill Orchestration
- Use
agentic-developmentwhen repo orientation, architecture choice, or the code-change path itself is the bottleneck. - Use
gh-fix-ciwhen GitHub Actions failures need log retrieval and implementation. - Use security, browser, visual, performance, or cloud-specific skills when the QA problem depends on those systems.
- Use repo-specific build, deploy, or observability skills when the failure depends on that tooling.
Exit Criteria
Do not stop on "likely fixed". Stop on reproduced failure, root-cause explanation, regression protection, fresh verification output, and a clear statement of residual risk if verification is partial.