test-guided-bug-detector

Installation

SKILL.md

Test-Guided Bug Detector

When tests fail, the failure set itself is a signal. One failure tells you where to look; the pattern across many failures tells you what kind of thing broke.

Step 1 — Triage by failure shape

Failure pattern	Most likely cause	First move
One test fails	Localized bug in the code that test covers	Read the assertion; → `bug-localization`
Many tests fail with the same error	Shared dependency broke (fixture, helper, import)	Find the shared thing — not the individual tests
Many tests fail with different errors	Environment/infra (DB down, fixture not loading)	Check setup/teardown logs, not test bodies
All tests in one file fail	Module-level import/fixture in that file	Check the file's top-level, not the tests
Tests fail only in CI, not locally	Env difference: version, path, timezone, locale, parallelism	Diff CI env vs local env, not the code
Tests fail only when run together	Test pollution — one test mutates shared state	Bisect the test order; find the polluter
Same tests intermittently fail	Flake — timing, network, randomness	Do NOT chase the code — stabilize the test

Step 2 — Spectrum-based fault localization (when many tests fail)

The classic move: code executed by failing tests but not by passing tests is suspicious.

Run with coverage. Record which lines each test hits.
For each line, compute suspiciousness (Ochiai): fail_hits / sqrt(total_fails × (fail_hits + pass_hits))
Sort descending. The top lines are where the fault probably lives.

This is mechanical but surprisingly effective. You need ≥3 failing and ≥3 passing tests for the signal to separate from noise.

Step 3 — Cluster failures

Before debugging, group failures that share a root cause. Debugging 20 failures that are secretly 1 bug is 19× wasted effort.

Cluster by, in order:

Identical exception type + message → almost certainly same bug
Identical top-of-stack frame → very likely same bug
Same file-under-test → likely
Same fixture used → possible (if the fixture is the bug)

Pick the largest cluster. Fix it. Re-run. Repeat.

Worked example

Input: 47 tests failing after a merge.

Triage:

41 fail with KeyError: 'tenant_id' → same error → one cluster
5 fail with various assertion mismatches in test_billing.py → file-local → one cluster
1 fails with ConnectionRefused → infra → ignore for now

Cluster 1 (41 tests): All 41 use @with_authenticated_user fixture. Fixture source: creates a User dict. Grep the diff: tenant_id was added as a required field in User.__init__ but the fixture wasn't updated.

Root cause: One line in conftest.py. 41 failures → 1 bug.

Cluster 2 (5 tests): After fixing cluster 1, re-run. 3 of the 5 now pass (they were also blocked by the fixture). 2 remain. Both assert on a dollar amount that's off by exactly the tax rate. The merge also changed tax calculation.

47 → 2 root causes.

Edge cases

Every single test fails: Your test runner is broken, not your code. Import error at conftest.py/setup.py level, or the test DB didn't come up.
The failing assertion is True == True or similar tautology: The test itself is broken — pytest collected an accidentally-named non-test function, or someone committed a assert True # TODO placeholder.
New tests fail, old tests pass: The new tests might be wrong. Don't assume the product code is at fault until you've read the new test.

Do not

Do not debug failures one at a time without clustering first. You will fix the same bug five times.
Do not assume the test is right. The test is a claim; verify the claim against spec before chasing the code.
Do not re-run a flaky test until it passes and call it fixed. Mark it, quarantine it, move on — but don't lie to yourself.
Do not skip spectrum analysis because it sounds fancy. It's three lines of script and it cuts search time in half.

Output format

## Clusters
1. <N> failures — <shared root: exception/fixture/file>
   Suspected fault: <file:line>  (<how you narrowed it>)
2. ...

## Recommended order
Fix cluster <N> first (<reason: biggest / blocks others / fastest>)

## Quarantine
- <test name>: flaky, <mechanism> — do not chase

Related skills

More from santosomar/general-secure-coding-agent-skills

Installs

Repository

santosomar/gene…t-skills

GitHub Stars

First Seen

Mar 29, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

test-guided-bug-detector

Test-Guided Bug Detector

Step 1 — Triage by failure shape

Step 2 — Spectrum-based fault localization (when many tests fail)

Step 3 — Cluster failures

Worked example

Edge cases

Do not

Output format

More from santosomar/general-secure-coding-agent-skills

code-review-assistant

dependency-resolver

configuration-generator

ci-pipeline-synthesizer

api-design-assistant

code-refactoring-assistant