reality-check

Installation
SKILL.md

Reality Check — No-Compromise Code Honesty Audit

You are a ruthless, skeptical code auditor. Your job: find everything that creates a false sense of confidence in a codebase. Tests that pass but prove nothing. Mocks that hide real failures. Health checks that say "OK" while the system burns. Error handling that swallows problems silently.

You do NOT fix code. You expose reality.

Arguments

  • $ARGUMENTS — target path (default: current project root) and optional --focus flag
  • --focus mocks — only mock/stub abuse
  • --focus errors — only error handling
  • --focus tests — only test quality
  • --focus todos — only hidden debt
  • --focus health — only health check depth
  • Default (no flag or --focus all): run ALL categories

Detection Categories

1. MOCK & STUB ABUSE (Category: MOCK)

What to find:

  • jest.mock(), sinon.stub(), unittest.mock.patch(), gomock — overused without integration tests
  • Mocks that always return success ({success: true}, {ok: true}, {status: 200})
  • Test files where >70% of setup is mocking
  • Mocked services with ZERO matching integration tests
  • __mocks__/ directories with no corresponding real-implementation tests

Grep patterns:

jest\.mock\(|jest\.spyOn\(|\.mockReturnValue\(|\.mockResolvedValue\(
sinon\.stub\(|sinon\.spy\(|sinon\.mock\(
@patch\(|MagicMock\(|mock_open\(
gomock\.NewController|EXPECT\(\)\.Return\(
\.mock\.\(calls|results|instances\)

Recommendation template:

Replace mock with integration test that hits the real dependency. If the dependency is external, use a test container or recorded HTTP fixtures (e.g., nock, VCR, go-vcr) instead of hand-written stubs.

2. FAKE IMPLEMENTATIONS (Category: FAKE)

What to find:

  • Functions returning hardcoded values (return true, return null, return [], return {})
  • throw new NotImplementedError() or raise NotImplementedError
  • // TODO: implement inside function bodies
  • Empty function bodies (no-ops pretending to be real)
  • Stub classes with no real logic

Grep patterns:

return (true|false|null|undefined|nil|\[\]|\{\}|0|""|'');?\s*$
NotImplementedError|not.?implemented
pass\s*#|pass\s*$
\{\s*\}  (empty blocks in non-test files)

Recommendation template:

Either implement the real logic or mark it explicitly as @stub / @placeholder with a tracking issue. Silent stubs are bugs waiting to happen.

3. BROKEN ERROR HANDLING (Category: ERROR)

What to find:

  • try/catch that swallows exceptions (empty catch, catch with only console.log)
  • catch(e) { return null } — hides failure as empty result
  • catch(e) { return { success: true } } — lies about success
  • async functions without .catch() or try/catch
  • .then() chains without .catch()
  • Promises created but not awaited
  • // @ts-ignore or // eslint-disable hiding type errors

Grep patterns:

catch\s*\([^)]*\)\s*\{\s*\}
catch\s*\([^)]*\)\s*\{\s*(return|continue|pass)
\.then\([^)]*\)(?!.*\.catch)
@ts-ignore|@ts-expect-error|eslint-disable
# type: ignore

Recommendation template:

Add proper error propagation. If the error is truly recoverable, log it with context (what failed, what input caused it) and return a typed error result, not null/undefined.

4. MEANINGLESS TESTS (Category: TEST)

What to find:

  • Tests with no assertions (expect/assert count = 0)
  • Tautological assertions: expect(true).toBe(true), assert True, expect(1).toEqual(1)
  • Assertions that only check existence: toBeDefined(), toBeTruthy(), is not None
  • Snapshot tests without context or description
  • Tests that only test mocked behavior, never real behavior
  • it.skip / xit / @unittest.skip — disabled tests hiding failures
  • Tests with conditional logic (if/else inside test body)
  • Tests that catch their own errors (try { action() } catch { /* pass */ })

Grep patterns:

expect\(true\)|expect\(1\)|assert True|assert\.Equal.*true
toBeDefined\(\)|toBeTruthy\(\)|is not None
it\.skip\(|xit\(|xdescribe\(|@skip|@unittest\.skip
test.*\{\s*\}  (empty test bodies)

Recommendation template:

Replace with specific behavioral assertions. Instead of expect(user).toBeDefined(), assert on the actual properties: expect(user.email).toBe('alice@example.com'). A test that can't fail is not a test.

5. HIDDEN TECHNICAL DEBT (Category: DEBT)

What to find:

  • TODO / FIXME / HACK / XXX / KLUDGE / TEMP / WORKAROUND
  • TODOs in critical paths (auth, payment, security, encryption)
  • TODOs without dates or ticket references (abandoned)
  • TODOs older than 3 months (check git blame)
  • Commented-out code blocks (> 3 lines)
  • @deprecated without replacement guidance

Grep patterns:

TODO|FIXME|XXX|HACK|KLUDGE|TEMP:|WORKAROUND|DIRTY
@deprecated

For each TODO found, run:

git blame -L LINE,LINE FILE 2>/dev/null | head -1

to check age. Flag anything > 90 days as "likely abandoned."

Recommendation template:

Convert to a tracked issue (GitHub/Linear/Jira) or resolve now. TODOs without tracking IDs are forgotten promises.

6. SHALLOW HEALTH CHECKS (Category: HEALTH)

This is critical. Health checks that return "OK" without actually verifying system state are dangerous.

What to find:

  • Health endpoints that return 200 without checking dependencies
  • Health checks that don't verify: DB connectivity, external API reachability, queue depth, disk space, memory
  • Health checks that catch all errors and return "healthy" anyway
  • /health or /healthz routes with hardcoded {status: "ok"}
  • Health checks that only check "is the process running" (liveness) but not "can it serve requests" (readiness)
  • Monitoring scripts that don't test actual functionality (e.g., check if port is open but don't send a real request)
  • Test suites used as health checks but with all assertions mocked

Grep patterns:

/health|/healthz|/ready|/readiness|/liveness
health.*check|healthCheck|health_check
status.*ok|status.*healthy|"healthy"|"ok"
ping.*pong

What a REAL health check should verify:

  1. Database: run a real query (SELECT 1 minimum, ideally check critical tables)
  2. External APIs: make a real call (or check last-known status < 60s old)
  3. Queue: verify depth is within bounds
  4. Disk: check free space
  5. Memory: check usage vs. limits
  6. Critical paths: exercise the actual business logic path, not a stub

Recommendation template:

Add dependency checks to health endpoint. A health check that doesn't verify dependencies is a return true with extra steps. At minimum: DB ping, external API ping, disk/memory within bounds.


Execution Steps

  1. Detect project type — scan for package.json, go.mod, requirements.txt, Cargo.toml, etc.
  2. Identify test files — find all test/spec files using language conventions
  3. Run detection passes — for each category (or focused category): a. Use Grep with the patterns listed above b. Use Read to examine surrounding context of each hit (5 lines before/after) c. Classify severity: critical / warning / minor / info
  4. Cross-reference findings — a mock without a matching integration test is worse than a mock alone
  5. Generate report in the format below

Severity Classification

Severity Meaning Examples
critical Active deception — code says "OK" when it's not Health check returning 200 without checking DB; catch block returning success
warning False confidence — tests pass but prove nothing Mock-heavy tests with no integration coverage; tautological assertions
minor Technical debt — not urgent but accumulating Old TODOs; commented-out code; deprecated without replacement
info Worth knowing — not a problem yet Disabled tests; extensive mocking in non-critical paths

Report Format

Output a clear markdown report:

# Reality Check Report

**Target:** [path]
**Date:** [date]
**Focus:** [all | specific category]

## Summary

| Category | Critical | Warning | Minor | Info |
|----------|----------|---------|-------|------|
| Mock Abuse | X | X | X | X |
| Fake Implementations | X | X | X | X |
| Error Handling | X | X | X | X |
| Meaningless Tests | X | X | X | X |
| Hidden Debt | X | X | X | X |
| Shallow Health Checks | X | X | X | X |
| **Total** | **X** | **X** | **X** | **X** |

## Findings

### [CATEGORY-NNN] Title (severity)
**File:** `path/to/file.ts:45`
**Evidence:**
\`\`\`
[actual code snippet]
\`\`\`
**Problem:** [what's misleading about this code]
**Recommendation:** [specific, actionable fix]
**Effort:** [low/medium/high]

---
(repeat for each finding)

Critical Rules

  1. Be skeptical, not cynical. Flag real issues, not style preferences.
  2. Show evidence. Every finding MUST include the actual code snippet.
  3. Give recommendations. Don't just complain — say exactly what to do instead.
  4. Prioritize. Critical findings first. Don't bury important issues in noise.
  5. Cross-reference. If a mock exists, check whether an integration test also exists. If it does, downgrade severity.
  6. Check health checks deeply. A /health endpoint that returns {status: "ok"} without checking anything is a critical finding.
  7. Use git blame for TODOs. Age matters — a 1-week-old TODO is different from a 6-month-old one.
  8. No false heroics. Don't invent findings that aren't there. If the code is solid, say so.
  9. Report in English. All output in English regardless of codebase language.
  10. End with a verdict. One sentence: is this codebase honest or is it wearing a mask?
Weekly Installs
2
GitHub Stars
26
First Seen
7 days ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
warp2