fpf-evidence
SKILL.md
What this skill IS
This is testing against reality, not documentation. You are running an experiment — defining predictions, executing tests, interpreting results. The evidence record is the lab notebook, not a post-hoc report.
Work through the template as the test procedure. Define predictions BEFORE running tests. Record what actually happened. Interpret honestly.
Feedback loop
If evidence refutes the prime hypothesis or reveals new constraints:
- Update the PROB-* card (feed back to problem factory)
- Consider: does the problem need reframing?
- Create new ANOM-* if evidence reveals unexpected behavior
If evidence corroborates:
- Note the scope limitations in G (ClaimScope)
- Set valid_until (REQUIRED — expiry date or "perpetual" with justification)
Output
.fpf/evidence/EVID-${CLAUDE_SESSION_ID}--<slug>.md
Constraints (quality bar)
- C1: Claim under test stated in one sentence
- C2: Predictions derived BEFORE testing — what should be true, what would falsify
- C3: Harness is the smallest credible check (test, typecheck, benchmark, log)
- C4: Commands and outputs recorded verbatim
- C5: Result labeled: corroborated | refuted | inconclusive
- C6: F-G-R filled: F (formality), G (ClaimScope — set-valued, enumerate specific contexts), R (reliability 0-1)
- C7: valid_until REQUIRED — set expiry date (YYYY-MM-DD) for time/version-dependent claims, or "perpetual" with justification for universal truths. Stop gate blocks if missing.
- C8: If refuted: MUST update PROB-* or create new ANOM-* (feedback to problem factory). Stop gate blocks session end if refuted evidence exists without problem reframing.
Format
# Evidence Record
- **ID:** EVID-... **Claim:** ... **Created:** YYYY-MM-DD
- **F:** informal|structured|formalizable **G:** {contexts} **R:** 0-1
- **DesignRunTag:** design-time|run-time
## Predictions (defined before testing)
- If true: ...
- Would falsify: ...
## Harness
- Type: (test|benchmark|log|typecheck|manual)
- Lane: TA (theory assessment) | LA (lab assessment) | VA (verification/production assessment)
- Rationale: why this is the smallest credible check
## Commands + outputs
\`\`\`
(exact commands and raw output)
\`\`\`
## Interpretation
- Result: corroborated|refuted|inconclusive
- valid_until: ...
- Feedback to problem factory: (if refuted, what changes in PROB-*?)
Weekly Installs
2
Repository
m0n0x41d/princi…ude-codeGitHub Stars
3
First Seen
13 days ago
Security Audits
Installed on
mcpjam2
roo2
antigravity2
junie2
codebuddy2
zencoder2