evidence-analysis
Evidence Analysis
This skill handles CF-06: Evidence Sufficiency and CF-07: Decision Framing from the release-decision framework.
CF-06 and CF-07 are handled together because they represent a continuous decision: first determine if evidence is sufficient, then frame what the evidence says.
When to Activate
- Data is being collected and the user wants to know whether to decide now
- The user is impatient to interpret weak or early evidence
- Results exist and a go/no-go decision is needed
- Project stage is
measuringordeciding
On Entry — Read Current State
Before doing any work, read the project from the database using the project-sync skill's get-project command.
Check these fields:
| Field | Purpose |
|---|---|
primaryMetric |
The metric that decides the outcome |
guardrails |
Metrics that must not degrade |
hypothesis |
The causal claim being tested |
stage |
Current lifecycle position |
experiments |
Existing experiment records and their status |
- If
primaryMetricis empty → redirect tomeasurement-design - If
stageisdeciding→ a decision may already exist; check experiment records before re-analyzing - If experiment records already have a
decisionfield → may only need to review, not re-decide
Decision Actions
Evidence sufficiency check (CF-06 first)
Before interpreting results, confirm:
- Simultaneous? — Are both variants measured over the same time window?
- Sufficient volume? — Sample per variant ≥
minimumSamplein the experiment record. If below this floor, the Gaussian approximation is unreliable — do not interpret P(win) or risk values yet. - Risk has had a chance to converge? — Read the experiment's
analysisResultand check thatrisk[trt]andrisk[ctrl]are not both still very high (> 0.02). If both are high, the posterior is still wide — more data is needed regardless of what P(win) shows. - Clean window? — Were there external events (promotions, outages, holidays) that could contaminate the data?
- Instrumentation verified? — Are events firing correctly for both variants?
- SRM check passed? —
analysisResultincludes a χ² SRM check. If it flags an imbalance (p < 0.01), do not interpret metric results until the traffic split issue is resolved.
If any check fails, the right move is NOT to decide — it is to wait, fix, or extend.
Decision framing (CF-07)
Once evidence is sufficient, read the experiment's analysisResult and frame the outcome using exactly one of these categories:
- CONTINUE — Primary metric P(win) ≥ 95% and risk[trt] is low. Guardrail P(win) all > 20%. Proceed with planned expansion.
- PAUSE — Primary metric P(win) 80–95%, or a guardrail P(win) ≤ 20%, or SRM check failed. Signal exists but is not clean enough to expand. Investigate before proceeding.
- ROLLBACK CANDIDATE — A guardrail P(win) ≤ 5%, or primary metric P(win) ≤ 5%. Evidence of harm. Flag should be reverted.
- INCONCLUSIVE — Sample below validity floor, or risk[trt] and risk[ctrl] both still high, or primary metric P(win) 20–80% after a full observation window. Extend window or revisit instrumentation.
See references/decision-framing-guide.md for how to write each category's decision statement and what counts as "low" for risk values.
Produce the decision artifact
Write a structured decision statement with:
- The recommendation category
- The evidence that supports it (numbers, not vague descriptions)
- The link back to the original hypothesis
- The explicit next action
Operating Rules
- Do not let urgency substitute for evidence
- "Not enough data" is a valid and honest decision frame — do not dress it up when the real issue is impatience
- Separate "we don't know yet" from "we know it's harmful"
- Hand off to
learning-captureimmediately after the decision is made
Persist State
After completing work, use the project-sync skill to persist state to the database:
update-state— save--lastAction "Decision: <category>"set-stage— set todecidingupsert-experiment— save--decision <category> --decisionSummary "plain-language action" --decisionReason "technical rationale with data"add-activity— record what happened, e.g.--type decision --title "Decision: <category>"
Reference Files
- references/decision-framing-guide.md — CONTINUE/PAUSE/ROLLBACK CANDIDATE/INCONCLUSIVE language, decision statement template, common framing mistakes
- references/tool-featbit-abtesting.md — FeatBit experiment dashboard, reading per-variant results, confidence interpretation