measurement-design
Measurement Design
This skill handles CF-05: Measurement Discipline from the release-decision framework.
Its job is to ensure every hypothesis has exactly one primary metric, a small set of guardrails, and an event schema that can produce evidence for the decision.
When to Activate
- Hypothesis exists but no primary metric is defined
- User names multiple competing success metrics
- User mixes goals, proxies, and diagnostics together
- Instrumentation does not exist for the desired outcome
- Project
primaryMetricfield is empty or contains a list
On Entry — Read Current State
Before doing any work, read the project from the database using the project-sync skill's get-project command.
Check these fields:
| Field | Purpose |
|---|---|
hypothesis |
Confirms causal claim exists |
primaryMetric |
Current metric definition (may be empty) |
guardrails |
Existing guardrail metrics |
stage |
Current lifecycle position |
- If
hypothesisis empty → redirect tohypothesis-design - If
primaryMetricis already defined and instrumentation is confirmed → may not need this skill - If
guardrailsalready exist → build on existing rather than overwriting
Core Principle
One primary metric decides the outcome. Everything else is a guardrail or diagnostic.
If two metrics both "decide" success, the decision is not yet sharp enough. Return to hypothesis-design.
Decision Actions
Define the primary metric
Ask: "If this experiment runs for 2 weeks and you had to make a go/no-go decision with ONE number, what is that number?"
The answer is the primary metric. One only.
Define guardrails (2–3 maximum)
Ask: "What other metrics would concern you if they degraded significantly, even if the primary metric improved?"
Common guardrails:
- Error rate / p99 latency for the candidate variant
- User satisfaction score or support ticket volume
- A downstream conversion step after the primary metric
Design the event
For each metric, define:
- Event name (what action fires it)
- Required properties (user_key, session_id, relevant context)
- Where in the user journey it fires
- Whether it needs to be associated with a flag evaluation for experiment analysis
Verify instrumentation completeness
Check: can the current codebase emit this event? If not, instrumentation must be built before exposure begins.
Operating Rules
- Do not allow exposure to start without confirmed instrumentation
- One primary metric only — push back on lists
- Guardrails protect against harm, not success. They should not be optimized for.
- When multiple experiments share a flag or user pool, verify traffic allocation strategy before calculating sample size. Sequential experiments get full traffic; concurrent experiments with mutual exclusion get only their slice. An underpowered experiment due to traffic splitting is worse than waiting for a sequential slot. See
reversible-exposure-control→ multi-experiment-traffic.md for patterns. - Hand off to
reversible-exposure-controlwhen instrumentation is complete - Hand off to
evidence-analysisonce data collection is underway
Persist State
After completing work, use the project-sync skill to persist state to the database:
update-state— save--primaryMetric "<metric>"and--guardrails "<guardrail list>"set-stage— set tomeasuringadd-activity— record what happened, e.g.--type stage_update --title "Metrics defined"
Reference Files
- references/event-schema-design.md — vendor-agnostic event design principles, naming conventions, metric-to-event mapping, anti-patterns
- references/tool-featbit-sdk.md — FeatBit SDK track() usage, experiment event association, sendToExperiment