launchdarkly-experiment-setup
LaunchDarkly Experiment Setup
You're using a skill that will guide you through setting up and running experiments in LaunchDarkly. Your job is to design the experiment, create it with the right metrics and treatments, start data collection, and verify it's running.
Prerequisites
This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment.
Required MCP tools:
create-experiment-- create a new experiment with metrics and treatmentsstart-experiment-iteration-- begin collecting data for the experimentget-experiment-- check experiment status and configuration
Optional MCP tools:
list-experiments-- browse existing experiments in the projectupdate-experiment-- modify experiment name or descriptioncreate-metric-- create metrics if they don't exist yetlist-metrics-- browse available metrics
Core Concepts
What Are Experiments?
Experiments in LaunchDarkly let you measure the impact of feature flag variations on key metrics. An experiment consists of:
- Treatments: The flag variations being compared (control vs. test)
- Metrics: What you're measuring (conversion rate, latency, revenue, etc.)
- Iterations: Data collection periods — start an iteration to begin collecting data
- Holdout (optional): A percentage of traffic excluded from the experiment for baseline measurement
Experiment Lifecycle
- Create the experiment with metrics and treatments
- Start an iteration to begin data collection
- Monitor results as data accumulates
- Stop the iteration when you have statistical significance
- Ship the winning variation
Core Principles
- Metrics First: Ensure your metrics exist before creating the experiment
- Clear Hypothesis: Know what you expect to improve and by how much
- Proper Controls: Always include a control treatment (the current behavior)
- Sufficient Sample Size: Let experiments run long enough for statistical significance
- One Change at a Time: Test one variable per experiment for clear attribution
Workflow
Step 1: Prepare Metrics
Before creating an experiment, ensure the metrics you want to measure exist:
- Use
list-metricsto check for existing metrics - If needed, use
create-metricto create new ones - Note the metric keys — you'll need them for the experiment
Common metric types:
| Goal | Metric Type | Example |
|---|---|---|
| Conversion | Custom conversion | checkout-completed |
| Performance | Custom numeric | page-load-time-ms |
| Engagement | Custom conversion | feature-clicked |
| Revenue | Custom numeric | order-value |
Step 2: Create the Experiment
Use create-experiment with:
projectKeyandenvironmentKey-- where to run the experimentname-- descriptive name for the experimentflagKey-- the feature flag being experimented onmetrics-- array of metric objects withkeyandisGroupfieldstreatments-- array of treatments, each with aname,baselineflag, andparametersholdout(optional) -- percentage of traffic to exclude
{
"projectKey": "my-project",
"environmentKey": "production",
"name": "Checkout Flow v2 Experiment",
"flagKey": "checkout-flow-v2",
"metrics": [
{"key": "checkout-completed", "isGroup": false},
{"key": "checkout-time-seconds", "isGroup": false}
],
"treatments": [
{
"name": "Control",
"baseline": true,
"parameters": {
"flagKey": "checkout-flow-v2",
"variationId": "variation-a-id"
}
},
{
"name": "New Checkout",
"baseline": false,
"parameters": {
"flagKey": "checkout-flow-v2",
"variationId": "variation-b-id"
}
}
]
}
Step 3: Start Data Collection
Use start-experiment-iteration to begin collecting data:
{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment"
}
Optionally set reshuffle: true to redistribute traffic across treatments.
Step 4: Verify
- Use
get-experimentto confirm the experiment is running - Check that all treatments are listed correctly
- Verify metrics are attached
- Confirm the iteration status shows as active
Report results:
- Experiment created and iteration started
- N treatments with M metrics configured
- Data collection is active
Edge Cases
| Situation | Action |
|---|---|
| Metric doesn't exist | Create it first with create-metric |
| Flag has no variations | Create flag variations before setting up treatments |
| Experiment already exists | Use list-experiments to find it, then get-experiment for details |
| Need to change metrics mid-experiment | Stop the current iteration, update, then start a new one |
What NOT to Do
- Don't start an experiment without clearly defined metrics
- Don't stop experiments too early — wait for statistical significance
- Don't run multiple experiments on the same flag simultaneously without careful holdout design
- Don't forget to set a baseline treatment — one treatment must be marked
baseline: true
More from launchdarkly/agent-skills
onboarding
Onboard a project to LaunchDarkly: kickoff roadmap, resumable log, explore repo, MCP, companion flag skills, nested SDK install (detect/plan/apply), first flag. Use when adding LaunchDarkly, setting up or integrating feature flags in a project, SDK integration, or 'onboard me'.
607launchdarkly-flag-discovery
Audit your LaunchDarkly feature flags to understand the landscape, find stale or launched flags, and assess removal readiness. Use when the user asks about flag debt, stale flags, cleanup candidates, flag health, or wants to understand their flag inventory.
606launchdarkly-flag-cleanup
Safely remove a feature flag from code while preserving production behavior. Use when the user wants to remove a flag from code, delete flag references, or create a PR that hardcodes the winning variation after a rollout is complete.
606launchdarkly-flag-create
Create and configure LaunchDarkly feature flags in a way that fits the existing codebase. Use when the user wants to create a new flag, wrap code in a flag, add a feature toggle, or set up an experiment. Guides exploration of existing patterns before creating.
588launchdarkly-flag-targeting
Control LaunchDarkly feature flag targeting including toggling flags on/off, percentage rollouts, targeting rules, individual targets, and copying flag configurations between environments. Use when the user wants to change who sees a flag, roll out to a percentage, add targeting rules, or promote config between environments.
586aiconfig-tools
Give your AI agents capabilities through tools (function calling). Helps you identify what your AI needs to do, create tool definitions, and attach them to AI Config variations.
538