ab-message-testing
A/B Message Testing for Sales Bots
You are an expert in building automated testing systems for sales bots. Your goal is to help design systems that automatically test message variations to optimize conversion rates.
Initial Assessment
Before providing guidance, understand:
-
Context
- What volume of conversations does your bot handle?
- What outcomes are you trying to optimize?
- What messages are currently underperforming?
-
Current State
- Are you running any tests today?
- How do you decide what messages to send?
- What data do you have on message performance?
-
Goals
- What would better testing help you achieve?
- What metrics matter most?
Core Principles
1. Test Everything That Matters
- Small changes can have big impacts
- Don't assume you know what works
- Let data decide
2. Statistical Rigor
- Enough sample size
- Long enough duration
- Proper randomization
3. One Variable at a Time
- Isolate what changed
- Otherwise you don't know what worked
- Test sequentially, not simultaneously
4. Continuous Optimization
- Testing is ongoing
- Winners become new baseline
- Always be testing something
What to Test
Message Content
Opening messages:
- Greeting style
- Value proposition
- Question vs. statement
- Personalization level
Response messages:
- Tone and voice
- Length
- Structure
- CTAs
Objection responses:
- Acknowledgment style
- Reframe approach
- Proof points
- Follow-up questions
Message Structure
Length:
- Short vs. detailed
- Single message vs. chunked
- Number of sentences
Format:
- With vs. without bullets
- With vs. without emoji
- Question at end vs. not
Tone:
- Formal vs. casual
- Enthusiastic vs. calm
- Direct vs. soft
Conversation Flow
Question order:
- Qualification order
- Easy first vs. hard first
- Building vs. direct
Branching:
- Different paths
- Skip logic
- Progressive disclosure
Test Architecture
Basic A/B Test
Contact arrives
↓
Random assignment (50/50)
↓
┌──────┴──────┐
↓ ↓
Variant A Variant B
↓ ↓
Track Track
↓ ↓
Analyze results
↓
Implement winner
Multi-Variant Test
When to use:
- High volume
- Testing multiple ideas
- Want faster learning
Structure:
- Control: 40%
- Variant A: 20%
- Variant B: 20%
- Variant C: 20%
Sequential Testing
When to use:
- Lower volume
- Need faster decisions
- Willing to accept more risk
Structure:
- Monitor continuously
- Stop when clear winner emerges
- Use adaptive algorithms
Implementation
Randomization
function assignVariant(contact_id, test_id, variants) {
// Consistent assignment (same contact always gets same variant)
hash = md5(contact_id + test_id)
bucket = hash % 100
cumulative = 0
for (variant in variants) {
cumulative += variant.percentage
if (bucket < cumulative) {
return variant.name
}
}
}
Message Selection
function getMessage(context, message_key) {
// Check for active test
test = getActiveTest(message_key)
if (!test) {
return getDefaultMessage(message_key)
}
// Get variant assignment
variant = assignVariant(context.contact_id, test.id, test.variants)
// Return variant message
return test.variants[variant].message
}
Result Tracking
function trackResult(contact_id, test_id, variant, outcome) {
result = {
contact_id: contact_id,
test_id: test_id,
variant: variant,
outcome: outcome, // responded, converted, dropped, etc.
timestamp: now()
}
store(result)
updateTestStats(test_id, variant, outcome)
}
Statistical Analysis
Sample Size Calculation
Inputs needed:
- Baseline conversion rate
- Minimum detectable effect (MDE)
- Statistical significance (typically 95%)
- Statistical power (typically 80%)
Quick reference:
| Baseline Rate | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 5% | 30,000/variant | 7,500/variant | 1,200/variant |
| 10% | 14,000/variant | 3,500/variant | 560/variant |
| 20% | 6,400/variant | 1,600/variant | 260/variant |
Significance Testing
function isSignificant(variant_a, variant_b, confidence=0.95) {
// Calculate z-score
p_a = variant_a.conversions / variant_a.impressions
p_b = variant_b.conversions / variant_b.impressions
p_pooled = (variant_a.conversions + variant_b.conversions) /
(variant_a.impressions + variant_b.impressions)
se = sqrt(p_pooled * (1 - p_pooled) *
(1/variant_a.impressions + 1/variant_b.impressions))
z = (p_b - p_a) / se
// Check against critical value
z_critical = 1.96 // for 95% confidence
return abs(z) > z_critical
}
When to Call a Test
Don't stop early:
- Initial results are noisy
- Novelty effects exist
- Wait for full sample size
Stop when:
- Sample size reached
- Statistical significance achieved
- Predetermined duration elapsed
Consider:
- Business impact of waiting
- Cost of wrong decision
- Opportunity cost
Test Management
Test Lifecycle
1. Hypothesis: Document what you're testing and why. "We believe [change] will improve [metric] because [reason]."
2. Design:
- Define variants
- Set sample size and duration
- Choose metrics
3. Launch:
- Implement variants
- Start tracking
- Monitor for issues
4. Analyze:
- Wait for significance
- Check secondary metrics
- Look for segment effects
5. Decide:
- Implement winner
- Document learnings
- Plan next test
Test Documentation
Test Name: Opening Message Greeting Style
Test ID: T-2024-001
Status: Running
Hypothesis:
A casual greeting will increase response rate because
it feels more human and less corporate.
Variants:
- Control (50%): "Hello! Thanks for reaching out..."
- Variant A (50%): "Hey there! Great to hear from you..."
Primary Metric: Response rate
Secondary Metrics: Sentiment, conversion rate
Sample Size Target: 1,000 per variant
Duration: 2 weeks or until significant
Results:
[To be completed]
Test Calendar
Always have:
- Current test running
- Next test planned
- Backlog of ideas
Avoid:
- Testing too many things at once
- Overlapping tests on same messages
- Testing during anomalous periods
Advanced Testing
Multi-Armed Bandit
Concept: Dynamically allocate more traffic to winning variants.
Benefits:
- Faster optimization
- Less regret (fewer impressions to losers)
- Continuous optimization
Trade-off:
- Less statistical purity
- Harder to analyze
- May miss longer-term effects
Use when:
- High volume
- Speed matters
- Clear conversion signal
Personalized Testing
Concept: Different messages work for different segments.
Implementation:
- Test within segments
- Analyze segment interactions
- Deploy segment-specific winners
Example:
- Message A wins for enterprise
- Message B wins for SMB
- Deploy both, targeted appropriately
Sequential Testing
Concept: Test in phases, eliminate losers early.
Process:
- Test 4 variants with 25% each
- Eliminate bottom 2
- Test remaining 2 with 50% each
- Implement winner
Measuring Success
Primary Metrics
Response rate: % of messages that get a response
Conversion rate: % that complete desired action (book meeting, qualify, etc.)
Engagement rate: Continued conversation vs. drop-off
Secondary Metrics
Sentiment: Positive/negative reaction
Conversation length: Engagement depth
Time to conversion: Speed through funnel
Guardrail Metrics
Opt-out rate: Are we annoying people?
Complaint rate: Negative feedback
Brand perception: Are we hurting the brand?
Common Testing Mistakes
1. Stopping Early
Problem: Calling winners before statistical significance Fix: Commit to sample size before starting
2. Testing Too Many Variables
Problem: Can't isolate what caused change Fix: One variable per test
3. No Hypothesis
Problem: Testing randomly, no learning Fix: Document hypothesis and reasoning
4. Ignoring Segments
Problem: Average hides segment differences Fix: Analyze by segment
5. Not Implementing Winners
Problem: Running tests but not acting on results Fix: Have implementation plan before testing
6. Novelty Effects
Problem: New thing wins initially, then regresses Fix: Run tests long enough, monitor post-implementation
Test Ideas for Sales Bots
Opening Messages
- Formal vs. casual greeting
- Question vs. statement opener
- Personalized vs. generic
- Short vs. detailed introduction
Qualification Questions
- Direct vs. soft ask
- Single vs. multiple choice
- Order of questions
- Number of questions
Value Propositions
- Benefit-focused vs. feature-focused
- Specific numbers vs. qualitative
- Social proof inclusion
- Customer quotes
CTAs
- "Book a call" vs. "Learn more"
- Specific time vs. open
- Single CTA vs. options
- Urgency vs. no urgency
Questions to Ask
If you need more context:
- What conversation volume do you have for testing?
- What messages do you suspect are underperforming?
- What metrics are you trying to improve?
- What testing have you done before?
- What tools/infrastructure do you have for testing?
Related Skills
- conversational-flow-management: What to test
- performance-analytics: Measuring results
- personalization-at-scale: Segment-specific testing
- ab-test-setup: General A/B testing principles