mobile-verification
Mobile Verification Skill
Comprehensive testing workflow with pass@k metrics for Android development reliability.
Philosophy
Single test runs lie.
A test that passes once might fail tomorrow. Verification loops run tests multiple times to reveal:
- Flaky tests (timing issues, async problems)
- Intermittent failures (resource contention)
- Reliability trends (improving vs degrading)
Pass@k Explained
Pass@k = proportion of test iterations that passed
Pass@3(test) = tests_passed / 3
testLogin(): ✓✓✓ → Pass@3 = 3/3 = 1.0 (100%)
testLogout(): ✓✓✗ → Pass@3 = 2/3 = 0.67 (67%)
testRefresh(): ✗✗✗ → Pass@3 = 0/3 = 0.0 (0%)
Verification Levels
Quick Verification (k=2)
Purpose: Fast feedback during development
Usage: /mobile-verify --k=2
Time: ~2 minutes
When: After small changes, before commit
Standard Verification (k=3)
Purpose: Standard confidence level
Usage: /mobile-verify --k=3
Time: ~5 minutes
When: Before push, after feature complete
Thorough Verification (k=5)
Purpose: High confidence, flaky detection
Usage: /mobile-verify --k=5
Time: ~10 minutes
When: Before release, after refactor
Release Verification (k=10)
Purpose: Maximum confidence
Usage: /mobile-verify --k=10
Time: ~20 minutes
When: Production release, critical bugs
Test Type Strategies
Unit Tests (JUnit)
Characteristics:
- Fast: ~1-2 seconds per test
- Isolated: No Android dependencies
- Reliable: Should be Pass@k = 1.0
Target Pass@k: ≥ 0.95 (95%)
Common Flaky Causes:
- Async operations without proper waiting
- Date/time dependencies
- Random data generation
- Static state leakage
Fix Strategies:
// Bad: Flaky
@Test
fun testLoadData() {
viewModel.loadData()
assert(viewModel.state.value is Loaded)
}
// Good: Stable
@Test
fun testLoadData() = runTest {
viewModel.loadData()
advanceUntilIdle()
assert(viewModel.state.value is Loaded)
}
UI Tests (Espresso)
Characteristics:
- Slow: ~5-10 seconds per test
- Device-dependent: Need emulator/device
- Fragile: UI changes break tests
Target Pass@k: ≥ 0.80 (80%)
Common Flaky Causes:
- Idling resource not registered
- Animation interference
- Screen rotation
- Network timeouts
Fix Strategies:
// Register idling resources
@IdlingResource
val countingIdlingResource = CountingIdlingResource("api")
// Disable animations
@get:Rule
val disableAnimationsRule = DisableAnimationsRule()
Compose Tests
Characteristics:
- Fast: ~1-3 seconds per test
- UI-level: Tests Composable behavior
- Modern: Uses Compose Testing framework
Target Pass@k: ≥ 0.90 (90%)
Common Flaky Causes:
- Recomposition timing
- State hoisting issues
- Animation interference
Fix Strategies:
@Composable
fun TestComposable(content: @Composable () -> Unit) {
CompositionLocalProvider(
LocalInspectionMode provides true
) {
content()
}
}
Verification Workflow
During Development
# 1. Write test
# 2. Quick verify
/mobile-verify --class=NewTest --k=2
# 3. Fix if fails
# 4. Standard verify
/mobile-verify --class=NewTest --k=3
Before Commit
# Verify changed modules only
/mobile-verify --module=$(git diff --name-only | head -1) --k=2
Before Push
# Full verification
/mobile-verify --k=3
Before Release
# Thorough verification with flaky detection
/mobile-verify --k=5 --flaky
Interpreting Results
Pass@k Scores
| Score | Meaning | Action |
|---|---|---|
| 1.0 | Perfect | Celebrate |
| 0.8-0.9 | Excellent | Monitor |
| 0.6-0.7 | Good | Investigate |
| 0.4-0.5 | Fair | Fix needed |
| 0.0-0.3 | Poor | Block release |
Trends
Track pass@k over time:
Week 1: Pass@3 = 0.85
Week 2: Pass@3 = 0.87 ↗ Improving
Week 3: Pass@3 = 0.82 ↘ Degraded - investigate!
Week 4: Pass@3 = 0.88 ↗ Recovered
Flaky Test Patterns
| Pattern | Likely Cause |
|---|---|
| Fails on iteration 1 only | Cold start issue |
| Fails randomly | Async timing |
| Fails on specific iteration | Resource leak |
| Fails in parallel only | Shared state |
Fixing Flaky Tests
Step 1: Identify Pattern
/mobile-verify --flaky --k=10
Look for patterns in failures.
Step 2: Add Diagnostics
@Test
fun flakyTest() = runTest {
val startTime = System.currentTimeMillis()
// ... test code ...
val duration = System.currentTimeMillis() - startTime
Log.d("Test", "Duration: $duration ms") // Check for timing issues
}
Step 3: Apply Fix
Common fixes:
- Add
advanceUntilIdle()for coroutines - Add
IdlingResourcefor network - Disable animations for UI tests
- Use
@UiThreadTestfor main thread work - Add explicit waits for async operations
Step 4: Verify Fix
/mobile-verify --class=FixedTest --k=5
Target: Pass@5 = 1.0
Integration
With Checkpoints
Create checkpoint before verification:
/mobile-checkpoint save pre-verify
/mobile-verify --k=3
With Memory
Track pass@k in memory:
{
"test-coverage": {
"passAt3": 0.87,
"trend": "improving",
"flakyTests": []
}
}
With Instincts
Learn testing patterns:
{
"id": "test-coroutine-async",
"description": "Always use runTest + advanceUntilIdle for ViewModel tests",
"confidence": 0.95
}
Thresholds by Context
| Context | Pass@k Threshold | Rationale |
|---|---|---|
| Unit tests | 0.95 | Should be deterministic |
| UI tests | 0.80 | More fragile, device-dependent |
| Compose tests | 0.90 | Better than Espresso, more stable |
| Integration tests | 0.70 | Complex, more variables |
| E2E tests | 0.60 | Full system, many variables |
Best Practices
- Start High, Go Low: Use k=5 for investigation, k=3 for routine
- Fix Flaky Fast: Don't tolerate flaky tests
- Track Trends: Monitor pass@k over time
- Context Matters: UI tests can have lower thresholds than unit
- Block Release: Failed verification should block releases
Remember: A test that sometimes passes is worse than no test at all. It gives false confidence.