compare-test-case
Installation
SKILL.md
Compare Test Case
Quick Start
You'll typically receive a test case identifier and two branches. Follow these steps:
- Run
tuist test case show <id-or-identifier> --jsonto get the test case metrics. - Run
tuist test case run list <identifier> --jsonto see runs across branches. - Compare behavior between base and head branches.
- Inspect failures with
tuist test case run show <run-id> --json. - Summarize findings with root cause analysis.
Step 1: Resolve the Test Case
By ID or dashboard URL
tuist test case show <test-case-id> --json
By identifier (Module/Suite/TestCase)
tuist test case show Module/Suite/TestCase --json
If no test case is provided
Discover flaky or failing tests to investigate:
tuist test case list --flaky --json --page-size 10
Key fields from the response:
id-- unique identifier for subsequent commandsname,module_name,suite_name-- the test identityreliability_rate-- percentage of successful runsflakiness_rate-- percentage of flaky runs in the last 30 daystotal_runs/failed_runs-- volume contextis_flaky/is_quarantined-- current flags
Step 2: Get Runs on Each Branch
List test case runs filtered by the test case, and look at the git_branch field:
tuist test case run list <identifier> --json --page-size 20
Separate runs by branch. For each branch, compute:
- Pass rate:
passed_runs / total_runs * 100 - Average duration
- Flaky run count
- Most recent status
Defaults
- If no base branch is provided, use the project's default branch (usually
main). - If no head branch is provided, detect the current git branch.
Step 3: Compare Branch Behavior
| Metric | Base branch | Head branch | Verdict |
|---|---|---|---|
| Pass rate | e.g. 100% | e.g. 60% | REGRESSION |
| Avg duration | e.g. 0.5s | e.g. 2.1s | REGRESSION |
| Flaky runs | 0 | 3 | NEW FLAKINESS |
| Last status | success | failure | REGRESSION |
Classify the change:
- Newly failing: 100% pass rate on base, <100% on head
- Newly flaky: No flaky runs on base, flaky runs on head
- Duration regression: >50% increase in average duration
- Fixed: Failing on base, passing on head
- Stable: Same behavior on both branches
Step 4: Inspect Failures
For each failing run on the head branch:
tuist test case run show <test-case-run-id> --json
Examine:
failures[].message-- the assertion or error messagefailures[].path-- source file pathfailures[].line_number-- exact line of failurefailures[].issue_type-- type of issuerepetitions-- shows retry behavior (e.g., pass-fail-pass means flaky)crash_report-- crash data if the test runner crashed
Step 5: Identify Root Cause
Based on the comparison:
Newly failing
- Check commits between base and head branches for changes to the test file or the code under test.
- Look at the failure message for clues about what changed.
Newly flaky
- Common patterns: timing/async issues, shared state, environment dependencies.
- Check if
repetitionsshow intermittent pass/fail patterns. - See the fix-flaky-tests skill for detailed flaky test analysis patterns.
Duration regression
- Check if setup/teardown time increased.
- Check if the test is doing more work (new assertions, larger data sets).
- Check if a dependency became slower.
Summary Format
Produce a summary with:
- Test case info: Name, module, suite, overall reliability.
- Base branch behavior: Pass rate, avg duration, flaky count.
- Head branch behavior: Pass rate, avg duration, flaky count.
- Verdict: What changed and classification.
- Root cause: Hypothesis based on failure analysis.
- Recommendations: Specific file paths, line numbers, and fix suggestions.
Example:
Test Case Comparison: AuthModuleTests/LoginTests/test_login_with_expired_token
Overall reliability: 85% (was 100% before head branch)
Base (main):
Pass rate: 100% (15/15 runs)
Avg duration: 0.3s
Flaky: No
Head (feature/auth-refactor):
Pass rate: 60% (3/5 runs)
Avg duration: 0.5s
Flaky: Yes (2 flaky runs)
Verdict: NEWLY FLAKY -- test was stable on main but intermittently fails on feature branch
Root cause: The auth refactor introduced an async token refresh that races with the
test's synchronous assertion. Failures show "Expected status 401, got nil" at
Tests/AuthModuleTests/LoginTests.swift:42, suggesting the response arrives before
the token refresh completes.
Recommendations:
- Add an await/expectation before the assertion at LoginTests.swift:42
- Consider mocking the token refresh to make the test deterministic
Done Checklist
- Resolved the test case identity
- Gathered runs on both branches
- Compared pass rates, durations, and flakiness
- Inspected failure details for failing runs
- Identified root cause with file paths and line numbers
- Provided actionable fix recommendations