Regression Consistency Checker

Check whether a new version of a repository preserves the behavior observed by tests on the old version.

Workflow

1. Prepare Versions

Set up old version:

# Tag or note the old version
git tag old-version

# Or checkout specific commit
git checkout <old-commit-hash>

Set up new version:

# Tag the new version
git tag new-version

# Or checkout new commit
git checkout <new-commit-hash>

Ensure clean environment:

Same dependencies installed
Same test configuration
Same environment variables
Deterministic test execution (fix random seeds, mock time)

2. Run Tests on Old Version

Capture baseline results:

# Python (pytest with JSON report)
git checkout old-version
pytest --json-report --json-report-file=old_results.json

# JavaScript (Jest with JSON report)
git checkout old-version
npm test -- --json --outputFile=old_results.json

# Run multiple times to check stability
pytest --json-report --json-report-file=old_results_1.json
pytest --json-report --json-report-file=old_results_2.json
# Compare to ensure deterministic

Verify baseline stability:

All tests should pass (or document known failures)
Results should be consistent across runs
No flaky tests

3. Run Tests on New Version

Capture new results:

# Python
git checkout new-version
pytest --json-report --json-report-file=new_results.json

# JavaScript
git checkout new-version
npm test -- --json --outputFile=new_results.json

Note any immediate failures:

Tests that now fail
New errors or exceptions
Changed behavior

4. Compare Results

Use comparison script:

python scripts/compare_results.py old_results.json new_results.json

# With custom tolerance for floats
python scripts/compare_results.py old_results.json new_results.json --tolerance 0.001

# Save detailed report
python scripts/compare_results.py old_results.json new_results.json --output regression_report.json

Script detects:

🔴 Critical: Tests that passed now fail, missing tests
🟠 High: Different outputs for same inputs
🟡 Medium: Different exception types
🔵 Low: Changed error messages
✅ Improvements: Tests that now pass, bug fixes

5. Analyze Regressions

For each regression, determine:

Is it a true regression?

Unintended behavior change
Bug introduced
Performance degradation
Breaking change

Or is it expected?

Intentional behavior change
Bug fix that changes output
Improved error handling
Refactoring with equivalent behavior

Review strategies in detection_strategies.md.

6. Investigate Root Causes

For critical regressions:

# Find commits that caused regression
git bisect start
git bisect bad new-version
git bisect good old-version
# Test each commit
git bisect run pytest path/to/failing_test.py

For output differences:

Compare function inputs/outputs
Check for changed algorithms
Verify data transformations
Review calculation logic

For exception changes:

Check error handling code
Verify exception types
Review validation logic

7. Document Findings

Create regression report:

REGRESSION ANALYSIS REPORT
==========================

Version Comparison: v1.0.0 → v1.1.0
Date: 2024-01-15
Tests Run: 156

SUMMARY
-------
Critical Regressions: 2
High Severity: 5
Medium Severity: 3
Low Severity: 8
Improvements: 4
Unchanged: 134

CRITICAL REGRESSIONS
--------------------
1. test_user_authentication
   - Status: PASS → FAIL
   - Error: KeyError: 'user_id'
   - Root Cause: Removed field from response
   - Action: Restore field or update API contract

2. test_payment_processing
   - Status: PASS → FAIL
   - Error: AssertionError: expected 100.00, got 100.01
   - Root Cause: Rounding change in calculation
   - Action: Fix rounding logic

HIGH SEVERITY REGRESSIONS
--------------------------
1. test_data_export
   - Output changed: CSV format → JSON format
   - Impact: Breaking change for consumers
   - Action: Maintain backward compatibility

[... continue for all regressions ...]

EXPECTED CHANGES
----------------
1. test_error_messages
   - Error messages now include more context
   - Intentional improvement
   - Action: Update baseline

RECOMMENDATIONS
---------------
1. Fix critical regressions before release
2. Review high severity changes with team
3. Document breaking changes in changelog
4. Update tests for intentional changes

8. Fix or Accept Changes

Fix true regressions:

# Fix the code
git checkout new-version
# Make fixes
git commit -m "Fix: regression in user authentication"

# Re-run tests
pytest --json-report --json-report-file=fixed_results.json

# Verify fix
python scripts/compare_results.py old_results.json fixed_results.json

Accept intentional changes:

# Update baseline
cp new_results.json baseline_results.json

# Document in changelog
echo "- Changed: CSV export now returns JSON" >> CHANGELOG.md

Quick Reference

Regression Types

Output Regressions:

Function returns different values
Data format changes
Calculation differences

Exception Regressions:

New exceptions raised
Different exception types
Changed error messages

State Regressions:

Different database state
Different files created
Different side effects

Performance Regressions:

Slower execution
Higher memory usage
More API calls

Severity Levels

Critical (block release):

Test passed → failed
Data corruption
Security issues
Crashes

High (fix before release):

Wrong outputs
Breaking API changes
Major performance degradation (>2x)

Medium (review and decide):

Minor output changes
Moderate performance degradation (50-100%)
Changed error messages

Low (document):

Cosmetic changes
Minor performance changes (<50%)
Log message changes

Comparison Strategies

Exact comparison:

old_output == new_output

Approximate comparison (floats):

abs(old_output - new_output) < tolerance

Structural comparison (ignore fields):

# Ignore timestamps, IDs
compare_ignoring_fields(old, new, ['timestamp', 'id'])

Semantic comparison (order-independent):

# Compare as sets
set(old_list) == set(new_list)

Helper Script

The compare_results.py script automates comparison:

# Basic comparison
python scripts/compare_results.py old_results.json new_results.json

# Custom float tolerance
python scripts/compare_results.py old_results.json new_results.json --tolerance 0.001

# Save detailed report
python scripts/compare_results.py old_results.json new_results.json --output report.json

Supported formats:

pytest JSON report
Jest JSON report
Generic JSON format

Output includes:

Categorized regressions by severity
Specific test failures
Output diffs
Exception changes
Improvements

Best Practices

Ensure deterministic tests:

Fix random seeds
Mock current time
Mock external APIs
Sort non-deterministic outputs

Run multiple times:

Verify baseline stability
Catch flaky tests
Ensure reproducibility

Isolate changes:

Test one change at a time
Use git bisect for root cause
Compare specific commits

Document expectations:

Maintain changelog
Note intentional changes
Update test baselines

Automate checks:

Run in CI/CD pipeline
Block on critical regressions
Generate reports automatically

regression-consistency-checker