skills/arabelatso/skills-4-se/regression-consistency-checker

regression-consistency-checker

SKILL.md

Regression Consistency Checker

Check whether a new version of a repository preserves the behavior observed by tests on the old version.

Workflow

1. Prepare Versions

Set up old version:

# Tag or note the old version
git tag old-version

# Or checkout specific commit
git checkout <old-commit-hash>

Set up new version:

# Tag the new version
git tag new-version

# Or checkout new commit
git checkout <new-commit-hash>

Ensure clean environment:

  • Same dependencies installed
  • Same test configuration
  • Same environment variables
  • Deterministic test execution (fix random seeds, mock time)

2. Run Tests on Old Version

Capture baseline results:

# Python (pytest with JSON report)
git checkout old-version
pytest --json-report --json-report-file=old_results.json

# JavaScript (Jest with JSON report)
git checkout old-version
npm test -- --json --outputFile=old_results.json

# Run multiple times to check stability
pytest --json-report --json-report-file=old_results_1.json
pytest --json-report --json-report-file=old_results_2.json
# Compare to ensure deterministic

Verify baseline stability:

  • All tests should pass (or document known failures)
  • Results should be consistent across runs
  • No flaky tests

3. Run Tests on New Version

Capture new results:

# Python
git checkout new-version
pytest --json-report --json-report-file=new_results.json

# JavaScript
git checkout new-version
npm test -- --json --outputFile=new_results.json

Note any immediate failures:

  • Tests that now fail
  • New errors or exceptions
  • Changed behavior

4. Compare Results

Use comparison script:

python scripts/compare_results.py old_results.json new_results.json

# With custom tolerance for floats
python scripts/compare_results.py old_results.json new_results.json --tolerance 0.001

# Save detailed report
python scripts/compare_results.py old_results.json new_results.json --output regression_report.json

Script detects:

  • 🔴 Critical: Tests that passed now fail, missing tests
  • 🟠 High: Different outputs for same inputs
  • 🟡 Medium: Different exception types
  • 🔵 Low: Changed error messages
  • ✅ Improvements: Tests that now pass, bug fixes

5. Analyze Regressions

For each regression, determine:

Is it a true regression?

  • Unintended behavior change
  • Bug introduced
  • Performance degradation
  • Breaking change

Or is it expected?

  • Intentional behavior change
  • Bug fix that changes output
  • Improved error handling
  • Refactoring with equivalent behavior

Review strategies in detection_strategies.md.

6. Investigate Root Causes

For critical regressions:

# Find commits that caused regression
git bisect start
git bisect bad new-version
git bisect good old-version
# Test each commit
git bisect run pytest path/to/failing_test.py

For output differences:

  • Compare function inputs/outputs
  • Check for changed algorithms
  • Verify data transformations
  • Review calculation logic

For exception changes:

  • Check error handling code
  • Verify exception types
  • Review validation logic

7. Document Findings

Create regression report:

REGRESSION ANALYSIS REPORT
==========================

Version Comparison: v1.0.0 → v1.1.0
Date: 2024-01-15
Tests Run: 156

SUMMARY
-------
Critical Regressions: 2
High Severity: 5
Medium Severity: 3
Low Severity: 8
Improvements: 4
Unchanged: 134

CRITICAL REGRESSIONS
--------------------
1. test_user_authentication
   - Status: PASS → FAIL
   - Error: KeyError: 'user_id'
   - Root Cause: Removed field from response
   - Action: Restore field or update API contract

2. test_payment_processing
   - Status: PASS → FAIL
   - Error: AssertionError: expected 100.00, got 100.01
   - Root Cause: Rounding change in calculation
   - Action: Fix rounding logic

HIGH SEVERITY REGRESSIONS
--------------------------
1. test_data_export
   - Output changed: CSV format → JSON format
   - Impact: Breaking change for consumers
   - Action: Maintain backward compatibility

[... continue for all regressions ...]

EXPECTED CHANGES
----------------
1. test_error_messages
   - Error messages now include more context
   - Intentional improvement
   - Action: Update baseline

RECOMMENDATIONS
---------------
1. Fix critical regressions before release
2. Review high severity changes with team
3. Document breaking changes in changelog
4. Update tests for intentional changes

8. Fix or Accept Changes

Fix true regressions:

# Fix the code
git checkout new-version
# Make fixes
git commit -m "Fix: regression in user authentication"

# Re-run tests
pytest --json-report --json-report-file=fixed_results.json

# Verify fix
python scripts/compare_results.py old_results.json fixed_results.json

Accept intentional changes:

# Update baseline
cp new_results.json baseline_results.json

# Document in changelog
echo "- Changed: CSV export now returns JSON" >> CHANGELOG.md

Quick Reference

Regression Types

Output Regressions:

  • Function returns different values
  • Data format changes
  • Calculation differences

Exception Regressions:

  • New exceptions raised
  • Different exception types
  • Changed error messages

State Regressions:

  • Different database state
  • Different files created
  • Different side effects

Performance Regressions:

  • Slower execution
  • Higher memory usage
  • More API calls

Severity Levels

Critical (block release):

  • Test passed → failed
  • Data corruption
  • Security issues
  • Crashes

High (fix before release):

  • Wrong outputs
  • Breaking API changes
  • Major performance degradation (>2x)

Medium (review and decide):

  • Minor output changes
  • Moderate performance degradation (50-100%)
  • Changed error messages

Low (document):

  • Cosmetic changes
  • Minor performance changes (<50%)
  • Log message changes

Comparison Strategies

Exact comparison:

old_output == new_output

Approximate comparison (floats):

abs(old_output - new_output) < tolerance

Structural comparison (ignore fields):

# Ignore timestamps, IDs
compare_ignoring_fields(old, new, ['timestamp', 'id'])

Semantic comparison (order-independent):

# Compare as sets
set(old_list) == set(new_list)

Helper Script

The compare_results.py script automates comparison:

# Basic comparison
python scripts/compare_results.py old_results.json new_results.json

# Custom float tolerance
python scripts/compare_results.py old_results.json new_results.json --tolerance 0.001

# Save detailed report
python scripts/compare_results.py old_results.json new_results.json --output report.json

Supported formats:

  • pytest JSON report
  • Jest JSON report
  • Generic JSON format

Output includes:

  • Categorized regressions by severity
  • Specific test failures
  • Output diffs
  • Exception changes
  • Improvements

Best Practices

Ensure deterministic tests:

  • Fix random seeds
  • Mock current time
  • Mock external APIs
  • Sort non-deterministic outputs

Run multiple times:

  • Verify baseline stability
  • Catch flaky tests
  • Ensure reproducibility

Isolate changes:

  • Test one change at a time
  • Use git bisect for root cause
  • Compare specific commits

Document expectations:

  • Maintain changelog
  • Note intentional changes
  • Update test baselines

Automate checks:

  • Run in CI/CD pipeline
  • Block on critical regressions
  • Generate reports automatically
Weekly Installs
1
GitHub Stars
47
First Seen
12 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1