counterexample-generator
Counterexample Generator
Systematically generate counterexamples that demonstrate why verification fails. Provides concrete input values, execution traces, and diagnostic information to help understand and fix specification or implementation issues.
Core Capabilities
1. Precondition Violation Detection
Generate inputs that violate preconditions:
- Invalid parameter values - Out-of-range inputs
- Null/undefined references - Missing required objects
- State violations - Invalid object states
- Type mismatches - Incorrect types
- Constraint violations - Broken business rules
2. Postcondition Failure Analysis
Find executions where postconditions fail:
- Return value violations - Wrong return values
- State inconsistencies - Incorrect final states
- Invariant breaks - Invariants violated after execution
- Side effect errors - Unexpected modifications
- Exception failures - Wrong or missing exceptions
3. Invariant Violation Discovery
Identify cases where invariants break:
- Class invariant violations - Object consistency broken
- Loop invariant violations - Invariant not maintained
- Data structure violations - Consistency broken
- Temporal violations - Time-based properties fail
- Concurrency violations - Race conditions exposed
4. Execution Trace Generation
Produce detailed execution paths:
- Step-by-step traces - Line-by-line execution
- State snapshots - Variable values at each step
- Branch decisions - Which paths taken
- Call stacks - Function call hierarchy
- Symbolic execution - Constraint-based paths
Counterexample Generation Workflow
Step 1: Identify the Failure
Understand what verification failed:
From verification tools:
Verification failed: Postcondition violated
Function: withdraw
Line: 15
Property: account.balance == old(account.balance) - amount
From assertions:
AssertionError: assert result >= 0
in function: sqrt
with input: x = -1
From test failures:
Test failed: test_binary_search
Expected: 2
Actual: -1
Analysis questions:
- What property failed?
- Where did it fail?
- What was being verified?
- What are the inputs/state?
Step 2: Analyze the Specification
Understand what should be true:
Example specification:
def withdraw(account: Account, amount: float) -> float:
"""
Preconditions:
- account is not None
- amount > 0
- account.balance >= amount
Postconditions:
- account.balance == old(account.balance) - amount
- result == account.balance
- account.balance >= 0
"""
Identify constraints:
- Precondition:
account.balance >= amount - Postcondition:
account.balance == old(account.balance) - amount - Invariant:
account.balance >= 0
Step 3: Generate Counterexample Inputs
Find concrete values that cause failure:
Manual generation:
# Try to violate postcondition
# If amount = 100, old balance = 50
# Then: 50 - 100 = -50 (violates balance >= 0 invariant!)
counterexample = {
'account': Account(balance=50),
'amount': 100
}
Systematic generation:
- Start with boundary values
- Try edge cases
- Use constraint solving
- Employ random testing
- Apply mutation testing
Step 4: Execute and Trace
Run the counterexample and capture details:
Execution trace:
Initial state:
account.balance = 50
amount = 100
Step 1: Enter withdraw function
Precondition check: account.balance >= amount?
50 >= 100 → False
Should raise error, but code doesn't check!
Step 2: Execute: account.balance -= amount
account.balance = 50 - 100 = -50
Step 3: Return account.balance
result = -50
Postcondition check:
account.balance == old(account.balance) - amount?
-50 == 50 - 100 → True (passes)
account.balance >= 0?
-50 >= 0 → False (FAILS!)
COUNTEREXAMPLE FOUND:
Invariant violation: balance became negative
Step 5: Present the Counterexample
Format for clarity and debugging:
Counterexample report:
## Counterexample: Invariant Violation in withdraw()
### Failed Property
Invariant: account.balance >= 0
### Inputs
- account.balance = 50
- amount = 100
### Execution Trace
1. Initial: account.balance = 50
2. Check: balance >= amount → 50 >= 100 → False
(Missing precondition enforcement!)
3. Execute: balance -= amount → balance = -50
4. Return: -50
### Why It Fails
The function doesn't enforce the precondition that
`account.balance >= amount`. When amount > balance,
the withdrawal proceeds anyway, violating the
invariant that balance must be non-negative.
### Root Cause
Missing validation:
```python
if account.balance < amount:
raise InsufficientFundsError()
Fix
Add precondition check before withdrawal.
## Counterexample Patterns
### Pattern 1: Boundary Value Violation
**Specification:**
```python
def sqrt(x: float) -> float:
"""
Precondition: x >= 0
Postcondition: result >= 0 and abs(result * result - x) < 1e-10
"""
return x ** 0.5
Verification failure:
Postcondition violated when x < 0
Counterexample:
# Counterexample
input: x = -1
# Execution:
result = (-1) ** 0.5 = (1.5707963267948966e-308+6.123233995736766e-17j)
# Returns complex number, not float!
# Why it fails:
# Python returns complex for sqrt of negative
# Violates postcondition: result should be float
# Trace:
Initial state: x = -1
Step 1: Compute (-1) ** 0.5
Step 2: Result is complex number
Step 3: Return complex (type violation)
# Fix needed:
if x < 0:
raise ValueError("Cannot compute sqrt of negative number")
Pattern 2: Off-By-One Error
Specification:
def binary_search(arr: List[int], target: int) -> int:
"""
Precondition: arr is sorted
Postcondition:
If result >= 0: arr[result] == target
If result == -1: target not in arr
"""
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid # BUG: should be mid + 1
else:
right = mid - 1
return -1
Counterexample:
# Counterexample
input:
arr = [1, 2, 3, 4, 5]
target = 5
# Execution trace:
Iteration 1:
left = 0, right = 4, mid = 2
arr[2] = 3 < 5
left = 2 # Bug! Infinite loop starts
Iteration 2:
left = 2, right = 4, mid = 3
arr[3] = 4 < 5
left = 3
Iteration 3:
left = 3, right = 4, mid = 3
arr[3] = 4 < 5
left = 3 # Stuck! left doesn't advance
... (infinite loop)
# Why it fails:
Setting left = mid instead of left = mid + 1
causes infinite loop when target is in right half
# Minimal counterexample:
arr = [1, 2], target = 2
→ Infinite loop
Pattern 3: Invariant Violation
Specification:
class Stack:
"""
Invariant:
- 0 <= size <= capacity
- size == len(items)
"""
def __init__(self, capacity):
self.capacity = capacity
self.items = []
self.size = 0
def push(self, item):
self.items.append(item)
self.size += 1 # BUG: doesn't check capacity
Counterexample:
# Counterexample
Initial state:
capacity = 2
items = []
size = 0
Operations:
push(1) → items = [1], size = 1 ✓
push(2) → items = [1, 2], size = 2 ✓
push(3) → items = [1, 2, 3], size = 3 ✗
# Invariant violation:
size > capacity
3 > 2 → FAILS
# Why it fails:
No check that size < capacity before push
# Fix:
def push(self, item):
if self.size >= self.capacity:
raise StackOverflowError()
self.items.append(item)
self.size += 1
Pattern 4: State Inconsistency
Specification:
class BankAccount:
"""
Invariant: balance >= 0
"""
def __init__(self, balance):
self.balance = balance
def transfer_to(self, other, amount):
"""
Postcondition:
self.balance == old(self.balance) - amount
other.balance == old(other.balance) + amount
"""
self.balance -= amount
# BUG: Missing other.balance += amount
Counterexample:
# Counterexample
Initial state:
account1.balance = 100
account2.balance = 50
Operation:
account1.transfer_to(account2, 30)
# Execution trace:
Step 1: account1.balance -= 30
account1.balance = 70
Step 2: [Missing code!]
account2.balance unchanged = 50
Final state:
account1.balance = 70
account2.balance = 50
# Postcondition check:
account2.balance == old(account2.balance) + amount?
50 == 50 + 30?
50 == 80? → FALSE
# Money disappeared:
old total = 100 + 50 = 150
new total = 70 + 50 = 120
Lost: 30
# Counterexample demonstrates:
Money is lost, violates conservation invariant
Pattern 5: Race Condition
Specification:
class Counter:
"""
Thread-safe counter
Invariant: value equals number of increments
"""
def __init__(self):
self.value = 0
def increment(self):
# BUG: Not atomic
temp = self.value
temp += 1
self.value = temp
Counterexample:
# Counterexample (concurrent execution)
Initial state:
value = 0
Thread 1: Thread 2:
temp1 = value (0)
temp2 = value (0)
temp1 += 1 (1)
temp2 += 1 (1)
value = temp1 (1)
value = temp2 (1)
Final state:
value = 1
# Expected:
2 increments → value should be 2
# Actual:
value = 1
# Why it fails:
Non-atomic read-modify-write
Lost update: Thread 2 overwrites Thread 1
# Interleaving trace:
T1: Read value=0
T2: Read value=0
T1: Compute 0+1=1
T2: Compute 0+1=1
T1: Write value=1
T2: Write value=1 (overwrites!)
Result: value=1 (should be 2)
Pattern 6: Loop Invariant Violation
Specification:
def find_max(arr: List[int]) -> int:
"""
Loop invariant:
max_val is the maximum of arr[0:i]
Postcondition:
result is maximum of all elements
"""
max_val = arr[0]
for i in range(len(arr)): # BUG: should be range(1, len(arr))
if arr[i] > max_val:
max_val = arr[i]
return max_val
Counterexample:
# Counterexample
input: arr = [5, 3, 8, 2]
# Execution trace:
Initial: max_val = 5, i = 0
Iteration 1 (i=0):
arr[0] > max_val?
5 > 5? → False
max_val remains 5
Iteration 2 (i=1):
arr[1] > max_val?
3 > 5? → False
max_val remains 5
Iteration 3 (i=2):
arr[2] > max_val?
8 > 5? → True
max_val = 8
Iteration 4 (i=3):
arr[3] > max_val?
2 > 8? → False
max_val remains 8
Result: 8 ✓ (correct, but inefficient)
# Better counterexample showing actual bug:
input: arr = [1, 2, 3, 5, 4]
If we compare with itself at i=0:
Loop invariant violated on first iteration
Compares element with itself unnecessarily
# Real bug revealed with:
input: arr = [] # Empty array
Execution:
max_val = arr[0] → IndexError!
# Counterexample demonstrates:
Missing check for empty array
Pattern 7: Arithmetic Overflow
Specification:
def midpoint(left: int, right: int) -> int:
"""
Postcondition: left <= result <= right
"""
return (left + right) // 2 # BUG: Overflow possible
Counterexample:
# Counterexample (in languages with fixed-size integers)
input:
left = 2_000_000_000
right = 2_000_000_000
# Execution:
Step 1: left + right = 4_000_000_000
(Overflow in 32-bit signed integer! Max = 2_147_483_647)
Wraps to negative: -294_967_296
Step 2: -294_967_296 // 2 = -147_483_648
Result: -147_483_648
# Postcondition check:
left <= result <= right?
2_000_000_000 <= -147_483_648 <= 2_000_000_000?
FALSE
# Why it fails:
Integer overflow in addition
# Fix:
return left + (right - left) // 2
Counterexample Generation Techniques
Technique 1: Boundary Value Analysis
Test at boundaries of valid ranges:
# For specification: 0 <= x <= 100
Test values:
- Minimum: 0
- Just above minimum: 1
- Just below maximum: 99
- Maximum: 100
- Below minimum: -1 (counterexample)
- Above maximum: 101 (counterexample)
Technique 2: Constraint Solving
Use SMT solvers to find satisfying inputs:
# Specification:
# Precondition: x > 0 and y > 0
# Postcondition: result > x and result > y
# Find counterexample where postcondition fails:
# We want: NOT (result > x and result > y)
# Which is: result <= x OR result <= y
# SMT constraint:
# x > 0 AND y > 0 AND (result <= x OR result <= y)
# Solver finds:
x = 10, y = 5
result = 7 # 7 <= 10, so postcondition fails
Technique 3: Symbolic Execution
Execute code symbolically to find violations:
def abs_diff(a, b):
if a > b:
return a - b
else:
return b - a
# Symbolic execution:
Path 1: a > b
Constraint: a > b
Return: a - b
Path 2: a <= b
Constraint: a <= b
Return: b - a
# Check postcondition: result >= 0
Path 1: a - b >= 0? (a > b, so yes)
Path 2: b - a >= 0? (a <= b, so yes)
# No counterexample found (correct!)
Technique 4: Mutation Testing
Mutate code and find tests that don't catch it:
# Original:
def is_even(n):
return n % 2 == 0
# Mutant:
def is_even(n):
return n % 2 == 1 # Mutation: == to ==
# Counterexample for specification:
input: n = 4
expected: True (even)
actual: False (mutant says odd)
# This shows missing test case
Counterexample Report Template
## Counterexample Report
### Failed Property
[Precondition/Postcondition/Invariant that failed]
### Location
File: [filename]
Function: [function name]
Line: [line number]
### Counterexample Inputs
[Concrete input values that trigger the failure]
### Expected Behavior
[What should happen according to specification]
### Actual Behavior
[What actually happened]
### Execution Trace
[Step-by-step execution showing the failure]
### Root Cause
[Why the failure occurs]
### Suggested Fix
[How to fix the issue]
### Minimal Example
[Simplest input that demonstrates the problem]
Example Complete Report
## Counterexample Report
### Failed Property
Postcondition: result >= 0
### Location
File: math_utils.py
Function: sqrt
Line: 12
### Counterexample Inputs
```python
x = -4
Expected Behavior
According to specification:
- Should raise ValueError for x < 0
- Should only return non-negative float values
Actual Behavior
result = sqrt(-4)
# Returns: (1.2246467991473532e-16+2j)
# Type: <class 'complex'>
Function returns a complex number instead of raising an exception.
Execution Trace
1. Function called with x = -4
2. Precondition check: x >= 0? → False (should fail here!)
3. No explicit check implemented
4. Compute: (-4) ** 0.5
5. Python evaluates to complex: 2j
6. Return complex number
7. Postcondition: result >= 0? → Type error (can't compare complex to int)
Root Cause
Missing precondition enforcement. The function assumes input validation but doesn't implement it. Python's ** operator returns complex numbers for negative bases with fractional exponents.
Suggested Fix
def sqrt(x: float) -> float:
if x < 0:
raise ValueError(f"Cannot compute sqrt of negative number: {x}")
return x ** 0.5
Minimal Example
Simplest failing input:
sqrt(-1) # Any negative number fails
Additional Test Cases
Based on this counterexample, add tests:
def test_sqrt_negative():
with pytest.raises(ValueError):
sqrt(-1)
def test_sqrt_zero():
assert sqrt(0) == 0
def test_sqrt_positive():
assert abs(sqrt(4) - 2.0) < 1e-10
## Best Practices
1. **Start simple** - Find minimal counterexamples first
2. **Be concrete** - Use specific values, not symbolic
3. **Show execution** - Provide step-by-step traces
4. **Explain clearly** - Make the failure obvious
5. **Suggest fixes** - Show how to resolve the issue
6. **Test the fix** - Verify the counterexample is resolved
7. **Generalize** - Identify similar potential failures
8. **Document** - Record counterexamples for regression testing
9. **Automate** - Use tools to generate counterexamples
10. **Verify minimal** - Ensure counterexample is simplest possible
## Tools and Techniques
### Static Analysis Tools
- **Dafny** - Auto-generates counterexamples for failed proofs
- **Frama-C** - C verification with counterexample generation
- **Z3/SMT solvers** - Constraint-based counterexample finding
- **CBMC** - Bounded model checking for C
### Dynamic Analysis
- **Property-based testing** - Hypothesis, QuickCheck
- **Fuzzing** - AFL, LibFuzzer
- **Concolic execution** - KLEE, Symbolic PathFinder
- **Mutation testing** - Mutmut, PIT
### Manual Techniques
- Boundary value analysis
- Equivalence partitioning
- State transition testing
- Path coverage analysis
## Common Counterexample Scenarios
### Scenario 1: Missing Validation
```python
# Bug: No input validation
def divide(a, b):
return a / b
# Counterexample: b = 0 → ZeroDivisionError
Scenario 2: Wrong Boundary
# Bug: Off-by-one in condition
if x <= 100: # Should be x < 100
# Counterexample: x = 100 (incorrectly included)
Scenario 3: Type Confusion
# Bug: Assumes integer
def double(x):
return x * 2
# Counterexample: x = "hello" → "hellohello" (string, not number)
Scenario 4: Unhandled Case
# Bug: Missing else
if x > 0:
return 1
elif x < 0:
return -1
# Missing: x == 0
# Counterexample: x = 0 → None (should return 0)
Scenario 5: Resource Leak
# Bug: File not closed on exception
def read_file(path):
f = open(path)
data = f.read()
# Missing: f.close()
return data
# Counterexample: Exception during read → file not closed