directed-test-input-generator
Directed Test Input Generator
Generate test inputs that target specific code paths and hard-to-reach behaviors using program analysis, coverage feedback, and LLM-driven semantic understanding.
Overview
Directed test input generation combines multiple techniques to create test inputs that explore specific execution paths:
- Path Analysis: Extract control flow paths and their constraints
- Constraint Solving: Generate inputs satisfying path conditions
- Coverage Guidance: Use coverage feedback to iteratively reach new paths
- LLM Semantic Understanding: Leverage code understanding for meaningful inputs
Quick Start
Basic Workflow
# 1. Analyze code to extract paths
from scripts.path_analyzer import analyze_code_paths
paths = analyze_code_paths(source_code)
# 2. Generate inputs for each path
from scripts.input_generator import generate_test_suite
test_suite = generate_test_suite(paths)
# 3. Execute tests and measure coverage
for path_id, test_data in test_suite.items():
result = execute_test(function, test_data['inputs'])
verify_coverage(result, test_data['target_line'])
Core Techniques
1. Path Analysis and Extraction
Extract execution paths and their constraints from code:
from scripts.path_analyzer import analyze_code_paths, print_paths
source = """
def validate_age(age, country):
if age < 0:
raise ValueError("Invalid age")
if age < 18:
return "minor"
if age >= 65 and country == "US":
return "senior_us"
return "adult"
"""
paths = analyze_code_paths(source)
print_paths(paths)
# Output:
# Path #0: exception handler (ValueError) (line 3)
# Conditions:
# - age < 0
#
# Path #1: if branch (line 5)
# Conditions:
# - age >= 0
# - age < 18
#
# Path #2: if branch (line 7)
# Conditions:
# - age >= 0
# - age >= 18
# - age >= 65
# - country == US
2. Constraint-Based Input Generation
Generate inputs that satisfy specific path constraints:
from scripts.input_generator import TestInputGenerator
generator = TestInputGenerator()
# Generate input for path: age >= 65 AND country == "US"
constraints = {
"age": ["age >= 65"],
"country": ["country == US"]
}
inputs = generator.generate_for_path(constraints)
# Result: {"age": 65, "country": "US"}
3. Edge Case Generation
Generate boundary values systematically:
from scripts.input_generator import EdgeCaseGenerator
# Generate edge cases for integer type
edge_cases = EdgeCaseGenerator.generate_edge_cases(int)
# [0, -1, 1, -2147483648, 2147483647, ...]
# Generate boundary values around a threshold
boundaries = EdgeCaseGenerator.generate_boundary_values(">", 65)
# [64, 65, 66, 67] - values around the boundary
4. Coverage-Guided Generation
Iteratively generate inputs to maximize coverage:
def coverage_guided_testing(function, max_iterations=100):
"""
Generate test inputs guided by coverage feedback.
"""
covered_lines = set()
test_corpus = []
# Start with initial inputs
current_inputs = generate_seed_inputs(function)
for i in range(max_iterations):
# Execute and measure coverage
new_coverage = execute_with_coverage(function, current_inputs)
if new_coverage - covered_lines:
# New coverage reached - save inputs
test_corpus.append(current_inputs)
covered_lines.update(new_coverage)
# Mutate inputs to explore new paths
current_inputs = mutate_toward_uncovered(current_inputs, covered_lines)
return test_corpus
See coverage_strategies.md for detailed coverage-guided strategies.
5. LLM-Driven Semantic Generation
Use LLM understanding of code semantics to generate meaningful inputs:
# Example: LLM generates semantically appropriate inputs
function_code = """
def book_flight(passenger_age, departure_date, destination_country):
# ... booking logic
"""
# LLM understands parameter semantics and generates realistic inputs:
llm_generated = [
{
"passenger_age": 35, # Adult
"departure_date": "2026-06-15", # Future date
"destination_country": "US" # Valid country code
},
{
"passenger_age": 8, # Child passenger
"departure_date": "2026-07-20",
"destination_country": "UK"
},
{
"passenger_age": 72, # Senior citizen
"departure_date": "2026-08-10",
"destination_country": "CA"
}
]
See llm_patterns.md for comprehensive LLM-driven generation patterns.
Common Use Cases
Use Case 1: Target Specific Branch
Generate input to reach an uncovered branch:
# Target: age > 100 branch
def check_age(age):
if age > 100:
return "exceptionally_old" # Want to test this
return "normal"
# Analyze paths
paths = analyze_code_paths(check_age_source)
target_path = paths[0] # age > 100 path
# Generate input
generator = TestInputGenerator()
constraints = target_path.get_constraints()
test_input = generator.generate_for_path(constraints)
# Result: {"age": 101}
# Test
assert check_age(**test_input) == "exceptionally_old"
Use Case 2: Systematic Boundary Testing
Test all boundaries in a function:
def calculate_discount(age, is_premium):
if age < 18:
return 0.0
elif age < 65:
return 0.1 if is_premium else 0.05
else:
return 0.2 if is_premium else 0.15
# Generate boundary test suite
boundaries = [
{"age": 17, "is_premium": False}, # Just below 18
{"age": 18, "is_premium": False}, # Exactly 18
{"age": 19, "is_premium": True}, # Just above 18
{"age": 64, "is_premium": False}, # Just below 65
{"age": 65, "is_premium": True}, # Exactly 65
{"age": 66, "is_premium": False}, # Just above 65
]
for inputs in boundaries:
result = calculate_discount(**inputs)
print(f"{inputs} -> {result}")
Use Case 3: Coverage-Driven Exploration
Maximize code coverage through iterative generation:
def complex_function(x, y, z):
if x > 10:
if y < 5:
if z == 0:
return "path_A" # Hard to reach
elif x < 0:
if y > 10:
return "path_B"
return "default"
# Coverage-guided approach
covered = set()
test_inputs = []
# Iteration 1: Try random input
input_1 = {"x": 5, "y": 3, "z": 1}
coverage_1 = execute_with_coverage(complex_function, input_1)
# Covers: default path
# Iteration 2: Mutate toward uncovered (x > 10)
input_2 = {"x": 11, "y": 7, "z": 1}
coverage_2 = execute_with_coverage(complex_function, input_2)
# Covers: x > 10 but not y < 5
# Iteration 3: Refine toward (y < 5)
input_3 = {"x": 11, "y": 4, "z": 1}
coverage_3 = execute_with_coverage(complex_function, input_3)
# Covers: x > 10 and y < 5 but not z == 0
# Iteration 4: Target (z == 0)
input_4 = {"x": 11, "y": 4, "z": 0}
result = complex_function(**input_4)
# SUCCESS: Reached "path_A"
Advanced Strategies
Hybrid Approach: Combining Techniques
def hybrid_test_generation(function_source):
"""
Combine multiple techniques for comprehensive coverage.
"""
# 1. Symbolic analysis - extract paths
paths = analyze_code_paths(function_source)
# 2. Constraint solving - generate initial inputs
symbolic_inputs = [generate_for_path(p.get_constraints()) for p in paths]
# 3. Edge case generation - add boundary values
edge_inputs = generate_edge_cases_for_function(function_source)
# 4. LLM semantic generation - add realistic scenarios
llm_inputs = query_llm_for_realistic_inputs(function_source)
# 5. Coverage-guided refinement - fill gaps
all_inputs = symbolic_inputs + edge_inputs + llm_inputs
coverage = measure_coverage(all_inputs)
# 6. Mutate to reach remaining uncovered paths
refined_inputs = coverage_guided_mutation(all_inputs, coverage)
return refined_inputs
Branch Distance Minimization
Guide input generation toward uncovered branches:
def minimize_branch_distance(target_condition, current_input):
"""
Adjust input to get closer to satisfying target condition.
Example:
Target: x > 100
Current: x = 50
Distance: 100 - 50 + 1 = 51
New input: x = 101 (distance = 0)
"""
variable = target_condition.variable
operator = target_condition.operator
threshold = target_condition.value
if operator == ">":
return {**current_input, variable: threshold + 1}
elif operator == "<":
return {**current_input, variable: threshold - 1}
elif operator == ">=":
return {**current_input, variable: threshold}
elif operator == "<=":
return {**current_input, variable: threshold}
elif operator == "==":
return {**current_input, variable: threshold}
return current_input
Reference Documentation
Detailed Coverage Strategies
See coverage_strategies.md for:
- Branch distance minimization
- Gradient-based input adjustment
- Symbolic constraint solving
- Mutation-based fuzzing
- Hybrid coverage-guided + LLM approaches
LLM-Driven Patterns
See llm_patterns.md for:
- Semantic input generation
- Path-directed generation with LLM
- Domain knowledge integration
- Adversarial input generation
- Property-based test generation
- Multi-step scenario generation
Best Practices
1. Start with Symbolic Analysis
Extract paths first to understand what needs to be tested:
paths = analyze_code_paths(source)
print(f"Found {len(paths)} paths to cover")
2. Generate Diverse Inputs
Combine multiple generation strategies:
- Constraint solving for known paths
- Edge cases for boundaries
- LLM for semantic realism
- Fuzzing for unexpected cases
3. Use Coverage Feedback
Monitor coverage and adjust strategy:
if coverage_improvement < 0.01:
# Switch from symbolic to fuzzing
switch_to_mutation_based()
4. Validate Generated Inputs
Always check that generated inputs are valid:
def validate_input(inputs, function_signature):
required_params = get_parameters(function_signature)
assert all(p in inputs for p in required_params)
5. Prioritize Hard-to-Reach Paths
Focus on paths requiring specific conditions:
# Prioritize deeply nested conditions
priority_paths = [p for p in paths if len(p.conditions) > 3]
Tools Reference
Scripts
path_analyzer.py - Extract control flow paths and constraints from Python code
python scripts/path_analyzer.py < your_code.py
input_generator.py - Generate test inputs satisfying path constraints
python scripts/input_generator.py --paths paths.json
Example Workflow
from scripts.path_analyzer import analyze_code_paths
from scripts.input_generator import generate_test_suite
# 1. Analyze code
source = open("my_module.py").read()
paths = analyze_code_paths(source)
# 2. Generate inputs
test_suite = generate_test_suite(paths)
# 3. Run tests
for path_id, test_data in test_suite.items():
print(f"Testing path {path_id}: {test_data['description']}")
result = my_function(**test_data['inputs'])
print(f" Result: {result}")