Directed Test Input Generator

Generate test inputs that target specific code paths and hard-to-reach behaviors using program analysis, coverage feedback, and LLM-driven semantic understanding.

Overview

Directed test input generation combines multiple techniques to create test inputs that explore specific execution paths:

Path Analysis: Extract control flow paths and their constraints
Constraint Solving: Generate inputs satisfying path conditions
Coverage Guidance: Use coverage feedback to iteratively reach new paths
LLM Semantic Understanding: Leverage code understanding for meaningful inputs

Quick Start

Basic Workflow

# 1. Analyze code to extract paths
from scripts.path_analyzer import analyze_code_paths

paths = analyze_code_paths(source_code)

# 2. Generate inputs for each path
from scripts.input_generator import generate_test_suite

test_suite = generate_test_suite(paths)

# 3. Execute tests and measure coverage
for path_id, test_data in test_suite.items():
    result = execute_test(function, test_data['inputs'])
    verify_coverage(result, test_data['target_line'])

Core Techniques

1. Path Analysis and Extraction

Extract execution paths and their constraints from code:

from scripts.path_analyzer import analyze_code_paths, print_paths

source = """
def validate_age(age, country):
    if age < 0:
        raise ValueError("Invalid age")
    if age < 18:
        return "minor"
    if age >= 65 and country == "US":
        return "senior_us"
    return "adult"
"""

paths = analyze_code_paths(source)
print_paths(paths)

# Output:
# Path #0: exception handler (ValueError) (line 3)
#   Conditions:
#     - age < 0
#
# Path #1: if branch (line 5)
#   Conditions:
#     - age >= 0
#     - age < 18
#
# Path #2: if branch (line 7)
#   Conditions:
#     - age >= 0
#     - age >= 18
#     - age >= 65
#     - country == US

2. Constraint-Based Input Generation

Generate inputs that satisfy specific path constraints:

from scripts.input_generator import TestInputGenerator

generator = TestInputGenerator()

# Generate input for path: age >= 65 AND country == "US"
constraints = {
    "age": ["age >= 65"],
    "country": ["country == US"]
}

inputs = generator.generate_for_path(constraints)
# Result: {"age": 65, "country": "US"}

3. Edge Case Generation

Generate boundary values systematically:

from scripts.input_generator import EdgeCaseGenerator

# Generate edge cases for integer type
edge_cases = EdgeCaseGenerator.generate_edge_cases(int)
# [0, -1, 1, -2147483648, 2147483647, ...]

# Generate boundary values around a threshold
boundaries = EdgeCaseGenerator.generate_boundary_values(">", 65)
# [64, 65, 66, 67] - values around the boundary

4. Coverage-Guided Generation

Iteratively generate inputs to maximize coverage:

def coverage_guided_testing(function, max_iterations=100):
    """
    Generate test inputs guided by coverage feedback.
    """
    covered_lines = set()
    test_corpus = []

    # Start with initial inputs
    current_inputs = generate_seed_inputs(function)

    for i in range(max_iterations):
        # Execute and measure coverage
        new_coverage = execute_with_coverage(function, current_inputs)

        if new_coverage - covered_lines:
            # New coverage reached - save inputs
            test_corpus.append(current_inputs)
            covered_lines.update(new_coverage)

        # Mutate inputs to explore new paths
        current_inputs = mutate_toward_uncovered(current_inputs, covered_lines)

    return test_corpus

See coverage_strategies.md for detailed coverage-guided strategies.

5. LLM-Driven Semantic Generation

Use LLM understanding of code semantics to generate meaningful inputs:

# Example: LLM generates semantically appropriate inputs
function_code = """
def book_flight(passenger_age, departure_date, destination_country):
    # ... booking logic
"""

# LLM understands parameter semantics and generates realistic inputs:
llm_generated = [
    {
        "passenger_age": 35,           # Adult
        "departure_date": "2026-06-15",  # Future date
        "destination_country": "US"      # Valid country code
    },
    {
        "passenger_age": 8,            # Child passenger
        "departure_date": "2026-07-20",
        "destination_country": "UK"
    },
    {
        "passenger_age": 72,           # Senior citizen
        "departure_date": "2026-08-10",
        "destination_country": "CA"
    }
]

See llm_patterns.md for comprehensive LLM-driven generation patterns.

Common Use Cases

Use Case 1: Target Specific Branch

Generate input to reach an uncovered branch:

# Target: age > 100 branch
def check_age(age):
    if age > 100:
        return "exceptionally_old"  # Want to test this
    return "normal"

# Analyze paths
paths = analyze_code_paths(check_age_source)
target_path = paths[0]  # age > 100 path

# Generate input
generator = TestInputGenerator()
constraints = target_path.get_constraints()
test_input = generator.generate_for_path(constraints)
# Result: {"age": 101}

# Test
assert check_age(**test_input) == "exceptionally_old"

Use Case 2: Systematic Boundary Testing

Test all boundaries in a function:

def calculate_discount(age, is_premium):
    if age < 18:
        return 0.0
    elif age < 65:
        return 0.1 if is_premium else 0.05
    else:
        return 0.2 if is_premium else 0.15

# Generate boundary test suite
boundaries = [
    {"age": 17, "is_premium": False},  # Just below 18
    {"age": 18, "is_premium": False},  # Exactly 18
    {"age": 19, "is_premium": True},   # Just above 18
    {"age": 64, "is_premium": False},  # Just below 65
    {"age": 65, "is_premium": True},   # Exactly 65
    {"age": 66, "is_premium": False},  # Just above 65
]

for inputs in boundaries:
    result = calculate_discount(**inputs)
    print(f"{inputs} -> {result}")

Use Case 3: Coverage-Driven Exploration

Maximize code coverage through iterative generation:

def complex_function(x, y, z):
    if x > 10:
        if y < 5:
            if z == 0:
                return "path_A"  # Hard to reach
    elif x < 0:
        if y > 10:
            return "path_B"
    return "default"

# Coverage-guided approach
covered = set()
test_inputs = []

# Iteration 1: Try random input
input_1 = {"x": 5, "y": 3, "z": 1}
coverage_1 = execute_with_coverage(complex_function, input_1)
# Covers: default path

# Iteration 2: Mutate toward uncovered (x > 10)
input_2 = {"x": 11, "y": 7, "z": 1}
coverage_2 = execute_with_coverage(complex_function, input_2)
# Covers: x > 10 but not y < 5

# Iteration 3: Refine toward (y < 5)
input_3 = {"x": 11, "y": 4, "z": 1}
coverage_3 = execute_with_coverage(complex_function, input_3)
# Covers: x > 10 and y < 5 but not z == 0

# Iteration 4: Target (z == 0)
input_4 = {"x": 11, "y": 4, "z": 0}
result = complex_function(**input_4)
# SUCCESS: Reached "path_A"

Advanced Strategies

Hybrid Approach: Combining Techniques

def hybrid_test_generation(function_source):
    """
    Combine multiple techniques for comprehensive coverage.
    """
    # 1. Symbolic analysis - extract paths
    paths = analyze_code_paths(function_source)

    # 2. Constraint solving - generate initial inputs
    symbolic_inputs = [generate_for_path(p.get_constraints()) for p in paths]

    # 3. Edge case generation - add boundary values
    edge_inputs = generate_edge_cases_for_function(function_source)

    # 4. LLM semantic generation - add realistic scenarios
    llm_inputs = query_llm_for_realistic_inputs(function_source)

    # 5. Coverage-guided refinement - fill gaps
    all_inputs = symbolic_inputs + edge_inputs + llm_inputs
    coverage = measure_coverage(all_inputs)

    # 6. Mutate to reach remaining uncovered paths
    refined_inputs = coverage_guided_mutation(all_inputs, coverage)

    return refined_inputs

Branch Distance Minimization

Guide input generation toward uncovered branches:

def minimize_branch_distance(target_condition, current_input):
    """
    Adjust input to get closer to satisfying target condition.

    Example:
        Target: x > 100
        Current: x = 50
        Distance: 100 - 50 + 1 = 51

        New input: x = 101 (distance = 0)
    """
    variable = target_condition.variable
    operator = target_condition.operator
    threshold = target_condition.value

    if operator == ">":
        return {**current_input, variable: threshold + 1}
    elif operator == "<":
        return {**current_input, variable: threshold - 1}
    elif operator == ">=":
        return {**current_input, variable: threshold}
    elif operator == "<=":
        return {**current_input, variable: threshold}
    elif operator == "==":
        return {**current_input, variable: threshold}

    return current_input

Reference Documentation

Detailed Coverage Strategies

See coverage_strategies.md for:

Branch distance minimization
Gradient-based input adjustment
Symbolic constraint solving
Mutation-based fuzzing
Hybrid coverage-guided + LLM approaches

LLM-Driven Patterns

See llm_patterns.md for:

Semantic input generation
Path-directed generation with LLM
Domain knowledge integration
Adversarial input generation
Property-based test generation
Multi-step scenario generation

Best Practices

1. Start with Symbolic Analysis

Extract paths first to understand what needs to be tested:

paths = analyze_code_paths(source)
print(f"Found {len(paths)} paths to cover")

2. Generate Diverse Inputs

Combine multiple generation strategies:

Constraint solving for known paths
Edge cases for boundaries
LLM for semantic realism
Fuzzing for unexpected cases

3. Use Coverage Feedback

Monitor coverage and adjust strategy:

if coverage_improvement < 0.01:
    # Switch from symbolic to fuzzing
    switch_to_mutation_based()

4. Validate Generated Inputs

Always check that generated inputs are valid:

def validate_input(inputs, function_signature):
    required_params = get_parameters(function_signature)
    assert all(p in inputs for p in required_params)

5. Prioritize Hard-to-Reach Paths

Focus on paths requiring specific conditions:

# Prioritize deeply nested conditions
priority_paths = [p for p in paths if len(p.conditions) > 3]

Tools Reference

Scripts

path_analyzer.py - Extract control flow paths and constraints from Python code

python scripts/path_analyzer.py < your_code.py

input_generator.py - Generate test inputs satisfying path constraints

python scripts/input_generator.py --paths paths.json

Example Workflow

from scripts.path_analyzer import analyze_code_paths
from scripts.input_generator import generate_test_suite

# 1. Analyze code
source = open("my_module.py").read()
paths = analyze_code_paths(source)

# 2. Generate inputs
test_suite = generate_test_suite(paths)

# 3. Run tests
for path_id, test_data in test_suite.items():
    print(f"Testing path {path_id}: {test_data['description']}")
    result = my_function(**test_data['inputs'])
    print(f"  Result: {result}")

directed-test-input-generator