fuzz
Fuzz Input Generation
Generate intelligent, context-aware fuzz test inputs by analyzing input parsing code. Produces boundary values, type confusion inputs, encoding edge cases, format-specific attacks, and injection payloads tailored to the specific parser and data types in scope. Output is structured JSON test case sets ready for integration with test harnesses.
Supported Flags
Read ../../shared/schemas/flags.md for the full flag specification.
| Flag | Fuzz Behavior |
|---|---|
--scope |
Identifies which input handlers to generate fuzz inputs for. Default changed. |
--depth quick |
Standard boundary values and common injection strings only. |
--depth standard |
Context-aware inputs based on code analysis of the parser. |
--depth deep |
Standard + format-specific attacks, encoding mutations, and chained payloads. |
--depth expert |
Deep + adversarial inputs designed to bypass specific validation logic found in code. |
--severity |
Generate inputs targeting vulnerabilities at or above this severity. |
--format |
Default json. Use text for human-readable listing. |
Workflow
Step 1: Identify Input Handlers
Locate input parsing and processing code in scope:
- API endpoint handlers: Functions that read request body, query params, headers.
- File parsers: Functions that parse uploaded files, config files, data imports.
- CLI argument parsers: Argument parsing with
argparse,commander,cobra,clap. - Message consumers: Functions processing messages from queues, WebSockets, SSE.
- Deserialization points: JSON.parse, XML parsing, YAML loading, protobuf decoding.
- Database query builders: Functions constructing queries from user input.
For each handler, identify:
- Expected input type (string, number, array, object, file).
- Validation rules (regex, schema, type checks, length limits).
- How the input is used downstream (SQL, shell, HTML, file path, URL, regex).
Step 2: Analyze Input Constraints
Read the code to understand what the parser expects and what it guards against:
- Type expectations: What types does the code assume? Where are type coercions?
- Length limits: Are there explicit length checks? What happens at max length?
- Character restrictions: Are certain characters filtered or escaped? Which ones?
- Format requirements: Does the input need to match a pattern (email, URL, date)?
- Range constraints: Numeric bounds, enum values, allowed file extensions.
- Nested structure: How deep can objects/arrays nest? Are there recursion limits?
Step 3: Generate Boundary Value Inputs
For each input field, generate boundary value test cases:
| Input Type | Boundary Values |
|---|---|
| String | Empty "", single char "a", max length, max length + 1, unicode BOM, null bytes "\x00" |
| Number | 0, -1, MAX_INT, MIN_INT, MAX_INT+1, NaN, Infinity, -Infinity, float precision edge cases |
| Array | Empty [], single element, very large array (10000+), nested arrays, mixed types |
| Object | Empty {}, deeply nested (100+ levels), circular reference attempt, prototype keys |
| Boolean | true, false, 0, 1, "", "false", null, undefined |
| Date | Epoch 0, negative timestamp, far future, invalid dates (Feb 30), timezone edge cases |
| File | Empty file, 0-byte, huge file, wrong extension, polyglot file, symlink |
Step 4: Generate Type Confusion Inputs
Inputs designed to exploit type coercion and type assumption bugs:
Generate inputs that send the wrong type: string where number expected, array where string expected, object with toString override, deeply nested arrays, null where required, boolean where string expected, numeric string where number expected, and prototype/constructor pollution objects (__proto__, constructor.prototype).
Step 5: Generate Encoding Edge Cases
Inputs exploiting encoding and character set handling:
- Unicode: Normalization forms (NFC, NFD, NFKC, NFKD), homoglyphs, right-to-left override, zero-width characters.
- URL encoding: Double encoding (
%2527), mixed encoding, overlong UTF-8. - HTML entities: Named (
&), numeric (&), hex (&), surrogate pairs. - Null bytes: Mid-string null bytes for truncation attacks.
- Line endings:
\r\n,\r,\n,\x0b,\x0c,\x85,\u2028,\u2029. - Case mapping: Turkish locale
I/idotless variants, Germanß/SS.
Step 6: Generate Context-Aware Injection Payloads
Based on how the input is used downstream (identified in Step 1), generate targeted payloads:
| Sink Context | Payload Category |
|---|---|
| SQL query | SQL injection: UNION, boolean blind, time blind, stacked queries, comment-based |
| Shell command | Command injection: semicolons, pipes, backticks, $(), newlines |
| HTML output | XSS: script tags, event handlers, SVG/MathML, template injection |
| File path | Path traversal: ../, null bytes, long paths, reserved names (CON, NUL) |
| URL construction | SSRF: localhost variants, IPv6, DNS rebinding, scheme confusion |
| Regex input | ReDoS: catastrophic backtracking patterns, exponential quantifiers |
| XML parser | XXE: external entity, parameter entity, SSRF via DTD |
| LDAP query | LDAP injection: wildcards, boolean operators, null bytes |
| Header value | Header injection: CRLF, response splitting |
| JSON parser | JSON interoperability: duplicate keys, large numbers, deep nesting |
Step 7: Generate Format-Specific Attacks
At --depth deep and above, generate inputs targeting specific file/data formats:
- JSON: Duplicate keys (parser-dependent behavior), comments, trailing commas, BOM prefix.
- XML: Billion laughs, quadratic blowup, external entities, CDATA abuse.
- YAML: Anchor bombs, merge keys, tag deserialization (
!!python/object). - CSV: Formula injection (
=CMD()), field separator in values, newlines in quoted fields. - JWT: Algorithm none, key confusion (RS256/HS256), expired but valid signature.
- GraphQL: Deep nesting, alias flooding, batch query abuse, introspection.
- Multipart: Boundary manipulation, filename traversal, content-type mismatch.
Step 8: Output Test Case Sets
Organize all generated inputs into structured JSON test case sets:
{
"target": {
"file": "src/api/users.ts",
"function": "createUser",
"input_field": "email",
"expected_type": "string",
"downstream_use": ["sql_query", "html_email"]
},
"generated_at": "2026-02-14T10:30:00Z",
"total_cases": 85,
"test_cases": [
{
"id": "FUZZ-001",
"category": "boundary",
"label": "empty_string",
"input": "",
"expected_behavior": "validation_error",
"targets_cwe": "CWE-20"
},
{
"id": "FUZZ-002",
"category": "injection_sql",
"label": "union_select",
"input": "test@test.com' UNION SELECT * FROM users--",
"expected_behavior": "parameterized_query_prevents_injection",
"targets_cwe": "CWE-89"
}
]
}
Write test case files to .appsec/fuzz/ organized by target.
Output Format
Fuzz inputs are not findings themselves but may reference CWEs they target.
Finding ID prefix: FUZZ (e.g., FUZZ-001) for test case identification.
metadata.tool:"fuzz"
If fuzz testing reveals an actual vulnerability (input causes unexpected behavior), emit a finding using ../../shared/schemas/findings.md.
Pragmatism Notes
- Generate inputs relevant to the actual technology. Do not generate SQL injection payloads for code that never touches a database.
- Respect the
--depthflag. Quick depth should produce 10-20 inputs. Expert depth can produce hundreds. - Label each input clearly so testers understand what it targets and what behavior to expect.
- Mark intentionally dangerous inputs (e.g., billion laughs XML) with a warning about resource consumption.
- These are test inputs, not exploit code. Frame output as defensive testing material.
- If the code already has strong validation visible in the source, generate inputs that specifically test the validation boundaries.
- Include both inputs that should be rejected (malicious) and inputs that should be accepted (edge case valid) to test for false positives in validation.