Encoding Bypass Anti-Pattern

Severity: High

Summary

Encoding bypass evades security checks via alternate encodings. Occurs when validation happens before decoding/normalization. Encoded payload appears safe but becomes malicious after processing. Bypasses WAFs, input filters, enables XSS and SQL injection.

The Anti-Pattern

Flawed order of operations: Validate then Decode/Normalize. Security checks run on encoded data, application later uses decoded version, re-introducing the vulnerability.

BAD Code Example

# VULNERABLE: Validation happens before Unicode normalization.
import unicodedata

def is_safe_username(username):
    # This check is flawed because it doesn't account for Unicode variants.
    if '<' in username or '>' in username:
        return False
    return True

def create_user_profile(username):
    if not is_safe_username(username):
        raise ValueError("Invalid characters in username.")

    # The application later normalizes the username for display or storage.
    # The full-width less-than sign '＜' (U+FF1C) was not caught by the check.
    # It gets normalized into the standard '<' (U+003C), enabling XSS.
    normalized_username = unicodedata.normalize('NFKC', username)

    # This will render the malicious script tag.
    return f"<div>Welcome, {normalized_username}</div>"

# Attacker's input: '＜script＞alert(1)＜/script＞'
# is_safe_username returns True.
# The normalized output becomes '<div>Welcome, <script>alert(1)</script></div>'

GOOD Code Example

# SECURE: Normalize then validate.
import unicodedata

def is_safe_username(username):
    # This check is now effective because it runs on the canonical form of the input.
    if '<' in username or '>' in username:
        return False
    return True

def create_user_profile(username):
    # First, normalize the input to its canonical form.
    normalized_username = unicodedata.normalize('NFKC', username)

    # Then, perform the security validation on the normalized data.
    if not is_safe_username(normalized_username):
        raise ValueError("Invalid characters in username.")

    # Now it's safe to use the normalized username.
    return f"<div>Welcome, {normalized_username}</div>"

JavaScript/Node.js Examples

BAD:

// VULNERABLE: Validation before URL decoding in path traversal
const express = require('express');
const fs = require('fs');
const path = require('path');

app.get('/file/:filename', (req, res) => {
    const filename = req.params.filename;

    // Check for path traversal - but filename is still encoded
    if (filename.includes('..')) {
        return res.status(400).send('Invalid filename');
    }

    // Express automatically decodes URL parameters
    // Attack: filename = "..%2F..%2Fetc%2Fpasswd"
    // After decoding: "../../etc/passwd" - bypasses the check
    const filePath = path.join('/uploads', filename);
    res.sendFile(filePath);
});

// Attack payload: GET /file/..%252F..%252Fetc%252Fpasswd
// Double encoding: %252F becomes %2F, then becomes /

GOOD:

// SECURE: Decode then validate
const express = require('express');
const fs = require('fs');
const path = require('path');

app.get('/file/:filename', (req, res) => {
    // Express already decoded once, but check for double encoding
    let filename = decodeURIComponent(req.params.filename);

    // Normalize to canonical form
    filename = path.normalize(filename);

    // Now validate the normalized path
    if (filename.includes('..') || path.isAbsolute(filename)) {
        return res.status(400).send('Invalid filename');
    }

    // Safe to use
    const filePath = path.join('/uploads', filename);
    res.sendFile(filePath);
});

Java Examples

BAD:

// VULNERABLE: SQL injection via URL decoding bypass
import java.net.URLDecoder;
import java.sql.*;

public void searchUser(String encodedQuery) {
    // Validate before decoding
    if (encodedQuery.contains("'") || encodedQuery.contains("--")) {
        throw new SecurityException("Invalid characters");
    }

    // Decode after validation
    String query = URLDecoder.decode(encodedQuery, "UTF-8");

    // Attack: encodedQuery = "admin%27%20OR%20%271%27%3D%271"
    // After decode: "admin' OR '1'='1" - bypasses the check
    String sql = "SELECT * FROM users WHERE name = '" + query + "'";
    Statement stmt = connection.createStatement();
    ResultSet rs = stmt.executeQuery(sql);
}

GOOD:

// SECURE: Decode then validate (but use parameterized queries)
import java.net.URLDecoder;
import java.sql.*;
import java.util.regex.Pattern;

public void searchUser(String encodedQuery) {
    // Decode to canonical form first
    String query = URLDecoder.decode(encodedQuery, "UTF-8");

    // Validate the decoded form
    if (!Pattern.matches("^[a-zA-Z0-9_]+$", query)) {
        throw new SecurityException("Invalid characters");
    }

    // Use parameterized query (best practice)
    String sql = "SELECT * FROM users WHERE name = ?";
    PreparedStatement stmt = connection.prepareStatement(sql);
    stmt.setString(1, query);
    ResultSet rs = stmt.executeQuery();
}

Detection

Python:

Validation before unicodedata.normalize()
Input checks before urllib.parse.unquote()
Regex patterns before string normalization
HTML entity validation before html.unescape()

JavaScript/Node.js:

Validation before decodeURIComponent()
Path checks before path.normalize()
Express middleware order (validate before decode)
Input checks before Buffer.from(input, 'base64')

Java:

Validation before URLDecoder.decode()
Security checks before Normalizer.normalize()
Input validation before StringEscapeUtils.unescapeHtml()
Path validation before Paths.get().normalize()

PHP:

Validation before urldecode()
Input checks before html_entity_decode()
Path validation before realpath()

Search Patterns:

Grep: normalize\(|decode\(|unescape\(|URLDecoder|decodeURIComponent
Look for validation logic (if statements, regex) before these functions
Check for double decoding: multiple decode calls in sequence
Review web framework routing (automatic decoding may occur)

Common Encoding Bypass Techniques:

URL encoding: %3c for <, %2e%2e%2f for ../
Double URL encoding: %253c for < (decoded twice)
Unicode variants: ＜ (U+FF1C) for <
HTML entities: < or < for <
Unicode escapes: \u003c for <
Mixed encoding: %u003c or %c0%bc for <
Path traversal: ..%2f, ..%5c, %2e%2e/

Prevention

Normalize/decode before validation: Always bring data to its simplest, canonical form before performing any security checks on it.
Use parameterized queries (for SQL) and other safe APIs that handle encoding internally. This is the best defense against injection attacks.
Enforce strict character encoding for all input (e.g., reject any data that is not valid UTF-8).
Be aware of implicit decoding performed by your web framework or libraries and ensure your validation logic runs after it.
Canonicalize paths and URLs before validating them to prevent path traversal attacks.

Testing for Encoding Bypass

Manual Testing:

Test URL encoding: %3cscript%3e, ..%2f..%2f
Test double encoding: %253cscript%253e, ..%252f..%252f
Test Unicode variants: ＜script＞, ．．／
Test HTML entities: <script>, <script>
Test mixed encoding: %u003cscript%u003e
Verify filters catch all encoding variants

Automated Testing:

Static Analysis: Semgrep, CodeQL to detect validation-before-decode patterns
DAST: Burp Suite Intruder with encoding payloads, OWASP ZAP fuzzer
Payload Lists: SecLists encoding bypass payloads
Custom Scripts: Automated encoding variant generation

Example Test:

# Test that validation occurs after normalization
def test_encoding_bypass_prevention():
    # Unicode variant of '<script>'
    malicious_input = "＜script＞alert(1)＜/script＞"

    try:
        create_user_profile(malicious_input)
        assert False, "Should reject encoded malicious input"
    except ValueError:
        pass  # Expected

    # Double URL encoding
    encoded_input = "%253cscript%253e"
    try:
        search_user(encoded_input)
        assert False, "Should reject double-encoded input"
    except SecurityException:
        pass  # Expected

Burp Suite Test:

# Intruder payload positions
GET /file/§..%2f..%2fetc%2fpasswd§

# Payload list (encoding variants)
..%2f..%2f
..%252f..%252f
．．／．．／
%2e%2e%2f%2e%2e%2f

Remediation Steps

Identify decoding operations - Use detection patterns to find decode/normalize functions
Trace data flow - Follow user input from entry to security validation
Check validation order - Verify decode/normalize happens before validation
Reverse order if needed - Move normalization before security checks
Add missing normalization - Insert decode/normalize if absent
Test with encoding variants - Use payload list from Testing section
Verify canonicalization - Ensure all paths/URLs are normalized
Review framework behavior - Check for automatic decoding in web framework

Related Security Patterns & Anti-Patterns

SQL Injection Anti-Pattern: A common goal of encoding bypass attacks.
Cross-Site Scripting (XSS) Anti-Pattern: Often enabled by bypassing filters with encoded payloads.
Path Traversal Anti-Pattern: Can be achieved by using encoded representations of ../.
Unicode Security Anti-Pattern: A collection of issues related to handling Unicode securely.

encoding-bypass-anti-pattern

Encoding Bypass Anti-Pattern

Summary

The Anti-Pattern

BAD Code Example

GOOD Code Example

JavaScript/Node.js Examples

Java Examples

Detection

Prevention

Testing for Encoding Bypass

Remediation Steps

Related Security Patterns & Anti-Patterns

References

More from igbuend/grimbard

tikz

latex

pgfplots

biblatex

ethical-hacking-ethics

amsmath