encoding-bypass-anti-pattern
SKILL.md
Encoding Bypass Anti-Pattern
Severity: High
Summary
Encoding bypass evades security checks via alternate encodings. Occurs when validation happens before decoding/normalization. Encoded payload appears safe but becomes malicious after processing. Bypasses WAFs, input filters, enables XSS and SQL injection.
The Anti-Pattern
Flawed order of operations: Validate then Decode/Normalize. Security checks run on encoded data, application later uses decoded version, re-introducing the vulnerability.
BAD Code Example
# VULNERABLE: Validation happens before Unicode normalization.
import unicodedata
def is_safe_username(username):
# This check is flawed because it doesn't account for Unicode variants.
if '<' in username or '>' in username:
return False
return True
def create_user_profile(username):
if not is_safe_username(username):
raise ValueError("Invalid characters in username.")
# The application later normalizes the username for display or storage.
# The full-width less-than sign '<' (U+FF1C) was not caught by the check.
# It gets normalized into the standard '<' (U+003C), enabling XSS.
normalized_username = unicodedata.normalize('NFKC', username)
# This will render the malicious script tag.
return f"<div>Welcome, {normalized_username}</div>"
# Attacker's input: '<script>alert(1)</script>'
# is_safe_username returns True.
# The normalized output becomes '<div>Welcome, <script>alert(1)</script></div>'
GOOD Code Example
# SECURE: Normalize then validate.
import unicodedata
def is_safe_username(username):
# This check is now effective because it runs on the canonical form of the input.
if '<' in username or '>' in username:
return False
return True
def create_user_profile(username):
# First, normalize the input to its canonical form.
normalized_username = unicodedata.normalize('NFKC', username)
# Then, perform the security validation on the normalized data.
if not is_safe_username(normalized_username):
raise ValueError("Invalid characters in username.")
# Now it's safe to use the normalized username.
return f"<div>Welcome, {normalized_username}</div>"
JavaScript/Node.js Examples
BAD:
// VULNERABLE: Validation before URL decoding in path traversal
const express = require('express');
const fs = require('fs');
const path = require('path');
app.get('/file/:filename', (req, res) => {
const filename = req.params.filename;
// Check for path traversal - but filename is still encoded
if (filename.includes('..')) {
return res.status(400).send('Invalid filename');
}
// Express automatically decodes URL parameters
// Attack: filename = "..%2F..%2Fetc%2Fpasswd"
// After decoding: "../../etc/passwd" - bypasses the check
const filePath = path.join('/uploads', filename);
res.sendFile(filePath);
});
// Attack payload: GET /file/..%252F..%252Fetc%252Fpasswd
// Double encoding: %252F becomes %2F, then becomes /
GOOD:
// SECURE: Decode then validate
const express = require('express');
const fs = require('fs');
const path = require('path');
app.get('/file/:filename', (req, res) => {
// Express already decoded once, but check for double encoding
let filename = decodeURIComponent(req.params.filename);
// Normalize to canonical form
filename = path.normalize(filename);
// Now validate the normalized path
if (filename.includes('..') || path.isAbsolute(filename)) {
return res.status(400).send('Invalid filename');
}
// Safe to use
const filePath = path.join('/uploads', filename);
res.sendFile(filePath);
});
Java Examples
BAD:
// VULNERABLE: SQL injection via URL decoding bypass
import java.net.URLDecoder;
import java.sql.*;
public void searchUser(String encodedQuery) {
// Validate before decoding
if (encodedQuery.contains("'") || encodedQuery.contains("--")) {
throw new SecurityException("Invalid characters");
}
// Decode after validation
String query = URLDecoder.decode(encodedQuery, "UTF-8");
// Attack: encodedQuery = "admin%27%20OR%20%271%27%3D%271"
// After decode: "admin' OR '1'='1" - bypasses the check
String sql = "SELECT * FROM users WHERE name = '" + query + "'";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);
}
GOOD:
// SECURE: Decode then validate (but use parameterized queries)
import java.net.URLDecoder;
import java.sql.*;
import java.util.regex.Pattern;
public void searchUser(String encodedQuery) {
// Decode to canonical form first
String query = URLDecoder.decode(encodedQuery, "UTF-8");
// Validate the decoded form
if (!Pattern.matches("^[a-zA-Z0-9_]+$", query)) {
throw new SecurityException("Invalid characters");
}
// Use parameterized query (best practice)
String sql = "SELECT * FROM users WHERE name = ?";
PreparedStatement stmt = connection.prepareStatement(sql);
stmt.setString(1, query);
ResultSet rs = stmt.executeQuery();
}
Detection
Python:
- Validation before
unicodedata.normalize() - Input checks before
urllib.parse.unquote() - Regex patterns before string normalization
- HTML entity validation before
html.unescape()
JavaScript/Node.js:
- Validation before
decodeURIComponent() - Path checks before
path.normalize() - Express middleware order (validate before decode)
- Input checks before
Buffer.from(input, 'base64')
Java:
- Validation before
URLDecoder.decode() - Security checks before
Normalizer.normalize() - Input validation before
StringEscapeUtils.unescapeHtml() - Path validation before
Paths.get().normalize()
PHP:
- Validation before
urldecode() - Input checks before
html_entity_decode() - Path validation before
realpath()
Search Patterns:
- Grep:
normalize\(|decode\(|unescape\(|URLDecoder|decodeURIComponent - Look for validation logic (if statements, regex) before these functions
- Check for double decoding: multiple decode calls in sequence
- Review web framework routing (automatic decoding may occur)
Common Encoding Bypass Techniques:
- URL encoding:
%3cfor<,%2e%2e%2ffor../ - Double URL encoding:
%253cfor<(decoded twice) - Unicode variants:
<(U+FF1C) for< - HTML entities:
<or<for< - Unicode escapes:
\u003cfor< - Mixed encoding:
%u003cor%c0%bcfor< - Path traversal:
..%2f,..%5c,%2e%2e/
Prevention
- Normalize/decode before validation: Always bring data to its simplest, canonical form before performing any security checks on it.
- Use parameterized queries (for SQL) and other safe APIs that handle encoding internally. This is the best defense against injection attacks.
- Enforce strict character encoding for all input (e.g., reject any data that is not valid UTF-8).
- Be aware of implicit decoding performed by your web framework or libraries and ensure your validation logic runs after it.
- Canonicalize paths and URLs before validating them to prevent path traversal attacks.
Testing for Encoding Bypass
Manual Testing:
- Test URL encoding:
%3cscript%3e,..%2f..%2f - Test double encoding:
%253cscript%253e,..%252f..%252f - Test Unicode variants:
<script>,../ - Test HTML entities:
<script>,<script> - Test mixed encoding:
%u003cscript%u003e - Verify filters catch all encoding variants
Automated Testing:
- Static Analysis: Semgrep, CodeQL to detect validation-before-decode patterns
- DAST: Burp Suite Intruder with encoding payloads, OWASP ZAP fuzzer
- Payload Lists: SecLists encoding bypass payloads
- Custom Scripts: Automated encoding variant generation
Example Test:
# Test that validation occurs after normalization
def test_encoding_bypass_prevention():
# Unicode variant of '<script>'
malicious_input = "<script>alert(1)</script>"
try:
create_user_profile(malicious_input)
assert False, "Should reject encoded malicious input"
except ValueError:
pass # Expected
# Double URL encoding
encoded_input = "%253cscript%253e"
try:
search_user(encoded_input)
assert False, "Should reject double-encoded input"
except SecurityException:
pass # Expected
Burp Suite Test:
# Intruder payload positions
GET /file/§..%2f..%2fetc%2fpasswd§
# Payload list (encoding variants)
..%2f..%2f
..%252f..%252f
../../
%2e%2e%2f%2e%2e%2f
Remediation Steps
- Identify decoding operations - Use detection patterns to find decode/normalize functions
- Trace data flow - Follow user input from entry to security validation
- Check validation order - Verify decode/normalize happens before validation
- Reverse order if needed - Move normalization before security checks
- Add missing normalization - Insert decode/normalize if absent
- Test with encoding variants - Use payload list from Testing section
- Verify canonicalization - Ensure all paths/URLs are normalized
- Review framework behavior - Check for automatic decoding in web framework
Related Security Patterns & Anti-Patterns
- SQL Injection Anti-Pattern: A common goal of encoding bypass attacks.
- Cross-Site Scripting (XSS) Anti-Pattern: Often enabled by bypassing filters with encoded payloads.
- Path Traversal Anti-Pattern: Can be achieved by using encoded representations of
../. - Unicode Security Anti-Pattern: A collection of issues related to handling Unicode securely.
References
Weekly Installs
2
Repository
igbuend/grimbardGitHub Stars
4
First Seen
Feb 19, 2026
Security Audits
Installed on
openclaw2
claude-code2
replit2
codex2
kiro-cli2
kimi-cli2