break-filter-js-from-html
Break Filter JS From HTML
Overview
This skill provides a systematic methodology for analyzing and bypassing HTML sanitization filters that attempt to prevent JavaScript execution. The focus is on understanding filter mechanics deeply before attempting bypasses, and on robust verification of solutions.
When to Use This Skill
- Analyzing HTML sanitization filters to find bypass vectors
- CTF challenges involving XSS filter evasion
- Authorized security testing of web application input sanitization
- Understanding parser differentials between server-side parsers and browsers
Phase 1: Environment and Filter Analysis
Before attempting any bypass, thoroughly understand the test environment and filter mechanics.
Environment Reconnaissance
- Identify all relevant file locations - Locate the filter implementation, test harness, and any configuration files
- Understand the test verification process - Determine how success is measured (browser alert, DOM inspection, etc.)
- Verify path dependencies - Check if tests expect files at specific paths; create symlinks or copies if needed
- Document the execution flow - Trace how input flows from your payload through the filter to the browser
Filter Mechanism Analysis
Examine the filter code to understand:
- Parsing library used - Different parsers (BeautifulSoup, DOMPurify, html-sanitizer, etc.) have different behaviors
- What elements are removed - Script tags, iframes, objects, embeds, etc.
- What attributes are stripped - Event handlers (on*), href with javascript:, etc.
- Processing order - Does the filter run once or recursively? Are there multiple passes?
- Output encoding - Is the output HTML-encoded, or passed through raw?
Create a Filter Output Test
Before running browser tests, create a quick method to see the filter's output directly:
# Example: Check what the filter outputs for a given input
echo '<script>alert(1)</script>' > /tmp/test.html && python filter.py /tmp/test.html && cat /tmp/test.html
This allows rapid iteration without slow browser-based testing.
Phase 2: Bypass Strategy Selection
Based on the filter analysis, select appropriate bypass strategies. Order these by likelihood of success given the specific filter.
Parser Differential Exploits
Parser differentials occur when the server-side filter parses HTML differently than browsers. This is often the most effective approach for library-based filters.
Key concept: The filter's parser may interpret certain HTML constructs differently than browsers, allowing tags that appear "safe" to the filter to execute JavaScript in browsers.
Elements that commonly cause parser differentials:
<noscript>- Parsed differently with/without JavaScript enabled<template>- Content may not be parsed as HTML by some libraries<textarea>and<title>- RCDATA parsing contexts- Comments and CDATA sections
- Malformed or nested tags
Encoding and Obfuscation
- HTML entity encoding (decimal, hex, named entities)
- Unicode normalization issues
- Double encoding
- Null bytes and other special characters
- Case variations (if filter is case-sensitive)
DOM Clobbering and Indirect Execution
- Creating elements that shadow built-in properties
- Exploiting existing JavaScript that reads from DOM
- CSS-based attacks (if JavaScript reads computed styles)
Lesser-Known Vectors
- SVG with embedded scripts or event handlers
- MathML elements
- XML processing instructions (if XHTML mode)
- Data URIs in appropriate contexts
Phase 3: Systematic Testing
Testing Methodology
- Test filter output first - Before browser testing, verify the filter passes your payload through
- Use a minimal payload - Start with the simplest possible XSS (
alert(1)) before complex payloads - Document each attempt - Record what was tried, filter output, and browser result
- Understand failures - When a technique fails, determine if it was filtered or if the browser didn't execute it
Efficient Iteration Pattern
1. Hypothesize a bypass based on filter analysis
2. Test against filter directly (fast)
3. If filter passes payload through, test in browser
4. If browser doesn't execute, investigate why
5. If filter blocks, analyze how and adjust approach
Avoid These Inefficiencies
- Running slow browser tests for payloads that don't survive the filter
- Moving to new techniques without understanding why previous ones failed
- Trying browser-incompatible techniques (e.g., deprecated HTML features)
Phase 4: Verification
Robust Solution Verification
A single passing test is insufficient. Verify solutions thoroughly:
- Run multiple times - Ensure the solution works consistently, not just once
- Test filter idempotency - Run the filtered output through the filter again to ensure it still works
- Check for timing issues - Browser-based tests may have race conditions
- Verify in isolation - Test the filtered HTML directly in a browser outside the test harness
- Document exact steps - Record the precise sequence to reproduce the successful bypass
Before Declaring Success
- Confirm the test passes multiple consecutive runs
- Verify no pending file modifications could invalidate the solution
- Ensure the solution doesn't depend on test environment quirks
- Check that the final state of all files is correct
Common Pitfalls
Environment Issues
- Path mismatches - Test harnesses may expect files at specific locations different from where you found them
- Stale state - Previous failed attempts may leave files in unexpected states
- Permission issues - Filters may fail silently if they can't write output files
Analysis Mistakes
- Assuming filter behavior - Always verify by reading the code; don't guess what's filtered
- Ignoring processing order - A filter that removes
<script>then<iframe>may be bypassed differently than one that does it in reverse - Missing recursive filtering - Some filters process until no more matches; others run once
Testing Mistakes
- Browser-specific payloads - Techniques that work in one browser may fail in another
- Deprecated HTML - Many classic XSS vectors no longer work in modern browsers
- Premature optimization - Getting a complex payload through is worthless if a simpler one works
Verification Mistakes
- Single test run - Flaky tests can pass once then fail
- Modifying files after success - Any changes after a successful test may invalidate it
- Ignoring test harness quirks - The test may measure success differently than expected