Break Filter JS From HTML

Overview

This skill provides a systematic methodology for analyzing and bypassing HTML sanitization filters that attempt to prevent JavaScript execution. The focus is on understanding filter mechanics deeply before attempting bypasses, and on robust verification of solutions.

When to Use This Skill

Analyzing HTML sanitization filters to find bypass vectors
CTF challenges involving XSS filter evasion
Authorized security testing of web application input sanitization
Understanding parser differentials between server-side parsers and browsers

Phase 1: Environment and Filter Analysis

Before attempting any bypass, thoroughly understand the test environment and filter mechanics.

Environment Reconnaissance

Identify all relevant file locations - Locate the filter implementation, test harness, and any configuration files
Understand the test verification process - Determine how success is measured (browser alert, DOM inspection, etc.)
Verify path dependencies - Check if tests expect files at specific paths; create symlinks or copies if needed
Document the execution flow - Trace how input flows from your payload through the filter to the browser

Filter Mechanism Analysis

Examine the filter code to understand:

Parsing library used - Different parsers (BeautifulSoup, DOMPurify, html-sanitizer, etc.) have different behaviors
What elements are removed - Script tags, iframes, objects, embeds, etc.
What attributes are stripped - Event handlers (on*), href with javascript:, etc.
Processing order - Does the filter run once or recursively? Are there multiple passes?
Output encoding - Is the output HTML-encoded, or passed through raw?

Create a Filter Output Test

Before running browser tests, create a quick method to see the filter's output directly:

# Example: Check what the filter outputs for a given input
echo '<script>alert(1)</script>' > /tmp/test.html && python filter.py /tmp/test.html && cat /tmp/test.html

This allows rapid iteration without slow browser-based testing.

Phase 2: Bypass Strategy Selection

Based on the filter analysis, select appropriate bypass strategies. Order these by likelihood of success given the specific filter.

Parser Differential Exploits

Parser differentials occur when the server-side filter parses HTML differently than browsers. This is often the most effective approach for library-based filters.

Key concept: The filter's parser may interpret certain HTML constructs differently than browsers, allowing tags that appear "safe" to the filter to execute JavaScript in browsers.

Elements that commonly cause parser differentials:

<noscript> - Parsed differently with/without JavaScript enabled
<template> - Content may not be parsed as HTML by some libraries
<textarea> and <title> - RCDATA parsing contexts
Comments and CDATA sections
Malformed or nested tags

Encoding and Obfuscation

HTML entity encoding (decimal, hex, named entities)
Unicode normalization issues
Double encoding
Null bytes and other special characters
Case variations (if filter is case-sensitive)

DOM Clobbering and Indirect Execution

Creating elements that shadow built-in properties
Exploiting existing JavaScript that reads from DOM
CSS-based attacks (if JavaScript reads computed styles)

Lesser-Known Vectors

SVG with embedded scripts or event handlers
MathML elements
XML processing instructions (if XHTML mode)
Data URIs in appropriate contexts

Phase 3: Systematic Testing

Testing Methodology

Test filter output first - Before browser testing, verify the filter passes your payload through
Use a minimal payload - Start with the simplest possible XSS (alert(1)) before complex payloads
Document each attempt - Record what was tried, filter output, and browser result
Understand failures - When a technique fails, determine if it was filtered or if the browser didn't execute it

Efficient Iteration Pattern

1. Hypothesize a bypass based on filter analysis
2. Test against filter directly (fast)
3. If filter passes payload through, test in browser
4. If browser doesn't execute, investigate why
5. If filter blocks, analyze how and adjust approach

Avoid These Inefficiencies

Running slow browser tests for payloads that don't survive the filter
Moving to new techniques without understanding why previous ones failed
Trying browser-incompatible techniques (e.g., deprecated HTML features)

Phase 4: Verification

Robust Solution Verification

A single passing test is insufficient. Verify solutions thoroughly:

Run multiple times - Ensure the solution works consistently, not just once
Test filter idempotency - Run the filtered output through the filter again to ensure it still works
Check for timing issues - Browser-based tests may have race conditions
Verify in isolation - Test the filtered HTML directly in a browser outside the test harness
Document exact steps - Record the precise sequence to reproduce the successful bypass

Before Declaring Success

Confirm the test passes multiple consecutive runs
Verify no pending file modifications could invalidate the solution
Ensure the solution doesn't depend on test environment quirks
Check that the final state of all files is correct

Common Pitfalls

Environment Issues

Path mismatches - Test harnesses may expect files at specific locations different from where you found them
Stale state - Previous failed attempts may leave files in unexpected states
Permission issues - Filters may fail silently if they can't write output files

Analysis Mistakes

Assuming filter behavior - Always verify by reading the code; don't guess what's filtered
Ignoring processing order - A filter that removes <script> then <iframe> may be bypassed differently than one that does it in reverse
Missing recursive filtering - Some filters process until no more matches; others run once

Testing Mistakes

Browser-specific payloads - Techniques that work in one browser may fail in another
Deprecated HTML - Many classic XSS vectors no longer work in modern browsers
Premature optimization - Getting a complex payload through is worthless if a simpler one works

Verification Mistakes

Single test run - Flaky tests can pass once then fail
Modifying files after success - Any changes after a successful test may invalidate it
Ignoring test harness quirks - The test may measure success differently than expected

break-filter-js-from-html