skills/mukul975/anthropic-cybersecurity-skills/deobfuscating-javascript-malware

deobfuscating-javascript-malware

SKILL.md

Deobfuscating JavaScript Malware

When to Use

  • Investigating a phishing page with obfuscated JavaScript that performs credential harvesting or redirect
  • Analyzing a web skimmer (Magecart-style) injected into an e-commerce site
  • Deobfuscating a JavaScript dropper that downloads and executes second-stage malware
  • Examining malicious email attachments containing HTML files with embedded obfuscated scripts
  • Analyzing browser exploit kits that use heavy JavaScript obfuscation to hide exploit delivery

Do not use for obfuscated JavaScript that is merely minified production code; use a standard beautifier instead.

Prerequisites

  • Node.js 18+ installed for executing and debugging JavaScript in a controlled environment
  • Python 3.8+ with jsbeautifier library for code formatting
  • Browser developer tools (Chrome DevTools) for controlled execution in an isolated browser
  • CyberChef (https://gchq.github.io/CyberChef/) for encoding/decoding operations
  • de4js or JStillery for automated JavaScript deobfuscation
  • Isolated analysis VM with no access to production systems or sensitive data

Workflow

Step 1: Safely Extract and Examine the Obfuscated Script

Isolate the malicious JavaScript without executing it:

# Extract JavaScript from HTML file
python3 << 'PYEOF'
from html.parser import HTMLParser

class ScriptExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.in_script = False
        self.scripts = []
        self.current = ""

    def handle_starttag(self, tag, attrs):
        if tag == "script":
            self.in_script = True
            self.current = ""

    def handle_endtag(self, tag):
        if tag == "script":
            self.in_script = False
            if self.current.strip():
                self.scripts.append(self.current)

    def handle_data(self, data):
        if self.in_script:
            self.current += data

with open("malicious_page.html") as f:
    parser = ScriptExtractor()
    parser.feed(f.read())

for i, script in enumerate(parser.scripts):
    with open(f"script_{i}.js", "w") as f:
        f.write(script)
    print(f"Extracted script_{i}.js ({len(script)} bytes)")
PYEOF

# Beautify the extracted JavaScript
npx js-beautify script_0.js -o script_0_pretty.js

Step 2: Identify Obfuscation Techniques

Categorize the obfuscation methods used:

Common JavaScript Obfuscation Techniques:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
String Encoding:
  - Hex encoding:          "\x68\x65\x6c\x6c\x6f" -> "hello"
  - Unicode escapes:       "\u0068\u0065\u006c\u006c\u006f" -> "hello"
  - Base64:                atob("aGVsbG8=") -> "hello"
  - charCodeAt/fromCharCode: String.fromCharCode(104,101,108,108,111)
  - Array-based lookup:    var _0x1234 = ["hello","world"]; _0x1234[0]

Eval Chains:
  - eval(atob("..."))
  - eval(unescape("..."))
  - new Function("return " + decoded)()
  - document.write("<script>" + decoded + "</script>")
  - setTimeout(decoded, 0)

Control Flow:
  - Switch-case dispatcher with shuffled case order
  - Opaque predicates (always-true/false conditions)
  - Dead code insertion
  - Variable name mangling (_0x4a3b, _0xab12)

Anti-Analysis:
  - Debugger traps: setInterval(function(){debugger;}, 100)
  - Console detection: overriding console.log
  - Timing checks: performance.now() deltas
  - DevTools detection: window.outerWidth - window.innerWidth > 100

Step 3: Remove Anti-Analysis Protections

Neutralize anti-debugging and anti-analysis traps:

// Remove debugger traps before analysis
// Replace in the obfuscated script:

// Before:
setInterval(function() { debugger; }, 100);

// After (neutralized):
setInterval(function() { /* debugger removed */ }, 100);

// Neutralize DevTools detection
// Before:
if (window.outerWidth - window.innerWidth > 160) { window.location = "about:blank"; }

// After:
if (false) { window.location = "about:blank"; }

// Neutralize timing checks
// Override performance.now to return consistent values
const originalNow = performance.now;
performance.now = function() { return 0; };

Step 4: Decode String Obfuscation Layers

Progressively decode encoded strings:

# Python script to decode common JS obfuscation patterns
import re
import base64
import urllib.parse

def decode_hex_strings(code):
    """Replace \\xNN sequences with ASCII characters"""
    def hex_replace(match):
        hex_str = match.group(0)
        try:
            return bytes.fromhex(hex_str.replace("\\x", "")).decode("ascii")
        except:
            return hex_str
    return re.sub(r'(?:\\x[0-9a-fA-F]{2})+', hex_replace, code)

def decode_unicode_escapes(code):
    """Replace \\uNNNN sequences with characters"""
    def unicode_replace(match):
        return chr(int(match.group(1), 16))
    return re.sub(r'\\u([0-9a-fA-F]{4})', unicode_replace, code)

def decode_charcode_arrays(code):
    """Resolve String.fromCharCode calls"""
    def charcode_replace(match):
        codes = [int(c.strip()) for c in match.group(1).split(",")]
        return '"' + "".join(chr(c) for c in codes) + '"'
    return re.sub(r'String\.fromCharCode\(([0-9,\s]+)\)', charcode_replace, code)

def decode_base64_strings(code):
    """Resolve atob() calls with static strings"""
    def atob_replace(match):
        try:
            decoded = base64.b64decode(match.group(1)).decode("utf-8")
            return f'"{decoded}"'
        except:
            return match.group(0)
    return re.sub(r'atob\(["\']([A-Za-z0-9+/=]+)["\']\)', atob_replace, code)

# Apply all decoders
with open("script_0.js") as f:
    code = f.read()

code = decode_hex_strings(code)
code = decode_unicode_escapes(code)
code = decode_charcode_arrays(code)
code = decode_base64_strings(code)

with open("script_0_decoded.js", "w") as f:
    f.write(code)
print("Decoded strings written to script_0_decoded.js")

Step 5: Resolve Eval Chains Safely

Unwrap eval/Function constructor chains without executing:

// Node.js script to safely resolve eval chains
// Run in isolated environment: node --experimental-vm-modules deobfuscate.js

const vm = require('vm');

// Create sandboxed context with logging
const sandbox = {
    eval: function(code) {
        console.log("=== EVAL INTERCEPTED ===");
        console.log(code.substring(0, 500));
        console.log("========================");
        return code; // Return the code instead of executing it
    },
    document: {
        write: function(html) {
            console.log("=== DOCUMENT.WRITE INTERCEPTED ===");
            console.log(html.substring(0, 500));
        },
        getElementById: function() { return { innerHTML: "" }; }
    },
    window: { location: { href: "" } },
    atob: function(s) { return Buffer.from(s, 'base64').toString(); },
    unescape: unescape,
    setTimeout: function(fn) { if (typeof fn === 'string') console.log("TIMEOUT CODE:", fn); },
    console: console,
    String: String,
    Array: Array,
    parseInt: parseInt,
    RegExp: RegExp,
};

const context = vm.createContext(sandbox);

// Load and execute the obfuscated script in sandbox
const fs = require('fs');
const code = fs.readFileSync('script_0.js', 'utf8');

try {
    vm.runInContext(code, context, { timeout: 5000 });
} catch(e) {
    console.log("Execution error (expected):", e.message);
}

Step 6: Analyze the Deobfuscated Payload

Examine the revealed malicious logic:

Deobfuscated Malware Categories and IOC Extraction:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Credential Harvester:
  - Form action URLs (exfiltration endpoints)
  - XMLHttpRequest/fetch destinations
  - Targeted input field names (username, password, cc_number)

Web Skimmer (Magecart):
  - Payment form overlay injection
  - Card data exfiltration URLs
  - Keylogger event listeners (onkeypress, oninput)

Redirect Script:
  - Destination URLs in location.href assignments
  - Conditional redirects based on user-agent or referrer
  - Cloaking logic (show benign content to bots)

Exploit Kit Landing:
  - Browser/plugin version checks
  - Exploit payload URLs
  - Shellcode embedded as arrays or encoded strings

Key Concepts

Term Definition
Eval Chain Nested layers of eval(), Function(), or document.write() calls that each decode one layer of obfuscation before passing to the next
String Array Rotation Obfuscation technique storing all strings in a shuffled array and accessing them by computed index to hide string literals
Dead Code Insertion Adding non-functional code blocks that never execute to increase analysis complexity and confuse pattern matching
Opaque Predicate Conditional expression whose outcome is predetermined but difficult to determine statically; used to obscure control flow
Anti-Debugging JavaScript techniques to detect and thwart browser DevTools or debugger usage including debugger statements and timing checks
Web Skimmer Malicious JavaScript injected into e-commerce sites to steal payment card data from checkout forms (Magecart attack)

Tools & Systems

  • CyberChef: GCHQ's web-based tool for encoding/decoding transformations useful for unwinding multi-layer obfuscation
  • de4js: Online JavaScript deobfuscator supporting common obfuscation tools (obfuscator.io, JScrambler)
  • Node.js VM Module: Sandboxed JavaScript execution environment for safely evaluating obfuscated code with intercepted APIs
  • Chrome DevTools: Browser developer tools for stepping through JavaScript execution with breakpoints and console access
  • JSDetox: JavaScript malware analysis tool providing execution emulation and deobfuscation

Common Scenarios

Scenario: Deobfuscating a Magecart Web Skimmer

Context: A compromised e-commerce site has obfuscated JavaScript injected into its checkout page. The script needs deobfuscation to identify the data exfiltration endpoint and determine what customer data was stolen.

Approach:

  1. Extract the injected script from the page source (often appended to a legitimate JS file or loaded from an external domain)
  2. Beautify the code and identify the obfuscation technique (typically string array + rotation + hex encoding)
  3. Decode string encoding layers (hex -> Unicode -> base64) using the Python decoder script
  4. Resolve the string array by evaluating the array definition and rotation function
  5. Identify the form targeting logic (querySelector for payment form fields)
  6. Extract the exfiltration URL from the XMLHttpRequest or fetch call
  7. Document stolen data fields and exfiltration endpoint for incident response

Pitfalls:

  • Executing obfuscated scripts on a connected system (the script may phone home during analysis)
  • Not removing anti-debugging traps before using browser DevTools (infinite debugger loops)
  • Missing additional obfuscation layers loaded dynamically from external URLs
  • Overlooking base64-encoded inline images or data URIs that may contain additional scripts

Output Format

JAVASCRIPT MALWARE DEOBFUSCATION REPORT
=========================================
Source:           checkout.js (injected into example-shop.com)
Obfuscation:      obfuscator.io (string array + rotation + hex encoding)
Layers Removed:   3

OBFUSCATION TECHNIQUES IDENTIFIED
[1] String array with 247 entries, rotated by 0x1a3
[2] Hex-encoded string references (\x68\x65\x6c\x6c\x6f)
[3] Base64-wrapped eval chain (2 layers)
[4] Anti-debugging: setInterval debugger trap

DEOBFUSCATED FUNCTIONALITY
Type:             Magecart Payment Card Skimmer
Target Forms:     input[name*="card"], input[name*="cc_"]
Data Captured:    Card number, expiration, CVV, cardholder name
Exfil Method:     POST via XMLHttpRequest
Exfil URL:        hxxps://analytics-cdn[.]com/collect
Exfil Format:     JSON { "cn": card_number, "exp": expiry, "cv": cvv }
Trigger:          Form submit event on checkout page

EXTRACTED IOCs
Domains:          analytics-cdn[.]com
IPs:              185.220.101[.]42
URLs:             hxxps://analytics-cdn[.]com/collect
                  hxxps://analytics-cdn[.]com/gate.js
Weekly Installs
5
GitHub Stars
2.4K
First Seen
4 days ago
Installed on
opencode5
gemini-cli5
github-copilot5
codex5
kimi-cli5
amp5