skills/mukul975/anthropic-cybersecurity-skills/reverse-engineering-malware-with-ghidra

reverse-engineering-malware-with-ghidra

SKILL.md

Reverse Engineering Malware with Ghidra

When to Use

  • Static and dynamic analysis have identified suspicious functionality that requires deeper code-level understanding
  • You need to reverse engineer C2 communication protocols, encryption algorithms, or custom obfuscation
  • Understanding the exact exploit mechanism or vulnerability targeted by a malware sample
  • Extracting hardcoded configuration data (C2 addresses, encryption keys, campaign IDs) embedded in compiled code
  • Developing precise YARA rules or detection signatures based on unique code patterns

Do not use for initial triage of unknown samples; perform static analysis with PEStudio and behavioral analysis with Cuckoo first.

Prerequisites

  • Ghidra 11.x installed (download from https://ghidra-sre.org/) with JDK 17+
  • Analysis VM isolated from production network (Windows or Linux host)
  • Familiarity with x86/x64 assembly language and Windows API conventions
  • PDB symbol files for Windows system DLLs to improve decompilation accuracy
  • Ghidra scripts repository (ghidra_scripts) for automated analysis tasks
  • Secondary reference: IDA Free or Binary Ninja for cross-validation of analysis results

Workflow

Step 1: Create Project and Import Binary

Set up a Ghidra project and import the malware sample:

1. Launch Ghidra: ghidraRun (Linux) or ghidraRun.bat (Windows)
2. File -> New Project -> Non-Shared Project -> Select directory
3. File -> Import File -> Select malware binary
4. Ghidra auto-detects format (PE, ELF, Mach-O) and architecture
5. Accept default import options (or specify base address if known)
6. Double-click imported file to open in CodeBrowser
7. When prompted, run Auto Analysis with default analyzers enabled

Headless analysis for automation:

# Run Ghidra headless analysis with decompiler
/opt/ghidra/support/analyzeHeadless /tmp/ghidra_project MalwareProject \
  -import suspect.exe \
  -postScript ExportDecompilation.py \
  -scriptPath /opt/ghidra/scripts/ \
  -deleteProject

Step 2: Identify Key Functions and Entry Points

Navigate the binary to locate critical code sections:

Navigation Strategy:
━━━━━━━━━━━━━━━━━━━
1. Start at entry point (OEP) - follow execution from _start/WinMain
2. Check Symbol Tree for imported functions (Window -> Symbol Tree)
3. Search for cross-references to suspicious APIs:
   - VirtualAlloc/VirtualAllocEx (memory allocation for injection)
   - CreateRemoteThread (remote thread injection)
   - CryptEncrypt/CryptDecrypt (encryption operations)
   - InternetOpen/HttpSendRequest (C2 communication)
   - RegSetValueEx (persistence via registry)
4. Use Search -> For Strings to find embedded URLs, IPs, and paths
5. Check the Functions window sorted by size (large functions often contain core logic)

Ghidra keyboard shortcuts for efficient navigation:

G         - Go to address
Ctrl+E    - Search for strings
X         - Show cross-references to current location
Ctrl+Shift+F - Search memory for byte patterns
L         - Rename label/function
;         - Add comment
T         - Retype variable
Ctrl+L    - Retype return value

Step 3: Analyze Decompiled Code

Use Ghidra's decompiler to understand function logic:

// Example: Ghidra decompiler output for a decryption routine
// Analyst renames variables and adds types for clarity

void decrypt_config(BYTE *encrypted_data, int data_len, BYTE *key, int key_len) {
    // XOR decryption with rolling key
    for (int i = 0; i < data_len; i++) {
        encrypted_data[i] = encrypted_data[i] ^ key[i % key_len];
    }
    return;
}

// Analyst actions in Ghidra:
// 1. Right-click parameters -> Retype to correct types (BYTE*, int)
// 2. Right-click variables -> Rename to meaningful names
// 3. Add comments explaining the algorithm
// 4. Set function signature to propagate types to callers

Step 4: Trace C2 Communication Logic

Follow the network communication code path:

Analysis Steps for C2 Protocol Reverse Engineering:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Find InternetOpenA/WinHttpOpen call -> trace to wrapper function
2. Follow data flow from encrypted config -> URL construction
3. Identify HTTP method (GET/POST), headers, and body format
4. Locate response parsing logic (JSON parsing, custom binary protocol)
5. Map the C2 command dispatcher (switch/case or jump table)
6. Document the command set (download, execute, exfiltrate, update, uninstall)

Ghidra Script for extracting C2 configuration:

# Ghidra Python script: extract_c2_config.py
# Run via Script Manager in Ghidra

from ghidra.program.model.data import StringDataType
from ghidra.program.model.symbol import SourceType

# Search for XOR decryption patterns
listing = currentProgram.getListing()
memory = currentProgram.getMemory()

# Find references to InternetOpenA
symbol_table = currentProgram.getSymbolTable()
for symbol in symbol_table.getExternalSymbols():
    if "InternetOpen" in symbol.getName():
        refs = getReferencesTo(symbol.getAddress())
        for ref in refs:
            print("C2 init at: {}".format(ref.getFromAddress()))

Step 5: Analyze Encryption and Obfuscation

Identify and document cryptographic routines:

Common Malware Encryption Patterns:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
XOR Cipher:     Loop with XOR operation, often single-byte or rolling key
RC4:            Two loops (KSA + PRGA), 256-byte S-box initialization
AES:            Look for S-box constants (0x63, 0x7C, 0x77...) or calls to CryptEncrypt
Base64:         Lookup table with A-Za-z0-9+/= characters
Custom:         Combination of arithmetic operations (ADD, SUB, ROL, ROR with XOR)

Identification Tips:
- Search for constants: AES S-box, CRC32 table, MD5 init values
- Look for loop structures operating on byte arrays
- Check for Windows Crypto API usage (CryptAcquireContext -> CryptCreateHash -> CryptEncrypt)
- FindCrypt Ghidra plugin automatically identifies crypto constants

Step 6: Document Findings and Create Detection Signatures

Produce actionable intelligence from reverse engineering:

# Generate YARA rule from unique code patterns found in Ghidra
cat << 'EOF' > malware_family_x.yar
rule MalwareFamilyX_Decryptor {
    meta:
        description = "Detects MalwareX decryption routine"
        author = "analyst"
        date = "2025-09-15"
    strings:
        // XOR decryption loop with hardcoded key
        $decrypt = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
        // C2 URL pattern after decryption
        $c2_pattern = "/gate.php?id=" ascii
    condition:
        uint16(0) == 0x5A4D and $decrypt and $c2_pattern
}
EOF

Key Concepts

Term Definition
Disassembly Converting machine code bytes into human-readable assembly language instructions; Ghidra's Listing view shows disassembled code
Decompilation Lifting assembly code to pseudo-C representation for easier analysis; Ghidra's Decompile window provides this view
Cross-Reference (XREF) Reference showing where a function or data address is called from or used; essential for tracing code execution flow
Control Flow Graph (CFG) Visual representation of all possible execution paths through a function; reveals branching logic and loops
Original Entry Point (OEP) The actual start address of the malware code after unpacking; packers redirect execution through an unpacking stub first
Function Signature The return type, name, and parameter types of a function; applying correct signatures improves decompiler output quality
Ghidra Script Python or Java automation script executed within Ghidra to perform batch analysis, pattern searching, or data extraction

Tools & Systems

  • Ghidra: NSA's open-source software reverse engineering suite with disassembler, decompiler, and scripting support for multiple architectures
  • IDA Pro/Free: Industry-standard interactive disassembler; IDA Free provides x86/x64 cloud-based decompilation
  • Binary Ninja: Commercial reverse engineering platform with modern UI and extensive API for plugin development
  • x64dbg: Open-source x64/x32 debugger for Windows used alongside Ghidra for dynamic debugging of malware
  • FindCrypt (Ghidra Plugin): Plugin that identifies cryptographic constants and algorithms in binary code

Common Scenarios

Scenario: Reversing Custom C2 Protocol

Context: Behavioral analysis shows encrypted traffic to an external IP on a non-standard port. Network signatures cannot detect variants because the protocol is proprietary. Deep reverse engineering is needed to understand the protocol structure.

Approach:

  1. Import the unpacked sample into Ghidra and run full auto-analysis
  2. Locate socket/WinHTTP API calls and trace backwards to the calling function
  3. Identify the encryption routine called before data is sent (follow data flow from send/HttpSendRequest)
  4. Reverse the encryption (XOR key extraction, RC4 key derivation, AES key location)
  5. Map the command structure by analyzing the response parsing function (switch/case on command IDs)
  6. Document the protocol format (header structure, command bytes, encryption method)
  7. Create a protocol decoder script for network monitoring tools

Pitfalls:

  • Not running the full auto-analysis before starting manual analysis (missing function boundaries and type propagation)
  • Ignoring indirect calls through function pointers or vtables (use cross-references to data holding function addresses)
  • Spending time on library code that Ghidra's Function ID (FID) or FLIRT signatures should have identified
  • Not saving Ghidra project progress frequently (analysis state can be lost on crashes)

Output Format

REVERSE ENGINEERING ANALYSIS REPORT
=====================================
Sample:           unpacked_payload.exe
SHA-256:          abc123def456...
Architecture:     x86 (32-bit PE)
Ghidra Project:   MalwareX_Analysis

FUNCTION MAP
0x00401000  main()              - Entry point, initializes config
0x00401200  decrypt_config()    - XOR decryption with 16-byte key
0x00401400  init_c2()           - WinHTTP initialization, URL construction
0x00401800  c2_beacon()         - HTTP POST beacon with system info
0x00401C00  cmd_dispatcher()    - Switch on 12 command codes
0x00402000  inject_process()    - Process hollowing into svchost.exe
0x00402400  persist_registry()  - HKCU Run key persistence
0x00402800  exfil_data()        - File collection and encrypted upload

C2 PROTOCOL
Method:           HTTPS POST to /gate.php
Encryption:       RC4 with derived key (MD5 of bot_id + campaign_key)
Bot ID Format:    MD5(hostname + username + volume_serial)
Beacon Interval:  60 seconds with 10% jitter
Command Set:
  0x01 - Download and execute file
  0x02 - Execute shell command
  0x03 - Upload file to C2
  0x04 - Update configuration
  0x05 - Uninstall and remove traces

ENCRYPTION DETAILS
Algorithm:        RC4
Key Derivation:   MD5(bot_id + "campaign_2025_q3")
Hardcoded Seed:   "campaign_2025_q3" at offset 0x00405A00

EXTRACTED IOCs
C2 URLs:          hxxps://update.malicious[.]com/gate.php
                  hxxps://backup.evil[.]net/gate.php (failover)
Campaign ID:      campaign_2025_q3
RC4 Key Material: [see encryption details above]
Weekly Installs
1
GitHub Stars
1.3K
First Seen
2 days ago
Installed on
zencoder1
amp1
cline1
opencode1
cursor1
kimi-cli1