extracting-iocs-from-malware-samples
Extracting IOCs from Malware Samples
When to Use
- A malware analysis (static or dynamic) is complete and actionable indicators need to be extracted for defense teams
- Building blocklists for firewalls, proxies, and DNS sinkholes from analyzed samples
- Creating YARA rules, Snort/Suricata signatures, or SIEM detection content from malware artifacts
- Contributing to threat intelligence sharing platforms (MISP, OTX, ThreatConnect)
- Tracking malware campaigns by correlating IOCs across multiple samples
Do not use for IOCs from unverified sources without validation; false positives in blocklists can disrupt legitimate business operations.
Prerequisites
- Python 3.8+ with
iocextract,pefile,yara-pythonlibraries installed - Completed malware analysis report (static analysis, dynamic analysis, or reverse engineering)
- Access to PCAP files, memory dumps, or sandbox reports from the analysis
- MISP instance or STIX/TAXII server for structured IOC sharing
- VirusTotal API key for IOC enrichment and validation
- CyberChef for decoding obfuscated indicators
Workflow
Step 1: Extract File-Based IOCs
Compute hashes and identify file metadata indicators:
# Generate all standard hashes
md5sum malware_sample.exe
sha1sum malware_sample.exe
sha256sum malware_sample.exe
# Generate ssdeep fuzzy hash for similarity matching
ssdeep malware_sample.exe
# Generate imphash (import hash) for PE files
python3 -c "
import pefile
pe = pefile.PE('malware_sample.exe')
print(f'Imphash: {pe.get_imphash()}')
"
# Generate TLSH (Trend Micro Locality Sensitive Hash)
python3 -c "
import tlsh
with open('malware_sample.exe', 'rb') as f:
h = tlsh.hash(f.read())
print(f'TLSH: {h}')
"
# Compile file metadata IOCs
python3 << 'PYEOF'
import pefile
import os
import hashlib
import datetime
pe = pefile.PE("malware_sample.exe")
print("FILE IOCs:")
with open("malware_sample.exe", "rb") as f:
data = f.read()
print(f" MD5: {hashlib.md5(data).hexdigest()}")
print(f" SHA-1: {hashlib.sha1(data).hexdigest()}")
print(f" SHA-256: {hashlib.sha256(data).hexdigest()}")
print(f" File Size: {len(data)} bytes")
ts = pe.FILE_HEADER.TimeDateStamp
print(f" Compile: {datetime.datetime.utcfromtimestamp(ts)} UTC")
print(f" Imphash: {pe.get_imphash()}")
PYEOF
Step 2: Extract Network IOCs
Pull network indicators from strings, PCAP, and sandbox reports:
# Extract network IOCs from strings
import re
with open("malware_sample.exe", "rb") as f:
data = f.read()
# Extract ASCII and Unicode strings
ascii_strings = re.findall(b'[ -~]{4,}', data)
unicode_strings = re.findall(b'(?:[ -~]\x00){4,}', data)
all_strings = [s.decode('ascii', errors='ignore') for s in ascii_strings]
all_strings += [s.decode('utf-16-le', errors='ignore') for s in unicode_strings]
# IP addresses (excluding private ranges for C2 indicators)
ip_pattern = re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b')
ips = set()
for s in all_strings:
for ip in ip_pattern.findall(s):
# Filter out private/reserved ranges
octets = [int(o) for o in ip.split('.')]
if octets[0] not in [10, 127, 0] and not (octets[0] == 172 and 16 <= octets[1] <= 31) and not (octets[0] == 192 and octets[1] == 168):
ips.add(ip)
# Domain names
domain_pattern = re.compile(r'\b[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z]{2,})+\b')
domains = set()
for s in all_strings:
for d in domain_pattern.findall(s):
if not d.endswith(('.dll', '.exe', '.sys', '.com.au')):
domains.add(d)
# URLs
url_pattern = re.compile(r'https?://[^\s<>"{}|\\^`\[\]]+')
urls = set()
for s in all_strings:
for u in url_pattern.findall(s):
urls.add(u)
print("NETWORK IOCs:")
print(f" IPs: {ips}")
print(f" Domains: {domains}")
print(f" URLs: {urls}")
Step 3: Extract Host-Based IOCs
Identify file paths, registry keys, mutexes, and services:
# Extract host-based IOCs from sandbox report
import json
with open("cuckoo_report.json") as f:
report = json.load(f)
print("HOST IOCs:")
# File paths created or modified
print("\nFile Paths:")
for f in report["behavior"]["summary"].get("files", []):
if any(p in f.lower() for p in ["temp", "appdata", "system32", "programdata"]):
print(f" [DROPPED] {f}")
# Registry keys for persistence
print("\nRegistry Keys:")
for key in report["behavior"]["summary"].get("write_keys", []):
if any(p in key.lower() for p in ["run", "service", "startup", "shell"]):
print(f" [PERSIST] {key}")
# Mutexes (unique to malware family)
print("\nMutexes:")
for mutex in report["behavior"]["summary"].get("mutexes", []):
if mutex not in ["Local\\!IETld!Mutex", "RasPbFile"]: # Filter known Windows mutexes
print(f" [MUTEX] {mutex}")
# Created services
print("\nServices:")
for svc in report["behavior"]["summary"].get("started_services", []):
print(f" [SERVICE] {svc}")
Step 4: Extract Network IOCs from PCAP
Parse network captures for additional indicators:
# Extract DNS queries from PCAP
tshark -r capture.pcap -T fields -e dns.qry.name -Y "dns.flags.response == 0" | sort -u
# Extract HTTP hosts and URLs
tshark -r capture.pcap -T fields -e http.host -e http.request.uri -Y "http.request" | sort -u
# Extract TLS server names (SNI)
tshark -r capture.pcap -T fields -e tls.handshake.extensions_server_name -Y "tls.handshake.type == 1" | sort -u
# Extract JA3 hashes
tshark -r capture.pcap -T fields -e tls.handshake.ja3 -Y "tls.handshake.type == 1" | sort -u
# Extract unique destination IPs
tshark -r capture.pcap -T fields -e ip.dst -Y "ip.src == 10.0.2.15" | sort -u
# Extract User-Agent strings
tshark -r capture.pcap -T fields -e http.user_agent -Y "http.user_agent" | sort -u
Step 5: Defang and Validate IOCs
Defang indicators for safe sharing and validate against threat intelligence:
# Defang IOCs for safe sharing
def defang_ip(ip):
return ip.replace(".", "[.]")
def defang_url(url):
return url.replace("http", "hxxp").replace(".", "[.]")
def defang_domain(domain):
return domain.replace(".", "[.]")
# Validate IOCs against VirusTotal
import requests
VT_API_KEY = "your_api_key"
def check_vt_ip(ip):
resp = requests.get(f"https://www.virustotal.com/api/v3/ip_addresses/{ip}",
headers={"x-apikey": VT_API_KEY})
data = resp.json()
stats = data["data"]["attributes"]["last_analysis_stats"]
return stats["malicious"]
def check_vt_domain(domain):
resp = requests.get(f"https://www.virustotal.com/api/v3/domains/{domain}",
headers={"x-apikey": VT_API_KEY})
data = resp.json()
stats = data["data"]["attributes"]["last_analysis_stats"]
return stats["malicious"]
# Validate each IOC
for ip in ips:
detections = check_vt_ip(ip)
print(f" {defang_ip(ip)} - VT: {detections} detections")
Step 6: Export IOCs in Standard Formats
Generate structured IOC outputs for sharing and ingestion:
# Export as STIX 2.1 bundle
from stix2 import Indicator, Bundle, Malware, Relationship
import datetime
indicators = []
# File hash indicator
indicators.append(Indicator(
name="Malware SHA-256 Hash",
pattern=f"[file:hashes.'SHA-256' = '{sha256_hash}']",
pattern_type="stix",
valid_from=datetime.datetime.now(datetime.timezone.utc),
labels=["malicious-activity"]
))
# IP indicator
for ip in ips:
indicators.append(Indicator(
name=f"C2 IP Address {ip}",
pattern=f"[ipv4-addr:value = '{ip}']",
pattern_type="stix",
valid_from=datetime.datetime.now(datetime.timezone.utc),
labels=["malicious-activity"]
))
# Domain indicator
for domain in domains:
indicators.append(Indicator(
name=f"C2 Domain {domain}",
pattern=f"[domain-name:value = '{domain}']",
pattern_type="stix",
valid_from=datetime.datetime.now(datetime.timezone.utc),
labels=["malicious-activity"]
))
bundle = Bundle(objects=indicators)
with open("iocs_stix.json", "w") as f:
f.write(bundle.serialize(pretty=True))
# Export as CSV for SIEM ingestion
import csv
with open("iocs.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["type", "value", "context", "confidence"])
writer.writerow(["sha256", sha256_hash, "malware_sample", "high"])
for ip in ips:
writer.writerow(["ipv4", ip, "c2_server", "high"])
for domain in domains:
writer.writerow(["domain", domain, "c2_domain", "high"])
for url in urls:
writer.writerow(["url", url, "c2_url", "high"])
Key Concepts
| Term | Definition |
|---|---|
| IOC (Indicator of Compromise) | Forensic artifact observed in a network or system that indicates a potential intrusion: hashes, IPs, domains, file paths, registry keys |
| Defanging | Modifying IOCs to prevent accidental activation (e.g., replacing dots with [.] in URLs and IPs for safe sharing in reports) |
| Imphash | MD5 hash of the import table functions in a PE file; samples from the same malware family often share the same imphash |
| STIX/TAXII | Structured Threat Information Expression / Trusted Automated Exchange; standards for encoding and transmitting threat intelligence |
| JA3/JA3S | TLS client/server fingerprint based on ClientHello/ServerHello parameters; identifies specific malware families by their TLS implementation |
| Fuzzy Hashing (ssdeep) | Context-triggered piecewise hashing that identifies similar files even with minor modifications; useful for malware variant detection |
| MISP | Malware Information Sharing Platform; open-source threat intelligence platform for collecting, storing, and sharing IOCs |
Tools & Systems
- iocextract (Python): Automated IOC extraction library supporting IPs, URLs, domains, hashes, and YARA rules from text
- MISP: Open-source threat intelligence sharing platform for structured IOC management and distribution
- CyberChef: Web-based tool for decoding, decrypting, and transforming data useful for deobfuscating encoded IOCs
- tshark: Command-line network protocol analyzer for extracting network IOCs from PCAP files
- VirusTotal: Online service for validating and enriching IOCs with community detection results and threat intelligence
Common Scenarios
Scenario: Building a Comprehensive IOC Package from a Ransomware Sample
Context: A ransomware incident requires rapid IOC extraction for blocking across the enterprise while the full investigation continues. Multiple data sources are available: the sample binary, PCAP from network monitoring, and a Cuckoo sandbox report.
Approach:
- Compute all file hashes (MD5, SHA-1, SHA-256, imphash, ssdeep) for the ransomware binary and any dropped files
- Extract network IOCs from strings in the binary (hardcoded C2 addresses)
- Parse the PCAP for DNS queries, HTTP requests, and TLS SNI fields
- Extract host IOCs from the sandbox report (file paths, registry keys, mutexes, ransom note filenames)
- Validate all network IOCs against VirusTotal to confirm malicious status and check for known associations
- Defang all indicators and compile into STIX 2.1 format for sharing and CSV for SIEM ingestion
- Submit to MISP event for organizational and community sharing
Pitfalls:
- Including IP addresses of legitimate CDNs or cloud services without validating context (e.g., AWS IPs used for hosting, not inherently malicious)
- Not defanging URLs and IPs in reports, leading to accidental clicks or DNS resolution
- Extracting strings from packed binaries (IOCs from packed samples are unreliable; unpack first)
- Forgetting to include dropped file hashes (the initial dropper and the final payload are separate IOCs)
Output Format
IOC EXTRACTION REPORT
======================
Sample: ransomware.exe
Analysis Date: 2025-09-15
Analyst: [Name]
FILE INDICATORS
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
SHA-1: da39a3ee5e6b4b0d3255bfef95601890afd80709
MD5: d41d8cd98f00b204e9800998ecf8427e
Imphash: a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
ssdeep: 3072:kJh3bN7fY+aUkJh3bN7fY+aU:kJh3R7aUkJh3R7aU
NETWORK INDICATORS
C2 IPs: 185.220.101[.]42, 91.215.85[.]17
C2 Domains: update.malicious[.]com, backup.evil[.]net
C2 URLs: hxxps://update.malicious[.]com/gate.php
hxxps://backup.evil[.]net/gate.php
JA3 Hash: a0e9f5d64349fb13191bc781f81f42e1
User-Agent: Mozilla/5.0 (compatible; MSIE 10.0)
HOST INDICATORS
File Paths: C:\Users\Public\svchost.exe
C:\Users\%USER%\AppData\Local\Temp\payload.dll
C:\Users\%USER%\Desktop\README_DECRYPT.txt
Registry Keys: HKCU\Software\Microsoft\Windows\CurrentVersion\Run\WindowsUpdate
Mutexes: Global\CryptLocker_2025_Q3
Services: FakeWindowsUpdate
CONFIDENCE ASSESSMENT
High Confidence: SHA-256, C2 IPs (validated via VT), Mutexes
Medium Confidence: Domains (could be compromised legitimate sites)
Low Confidence: User-Agent (common string, high false positive risk)
EXPORT FILES
stix_bundle.json - STIX 2.1 format for TIP ingestion
iocs.csv - Flat CSV for SIEM blocklist import
yara_rule.yar - YARA detection rule