yara-authoring
YARA Authoring Skill
Overview
This skill implements Trail of Bits' YARA authoring methodology for the agent-studio framework. YARA-X is the Rust-based successor to legacy YARA, offering improved performance, safety, and new features. This skill teaches you to think and act like an expert YARA author, producing detection rules that are precise, efficient, and maintainable.
Source repository: https://github.com/trailofbits/skills
License: CC-BY-SA-4.0
Target: YARA-X (with legacy YARA compatibility guidance)
When to Use
- When creating detection rules for malware samples
- When building threat hunting rules for IOC identification
- When converting legacy YARA rules to YARA-X format
- When optimizing existing rules for performance and accuracy
- When reviewing YARA rules for quality and false positive rates
- When building rule sets for automated scanning pipelines
Iron Laws
- EVERY RULE MUST HAVE EFFICIENT ATOMS AND PASS LINTING — a rule without efficient atoms degrades scanner performance across the entire rule set; always run
yr checkandyr debug atomsbefore deployment. - NEVER write rules without testing against both positive and negative samples — false positives on clean files are as harmful as missed detections; validate FP rate before deploying.
- ALWAYS include complete metadata (author, date, description, reference, hash) — rules without metadata are unauditable and unmaintainable in enterprise rule sets.
- NEVER use single-byte atoms or patterns starting with common bytes (0x00, 0xFF, 0x90) — these generate massive false positive rates and degrade the entire YARA scanning pipeline.
- ALWAYS use YARA-X toolchain (
yr) by default — legacyyara/yaractooling lacks memory safety, performance optimizations, and modern module support; use YARA-X unless backward compatibility is explicitly required.
YARA-X vs Legacy YARA
Key Differences
| Feature | Legacy YARA | YARA-X |
|---|---|---|
| Language | C | Rust |
| Safety | Manual memory management | Memory-safe |
| Performance | Good | Better (parallelism) |
| Modules | PE, ELF, math, etc. | Same + new modules |
| Syntax | YARA syntax | Compatible + extensions |
| Toolchain | yara, yarac |
yr CLI |
YARA-X CLI Commands
# Scan a file
yr scan rule.yar target_file
# Check rule syntax
yr check rule.yar
# View rule atoms (for efficiency analysis)
yr debug atoms rule.yar
# Format a rule
yr fmt rule.yar
Rule Structure
Standard Template
import "pe"
import "math"
rule MalwareFamily_Variant : tag1 tag2 {
meta:
author = "analyst-name"
date = "2026-02-21"
description = "Detects MalwareFamily variant based on [specific indicators]"
reference = "https://example.com/analysis-report"
hash = "sha256-of-sample"
tlp = "WHITE"
score = 75
strings:
// Unique byte sequences from the malware
$hex_pattern1 = { 48 8B 05 ?? ?? ?? ?? 48 89 45 F0 }
$hex_pattern2 = { E8 ?? ?? ?? ?? 85 C0 74 ?? }
// String indicators
$str_mutex = "Global\\MalwareMutex_v2" ascii wide
$str_c2 = "https://evil.example.com/gate.php" ascii
$str_useragent = "Mozilla/5.0 (compatible; MalBot/1.0)" ascii
// Encoded/obfuscated patterns
$b64_config = "aHR0cHM6Ly9ldmlsLmV4YW1wbGUuY29t" ascii // base64
condition:
uint16(0) == 0x5A4D and // MZ header (PE file)
filesize < 5MB and
(
2 of ($hex_*) or
($str_mutex and 1 of ($str_c2, $str_useragent)) or
$b64_config
)
}
Metadata Fields (Required)
| Field | Purpose | Example |
|---|---|---|
author |
Who wrote the rule | "Trail of Bits" |
date |
When rule was created | "2026-02-21" |
description |
What the rule detects | "Detects XYZ malware loader" |
reference |
Source analysis/report | "https://..." |
hash |
Sample hash for validation | "sha256:abc123..." |
tlp |
Traffic Light Protocol | "WHITE", "GREEN", "AMBER", "RED" |
score |
Confidence (0-100) | 75 |
String Pattern Best Practices
Hex Patterns
// GOOD: Specific bytes with targeted wildcards
$good = { 48 8B 05 ?? ?? ?? ?? 48 89 45 F0 }
// BAD: Too many wildcards (poor atoms)
$bad = { ?? ?? ?? ?? 48 ?? ?? ?? ?? ?? }
// GOOD: Use jumps for variable-length gaps
$jump = { 48 8B 05 [4-8] 48 89 45 }
// GOOD: Use alternations for variant bytes
$alt = { 48 (8B | 89) 05 ?? ?? ?? ?? }
Text Strings
// Case-insensitive matching
$str1 = "CreateRemoteThread" ascii nocase
// Wide strings (UTF-16)
$str2 = "cmd.exe" ascii wide
// Full-word matching (avoid substring false positives)
$str3 = "evil" ascii fullword
Regular Expressions
// Use sparingly - regex is slower than literal strings
$re1 = /https?:\/\/[a-z0-9\-\.]+\.(xyz|top|club)\//
// Prefer hex patterns over regex for binary content
// WRONG: $re2 = /\x48\x8B\x05/
// RIGHT: $hex2 = { 48 8B 05 }
Atom Analysis
Atoms are the fixed byte sequences YARA uses to pre-filter which rules to evaluate. Efficient atoms = fast scanning.
How to Check Atoms
# View atoms for a rule
yr debug atoms rule.yar
# Good output: unique 4+ byte atoms
# Atom: 48 8B 05 (from $hex_pattern1)
# Atom: CreateRemoteThread (from $str1)
# Bad output: short or common atoms
# Atom: 00 00 (too common, will match everything)
Atom Quality Guidelines
| Atom Length | Quality | Action |
|---|---|---|
| 1-2 bytes | Poor | Rewrite pattern with more specific bytes |
| 3 bytes | Acceptable | Consider extending if possible |
| 4+ bytes | Good | Ideal for efficient scanning |
| Common bytes (00, FF, 90) | Poor | Avoid patterns starting with common bytes |
Condition Logic
Performance-Ordered Conditions
Place cheap checks first to enable short-circuit evaluation:
condition:
// 1. File type check (instant)
uint16(0) == 0x5A4D and
// 2. File size check (instant)
filesize < 10MB and
// 3. Simple string matches (fast)
$str_mutex and
// 4. Complex conditions (slower)
2 of ($hex_*) and
// 5. Module calls (slowest)
pe.imports("kernel32.dll", "VirtualAllocEx")
Common Condition Patterns
// At least N of a set
2 of ($indicator_*)
// All of a set
all of ($required_*)
// Any of a set
any of ($optional_*)
// String at specific offset
$mz at 0
// String in specific range
$header in (0..1024)
// Count-based
#suspicious_call > 5
Rule Categories
Category 1: Malware Family Detection
Targets specific malware families with high-confidence indicators.
rule APT_Backdoor_SilentMoon {
meta:
description = "Detects SilentMoon backdoor used by APT group"
score = 90
strings:
$config_marker = { 53 4D 43 46 47 } // "SMCFG"
$decrypt_routine = { 31 C0 8A 04 08 34 ?? 88 04 08 41 }
condition:
uint16(0) == 0x5A4D and
$config_marker and
$decrypt_routine
}
Category 2: Technique Detection
Targets specific attack techniques regardless of malware family.
rule TECHNIQUE_ProcessHollowing {
meta:
description = "Detects process hollowing technique indicators"
score = 60
strings:
$api1 = "NtUnmapViewOfSection" ascii
$api2 = "WriteProcessMemory" ascii
$api3 = "SetThreadContext" ascii
$api4 = "ResumeThread" ascii
condition:
uint16(0) == 0x5A4D and
3 of ($api*)
}
Category 3: Packer/Obfuscator Detection
Identifies packed or obfuscated executables.
rule PACKER_UPX {
meta:
description = "Detects UPX packed executables"
score = 30
strings:
$upx0 = "UPX0" ascii
$upx1 = "UPX1" ascii
$upx2 = "UPX!" ascii
condition:
uint16(0) == 0x5A4D and
2 of ($upx*)
}
Common Pitfalls
- Over-broad rules: Too many wildcards = too many false positives. Be specific.
- Under-tested rules: Always test against known-clean files to measure FP rate.
- Missing metadata: Rules without metadata are unmaintainable. Always include all required fields.
- Ignoring atoms: A rule with poor atoms slows down the entire scanning pipeline.
- Hardcoded offsets: Use
in (range)instead of exact offsets when possible -- variants shift bytes. - Legacy syntax: Use YARA-X features and
yrtoolchain, not legacyyara/yarac.
Linting Checklist
Before deploying any rule:
- Rule compiles without errors:
yr check rule.yar - Rule has efficient atoms:
yr debug atoms rule.yar - All required metadata fields present
- Tested against target sample (true positive confirmed)
- Tested against clean file corpus (false positive rate acceptable)
- Condition logic is performance-ordered (cheap checks first)
- No overly broad wildcard patterns
- Rule follows naming convention:
CATEGORY_FamilyName_Variant
Integration with Agent-Studio
Recommended Workflow
- Analyze malware sample with
binary-analysis-patternsormemory-forensics - Extract indicators and patterns
- Use
yara-authoringto create detection rules - Lint and atom-analyze rules
- Test rules against known samples and clean corpus
- Use
variant-analysisto find similar samples for rule tuning
Complementary Skills
| Skill | Relationship |
|---|---|
binary-analysis-patterns |
Extract indicators from malware for rule authoring |
memory-forensics |
Extract memory artifacts for memory-scanning rules |
variant-analysis |
Find malware variants to tune rule coverage |
static-analysis |
Automated analysis to complement YARA detection |
protocol-reverse-engineering |
Extract network signatures for YARA rules |
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
Over-broad wildcards (?? ?? ?? ??) |
Poor atoms cause rule to run against every file byte; massive performance degradation | Use at least 4 consecutive fixed bytes; scope wildcards to specific positions |
| Skipping atom analysis | Invisible performance sink; rule may have 1-byte atoms causing false positives | Always run yr debug atoms rule.yar before deployment |
| Missing metadata fields | Rules become unauditable; cannot trace origin, sample, or analyst | Always include: author, date, description, reference, hash, tlp, score |
| Conditions before file type checks | Expensive string matching runs on non-matching file types | Place uint16(0) == 0x5A4D (or equivalent) first in every condition |
Using nocase on short strings |
Short case-insensitive patterns match everywhere in arbitrary data | Reserve nocase for strings >= 8 bytes; use exact case for shorter patterns |
Memory Protocol
Before starting: Check for existing YARA rules in the project for naming conventions and pattern reuse.
During authoring: Write rules incrementally, testing each against the target sample. Document atom analysis results.
After completion: Record effective patterns, atom quality metrics, and false positive rates to .claude/context/memory/learnings.md for improving future rule authoring.