llvm
SKILL.md
LLVM IR and Tooling
Purpose
Guide agents through the LLVM IR pipeline: generating IR, running optimisation passes with opt, lowering to assembly with llc, and inspecting IR for debugging or performance work.
Triggers
- "Show me the LLVM IR for this function"
- "How do I run an LLVM optimisation pass?"
- "What does this LLVM IR instruction mean?"
- "How do I write a custom LLVM pass?"
- "Why isn't auto-vectorisation happening in LLVM?"
Workflow
1. Generate LLVM IR
# Emit textual IR (.ll)
clang -O0 -emit-llvm -S src.c -o src.ll
# Emit bitcode (.bc)
clang -O2 -emit-llvm -c src.c -o src.bc
# Disassemble bitcode to text
llvm-dis src.bc -o src.ll
2. Run optimisation passes with opt
# Apply a specific pass
opt -passes='mem2reg,instcombine,simplifycfg' src.ll -S -o out.ll
# Standard optimisation pipelines
opt -passes='default<O2>' src.ll -S -o out.ll
opt -passes='default<O3>' src.ll -S -o out.ll
# List available passes
opt --print-passes 2>&1 | less
# Print IR before and after a pass
opt -passes='instcombine' --print-before=instcombine --print-after=instcombine src.ll -S -o out.ll 2>&1 | less
3. Lower IR to assembly with llc
# Compile IR to object file
llc -filetype=obj src.ll -o src.o
# Compile to assembly
llc -filetype=asm -masm-syntax=intel src.ll -o src.s
# Target a specific CPU
llc -mcpu=skylake -mattr=+avx2 src.ll -o src.s
# Show available targets
llc --version
4. Inspect IR
Key IR constructs to understand:
| Construct | Meaning |
|---|---|
alloca |
Stack allocation (pre-SSA; mem2reg promotes to registers) |
load/store |
Memory access |
getelementptr (GEP) |
Pointer arithmetic / field access |
phi |
SSA φ-node: merges values from predecessor blocks |
call/invoke |
Function call (invoke has exception edges) |
icmp/fcmp |
Integer/float comparison |
br |
Branch (conditional or unconditional) |
ret |
Return |
bitcast |
Reinterpret bits (no-op in codegen) |
ptrtoint/inttoptr |
Pointer↔integer (avoid where possible) |
5. Key passes
| Pass | Effect |
|---|---|
mem2reg |
Promote alloca to SSA registers |
instcombine |
Instruction combining / peephole |
simplifycfg |
CFG cleanup, dead block removal |
loop-vectorize |
Auto-vectorisation |
slp-vectorize |
Superword-level parallelism (straight-line vectorisation) |
inline |
Function inlining |
gvn |
Global value numbering (common subexpression elimination) |
licm |
Loop-invariant code motion |
loop-unroll |
Loop unrolling |
argpromotion |
Promote pointer args to values |
sroa |
Scalar Replacement of Aggregates |
6. Debugging missed optimisations
# Why was a loop not vectorised?
clang -O2 -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize src.c
# Dump pass pipeline
clang -O2 -mllvm -debug-pass=Structure src.c -o /dev/null 2>&1 | less
# Print IR after each pass (very verbose)
opt -passes='default<O2>' -print-after-all src.ll -S 2>&1 | less
7. Useful llvm tools
| Tool | Purpose |
|---|---|
llvm-dis |
Bitcode → textual IR |
llvm-as |
Textual IR → bitcode |
llvm-link |
Link multiple bitcode files |
llvm-lto |
Standalone LTO |
llvm-nm |
Symbols in bitcode/object |
llvm-objdump |
Disassemble objects |
llvm-profdata |
Merge/show PGO profiles |
llvm-cov |
Coverage reporting |
llvm-mca |
Machine code analyser (throughput/latency) |
For binutils equivalents, see skills/binaries/binutils.
Related skills
- Use
skills/compilers/clangfor source-level Clang flags - Use
skills/binaries/linkers-ltofor LTO at link time - Use
skills/profilers/linux-perfcombined withllvm-mcafor micro-architectural analysis
Weekly Installs
1
Repository
mohitmishra786/low-level-dev-skillsFirst Seen
Today
Security Audits
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1