linux-perf
SKILL.md
Linux perf
Purpose
Guide agents through perf for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.
Triggers
- "Which function is consuming the most CPU?"
- "How do I measure cache misses / IPC?"
- "How do I use
perfto find hotspots?" - "How do I generate a flamegraph from perf data?"
- "perf shows
[unknown]or[kernel]frames"
Workflow
1. Prerequisites
# Install
sudo apt install linux-perf # Debian/Ubuntu (version-matched)
sudo dnf install perf # Fedora/RHEL
# Check permissions
# By default perf requires root or paranoid level ≤ 1
cat /proc/sys/kernel/perf_event_paranoid
# 2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions
# Temporarily lower (session only)
sudo sysctl -w kernel.perf_event_paranoid=1
# Persistent
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf
sudo sysctl -p /etc/sysctl.d/99-perf.conf
Compile the target with debug symbols for useful frame data:
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c
# -fno-omit-frame-pointer: essential for frame-pointer-based unwinding
# Alternative: compile with DWARF CFI and use --call-graph=dwarf
2. perf stat — quick counters
# Basic hardware counters
perf stat ./prog
# With specific events
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
# Wall-clock comparison: N runs
perf stat -r 5 ./prog
# Attach to existing process
perf stat -p 12345 sleep 10
Interpret perf stat output:
- IPC (instructions per cycle) < 1.0: memory-bound or stalled pipeline
- cache-miss rate > 5%: significant cache pressure
- branch-miss rate > 5%: branch predictor struggling
3. perf record — sampling
# Default: sample at 1000 Hz (cycles event)
perf record -g ./prog
# Specify frequency
perf record -F 999 -g ./prog
# Specific event
perf record -e cache-misses -g ./prog
# Attach to running process
perf record -F 999 -g -p 12345 sleep 30
# Off-CPU profiling (time spent waiting)
perf record -e sched:sched_switch -ag sleep 10
# DWARF call graphs (better for binaries without frame pointers)
perf record -F 999 --call-graph=dwarf ./prog
# Save to named file
perf record -o myapp.perf.data -g ./prog
4. perf report — interactive analysis
perf report # reads perf.data
perf report -i myapp.perf.data
perf report --no-children # self time only (not cumulative)
perf report --sort comm,dso,sym # sort by fields
perf report --stdio # non-interactive text output
Navigation in TUI:
Enter— expand a symbola— annotate (show assembly with hit counts)s— show source (needs debug info)d— filter by DSO (library)t— filter by thread?— help
5. perf annotate — hot instructions
# Show assembly with hit percentages
perf annotate sym_name
# From report: press 'a' on a symbol
# Or directly:
perf annotate -i perf.data --symbol=hot_function --stdio
High hit count on a mov or vmovdqa suggests a cache miss at that load.
6. perf top — live profiling
# Live top, like 'top' but for functions
sudo perf top -g
# Filter by process
sudo perf top -p 12345
7. Feed into flamegraphs
# Generate perf script output
perf script > out.perf
# Use Brendan Gregg's FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
# Open flamegraph.svg in browser
See skills/profilers/flamegraphs for reading flamegraphs and interpreting results.
8. Common issues
| Problem | Cause | Fix |
|---|---|---|
Permission denied |
perf_event_paranoid too high |
Lower paranoid level or run with sudo |
[unknown] frames |
Missing frame pointers or debug info | Recompile with -fno-omit-frame-pointer or use --call-graph=dwarf |
[kernel] everywhere |
Kernel symbols not visible | Use sudo perf record; install linux-image-$(uname -r)-dbgsym |
No kallsyms |
Kernel symbols unavailable | `echo 0 |
| Empty report for short program | Program exits too fast | Use -F 9999 or instrument longer workload |
| DWARF unwinding slow | Large DWARF stack | Limit with --call-graph dwarf,512 |
9. Useful events
# List all available events
perf list
# Common hardware events
cycles
instructions
cache-references
cache-misses
branch-instructions
branch-misses
stalled-cycles-frontend
stalled-cycles-backend
# Software events
context-switches
cpu-migrations
page-faults
# Tracepoints (requires root)
sched:sched_switch
syscalls:sys_enter_read
For a counter reference and interpretation guide, see references/events.md.
Related skills
- Use
skills/profilers/flamegraphsfor SVG flamegraph generation and reading - Use
skills/profilers/valgrindfor cache simulation and memory profiling - Use
skills/compilers/gccorskills/compilers/clangfor PGO from perf data (AutoFDO)
Weekly Installs
1
Repository
mohitmishra786/low-level-dev-skillsFirst Seen
Today
Security Audits
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1