assembly-x86
x86-64 Assembly
Purpose
Guide agents through x86-64 assembly: reading compiler output, understanding the ABI, writing inline asm, and common patterns.
Triggers
- "How do I read the assembly GCC generated?"
- "What are the x86-64 registers?"
- "What is the calling convention on Linux/macOS?"
- "How do I write inline assembly in C?"
- "How do I use SSE/AVX intrinsics?"
- "This assembly uses
%rsp/%rbp— what does it mean?"
Workflow
1. Generate and read assembly
# AT&T syntax (GCC default)
gcc -S -O2 -fverbose-asm foo.c -o foo.s
# Intel syntax
gcc -S -masm=intel -O2 foo.c -o foo.s
# From GDB
(gdb) disassemble /s main # with source
(gdb) x/20i $rip
# From objdump
objdump -d -M intel -S prog # Intel + source (needs -g)
2. x86-64 registers
| 64-bit | 32-bit | 16-bit | 8-bit high | 8-bit low | Purpose |
|---|---|---|---|---|---|
%rax |
%eax |
%ax |
%ah |
%al |
Return value / accumulator |
%rbx |
%ebx |
%bx |
%bh |
%bl |
Callee-saved |
%rcx |
%ecx |
%cx |
%ch |
%cl |
4th arg / count |
%rdx |
%edx |
%dx |
%dh |
%dl |
3rd arg / 2nd return |
%rsi |
%esi |
%si |
— | %sil |
2nd arg |
%rdi |
%edi |
%di |
— | %dil |
1st arg |
%rbp |
%ebp |
%bp |
— | %bpl |
Frame pointer (callee-saved) |
%rsp |
%esp |
%sp |
— | %spl |
Stack pointer |
%r8–%r11 |
%r8d–%r11d |
%r8w–%r11w |
— | %r8b–%r11b |
5th–8th args / caller-saved |
%r12–%r15 |
%r12d–%r15d |
%r12w–%r15w |
— | %r12b–%r15b |
Callee-saved |
%rip |
Instruction pointer | ||||
%rflags |
%eflags |
Status flags | |||
%xmm0–%xmm7 |
FP/SIMD args and return | ||||
%xmm8–%xmm15 |
Caller-saved SIMD | ||||
%ymm0–%ymm15 |
AVX 256-bit | ||||
%zmm0–%zmm31 |
AVX-512 512-bit |
3. System V AMD64 ABI (Linux, macOS, FreeBSD)
Integer/pointer argument registers (in order):
%rdi, %rsi, %rdx, %rcx, %r8, %r9
Floating-point argument registers:
%xmm0–%xmm7
Return values:
- Integer:
%rax(low),%rdx(high if 128-bit) - Float:
%xmm0(low),%xmm1(high)
Caller-saved (scratch): %rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15
Callee-saved (must preserve): %rbx, %rbp, %r12–%r15
Stack: 16-byte aligned before call; call pushes 8 bytes → 16-byte aligned at function entry after prologue.
Red zone: 128 bytes below %rsp may be used by leaf functions without adjusting %rsp. Not available in kernel/signal handlers.
4. Common instruction patterns
| Pattern | Meaning |
|---|---|
mov %rdi, %rax |
Copy rdi to rax |
mov (%rdi), %rax |
Load 8 bytes from address in rdi |
mov %rax, 8(%rdi) |
Store rax to rdi+8 |
lea 8(%rdi), %rax |
Load effective address rdi+8 into rax (no memory access) |
push %rbx |
Push rbx; rsp -= 8 |
pop %rbx |
Pop into rbx; rsp += 8 |
call foo |
Push return addr; jmp foo |
ret |
Pop return addr; jmp to it |
xor %eax, %eax |
Zero rax (smaller encoding than mov $0, %rax) |
test %rax, %rax |
Set ZF if rax == 0 (cheaper than cmp $0, %rax) |
cmp $5, %rdi |
Set flags for rdi - 5 |
jl label |
Jump if signed less than |
5. AT&T vs Intel syntax
| Feature | AT&T | Intel |
|---|---|---|
| Operand order | source, dest | dest, source |
| Register prefix | %rax |
rax |
| Immediate prefix | $42 |
42 |
| Memory operand | 8(%rdi) |
[rdi+8] |
| Size suffix | movl, movq |
— (inferred) |
GCC emits AT&T by default. Use -masm=intel for Intel syntax.
6. Inline assembly (GCC extended asm)
// Basic: increment a register
int x = 5;
__asm__ volatile (
"incl %0"
: "=r"(x) // outputs: =r means write-only register
: "0"(x) // inputs: 0 means same as output 0
: // clobbers: none
);
// CPUID example
uint32_t eax, ebx, ecx, edx;
__asm__ volatile (
"cpuid"
: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
: "a"(1) // input: leaf 1
);
// Atomic increment
static inline int atomic_inc(volatile int *p) {
int ret;
__asm__ volatile (
"lock; xaddl %0, %1"
: "=r"(ret), "+m"(*p)
: "0"(1)
: "memory"
);
return ret + 1;
}
Constraint codes:
"r"— any general register"m"— memory operand"i"— immediate integer"a","b","c","d"— specific registers (%rax, %rbx, %rcx, %rdx)"="prefix — output (write-only)"+"prefix — read-write"memory"clobber — tells compiler memory may be modified (barrier)
7. SSE/AVX intrinsics (preferred over inline asm)
#include <immintrin.h> // includes all x86 SIMD headers
// Add 8 floats at once with AVX
__m256 a = _mm256_loadu_ps(arr_a); // load 8 floats (unaligned)
__m256 b = _mm256_loadu_ps(arr_b);
__m256 c = _mm256_add_ps(a, b);
_mm256_storeu_ps(result, c);
Check CPU support at compile time: -mavx2 or -march=native.
Check at runtime: __builtin_cpu_supports("avx2").
For a full register and instruction reference, see references/reference.md.
Related skills
- Use
skills/low-level-programming/assembly-armfor AArch64/ARM assembly - Use
skills/compilers/gccfor-S -masm=intelflag details - Use
skills/debuggers/gdbfor stepping through assembly (si,ni,x/i)