assembly-x86

SKILL.md

x86-64 Assembly

Purpose

Guide agents through x86-64 assembly: reading compiler output, understanding the ABI, writing inline asm, and common patterns.

Triggers

  • "How do I read the assembly GCC generated?"
  • "What are the x86-64 registers?"
  • "What is the calling convention on Linux/macOS?"
  • "How do I write inline assembly in C?"
  • "How do I use SSE/AVX intrinsics?"
  • "This assembly uses %rsp / %rbp — what does it mean?"

Workflow

1. Generate and read assembly

# AT&T syntax (GCC default)
gcc -S -O2 -fverbose-asm foo.c -o foo.s

# Intel syntax
gcc -S -masm=intel -O2 foo.c -o foo.s

# From GDB
(gdb) disassemble /s main    # with source
(gdb) x/20i $rip

# From objdump
objdump -d -M intel -S prog  # Intel + source (needs -g)

2. x86-64 registers

64-bit 32-bit 16-bit 8-bit high 8-bit low Purpose
%rax %eax %ax %ah %al Return value / accumulator
%rbx %ebx %bx %bh %bl Callee-saved
%rcx %ecx %cx %ch %cl 4th arg / count
%rdx %edx %dx %dh %dl 3rd arg / 2nd return
%rsi %esi %si %sil 2nd arg
%rdi %edi %di %dil 1st arg
%rbp %ebp %bp %bpl Frame pointer (callee-saved)
%rsp %esp %sp %spl Stack pointer
%r8%r11 %r8d%r11d %r8w%r11w %r8b%r11b 5th–8th args / caller-saved
%r12%r15 %r12d%r15d %r12w%r15w %r12b%r15b Callee-saved
%rip Instruction pointer
%rflags %eflags Status flags
%xmm0%xmm7 FP/SIMD args and return
%xmm8%xmm15 Caller-saved SIMD
%ymm0%ymm15 AVX 256-bit
%zmm0%zmm31 AVX-512 512-bit

3. System V AMD64 ABI (Linux, macOS, FreeBSD)

Integer/pointer argument registers (in order): %rdi, %rsi, %rdx, %rcx, %r8, %r9

Floating-point argument registers: %xmm0%xmm7

Return values:

  • Integer: %rax (low), %rdx (high if 128-bit)
  • Float: %xmm0 (low), %xmm1 (high)

Caller-saved (scratch): %rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15

Callee-saved (must preserve): %rbx, %rbp, %r12–%r15

Stack: 16-byte aligned before call; call pushes 8 bytes → 16-byte aligned at function entry after prologue.

Red zone: 128 bytes below %rsp may be used by leaf functions without adjusting %rsp. Not available in kernel/signal handlers.

4. Common instruction patterns

Pattern Meaning
mov %rdi, %rax Copy rdi to rax
mov (%rdi), %rax Load 8 bytes from address in rdi
mov %rax, 8(%rdi) Store rax to rdi+8
lea 8(%rdi), %rax Load effective address rdi+8 into rax (no memory access)
push %rbx Push rbx; rsp -= 8
pop %rbx Pop into rbx; rsp += 8
call foo Push return addr; jmp foo
ret Pop return addr; jmp to it
xor %eax, %eax Zero rax (smaller encoding than mov $0, %rax)
test %rax, %rax Set ZF if rax == 0 (cheaper than cmp $0, %rax)
cmp $5, %rdi Set flags for rdi - 5
jl label Jump if signed less than

5. AT&T vs Intel syntax

Feature AT&T Intel
Operand order source, dest dest, source
Register prefix %rax rax
Immediate prefix $42 42
Memory operand 8(%rdi) [rdi+8]
Size suffix movl, movq — (inferred)

GCC emits AT&T by default. Use -masm=intel for Intel syntax.

6. Inline assembly (GCC extended asm)

// Basic: increment a register
int x = 5;
__asm__ volatile (
    "incl %0"
    : "=r"(x)   // outputs: =r means write-only register
    : "0"(x)    // inputs: 0 means same as output 0
    : // clobbers: none
);

// CPUID example
uint32_t eax, ebx, ecx, edx;
__asm__ volatile (
    "cpuid"
    : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
    : "a"(1)    // input: leaf 1
);

// Atomic increment
static inline int atomic_inc(volatile int *p) {
    int ret;
    __asm__ volatile (
        "lock; xaddl %0, %1"
        : "=r"(ret), "+m"(*p)
        : "0"(1)
        : "memory"
    );
    return ret + 1;
}

Constraint codes:

  • "r" — any general register
  • "m" — memory operand
  • "i" — immediate integer
  • "a", "b", "c", "d" — specific registers (%rax, %rbx, %rcx, %rdx)
  • "=" prefix — output (write-only)
  • "+" prefix — read-write
  • "memory" clobber — tells compiler memory may be modified (barrier)

7. SSE/AVX intrinsics (preferred over inline asm)

#include <immintrin.h>   // includes all x86 SIMD headers

// Add 8 floats at once with AVX
__m256 a = _mm256_loadu_ps(arr_a);   // load 8 floats (unaligned)
__m256 b = _mm256_loadu_ps(arr_b);
__m256 c = _mm256_add_ps(a, b);
_mm256_storeu_ps(result, c);

Check CPU support at compile time: -mavx2 or -march=native. Check at runtime: __builtin_cpu_supports("avx2").

For a full register and instruction reference, see references/reference.md.

Related skills

  • Use skills/low-level-programming/assembly-arm for AArch64/ARM assembly
  • Use skills/compilers/gcc for -S -masm=intel flag details
  • Use skills/debuggers/gdb for stepping through assembly (si, ni, x/i)
Weekly Installs
2
First Seen
Today
Installed on
mcpjam2
claude-code2
replit2
junie2
windsurf2
zencoder2