assembly-riscv
RISC-V Assembly
Purpose
Guide agents through RISC-V assembly programming: RV32/RV64 instruction sets, register naming and calling conventions (psABI), ISA extension naming, inline assembly with GCC/Clang, compressed (RVC) instructions, and QEMU-based simulation with GDB remote debugging.
Triggers
- "How do I write RISC-V assembly?"
- "What are the RISC-V calling convention registers?"
- "How do I use inline asm for RISC-V in C?"
- "What do RISC-V extension letters mean (IMAFD)?"
- "How do I simulate RISC-V with QEMU?"
- "How do I debug RISC-V code with GDB?"
Workflow
1. Register file and calling convention
RISC-V has 32 integer registers (x0–x31) with ABI names:
| Register | ABI name | Role | Saved by |
|---|---|---|---|
| x0 | zero | Hard-wired zero | — |
| x1 | ra | Return address | Caller |
| x2 | sp | Stack pointer | Callee |
| x3 | gp | Global pointer | — |
| x4 | tp | Thread pointer | — |
| x5–x7 | t0–t2 | Temporaries | Caller |
| x8 | s0/fp | Frame pointer | Callee |
| x9 | s1 | Saved register | Callee |
| x10–x11 | a0–a1 | Arguments / return values | Caller |
| x12–x17 | a2–a7 | Arguments | Caller |
| x18–x27 | s2–s11 | Saved registers | Callee |
| x28–x31 | t3–t6 | Temporaries | Caller |
Floating-point registers (F extension): f0–f31 (fa0–fa7 for arguments).
2. Basic instructions
# Arithmetic (R and I type)
add a0, a1, a2 # a0 = a1 + a2
sub a0, a1, a2 # a0 = a1 - a2
addi a0, a1, 42 # a0 = a1 + 42 (immediate)
mul a0, a1, a2 # a0 = a1 * a2 (M extension)
div a0, a1, a2 # signed divide (M extension)
rem a0, a1, a2 # remainder (M extension)
# Logical
and a0, a1, a2 # bitwise AND
or a0, a1, a2 # bitwise OR
xor a0, a1, a2 # bitwise XOR
sll a0, a1, a2 # shift left logical
srl a0, a1, a2 # shift right logical (unsigned)
sra a0, a1, a2 # shift right arithmetic (signed)
# Load / store
lw a0, 0(sp) # load word (32-bit)
ld a0, 0(sp) # load doubleword (64-bit, RV64)
lh a0, 4(sp) # load halfword (sign-extended)
lbu a0, 8(sp) # load byte (zero-extended)
sw a0, 0(sp) # store word
sd a0, 0(sp) # store doubleword (RV64)
# Branches (compare and branch)
beq a0, a1, label # branch if equal
bne a0, a1, label # branch if not equal
blt a0, a1, label # branch if less than (signed)
bltu a0, a1, label # branch if less than (unsigned)
bge a0, a1, label # branch if ≥ (signed)
# Jumps
j label # unconditional jump (pseudoinstruction: jal x0, label)
jal ra, func # jump and link (call)
jalr zero, ra, 0 # jump to ra (return: pseudoinstruction: ret)
3. Minimal function (psABI calling convention)
.section .text
.global add_numbers
# int add_numbers(int a, int b); — a in a0, b in a1, return in a0
add_numbers:
add a0, a0, a1 # result = a + b
ret # return (jalr zero, ra, 0)
.global factorial
# long factorial(int n); — n in a0
factorial:
addi sp, sp, -16 # allocate stack frame
sd ra, 8(sp) # save return address (RV64)
sd s0, 0(sp) # save s0 (callee-saved)
mv s0, a0 # s0 = n
li a0, 1 # default return 1
blez s0, .done # if n <= 0, return 1
addi a0, s0, -1 # a0 = n - 1
call factorial # recursive call: factorial(n-1)
mul a0, a0, s0 # a0 = result * n
.done:
ld ra, 8(sp) # restore ra
ld s0, 0(sp) # restore s0
addi sp, sp, 16 # deallocate
ret
4. ISA extension naming
RISC-V extensions are combined as a string after the base ISA:
| Letter | Extension | Description |
|---|---|---|
| I | Integer | Base 32/64-bit integer (RV32I, RV64I) |
| M | Multiply | Integer multiply and divide |
| A | Atomic | Atomic memory operations (lr/sc, AMOs) |
| F | Float | Single-precision float |
| D | Double | Double-precision float |
| C | Compressed | 16-bit compressed instructions |
| G | General | = IMAFD (shorthand) |
| V | Vector | Vector instructions (SIMD) |
| Zicsr | CSR | Control/status register access |
| Zifencei | Fence.i | Instruction-fetch fence |
| Zba/Zbb/Zbc/Zbs | Bit manipulation | Bit ops (B extension set) |
| Ztso | TSO | Total Store Ordering memory model |
Common targets:
- Embedded:
rv32imac— no floating point, with atomics and compressed - Linux app:
rv64gc— full general + compressed - High performance:
rv64gcv— + vector
5. Inline assembly (GCC/Clang)
// Read a CSR register (e.g., cycle counter)
static inline uint64_t read_cycle(void) {
uint64_t val;
asm volatile ("rdcycle %0" : "=r"(val));
return val;
}
// Atomic swap
static inline int atomic_swap(int *ptr, int new_val) {
int old;
asm volatile (
"amoswap.w.aqrl %0, %2, (%1)"
: "=r"(old)
: "r"(ptr), "r"(new_val)
: "memory"
);
return old;
}
// Memory fence
static inline void memory_fence(void) {
asm volatile ("fence rw, rw" ::: "memory");
}
// CSR read/write
#define csr_read(csr) ({ \
uint64_t _v; \
asm volatile ("csrr %0, " #csr : "=r"(_v)); \
_v; \
})
uint64_t mstatus = csr_read(mstatus);
6. Compressed instructions (RVC)
RVC replaces common 32-bit instructions with 16-bit versions when:
- Register is in x8–x15 (for
c.versions) - Immediate fits in smaller field
- Specific instruction patterns match
# Enable C extension in GCC
riscv64-linux-gnu-gcc -march=rv64gc prog.c -o prog
# Check if compressed instructions were generated
riscv64-linux-gnu-objdump -d prog | grep "c\."
# c.addi, c.ld, c.sw, c.j, etc.
# Disable compressed (for debugging or targets without C)
riscv64-linux-gnu-gcc -march=rv64g prog.c -o prog
7. QEMU simulation and GDB
# Install QEMU RISC-V
apt-get install qemu-user qemu-system-riscv64
# User-mode emulation (run RV64 binary on x86 host)
qemu-riscv64 ./prog
# System emulation (full bare-metal VM)
qemu-system-riscv64 \
-machine virt \
-nographic \
-kernel firmware.elf \
-gdb tcp::1234 \
-S # start paused
# GDB remote session
riscv64-linux-gnu-gdb prog
(gdb) target remote :1234
(gdb) load
(gdb) break main
(gdb) continue
For the RISC-V psABI calling convention details, see references/riscv-abi.md.
Related skills
- Use
skills/low-level-programming/assembly-armfor AArch64 comparison - Use
skills/low-level-programming/assembly-x86for x86-64 assembly - Use
skills/embedded/openocd-jtagfor real hardware RISC-V debugging - Use
skills/compilers/cross-gccfor RISC-V cross-compilation setup
More from mohitmishra786/low-level-dev-skills
cmake
CMake build system skill for C/C++ projects. Use when writing or refactoring CMakeLists.txt, configuring out-of-source builds, selecting generators (Ninja, Make, VS), managing targets and dependencies with target_link_libraries, integrating external packages via find_package or FetchContent, enabling sanitizers, setting up toolchain files for cross-compilation, or exporting CMake packages. Activates on queries about CMakeLists.txt, cmake configure errors, target properties, install rules, CPack, or CMake presets.
590static-analysis
Static analysis skill for C/C++ codebases. Use when hardening code quality, triaging noisy builds, running clang-tidy, cppcheck, or scan-build, interpreting check categories, suppressing false positives, or integrating static analysis into CI. Activates on queries about clang-tidy checks, cppcheck, scan-build, compile_commands.json, code hardening, or static analysis warnings.
412llvm
LLVM IR and pass pipeline skill. Use when working directly with LLVM Intermediate Representation (IR), running opt passes, generating IR with llc, inspecting or writing LLVM IR for custom passes, or understanding how the LLVM backend lowers IR to assembly. Activates on queries about LLVM IR, opt, llc, llvm-dis, LLVM passes, IR transformations, or building LLVM-based tools.
364gdb
GDB debugger skill for C/C++ programs. Use when starting a GDB session, setting breakpoints, stepping through code, inspecting variables, debugging crashes, using reverse debugging (record/replay), remote debugging with gdbserver, or loading core dumps. Activates on queries about GDB commands, segfaults, hangs, watchpoints, conditional breakpoints, pretty-printers, Python GDB scripting, or multi-threaded debugging.
159linux-perf
Linux perf profiler skill for CPU performance analysis. Use when collecting sampling profiles with perf record, generating perf report, measuring hardware counters (cache misses, branch mispredicts, IPC), identifying hot functions, or feeding perf data into flamegraph tools. Activates on queries about perf, Linux performance counters, PMU events, off-CPU profiling, perf stat, perf annotate, or sampling-based profiling on Linux.
147core-dumps
Core dump analysis skill for production crash triage. Use when loading core files in GDB or LLDB, enabling core dump generation on Linux/macOS, mapping symbols with debuginfo or debuginfod, or extracting backtraces from crashes without re-running the program. Activates on queries about core files, ulimit, coredumpctl, debuginfod, crash triage, or analyzing segfaults from production binaries.
135