zfs
Installation
SKILL.md
Identity
- Kernel module:
zfs.ko(loaded viamodprobe zfs; auto-loaded on most distros) - Main CLI tools:
zpool(pool management),zfs(dataset/snapshot management),zdb(low-level diagnostics) - Config:
/etc/zfs/(import cache, key files),/etc/zfs/zpool.cache(auto-import list) - Service:
zfs-import-cache.service,zfs-mount.service,zfs-share.service(systemd units) - Logs:
journalctl -u zfs-import-cache/journalctl -u zfs-mount/dmesg | grep -i zfs - Distro install:
apt install zfsutils-linux(Debian/Ubuntu) /dnf install zfsafter adding OpenZFS repo (RHEL/Fedora) - Version check:
zpool version/zfs version
Key Operations
| Operation | Command |
|---|---|
| Pool status (all pools) | zpool status |
| Pool status (one pool) | zpool status <pool> |
| Pool list (size/free/health) | zpool list |
| Create mirror pool | zpool create <pool> mirror <dev1> <dev2> |
| Create RAID-Z1 pool | zpool create <pool> raidz <dev1> <dev2> <dev3> |
| Destroy pool | zpool destroy <pool> |
| Export pool (safe removal) | zpool export <pool> |
| Import pool | zpool import <pool> |
| Import pool (search path) | zpool import -d /dev/disk/by-id <pool> |
| Create dataset | zfs create <pool>/<name> |
| List datasets | zfs list |
| List datasets (recursive) | zfs list -r <pool> |
| Set property | zfs set compression=lz4 <pool>/<dataset> |
| Get property | zfs get compression <pool>/<dataset> |
| Get all properties | zfs get all <pool>/<dataset> |
| Create snapshot | zfs snapshot <pool>/<dataset>@<snapname> |
| List snapshots | zfs list -t snapshot |
| Destroy snapshot | zfs destroy <pool>/<dataset>@<snapname> |
| Rollback to snapshot | zfs rollback <pool>/<dataset>@<snapname> |
| Send snapshot (local) | zfs send <pool>/<dataset>@<snap> | zfs receive <pool2>/<dest> |
| Send snapshot (remote SSH) | zfs send <pool>/<dataset>@<snap> | ssh host zfs receive <pool>/<dest> |
| Send incremental | zfs send -i @<prev> <pool>/<dataset>@<snap> | zfs receive <pool2>/<dest> |
| Start scrub | zpool scrub <pool> |
| Scrub status | zpool status <pool> (shows scrub progress and last result) |
| Replace failed disk | zpool replace <pool> <old-dev> <new-dev> |
| Resilver status | zpool status <pool> (shows resilver progress) |
| Pool I/O stats | zpool iostat -v <pool> 1 |
| List with custom cols | zfs list -o name,used,avail,refer,compression,ratio |
| Mount dataset | zfs mount <pool>/<dataset> |
| Unmount dataset | zfs unmount <pool>/<dataset> |
| Upgrade pool features | zpool upgrade <pool> |
| Upgrade all pools | zpool upgrade -a |
| Add vdev to pool | zpool add <pool> mirror <dev1> <dev2> |
| Online expand after resize | zpool online -e <pool> <dev> |
Expected State
- All pools report
ONLINEunderzpool status—DEGRADEDmeans redundancy is lost,FAULTEDmeans data may be at risk. - No checksum or read errors in
zpool statusoutput (errors: No known data errors). - All datasets mounted at expected mountpoints:
zfs mountshows no unmounted datasets that should be online. - Scrub completed within the last 30 days with zero errors.
- ARC hit rate above 80% under normal workloads:
arc_summaryor/proc/spl/kstat/zfs/arcstats.
Health Checks
zpool status— all pools ONLINE, no errors, scrub date and result visiblezpool list— verify free space; pools above 80% capacity show fragmentation increaseszfs mount— lists currently mounted datasets; cross-check againstzfs listawk '/^hits|^misses/ {sum+=$3} END {print "ARC total accesses:", sum}' /proc/spl/kstat/zfs/arcstats— then compute hit rate:hits / (hits + misses)
Common Failures
| Symptom | Likely cause | Check / Fix |
|---|---|---|
Pool status shows DEGRADED |
One or more disks failed | zpool status -v <pool> to identify failed device; replace with zpool replace |
Pool status shows FAULTED |
Too many disk failures for redundancy level | Restore from backup; RAIDZ1 cannot survive 2 simultaneous disk failures |
| Checksum errors without disk failure | Bit rot, bad cables, flaky controller | zpool scrub to assess scope; check cables and HBA; replace suspect disk |
cannot import pool: no such pool in the system |
Pool was exported or cache file missing | zpool import -d /dev/disk/by-id to search by path; or zpool import after attaching all disks |
cannot import pool: host ID mismatch |
Pool moved from another machine, hostid differs |
Override with zpool import -f <pool> — verify disks are no longer in use on the original host first |
| Dataset full, but pool shows free space | Dataset has a quota set |
zfs get quota <pool>/<dataset>; raise or remove with zfs set quota=none |
| Pool nearly full, cannot free space | Snapshots holding referenced blocks | zfs list -t snapshot -o name,used,refer to find large snapshots; destroy old ones |
cannot destroy snapshot: dataset is busy |
A clone depends on the snapshot | zfs list -t all -o name,origin to find clones; destroy clone first, then snapshot |
zfs send fails with incremental mismatch |
Intermediate snapshots were destroyed | Must restart full send; incremental base snapshot must exist on both source and destination |
zfs receive errors with cannot receive incremental stream |
Destination has diverged (rollbacks or manual snapshots) | zfs rollback on destination to match base, then re-send |
| ARC consuming all available RAM | Expected behavior — ARC is a cache | Limit if needed: echo <bytes> > /sys/module/zfs/parameters/zfs_arc_max; persist in /etc/modprobe.d/zfs.conf |
Pain Points
- ARC is not a memory leak: ZFS ARC uses all available RAM by design — it releases memory to other processes on demand via the kernel's memory pressure mechanism. Only cap it if RAM pressure is causing issues.
- Deduplication is memory-expensive: The deduplication table (DDT) requires roughly 300–500 bytes of RAM per unique block. A 10 TB deduplicated pool can require 5–30 GB of RAM just for the DDT. Use
compression=lz4orcompression=zstdinstead — similar space savings at negligible cost. - Cannot shrink a vdev: Once a vdev is added to a pool, its device count is permanent. You cannot remove a RAIDZ vdev (only mirrors can be removed in recent OpenZFS versions). Plan pool layout before creation.
- Pool feature flags are one-way upgrades:
zpool upgradeenables new features but the pool then requires the same or newer OpenZFS version to import. Never upgrade if the pool might need to be read by an older system. - ECC RAM — required or recommended: ZFS does not require ECC RAM; it runs fine without it. ECC protects against RAM bit errors corrupting data before ZFS writes it. On servers with large datasets and high write rates, ECC is strongly recommended. On a home NAS it is a judgment call.
- L2ARC and SLOG devices: L2ARC (SSD read cache) rarely helps unless your working set is larger than RAM. SLOG (separate ZFS Intent Log) only speeds up synchronous writes — databases, NFS with sync enabled. Cheap SSDs as SLOG devices are dangerous: a failed SLOG can cause a pool to need recovery. Use enterprise or power-loss-protected SSDs for SLOG.
recordsizematters for databases: Defaultrecordsize=128Kis good for large sequential files. PostgreSQL and MySQL benefit fromrecordsize=16Korrecordsize=8Kto match their page sizes. Set before writing data — changingrecordsizeapplies to new writes only.- Snapshot space accounting:
zfs list -t snapshotshowsUSEDfor each snapshot, but this only reflects blocks that are unique to that snapshot. Deleting a snapshot transfers its blocks to the next newer snapshot until the last one is destroyed. Space is not freed until the last snapshot holding unique blocks is destroyed.
References
See references/ for:
zfs-properties.md— pool and dataset property reference organized by categorycommon-patterns.md— pool creation, snapshots, send/receive, encryption, and tuning examplesdocs.md— official documentation and man page links
Related skills