tar-compression

Installation

SKILL.md

tar and compression

tar basics — create, extract, list

tar -cvf archive.tar dir/              # create .tar (no compression)
tar -czvf archive.tar.gz dir/          # create .tar.gz (gzip)
tar -cjvf archive.tar.bz2 dir/        # create .tar.bz2 (bzip2)
tar -cJvf archive.tar.xz dir/         # create .tar.xz (xz)
tar --zstd -cvf archive.tar.zst dir/  # create .tar.zst (zstd)

tar -xvf archive.tar               # extract (auto-detects compression)
tar -xzvf archive.tar.gz           # explicit gzip
tar -xjvf archive.tar.bz2          # explicit bzip2
tar -xJvf archive.tar.xz           # explicit xz

Modern tar auto-detects compression on extraction, so tar -xvf works for all formats.

tar flags reference

Flag	Meaning
`-c`	Create archive
`-x`	Extract archive
`-t`	List contents (don't extract)
`-r`	Append files to existing archive (uncompressed .tar only)
`-u`	Update — append only files newer than what's in archive
`-v`	Verbose (show files)
`-f`	Next argument is the filename — must come last before the filename
`-z`	gzip compression
`-j`	bzip2 compression
`-J`	xz compression
`--zstd`	zstd compression
`-C`	Change to directory before operating
`-p`	Preserve permissions
`-h`	Follow symlinks (archive the target, not the link)

-f must be the last flag before the archive name. tar -cfv archive.tar fails — tar looks for a file named v.

Append, update, exclude

tar -rvf archive.tar newfile.txt           # append (uncompressed .tar only)
tar -uvf archive.tar dir/                  # update only newer files

tar -czvf archive.tar.gz dir/ --exclude='*.log' --exclude='.git'
tar -czvf archive.tar.gz dir/ --exclude-vcs          # skip .git, .svn, .hg
tar -czvf archive.tar.gz dir/ --exclude-vcs-ignores   # also honor .gitignore
tar -czvf archive.tar.gz dir/ -X exclude.txt          # patterns from file

Archive inspection and partial extraction

# Inspect without extracting
tar -tzvf archive.tar.gz                       # list all files with sizes
tar -tzvf archive.tar.gz | grep pattern        # find specific files
tar -tzvf archive.tar.gz --wildcards '*.conf'  # list matching files (GNU tar)

# Extract specific files or directories
tar -xzvf archive.tar.gz path/to/file.txt      # extract one file
tar -xzvf archive.tar.gz dir/subdir/           # extract one directory
tar -xzvf archive.tar.gz --wildcards '*.conf'  # extract by pattern (GNU tar)
tar -xzvf archive.tar.gz -C /target/dir/       # extract to specific directory
tar -xzvf archive.tar.gz --strip-components=1  # strip top-level directory

Preserving permissions, ownership, symlinks

# Full preservation (system backups, run as root)
tar -cpzvf archive.tar.gz --same-owner dir/

# Follow symlinks instead of storing links
tar -chzvf archive.tar.gz dir/

# Restore with ownership (requires root)
tar -xpzvf archive.tar.gz --same-owner -C /restore/

# Compare archive to filesystem (check what changed)
tar -dvf archive.tar dir/

On macOS, --same-owner is default for root. On GNU/Linux, specify it explicitly.

Compression algorithms compared

Tool	Flag	Ext	Speed	Ratio	Parallel tool
gzip	`-z`	.tar.gz	Fast	Good	pigz
bzip2	`-j`	.tar.bz2	Slow	Better	pbzip2
xz	`-J`	.tar.xz	Very slow	Best	pxz / pixz
zstd	`--zstd`	.tar.zst	Very fast	Better	built-in `-T`
lz4	`--use-compress-program=lz4`	.tar.lz4	Fastest	Lower	built-in

Rough benchmarks (1 GB mixed data, single core):

Tool	Compress	Decompress	Compressed size
lz4	~2s	~0.5s	~55% of original
zstd -1	~3s	~1s	~42%
gzip -6	~12s	~3s	~36%
zstd -19	~90s	~1s	~30%
xz -6	~120s	~8s	~28%

Rule of thumb: quick backup → gzip/zstd. Source distribution → xz. General purpose → zstd. Real-time → lz4. bzip2 → legacy, skip it.

gzip / gunzip

gzip file.txt              # compress to file.txt.gz, removes original
gzip -k file.txt           # keep original
gunzip file.txt.gz         # decompress
gzip -9 file.txt           # max compression (1=fast, 9=best)
gzip -l file.txt.gz        # show compression ratio
zcat file.txt.gz           # decompress to stdout

zstd (modern compression)

zstd file.txt                  # compress -> file.txt.zst (default level 3)
zstd -d file.txt.zst           # decompress
zstd -19 file.txt              # max standard compression (1-19)
zstd --ultra -22 file.txt      # ultra mode (20-22, needs --ultra)
zstd --fast file.txt           # speed over ratio
zstd -T0 file.txt              # use all CPU cores
zstd --long file.txt           # larger window for better ratio on large files

Dictionary compression (many small similar files)

zstd --train samples/* -o mydict.zst          # train from samples
zstd -D mydict.zst file.txt -o file.txt.zst   # compress with dictionary
zstd -d -D mydict.zst file.txt.zst             # decompress with dictionary

Improves ratio 2-5x on small, structurally similar files (logs, JSON, configs).

Streaming

mysqldump mydb | zstd -T0 > dump.sql.zst
zstd -d < dump.sql.zst | mysql mydb
tar -cf - dir/ | pv | zstd -T0 > archive.tar.zst

Parallel compression (pigz, pbzip2, pxz)

# pigz — parallel gzip (drop-in replacement)
tar -cf - dir/ | pigz -p 8 > archive.tar.gz
pigz -d archive.tar.gz

# pbzip2 — parallel bzip2
tar -cf - dir/ | pbzip2 -p8 > archive.tar.bz2

# pxz — parallel xz
tar -cf - dir/ | pxz -T 8 > archive.tar.xz

# tar integration via --use-compress-program
tar -cf archive.tar.gz --use-compress-program='pigz -9' dir/
tar -xf archive.tar.gz --use-compress-program='pigz -d' -C /target/

zip / unzip

zip archive.zip file1 file2             # create
zip -r archive.zip dir/                 # recursive (directories)
zip -r archive.zip dir/ -x '*.log'     # exclude pattern

unzip archive.zip                       # extract to current dir
unzip archive.zip -d /target/dir/       # extract to specific dir
unzip archive.zip file.txt              # extract one file
unzip -o archive.zip                    # overwrite without prompting
unzip -l archive.zip                    # list files and sizes
zipinfo archive.zip                     # detailed info
zip -u archive.zip updated-file.txt    # update changed files in archive

Password-protected zip

zip -e archive.zip file1 file2         # prompt for password
unzip -P 'mypass' archive.zip          # extract with password

zip's encryption (ZipCrypto) is weak. Use 7z with AES-256 for real security.

Split zip archives

zip -r -s 100m archive.zip dir/        # split at 100MB chunks
zip -s 0 archive.zip --out combined.zip # merge splits before extracting
unzip combined.zip

7z operations (p7zip / 7-Zip)

7z a archive.7z dir/                        # create .7z
7z a archive.7z dir/ -mx=9                  # ultra compression
7z a archive.7z dir/ -ms=on                 # solid mode (better compression)
7z a archive.7z dir/ -p'Pass' -mhe=on       # AES-256 encrypt contents + filenames
7z a -v100m archive.7z dir/                 # split into 100MB volumes

7z x archive.7z                             # extract preserving structure
7z e archive.7z                             # extract flat (no dirs)
7z x archive.7z -o/target/dir/              # extract to directory
7z l archive.7z                             # list contents
7z t archive.7z                             # test integrity

7z reads/writes: .7z, .zip, .tar, .gz, .bz2, .xz. Extracts .rar.

Piping tar over SSH

# Local to remote
tar -czvf - dir/ | ssh user@host 'tar -xzvf - -C /target/'

# Remote to local
ssh user@host 'tar -czvf - /remote/dir/' | tar -xzvf - -C /local/dir/

# With zstd for faster transfer
tar -cf - dir/ | zstd -T0 | ssh user@host 'zstd -d | tar -xf - -C /target/'

# With progress and bandwidth limit
tar -cf - dir/ | pv -L 10m | ssh user@host 'tar -xf - -C /target/'

Incremental backups with tar

# Level 0: full backup (creates snapshot file)
tar -czvf backup-full.tar.gz \
    --listed-incremental=/var/backups/snapshot.snar dir/

# Level 1: incremental (only changes since last backup)
tar -czvf backup-inc-$(date +%Y%m%d).tar.gz \
    --listed-incremental=/var/backups/snapshot.snar dir/

# Restore: apply full, then each incremental in order
tar -xzvf backup-full.tar.gz -C /restore/ --listed-incremental=/dev/null
tar -xzvf backup-inc-20260410.tar.gz -C /restore/ --listed-incremental=/dev/null

--listed-incremental=/dev/null during restore tells tar to extract everything. GNU tar only.

Split large archives

tar -czvf - dir/ | split -b 100m - archive.tar.gz.part-
cat archive.tar.gz.part-* | tar -xzvf -

Cross-platform: macOS vs GNU tar

Feature	GNU tar (Linux)	BSD tar (macOS)
`--wildcards`	Required for patterns	Default behavior
`--exclude-vcs-ignores`	Supported	Not supported
`--zstd`	Supported	Use `--use-compress-program`
`--listed-incremental`	Supported	Not supported
Extended attributes	`--xattrs`	Stored by default

# macOS: install GNU tar for full feature set
brew install gnu-tar    # use as 'gtar'

# macOS: zstd with BSD tar
tar -cf - dir/ | zstd > archive.tar.zst

# macOS: avoid resource fork ._* files
COPYFILE_DISABLE=1 tar -czvf archive.tar.gz dir/

Compression for specific use cases

# Logs (repetitive text, compresses 90%+)
gzip -9 app.log
find /var/log -name '*.log' -mtime +7 -exec gzip {} \;
zstd --train logs/*.log -o log-dict.zst && zstd -D log-dict.zst -19 app.log

# Database dumps
mysqldump mydb | zstd -T0 > mydb.sql.zst
pg_dump mydb | gzip -9 > mydb.sql.gz
pg_dump -Fc mydb > mydb.dump              # pg custom format (built-in compression)

# Source code distribution
tar -cJvf project.tar.xz --exclude-vcs project/
git archive --format=tar.gz HEAD > project.tar.gz

# Disk images
dd if=/dev/sda bs=4M status=progress | zstd -T0 > disk.img.zst
zstd -d < disk.img.zst | dd of=/dev/sda bs=4M status=progress
qemu-img convert -c -O qcow2 disk.raw disk.qcow2

Common patterns

# Backup with timestamp
tar -czvf "backup-$(date +%Y%m%d-%H%M%S).tar.gz" /path/to/dir/

# Dry run — see what would be extracted
tar -tzvf archive.tar.gz | head -20

# Find largest files in archive
tar -tzvf archive.tar.gz | sort -k3 -n -r | head -20

# Create archive from file list
tar -czvf archive.tar.gz -T filelist.txt

# Create archive excluding large files
find dir/ -size -10M | tar -czvf small-files.tar.gz -T -

# Verify archive integrity
gzip -t archive.tar.gz && echo "OK" || echo "CORRUPT"
xz -t archive.tar.xz
zstd -t archive.tar.zst

Related skills

More from 1mangesh1/dev-skills-collection

Installs

Repository

1mangesh1/dev-s…llection

GitHub Stars

First Seen

Apr 14, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn