tar-compression
tar and compression
tar basics — create, extract, list
tar -cvf archive.tar dir/ # create .tar (no compression)
tar -czvf archive.tar.gz dir/ # create .tar.gz (gzip)
tar -cjvf archive.tar.bz2 dir/ # create .tar.bz2 (bzip2)
tar -cJvf archive.tar.xz dir/ # create .tar.xz (xz)
tar --zstd -cvf archive.tar.zst dir/ # create .tar.zst (zstd)
tar -xvf archive.tar # extract (auto-detects compression)
tar -xzvf archive.tar.gz # explicit gzip
tar -xjvf archive.tar.bz2 # explicit bzip2
tar -xJvf archive.tar.xz # explicit xz
Modern tar auto-detects compression on extraction, so tar -xvf works for all formats.
tar flags reference
| Flag | Meaning |
|---|---|
-c |
Create archive |
-x |
Extract archive |
-t |
List contents (don't extract) |
-r |
Append files to existing archive (uncompressed .tar only) |
-u |
Update — append only files newer than what's in archive |
-v |
Verbose (show files) |
-f |
Next argument is the filename — must come last before the filename |
-z |
gzip compression |
-j |
bzip2 compression |
-J |
xz compression |
--zstd |
zstd compression |
-C |
Change to directory before operating |
-p |
Preserve permissions |
-h |
Follow symlinks (archive the target, not the link) |
-f must be the last flag before the archive name. tar -cfv archive.tar fails — tar looks for a file named v.
Append, update, exclude
tar -rvf archive.tar newfile.txt # append (uncompressed .tar only)
tar -uvf archive.tar dir/ # update only newer files
tar -czvf archive.tar.gz dir/ --exclude='*.log' --exclude='.git'
tar -czvf archive.tar.gz dir/ --exclude-vcs # skip .git, .svn, .hg
tar -czvf archive.tar.gz dir/ --exclude-vcs-ignores # also honor .gitignore
tar -czvf archive.tar.gz dir/ -X exclude.txt # patterns from file
Archive inspection and partial extraction
# Inspect without extracting
tar -tzvf archive.tar.gz # list all files with sizes
tar -tzvf archive.tar.gz | grep pattern # find specific files
tar -tzvf archive.tar.gz --wildcards '*.conf' # list matching files (GNU tar)
# Extract specific files or directories
tar -xzvf archive.tar.gz path/to/file.txt # extract one file
tar -xzvf archive.tar.gz dir/subdir/ # extract one directory
tar -xzvf archive.tar.gz --wildcards '*.conf' # extract by pattern (GNU tar)
tar -xzvf archive.tar.gz -C /target/dir/ # extract to specific directory
tar -xzvf archive.tar.gz --strip-components=1 # strip top-level directory
Preserving permissions, ownership, symlinks
# Full preservation (system backups, run as root)
tar -cpzvf archive.tar.gz --same-owner dir/
# Follow symlinks instead of storing links
tar -chzvf archive.tar.gz dir/
# Restore with ownership (requires root)
tar -xpzvf archive.tar.gz --same-owner -C /restore/
# Compare archive to filesystem (check what changed)
tar -dvf archive.tar dir/
On macOS, --same-owner is default for root. On GNU/Linux, specify it explicitly.
Compression algorithms compared
| Tool | Flag | Ext | Speed | Ratio | Parallel tool |
|---|---|---|---|---|---|
| gzip | -z |
.tar.gz | Fast | Good | pigz |
| bzip2 | -j |
.tar.bz2 | Slow | Better | pbzip2 |
| xz | -J |
.tar.xz | Very slow | Best | pxz / pixz |
| zstd | --zstd |
.tar.zst | Very fast | Better | built-in -T |
| lz4 | --use-compress-program=lz4 |
.tar.lz4 | Fastest | Lower | built-in |
Rough benchmarks (1 GB mixed data, single core):
| Tool | Compress | Decompress | Compressed size |
|---|---|---|---|
| lz4 | ~2s | ~0.5s | ~55% of original |
| zstd -1 | ~3s | ~1s | ~42% |
| gzip -6 | ~12s | ~3s | ~36% |
| zstd -19 | ~90s | ~1s | ~30% |
| xz -6 | ~120s | ~8s | ~28% |
Rule of thumb: quick backup → gzip/zstd. Source distribution → xz. General purpose → zstd. Real-time → lz4. bzip2 → legacy, skip it.
gzip / gunzip
gzip file.txt # compress to file.txt.gz, removes original
gzip -k file.txt # keep original
gunzip file.txt.gz # decompress
gzip -9 file.txt # max compression (1=fast, 9=best)
gzip -l file.txt.gz # show compression ratio
zcat file.txt.gz # decompress to stdout
zstd (modern compression)
zstd file.txt # compress -> file.txt.zst (default level 3)
zstd -d file.txt.zst # decompress
zstd -19 file.txt # max standard compression (1-19)
zstd --ultra -22 file.txt # ultra mode (20-22, needs --ultra)
zstd --fast file.txt # speed over ratio
zstd -T0 file.txt # use all CPU cores
zstd --long file.txt # larger window for better ratio on large files
Dictionary compression (many small similar files)
zstd --train samples/* -o mydict.zst # train from samples
zstd -D mydict.zst file.txt -o file.txt.zst # compress with dictionary
zstd -d -D mydict.zst file.txt.zst # decompress with dictionary
Improves ratio 2-5x on small, structurally similar files (logs, JSON, configs).
Streaming
mysqldump mydb | zstd -T0 > dump.sql.zst
zstd -d < dump.sql.zst | mysql mydb
tar -cf - dir/ | pv | zstd -T0 > archive.tar.zst
Parallel compression (pigz, pbzip2, pxz)
# pigz — parallel gzip (drop-in replacement)
tar -cf - dir/ | pigz -p 8 > archive.tar.gz
pigz -d archive.tar.gz
# pbzip2 — parallel bzip2
tar -cf - dir/ | pbzip2 -p8 > archive.tar.bz2
# pxz — parallel xz
tar -cf - dir/ | pxz -T 8 > archive.tar.xz
# tar integration via --use-compress-program
tar -cf archive.tar.gz --use-compress-program='pigz -9' dir/
tar -xf archive.tar.gz --use-compress-program='pigz -d' -C /target/
zip / unzip
zip archive.zip file1 file2 # create
zip -r archive.zip dir/ # recursive (directories)
zip -r archive.zip dir/ -x '*.log' # exclude pattern
unzip archive.zip # extract to current dir
unzip archive.zip -d /target/dir/ # extract to specific dir
unzip archive.zip file.txt # extract one file
unzip -o archive.zip # overwrite without prompting
unzip -l archive.zip # list files and sizes
zipinfo archive.zip # detailed info
zip -u archive.zip updated-file.txt # update changed files in archive
Password-protected zip
zip -e archive.zip file1 file2 # prompt for password
unzip -P 'mypass' archive.zip # extract with password
zip's encryption (ZipCrypto) is weak. Use 7z with AES-256 for real security.
Split zip archives
zip -r -s 100m archive.zip dir/ # split at 100MB chunks
zip -s 0 archive.zip --out combined.zip # merge splits before extracting
unzip combined.zip
7z operations (p7zip / 7-Zip)
7z a archive.7z dir/ # create .7z
7z a archive.7z dir/ -mx=9 # ultra compression
7z a archive.7z dir/ -ms=on # solid mode (better compression)
7z a archive.7z dir/ -p'Pass' -mhe=on # AES-256 encrypt contents + filenames
7z a -v100m archive.7z dir/ # split into 100MB volumes
7z x archive.7z # extract preserving structure
7z e archive.7z # extract flat (no dirs)
7z x archive.7z -o/target/dir/ # extract to directory
7z l archive.7z # list contents
7z t archive.7z # test integrity
7z reads/writes: .7z, .zip, .tar, .gz, .bz2, .xz. Extracts .rar.
Piping tar over SSH
# Local to remote
tar -czvf - dir/ | ssh user@host 'tar -xzvf - -C /target/'
# Remote to local
ssh user@host 'tar -czvf - /remote/dir/' | tar -xzvf - -C /local/dir/
# With zstd for faster transfer
tar -cf - dir/ | zstd -T0 | ssh user@host 'zstd -d | tar -xf - -C /target/'
# With progress and bandwidth limit
tar -cf - dir/ | pv -L 10m | ssh user@host 'tar -xf - -C /target/'
Incremental backups with tar
# Level 0: full backup (creates snapshot file)
tar -czvf backup-full.tar.gz \
--listed-incremental=/var/backups/snapshot.snar dir/
# Level 1: incremental (only changes since last backup)
tar -czvf backup-inc-$(date +%Y%m%d).tar.gz \
--listed-incremental=/var/backups/snapshot.snar dir/
# Restore: apply full, then each incremental in order
tar -xzvf backup-full.tar.gz -C /restore/ --listed-incremental=/dev/null
tar -xzvf backup-inc-20260410.tar.gz -C /restore/ --listed-incremental=/dev/null
--listed-incremental=/dev/null during restore tells tar to extract everything. GNU tar only.
Split large archives
tar -czvf - dir/ | split -b 100m - archive.tar.gz.part-
cat archive.tar.gz.part-* | tar -xzvf -
Cross-platform: macOS vs GNU tar
| Feature | GNU tar (Linux) | BSD tar (macOS) |
|---|---|---|
--wildcards |
Required for patterns | Default behavior |
--exclude-vcs-ignores |
Supported | Not supported |
--zstd |
Supported | Use --use-compress-program |
--listed-incremental |
Supported | Not supported |
| Extended attributes | --xattrs |
Stored by default |
# macOS: install GNU tar for full feature set
brew install gnu-tar # use as 'gtar'
# macOS: zstd with BSD tar
tar -cf - dir/ | zstd > archive.tar.zst
# macOS: avoid resource fork ._* files
COPYFILE_DISABLE=1 tar -czvf archive.tar.gz dir/
Compression for specific use cases
# Logs (repetitive text, compresses 90%+)
gzip -9 app.log
find /var/log -name '*.log' -mtime +7 -exec gzip {} \;
zstd --train logs/*.log -o log-dict.zst && zstd -D log-dict.zst -19 app.log
# Database dumps
mysqldump mydb | zstd -T0 > mydb.sql.zst
pg_dump mydb | gzip -9 > mydb.sql.gz
pg_dump -Fc mydb > mydb.dump # pg custom format (built-in compression)
# Source code distribution
tar -cJvf project.tar.xz --exclude-vcs project/
git archive --format=tar.gz HEAD > project.tar.gz
# Disk images
dd if=/dev/sda bs=4M status=progress | zstd -T0 > disk.img.zst
zstd -d < disk.img.zst | dd of=/dev/sda bs=4M status=progress
qemu-img convert -c -O qcow2 disk.raw disk.qcow2
Common patterns
# Backup with timestamp
tar -czvf "backup-$(date +%Y%m%d-%H%M%S).tar.gz" /path/to/dir/
# Dry run — see what would be extracted
tar -tzvf archive.tar.gz | head -20
# Find largest files in archive
tar -tzvf archive.tar.gz | sort -k3 -n -r | head -20
# Create archive from file list
tar -czvf archive.tar.gz -T filelist.txt
# Create archive excluding large files
find dir/ -size -10M | tar -czvf small-files.tar.gz -T -
# Verify archive integrity
gzip -t archive.tar.gz && echo "OK" || echo "CORRUPT"
xz -t archive.tar.xz
zstd -t archive.tar.zst
More from 1mangesh1/dev-skills-collection
curl-http
HTTP request construction and API testing with curl and HTTPie. Use when user asks to "test API", "make HTTP request", "curl POST", "send request", "test endpoint", "debug API", "upload file", "check response time", "set auth header", "basic auth with curl", "send JSON", "test webhook", "check status code", "follow redirects", "rate limit testing", "measure API latency", "stress test endpoint", "mock API response", or any HTTP calls from the command line.
28database-indexing
Database indexing internals, index type selection, query plan analysis, and write-overhead tradeoffs across PostgreSQL, MySQL, and MongoDB. Use when user asks to "optimize queries", "create indexes", "fix slow queries", "read EXPLAIN output", "reduce query time", "index strategy", "database performance", "composite index", "covering index", "partial index", "index bloat", "unused indexes", or needs help diagnosing and resolving database performance problems.
13testing-strategies
Testing strategies, patterns, and methodologies across the full testing spectrum. Use when asked about unit tests, integration tests, e2e tests, test pyramid, mocking, test doubles, TDD, property-based testing, snapshot testing, test coverage, mutation testing, contract testing, performance testing, test data management, CI/CD testing, flaky tests, test anti-patterns, test organization, test isolation, test fixtures, test parameterization, or any testing strategy, approach, or methodology.
10secret-scanner
This skill should be used when the user asks to "scan for secrets", "find API keys", "detect credentials", "check for hardcoded passwords", "find leaked tokens", "scan for sensitive keys", "check git history for secrets", "audit repository for credentials", or mentions secret detection, credential scanning, API key exposure, token leakage, password detection, or security key auditing.
10terraform
Terraform infrastructure as code for provisioning, modules, state management, and workspaces. Use when user asks to "create infrastructure", "write Terraform", "manage state", "create module", "import resource", "plan changes", or any IaC tasks.
10kubernetes
Kubernetes and kubectl mastery for deployments, services, pods, debugging, and cluster management. Use when user asks to "deploy to k8s", "create deployment", "debug pod", "kubectl commands", "scale service", "check pod logs", "create ingress", or any Kubernetes tasks.
10