files-buddy
Files Buddy
Safe filesystem organization and cleanup. Delegates to best-in-class CLI tools for deduplication, renaming, archiving, and analysis. Cloud drives (iCloud, Google Drive, Dropbox, OneDrive) are first-class citizens with auto-detection and adjusted safety.
Scope: File organization, cleanup, renaming, deduplication, archiving, and analysis. NOT for shell script generation (shell-scripter), CI/CD pipelines (devops-engineer), or database work (database-architect).
Canonical Vocabulary
| Term | Definition |
|---|---|
| dry-run | Preview via tool's native mode (fclones group, f2 default, organize sim, detox -n) |
| manifest | JSON log at ~/.files-buddy/manifests/ enabling undo and discovery |
| blast radius | Total file count and size affected by an operation |
| protected path | Hard-blocked (system) or escalated-confirmation directory |
| trash | Reversible deletion via gomi / OS trash / .files-buddy-trash/ |
| scope pin | Hard boundary = user-referenced directory only |
| material drift | Filesystem changed >10% between preview and execution |
| batch | Non-overlapping operations; rollback unit |
| intent contract | User-confirmed description (not individual file list) |
| hardlink cluster | Files sharing same inode — NOT duplicates |
| tool delegation | Invoking CLI tool via subprocess; prefer over reimplementation |
| fallback | Python stdlib when CLI tool is not installed |
| evicted file | Cloud placeholder; must materialize before size/hash analysis |
| cloud-safe | Operations adjusted for sync implications |
| conflict copy | Duplicate from sync conflict (e.g., file (1).txt) |
| dashboard | Self-contained HTML visualization opened in browser |
Dispatch
| $ARGUMENTS | Mode | Destructive? |
|---|---|---|
organize <path> |
organize | Yes (moves) |
clean <path> |
clean | Yes (trash) |
audit <path> |
audit | No (read-only) |
rename <path> <pattern> |
rename | Yes (renames) |
flatten <path> |
flatten | Yes (moves) |
archive <path> |
archive | Yes (moves) |
sanitize <path> |
sanitize | Yes (renames) |
find <path> <query> |
find | No (read-only) |
watch <path> [rules] |
watch | Yes (moves) |
undo <manifest> |
undo | Yes (restores) |
dashboard [path] |
dashboard | No (writes report) |
| Empty or unrecognized | — | Show mode menu |
Auto-Detection Heuristic
- "sort", "organize", "tidy" + path -> organize
- "duplicates", "clean", "dedup", "lint" + path -> clean
- "how big", "usage", "analyze", "scan" + path -> audit
- "rename", "batch rename" + pattern -> rename
- "flatten", "collapse" + path -> flatten
- "archive", "compress", "old files" + path -> archive
- "fix names", "sanitize", "encoding" + path -> sanitize
- "find", "search", "where is" + query -> find
- "watch", "auto-organize", "monitor" + path -> watch
- "undo", "reverse", "restore" + manifest -> undo
- "dashboard", "visualize", "report" -> dashboard
- Ambiguous -> ask which mode
Structural Constraints
- Operation whitelist: move, rename, copy, trash, mkdir. NEVER
rm,chmod,chown. - Scope pinning: boundary = user-referenced directory only.
- Hard-blocked paths: never operate on filesystem roots or OS-managed system directories; use
references/protected-paths.mdas the canonical path list and validation source - Symlink resolution:
os.path.realpath()before plan gen. Cycles detected (max 40 hops). .git/always excluded.- Cloud-safe: NEVER auto-delete cloud files. Materialize evicted files before analysis.
Escalated-Confirmation Paths
~/.ssh, ~/.gnupg, ~/.aws, ~/.config, ~/.kube, ~/.local/share/keyrings, any directory containing .env
Require full preview + explicit path naming + warning before any operation.
Tiered Friction Model
| Tier | Trigger | Friction |
|---|---|---|
| Low | Rename, move within parent, <10 files | Inline plan, [y/N] |
| Medium | Cross-dir move, 10-100 files, archive | Summary preview, confirmation |
| High | Any trash, recursive, 100+ files, 1 GB+ | Full preview, blast radius, type "yes" |
| Critical | Escalated paths, cloud directories | Full preview + path naming + warning |
AI-initiated ops bias one tier higher. Cloud ops always at least Medium.
Pre-Flight Checks
Run before every mode:
- Path resolution — reject hard-blocked and escalated paths (or escalate friction)
- Scope boundary — confirm scope pin to user-referenced directory
- Cloud detection — check
~/Library/CloudStorage/*,~/Library/Mobile Documents/com~apple~CloudDocs; tag cloud-synced dirs, adjust behavior - Symlink inventory — flag escapes outside scope, detect cycles
- Tool availability —
command -v fd fclones rmlint f2 dust erd gomi ouch zstd b3sum detox convmv rclone pueue bat watchexec organize 2>/dev/null; report missing, suggest install - Permission check — flag restricted files (
statfor read/write access) - Disk space — verify free space >= estimated operation size
- Git awareness — detect
.gitignore, warn before moving tracked files - Case sensitivity — detect APFS case-insensitive volumes, flag rename collisions
- Eviction check — materialize placeholder files before analysis (
brctl downloadfor iCloud, access for GDrive stream)
Mode: organize
Sort files by type, date, project, or custom rules using organize-tool.
- Generate organize-tool YAML config from user intent. Read
references/organization-strategies.md - Run
organize sim <config>(dry-run) — parse output, present preview - Show blast radius: file count, size, destination structure as tree
- On confirmation, run
organize run <config>with manifest logging - Report: operations completed, manifest path,
open <dest>to verify - Fallback:
shutil.movewith manual rule matching if organize-tool not installed
Mode: clean
Remove duplicates, lint filesystem, trash temp files.
- Read
references/duplicate-detection.md. Runfclones group --format json <path>for duplicates - Run
rmlint -o json:<tmpfile> <path>for empty dirs, broken symlinks, orphan files - For cloud dirs: use
rclone dedupe --dry-run <remote>:path(Google Drive duplicate filenames) - Present grouped findings: duplicates (with sizes), lint issues, reclaimable space
- On confirmation, trash selected items via
gomi(neverrm). Log to manifest - Fallback:
hashlib+os.walkfor dedup;os.listdirfor empty dirs
Mode: audit
Analyze disk usage and find issues. Strictly read-only.
- Run
dust -j <path>for disk usage summary (JSON output) - Run
erd -l -s rsize <path>for directory tree with sizes - Run
rmlint -o json:<tmpfile> <path>for lint issues (empty dirs, broken symlinks) - For cloud dirs:
rclone size <remote>:path+rclone lsjson(materialize evicted files first) - Present: top space consumers, file type distribution, stale files (>1yr untouched), issues
- Offer transitions: "Clean duplicates?" / "Archive old files?"
- Fallback:
os.stat+os.walkfor sizes;pathlibfor file listing
Mode: rename
Batch rename with regex, EXIF, ID3 templates using f2.
- Read
references/rename-patterns.md. Constructf2command from user pattern - Run
f2 <flags> <path>(dry-run by default) — parse rename table - Present before/after diff:
- old-ugly-name_FINAL_v2.pdf + 2024-01-project-report.pdf - On confirmation, run
f2 -x <flags>(execute). Log to manifest - For undo:
f2 -uusing f2's native undo support - Fallback:
re.sub+os.renamewith collision detection
Mode: flatten
Collapse nested directories to a single level.
- Inventory target dir — build tree, count files, detect naming collisions
- Plan flat destination with collision resolution (append
-1,-2, etc.) - Present preview: tree before vs flat list after, collision resolutions
- On confirmation,
shutil.moveeach file. Log to manifest - Clean up empty directories (bottom-up traversal)
- No CLI tool dependency — Python stdlib only
Mode: archive
Compress old or unused files.
- Identify candidates: files untouched >N days (user-specified or default 365)
- Group by directory or type for archive bundles
- Run
ouch compress <files> <archive.tar.zst>(orzstdfor single files) - Present preview: files to archive, compressed size estimate, destination
- On confirmation, compress and optionally move originals to trash. Log to manifest
- For cloud:
rclone move <local> <remote>:archive/for cloud archiving - Fallback:
tarfile+gzipfrom Python stdlib
Mode: sanitize
Fix filenames: remove special characters, fix encoding, normalize Unicode.
- Run
detox -n <path>(dry-run) for character cleanup preview - Run
convmv -f <from> -t utf-8 --nfc <path>for encoding normalization preview - Present before/after rename table with changes highlighted
- On confirmation, run
detox <path>andconvmv --notest -f <from> -t utf-8 --nfc <path>. Log to manifest - Fallback:
re.subfor character cleanup;unicodedata.normalizefor NFC/NFD
Mode: find
Smart file search with rich output. Strictly read-only.
- Translate natural language to
fdflags: "large PDFs" ->fd -e pdf --size +10m <path> - Run
fdcommand, pipe matches throughbatfor syntax-highlighted preview - Present results as markdown table: name, size, modified date, path
- Offer transitions: "Found 42 PDFs -> organize them?" / "Found duplicates -> clean them?"
- For cloud dirs:
rclone lsjson <remote>:path --recursivewith--include/--exclude - Fallback:
os.walkwithfnmatchfiltering
Mode: watch
Auto-organize files on creation using watchexec + organize-tool.
- Generate organize-tool YAML config from user rules. Read
references/organization-strategies.md - Start watcher:
watchexec -e jpg,png,pdf -- organize run <config>(background) - Register in
~/.files-buddy/watchers.jsonfor persistence - Log all auto-organized files to manifest
- Subcommands:
watch start <path>,watch stop <id>,watch list,watch status - Requires: watchexec + organize-tool (no fallback — notify user to install)
Mode: undo
Reverse a previous operation using manifest records.
- If no manifest specified, run
manifest-manager.py list— show recent operations - Load manifest, validate paths are within
~/.files-buddy/manifests/ - Verify completed file ops include recorded BLAKE3 hashes and that current files match manifest records — abort on mismatch or missing integrity metadata
- Reverse operations in reverse order. Restore cloud-tagged manifests from
.files-buddy-trash/; otherwise restore from gomi / OS trash or.files-buddy-trash/fallback - Mark manifest status =
undone. Report restored files - Fallback:
mvfrom.files-buddy-trash/if gomi unavailable
Mode: dashboard
Open a visual HTML dashboard. Analysis stays read-only, but rendering writes a local report file.
- Run audit analysis (dust, rmlint, fd) to collect data for target path
- Generate JSON: disk usage treemap, file type distribution, duplicates, large files, stale files, operation history from manifests
- Run
uv run python skills/files-buddy/scripts/dashboard-renderer.py --data <json-file|-> --output <path> --open - Default dashboard path is
~/.{gemini|copilot|codex|claude}/files-buddy/{YYYY-MM-DD}-dashboard.htmlunless--outputis provided - Opens in default browser. Includes a self-contained treemap, sortable tables, dark/light theme
- Fallback: Print summary tables in terminal if browser unavailable
Dry-Run Preview Protocol
Every destructive mode produces a preview before execution:
- Tool native dry-run:
fclones group(no--delete),f2(default),organize sim,detox -n,ouch compress --dry-run - Preview table: Source, destination, operation type, size
- Blast radius summary: Total files, total size, directories affected
- Risk indicators:
🔴Critical,🟠High,🟡Medium,🟢Low - Confirmation gate: Tier-appropriate friction (see Tiered Friction Model)
Execution Protocol
Transaction-batched execution for all destructive modes:
- Manifest — atomic write (
.tmp->os.rename()) at~/.files-buddy/manifests/{ts}-{uuid8}-{mode}.jsonviamanifest-manager.py create - Batches — dirs (parallel) -> files (parallel non-overlapping,
pueuefor cloud) -> empty dir cleanup (sequential) - TOCTOU check — if material drift >10% between preview and execution, halt and re-preview
- State tracking — runtime may track preview / in-progress state internally, but manifest rows are appended once per finalized operation:
completedorfailed - Trash — local paths: gomi -> OS trash ->
.files-buddy-trash/; cloud-tagged paths:.files-buddy-trash/only. NEVERrm - Metadata — finalized file ops record: source, dest, type, timestamp, BLAKE3 hash (b3sum), size, st_mode, st_ino
- Cloud batch —
pueuewithparallel 2+ delays to avoid API rate limiting - Failure — roll back current batch, preserve prior completed batches, manifest status =
partial - Report — operations completed, manifest path, undo command
- Notify — desktop notification on completion if operation took >10s (
osascript/notify-send)
Scaling Strategy
| Scope | Strategy |
|---|---|
| <100 files | Direct operation, full preview |
| 100-1,000 files | Batched preview (summary + sample), pueue for parallelism |
| 1,000-10,000 files | Sampling preview (10%), batched execution, progress tracking |
| 10,000+ files | Sampling preview (1%), pueue queued batches, parallel subagents |
Cloud directories: halve the parallelism, double the batch delays.
Reference Files
Load ONE reference at a time. Do not preload all references into context.
| File | Content | Read When |
|---|---|---|
references/tool-integrations.md |
CLI interfaces for 20+ tools: fd, fclones, rmlint, f2, dust, erdtree, gomi, ouch, zstd, b3sum, detox, convmv, rclone, pueue, bat, watchexec, organize-tool, czkawka_cli. Install commands per platform. Output parsing. | Pre-flight tool detection |
references/cloud-drives.md |
Detection paths (macOS CloudStorage, iCloud Mobile Documents), brctl/fileproviderctl, evicted file handling, rclone backends, Google Drive dedupe, conflict copies, rate limiting | Pre-flight cloud detection |
references/organization-strategies.md |
organize-tool YAML templates, extension-to-category mapping, date grouping, project detection, collision handling | Organize mode, watch mode |
references/protected-paths.md |
Hard-blocked paths (macOS + Linux), escalated paths, validation algorithm, .git/ exclusion |
Pre-flight checks |
references/duplicate-detection.md |
fclones JSON parsing, rmlint lint types, rclone dedupe modes, hardlink detection, zero-byte exclusion, NFC/NFD gotchas | Clean mode |
references/rename-patterns.md |
f2 patterns (regex, EXIF {xt.make}, ID3 {id3.artist}, hash {hash.blake3}), CSV batch, conflicts, f2 undo |
Rename mode |
references/safety-workflow.md |
Manifest schema, atomic writes, trash hierarchy, TOCTOU drift, cloud safety rules, permission restoration, corruption recovery | Undo mode, execution protocol |
| Script | When to Run |
|---|---|
scripts/manifest-manager.py |
Create, list, search, validate, and close manifests — all destructive modes |
scripts/dashboard-renderer.py |
Inject JSON data into HTML template, open browser — dashboard mode |
| Template | When to Render |
|---|---|
templates/dashboard.html |
After audit analysis — inject data JSON into <script id="data"> tag |
Critical Rules
- Never delete without confirmation — tier determines friction level
- Never use
rm— local paths go to gomi / OS trash or.files-buddy-trash/fallback; cloud-tagged paths stage only to.files-buddy-trash/ - Never operate outside scope-pinned directory
- Never modify hard-blocked paths
- Never follow symlinks outside operation boundary
- Always create manifest (atomic write) before first operation
- Always show blast radius before destructive operations
- Respect
.gitignore— never move tracked files without confirmation - Audit and find modes are strictly read-only; dashboard only writes a local report file
- Prefer CLI tool delegation over reimplementation
- Exclude dotfiles by default;
.git/always excluded - Warn about restricted-permission files
- Undo must verify BLAKE3 hashes for completed file ops — abort on mismatch or missing hash metadata
- Load ONE reference at a time — do not preload all references
- Operation whitelist: move, rename, copy, trash, mkdir
- TOCTOU: re-validate filesystem state before each batch
- Record st_mode and st_ino in manifest metadata
- Validate manifest paths on undo — reject paths outside
~/.files-buddy/manifests/ - Check tool availability at mode start — fall back to Python, report install command
- NEVER auto-delete cloud-synced files — deletion syncs to all devices
- Materialize evicted files before size or hash analysis
- Rate-limit cloud batch ops — use pueue with throttled parallelism