manage-assets
manage-assets
Repos don't get slow from code. They get slow from binaries — a PDF committed last year, a 400 MB SQLite file a junior engineer checked in, a node_modules/ that snuck past .gitignore, a dist/ directory nobody bothered to exclude. A single 200 MB blob in git history turns git clone into a coffee break for every new collaborator, forever.
This skill surfaces that bloat. It is diagnosis-only — it never deletes a file, never rewrites history, never runs git filter-repo, never migrates to LFS. When the operator approves a finding, the skill hands off: refactor-verify for delete-from-history operations (it owns the verification discipline), manage-secrets-env if a leaked credential turns up inside a blob, fight-repo-rot if the asset is unused.
What this skill is: a sorted list of what's making the repo heavy, with provenance and a proposed fix owner.
What this skill is not: a history-rewriting tool, an LFS migration executor, or a dead-code detector (that's fight-repo-rot). It surfaces bloat; it does not remove bloat.
State assumptions — before acting
Before starting the procedure, write an explicit Assumptions block. Don't pick silently between interpretations; surface the choice. If any assumption is wrong or ambiguous, pause and ask — do not proceed on a guess.
Required block:
Assumptions:
- Public clones: <none known | public repo with active clones/forks (history rewrite requires coordination)>
- git-lfs: <installed + initialized | available but uninitialized | not installed>
- Requested action: <diagnosis only (default) | destructive hand-off to refactor-verify for history rewrite or LFS migration>
- Secret-shaped blob: <none | FOUND in history — hand off to audit-security, do not auto-propose removal>
Typical items for this skill:
- Whether the repo has public clones or forks (history rewrites require coordination with all of them)
- Whether
git lfsis installed and initialized - Whether the operator wants diagnosis-only (default) or is asking for a destructive action (history rewrite / LFS migration) — the skill itself is diagnosis-only and hands off destructive work to refactor-verify
Stop-and-ask triggers:
- History rewrite proposed on a repo with known public clones — require explicit acknowledgment of coordination burden before hand-off
- A "large file" is actually in active use — don't recommend removal, recommend LFS migration instead
Silent picks are the most common failure mode: the skill runs, produces plausible output, and the operator doesn't notice the wrong interpretation was chosen. The Assumptions block is cheap insurance.
When to trigger
- "my repo is huge" / "why is this so big"
- "git clone is slow"
- "I committed a big file"
- "should I use LFS"
- "delete a big file from history"
- "hundreds of MB of images"
- "DB file in git"
- before open-sourcing (clone speed becomes a first impression)
- during a storage quota warning on the host (GitHub, GitLab free tiers)
- when
git gcwarns about loose objects or pack size
Primary category: large files in the working tree
The simplest and highest-signal check. What files currently tracked by git exceed a size threshold? These are the first targets because they are the easiest to diagnose — the file still exists, you can open it, you can tell what it is.
# Every tracked file larger than 1 MB, sorted biggest first
git ls-files -z | xargs -0 du -b 2>/dev/null \
| sort -rn | head -50
# Or using find over git-tracked files only
git ls-files | while read f; do
[ -f "$f" ] && printf '%s\t%s\n' "$(wc -c <"$f")" "$f"
done | sort -rn | head -50
Thresholds the skill uses by default:
| Size | Severity | Why |
|---|---|---|
| > 100 MB | 🔴 CRITICAL | GitHub hard limit — host will reject the push. Must fix. |
| > 50 MB | 🔴 HIGH | GitHub warning threshold. git clone is painful on slow connections. |
| > 10 MB | 🟡 MEDIUM | Worth knowing about. Almost never source code; usually a binary artifact. |
| > 1 MB | 🟢 LOW | Normal for images, lockfiles, some PDFs. Only flag if suspicious. |
For every hit, classify by file type:
- Source-like (large JSON, large CSV, large SQL) — usually fine, but consider whether it belongs in LFS or a separate data repo.
- Binary artifacts (
.zip,.tar.gz,.exe,.dmg,.dll,.so, compiled.a, builtdist/,node_modules/) — almost always a mistake. Flag with HIGH confidence. - Media (images, video, audio, PDFs, fonts) — may be intentional. Flag for LFS consideration if repeated or over 10 MB.
- Databases (
.sqlite,.db,.mdb,.realm) — almost never belong in git. Flag as CRITICAL regardless of size. - Secrets-shaped (
.pem,.key,.p12,id_rsa,*.credentials.json) — incident. Hand off tomanage-secrets-envimmediately.
Secondary category: large blobs in git history
A file currently in the working tree is visible. A file deleted from the working tree but still in git history is invisible to ls and git ls-files, but every git clone still downloads it. This is the single most common cause of an unexpectedly huge repo.
# Every object in the repo, ranked by on-disk size
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| awk '$1=="blob" {print $3, $4}' \
| sort -rn | head -30
# Which commit(s) introduced the blob? — useful for context
git log --all --oneline --find-object=<sha>
Findings here are tagged with an extra field: reachable in HEAD (still in the working tree) vs historical only (deleted from HEAD but preserved in history). Historical-only blobs require history rewriting to remove — which is destructive and breaks every existing clone.
Never rewrite history as part of this skill. If the operator approves a historical-blob removal, hand off to refactor-verify with the specific blob SHA and a removal plan. Force-push to a shared branch is a coordination problem, not a technical one.
Secondary category: LFS migration candidates
Files that should probably have been in Git LFS from day one. Rule of thumb: any file that is both binary and over 10 MB, or any file over 50 MB.
# Is LFS already installed / configured?
git lfs env 2>/dev/null | head
git lfs ls-files 2>/dev/null | wc -l
cat .gitattributes 2>/dev/null | grep -E 'filter=lfs'
Report:
- LFS already configured — list which patterns are tracked and whether any large files in the working tree match none of them (escape hatches).
- LFS not configured, large binaries present — list the files and propose LFS patterns (e.g.,
*.psd filter=lfs diff=lfs merge=lfs -text). - Historical blobs that should have been LFS — note that migration (
git lfs migrate import) rewrites history, with the same warning as above.
The skill never runs git lfs migrate. It produces the list and the proposed .gitattributes additions; the operator decides.
Secondary category: asset directory growth
Some directories are expected to grow — assets/, public/, docs/images/, fixtures/. The problem is not the directory; it's the lack of a policy. This audit answers: is any directory growing without bound?
# Disk usage per top-level directory, largest first
du -sh ./*/ 2>/dev/null | sort -rh | head -20
# Directories with more than 100 files
find . -type d -not -path '*/.git/*' -exec sh -c '
count=$(find "$1" -maxdepth 1 -type f 2>/dev/null | wc -l)
[ "$count" -gt 100 ] && printf "%s\t%s\n" "$count" "$1"
' _ {} \; | sort -rn | head -20
Flag:
- Any top-level directory over 100 MB that is not
node_modules/.venv/target/(those should be gitignored). - Any directory with more than 500 files that isn't obviously a code directory.
dist//build//out//.next//__pycache__//.pytest_cache/tracked in git (should be in.gitignore).
Hand off .gitignore gaps to manage-secrets-env (which owns the default-safe .gitignore template).
Secondary category: duplicate binaries
Same file content under different paths is bloat twice over — you're paying storage for both copies, and you're guaranteed to edit one and forget the other. Detect by hash:
# Hash every tracked file, group by content
git ls-files -z | xargs -0 sha1sum 2>/dev/null \
| sort | awk '{
if ($1 == prev) { print prev, prev_path; print $1, substr($0, 42); }
prev = $1; prev_path = substr($0, 42);
}' | sort -u
Report clusters with 2+ files sharing the same SHA-1. Classify:
- Identical assets in multiple locations (same logo in
public/andsrc/assets/) — candidate for a single-source-of-truth move, but not always wrong (build systems sometimes duplicate). - Duplicate lockfiles or config — usually a bug.
- Identical test fixtures — candidate for a shared
fixtures/directory.
Output — prioritized diagnosis
# Repo bloat report — <date>
## Stats
- Total repo size on disk: <N MB>
- .git directory size: <N MB>
- Working tree size: <N MB>
- Tracked files: <N>
- Files > 10 MB: <N>
- Files > 50 MB: <N>
- Historical blobs > 10 MB: <N>
## Critical (🔴)
- `data/users.sqlite` — 340 MB, tracked in HEAD. Database files never belong in git. **Fix with:** `refactor-verify` to delete from HEAD and history; `manage-secrets-env` to add `*.sqlite` to `.gitignore`.
- Historical blob `abc123...` — 180 MB, was `backup.tar.gz`, deleted in commit `def456` but still in history. **Fix with:** `refactor-verify` (rewrite history, coordinate force-push).
## High (🟡)
- `public/demo.mp4` — 42 MB, tracked in HEAD. Good LFS candidate. **Fix with:** LFS migration via `.gitattributes`, hand off to `refactor-verify` for the migration commit.
- `assets/fonts/` — 38 MB across 12 font files. Consider LFS or a CDN.
## Medium (🟢)
- `docs/images/` — 87 MB across 240 PNGs. Growing without a compression policy. **Fix with:** establish a max-size per image, batch-compress existing ones (not a skill job — the operator handles).
## LFS status
- LFS configured: no / yes
- If yes: patterns tracked = <list>
- Files that should be in LFS but aren't: <count>
## Proposed `.gitattributes` (if LFS migration is approved)
```gitattributes
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.psd filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
.gitignore gaps found
dist/tracked (should be ignored) — hand off tomanage-secrets-env.next/tracked (should be ignored) — hand off tomanage-secrets-env
Hand-off summary
- Delete from HEAD and history ( items) →
refactor-verify - LFS migration ( items) →
refactor-verifywith proposed.gitattributes .gitignoregaps ( items) →manage-secrets-env- Unused assets that can be deleted outright ( items) →
fight-repo-rotto confirm unused, thenrefactor-verifyto delete
What this report did NOT do
Pure diagnosis. No files deleted, no history rewritten, no LFS migration run. Approve any item and it gets handed to the specialist that owns the fix.
## Things not to do
- **Never rewrite git history.** `git filter-repo`, `git filter-branch`, `bfg`, `git lfs migrate import` — none of these run from this skill. All of them destroy history and break every existing clone. They are always a `refactor-verify` handoff with operator confirmation and a coordinated force-push plan.
- **Never delete files.** Even a simple `git rm` of a 400 MB SQLite is not this skill's job. Diagnosis only. Flag it, hand it off.
- **Don't chase small wins.** A 1 MB file in a 10 GB repo is noise. Prioritize by size, not by count.
- **Don't confuse large with dead.** A file being big does not mean it's unused. A big asset might be actively referenced by the build. For "is this used" questions, that's `fight-repo-rot`.
- **Don't assume LFS is always the answer.** LFS costs money on most hosts and has its own operational overhead. Small teams with occasional large files often do fine with "just don't commit it." Recommend LFS only when the pattern is recurring.
- **Don't expand the diagnosis scope beyond what was asked.** This skill never edits; the scope-creep risk is in findings. If the operator asked about a specific large directory, don't hand back an all-repo blob scan with LFS recommendations for every binary. Adjacent findings go in a short "also worth looking at" footer — not the main bloat report.
## Sweep mode — read-only audit
This skill is **already diagnosis-only** — it never edits regardless of how it's invoked. When the umbrella runs it with `sweep=read-only`, behavior is identical to a direct invocation: produce the prioritized diagnosis report, hand off every approval to the right specialist.
The report format is the one above; the sweep just feeds the top-line stats and any CRITICAL / HIGH findings into the umbrella's synthesis step.
How to tell: if the task context includes `sweep=read-only`, shorten the report to stats + CRITICAL + HIGH only, skip the LFS proposed `.gitattributes`, and defer the operator-approval dialog to the umbrella's synthesis step.
## Harsh mode — no hedging
When the task context contains the `tone=harsh` marker (usually set by the `/vibesubin harsh` umbrella invocation, but can also come from direct requests like *"don't sugarcoat"* / *"brutal review"* / *"매운 맛"*), switch output rules:
- **Lead with the biggest file.** First line of the report is the worst offender, with exact size and the consequence. *"`data/users.sqlite` is 340 MB and is in every single clone of this repo, forever, until you rewrite history."* Not *"large SQLite file detected in the repo."*
- **No softening words.** Drop *"might want to consider"*, *"could be a candidate for"*, *"probably worth"*. Replace with direct verbs: *"delete this from history"*, *"move this to LFS now"*, *"this file should never have been committed"*.
- **Consequence framing on every finding.** Balanced mode says *"42 MB video tracked in git"*. Harsh mode says *"every `git clone` of this repo downloads this 42 MB video, even though nobody on the team needs it locally."*
- **LFS recommendations are directive, not suggestive.** *"Migrate `*.psd` to LFS before the next commit"* — not *"you might want to set up LFS for Photoshop files."*
- **No *"a few polish items"* closures.** If a repo has any file over 50 MB, the verdict line does not end on a positive note. *"Clean this up before you open-source — the first clone will take fifteen minutes on a hotel WiFi."*
- **Historical blobs get urgency language.** *"This 180 MB blob is no longer in the working tree but every past and future clone still downloads it. Fix with a history rewrite or stop recommending this repo to anyone on mobile."*
Harsh mode does not invent findings, exaggerate sizes, or become rude. Every harsh statement must cite the same `git rev-list` / `du` / `git ls-files` output the balanced version would cite. The change is framing, not substance.
## Layperson mode — plain-language translation
When the task context contains `explain=layperson` (from `/vibesubin explain`, `/vibesubin easy`, *"쉽게 설명해줘"*, *"일반인도 이해되게"*, *"explain like I'm non-technical"*, *"非開発者でも分かるように"*, *"用通俗的话解释"*), add a plain-language layer to every finding this skill emits. Combines freely with `tone=harsh`. Full rules at `/plugins/vibesubin/skills/vibesubin/references/layperson-translation.md`.
### Three dimensions per finding
Every finding gets three questions answered in plain language, in the operator's language (Korean / English / Japanese / Chinese):
- **왜 이것을 해야 하나요? / Why should you do this?** — *"저장소에 무거운 바이너리(이미지·DB 덤프·dist 폴더·압축 파일)가 쌓이면, 처음 clone하는 사람은 1GB를 받고, CI는 매번 느려집니다."*
- **왜 중요한 작업인가요? / Why is it an important task?** — *"큰 저장소는 오픈소스화·새 팀원 온보딩·CI 속도에 전부 영향을 줍니다. 한 번 깃 히스토리에 들어간 큰 파일은 나중에 빼려면 히스토리 재작성이라는 위험한 작업이 필요해요."*
- **그래서 무엇을 하나요? / So what gets done?** — *"작업 트리와 깃 히스토리 모두를 스캔해서 10MB 넘는 파일·LFS로 가야 할 것·자산 폴더 증가 추이·중복 바이너리를 찾아 보고합니다. 지우는 건 refactor-verify·manage-secrets-env·fight-repo-rot에 넘기고, 이 스킬은 손대지 않습니다."*
### Severity translation
- CRITICAL → *"지금 당장 — 100MB 넘는 단일 바이너리 또는 시크릿 shape 바이너리"*
- HIGH → *"이번 주 안에 — LFS로 가야 할 큰 파일 여러 개"*
- MEDIUM → *"자산 폴더가 꾸준히 커지고 있음 — 방치하면 HIGH로 악화"*
- LOW → *"중복 바이너리 소량 — 정리하면 좋지만 급하지 않음"*
### Box format
Wrap each finding in the box format from the shared reference. Header uses urgency phrase and the finding number. Footer names the hand-off skill.
### What does NOT change
Findings, counts, file:line references, evidence, and severity are identical to balanced/harsh output. Only the wrapping and dimension annotations are added.
## Hand-offs
- **Delete from HEAD only** (file is currently tracked, no history rewrite needed) → `refactor-verify` with `git rm` plus `.gitignore` addition
- **Delete from history** (large blob, deleted or not) → `refactor-verify` with `git filter-repo` plan and force-push coordination
- **LFS migration** → `refactor-verify` with proposed `.gitattributes` and migration steps; requires operator approval before touching history
- **`.gitignore` gaps** (build directories tracked, `node_modules/` tracked, etc.) → `manage-secrets-env` owns the default-safe `.gitignore` template
- **Asset is unused** (leftover from a prior feature) → `fight-repo-rot` to confirm unused, then `refactor-verify` to delete
- **Secret found inside a blob** (API key embedded in a committed binary, credentials in a ZIP) → `manage-secrets-env` and `audit-security` immediately; rotate first, delete second
## Details and tools
The full methodology is inlined in this `SKILL.md`. Tools worth knowing (install-on-demand):
- `git-sizer` — GitHub's official repo-size analyzer, flags path lengths, blob size distribution, tree depth
- `git-filter-repo` — the current recommended tool for history rewriting (replaces `git filter-branch`)
- `bfg-repo-cleaner` — older, Java-based; still useful for specific patterns
- `git lfs` — official LFS CLI; `git lfs migrate import` for bulk migration (history-rewriting)
- `du` / `find` — universal primitives, always available
- `scc` / `tokei` — not for bloat but for separating source LOC from "everything else"
Scripts and references are intentionally minimal — the primary deliverable is the diagnostic report, and the analysis lives in the single SKILL.md so nothing drifts out of sync.
More from subinium/vibesubin
manage-secrets-env
Opinionated defaults and full lifecycle playbook for secrets and environment variables. Decides where a secret or env-specific value lives (constant, .env, CI secret, env var), scaffolds .env.example and .gitignore, and manages the lifecycle end to end — add, update, rotate, remove, migrate between buckets, audit cross-environment drift, provision new environments. High-stakes companion to project-conventions. Language-agnostic.
4setup-ci
Teaches CI/CD from first principles to a non-developer, then scaffolds a working test + deploy pipeline. Handles the common hosts (GitHub Actions, GitLab CI, CircleCI, Travis, Jenkins) and common deploy targets (SSH to VM, Vercel, Netlify, Fly.io, Cloud Run, Docker registries). Asks what the operator has before generating anything — never assumes.
3audit-security
Runs a deliberately small, hand-curated security sweep across a repo. Finds secrets committed to git, SQL/shell injection patterns, XSS sinks, path traversal, dangerous deserialization, missing cookie flags, wildcard CORS, and tracked credential files. Triages every finding as real / false-positive / needs-review before reporting. Language-agnostic, no heavyweight scanner required.
3unify-design
Establishes a web project's design system as the single source of truth — colors, spacing, typography, radius, shadow, breakpoints — then audits the codebase for drift against it (hardcoded hex values, arbitrary Tailwind values, magic px/rem numbers, duplicate component variants, inconsistent navigation) and fixes the drift by extracting repeated values to design tokens. Framework-aware — Tailwind (v3 and v4), CSS Modules, styled-components / Emotion, Material UI, Chakra UI, vanilla CSS with custom properties. Multi-file rewrites hand off to refactor-verify.
3refactor-verify
Proves a behavior-preserving code change (refactor, rename, split, merge, extract, inline, or delete of confirmed-dead code) is actually complete. Plans the change as a dependency tree, executes it from the leaves up, and after each step proves 1:1 semantic equivalence through four independent checks — exported symbol-set diff, per-node AST diff, full behavioral test suite, and call-site closure via find-references. Runs before claiming any such change is done. Works for any language with a test runner and a way to grep for symbols.
3vibesubin
The vibesubin command and vibe. Runs every code-hygiene specialist in the plugin across a repository in parallel and synthesizes their findings into a single prioritized report. Process skills like `/ship-cycle` and host-specific wrappers like `/codex-fix` are direct-call only and not part of the sweep. Invoke by name (/vibesubin) for a full sweep, or let it route a vague request to the right sub-skill when the operator isn't sure where to start. Read-only by default; fixes apply only after the operator approves items from the report. Two optional output modifiers — `tone=harsh` for direct, no-hedging framing, and `explain=layperson` for plain-language translation (3-dimension box format with "왜 해야 / 왜 중요 / 무엇을 할지" per finding); the two combine. When two specialists give contradictory advice on the same file, the umbrella emits a skill-conflict block (gap / reason / basis per side) instead of silently picking one.
3