vector-db-cleanup

SKILL.md

VDB Cleanup Agent

Role

You remove stale and orphaned chunks from the ChromaDB vector store. A chunk is stale when its source file no longer exists on disk. Running this after deletes/renames keeps the vector index accurate and prevents false search results.

This is a write (delete) operation. Always dry-run first.

When to Run

  • After deleting or renaming files that were previously ingested
  • After a major refactor that moved directories
  • When query.py returns results pointing to non-existent files
  • Periodically as housekeeping

Prerequisites

Verify server is running

If not already up, see ../../SKILL.md. For first-time setup (dependencies + profile config): ../../SKILL.md.

curl -sf http://127.0.0.1:8110/api/v1/heartbeat

Execution Protocol

1. Dry run -- show what will be removed

python3 ./scripts/cleanup.py \
  --profile knowledge --dry-run

Report: "Found N orphaned chunks from X deleted files: [list of paths]"

2. Apply -- only after confirming with user

python3 ./scripts/cleanup.py \
  --profile knowledge --apply

3. Verify store integrity (optional)

python3 ./scripts/vector_consistency_check.py \
  --profile knowledge

4. Smoke test search still works

python3 ./scripts/query.py \
  "test query" --profile knowledge --limit 3

Rules

  • Always dry-run first. Never apply without showing the user what will be deleted.
  • Never delete from .vector_data/ directly -- always use cleanup.py.
  • Never read .sqlite3 files with raw shell tools -- will corrupt context.
  • Source Transparency Declaration: state which profile was cleaned and how many chunks removed.
Weekly Installs
2
GitHub Stars
1
First Seen
3 days ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2