skills/zeal422/dev-toolbelt-by-vm/vm-codebase-indexer

vm-codebase-indexer

SKILL.md

Codebase Indexer Skill

Purpose

This skill builds a local semantic index of a codebase. Once indexed, the agent can run fast semantic searches over millions of lines of code without blowing up the context window.


Step 1 β€” Confirmation Popup (MANDATORY, always do this first)

Before doing ANYTHING else, use the ask_user_input_v0 tool to ask the user:

question: "πŸ—‚οΈ Ready to index your codebase? This will scan all files and build
a local semantic search index. It may take a minute depending on project size."
type: single_select
options:
  - "YES! Let's index it πŸš€"
  - "Not yet"
  • If the user picks "Not yet" β†’ stop immediately, say something friendly like "No worries! Come back when you're ready." Do not proceed further.
  • If the user picks "YES! Let's index it πŸš€" β†’ continue to Step 2.

Step 2 β€” Determine the Codebase Path

Ask the user (in plain text, not a widget) for the root path of the codebase they want to index. If the current working directory is clearly the project root (e.g. there's a package.json, pyproject.toml, Cargo.toml, or .git in it), suggest that path as the default and confirm with the user before using it.

Store the confirmed path as CODEBASE_PATH.


Step 3 β€” Install Dependencies

Run the following to ensure required packages are available:

pip install chromadb sentence-transformers --break-system-packages -q

If the install fails, report the error to the user and stop.


Step 4 β€” Run the Indexer

Execute the indexer script:

python3 /path/to/skill/scripts/index.py --path "<CODEBASE_PATH>"

Replace /path/to/skill/ with the actual location of this skill's scripts folder.

The script will:

  • Walk the directory tree (respecting .gitignore and skipping common noise dirs)
  • Chunk each file into meaningful segments
  • Embed each chunk using a local embedding model (all-MiniLM-L6-v2)
  • Persist everything to a ChromaDB database at ~/.codebase-index/<project-name>/

While it runs, let the user know it's working. The script prints progress to stdout so you can relay updates.

If the script exits with a non-zero code, show the error and stop.


Step 5 β€” Confirm Success

When the script finishes, it prints a JSON summary line like:

{"status": "done", "files": 142, "chunks": 891, "db_path": "~/.codebase-index/my-project"}

Parse this and report back to the user in a friendly way, e.g.:

βœ… Done! Indexed 142 files across 891 chunks. The index is saved at ~/.codebase-index/my-project and ready to search.


Step 6 β€” How to Search the Index (after indexing)

Whenever you need to find relevant code during a task, use the search script:

python3 /path/to/skill/scripts/search.py --db "<DB_PATH>" --query "<your query>" --results 5

This returns the top N most semantically relevant code chunks as JSON. Read them and use their content to answer the user's question. Always prefer searching the index over reading entire files.


Important Rules

  • Always show the confirmation popup first. Never skip Step 1.
  • Never index without the user's explicit YES.
  • The index persists between sessions β€” if one already exists at the same path, the indexer will update it (add new/changed files, skip unchanged ones).
  • Respect .gitignore. Never index node_modules, .git, __pycache__, dist, build, .next, venv, .env, or binary files.
  • The embedding model runs locally β€” no data leaves the machine.
Weekly Installs
4
First Seen
12 days ago
Installed on
qoder4
kilo4
gemini-cli4
antigravity4
qwen-code4
github-copilot4