Codebase Indexer Skill

Purpose

This skill builds a local semantic index of a codebase. Once indexed, the agent can run fast semantic searches over millions of lines of code without blowing up the context window.

Step 1 — Confirmation Popup (MANDATORY, always do this first)

Before doing ANYTHING else, use the ask_user_input_v0 tool to ask the user:

question: "🗂️ Ready to index your codebase? This will scan all files and build
a local semantic search index. It may take a minute depending on project size."
type: single_select
options:
  - "YES! Let's index it 🚀"
  - "Not yet"

If the user picks "Not yet" → stop immediately, say something friendly like "No worries! Come back when you're ready." Do not proceed further.
If the user picks "YES! Let's index it 🚀" → continue to Step 2.

Step 2 — Determine the Codebase Path

Ask the user (in plain text, not a widget) for the root path of the codebase they want to index. If the current working directory is clearly the project root (e.g. there's a package.json, pyproject.toml, Cargo.toml, or .git in it), suggest that path as the default and confirm with the user before using it.

Store the confirmed path as CODEBASE_PATH.

Step 3 — Install Dependencies

Run the following to ensure required packages are available:

pip install chromadb sentence-transformers --break-system-packages -q

If the install fails, report the error to the user and stop.

Step 4 — Run the Indexer

Execute the indexer script:

python3 /path/to/skill/scripts/index.py --path "<CODEBASE_PATH>"

Replace /path/to/skill/ with the actual location of this skill's scripts folder.

The script will:

Walk the directory tree (respecting .gitignore and skipping common noise dirs)
Chunk each file into meaningful segments
Embed each chunk using a local embedding model (all-MiniLM-L6-v2)
Persist everything to a ChromaDB database at ~/.codebase-index/<project-name>/

While it runs, let the user know it's working. The script prints progress to stdout so you can relay updates.

If the script exits with a non-zero code, show the error and stop.

Step 5 — Confirm Success

When the script finishes, it prints a JSON summary line like:

{"status": "done", "files": 142, "chunks": 891, "db_path": "~/.codebase-index/my-project"}

Parse this and report back to the user in a friendly way, e.g.:

✅ Done! Indexed 142 files across 891 chunks. The index is saved at ~/.codebase-index/my-project and ready to search.

Step 6 — How to Search the Index (after indexing)

Whenever you need to find relevant code during a task, use the search script:

python3 /path/to/skill/scripts/search.py --db "<DB_PATH>" --query "<your query>" --results 5

This returns the top N most semantically relevant code chunks as JSON. Read them and use their content to answer the user's question. Always prefer searching the index over reading entire files.

Important Rules

Always show the confirmation popup first. Never skip Step 1.
Never index without the user's explicit YES.
The index persists between sessions — if one already exists at the same path, the indexer will update it (add new/changed files, skip unchanged ones).
Respect .gitignore. Never index node_modules, .git, __pycache__, dist, build, .next, venv, .env, or binary files.
The embedding model runs locally — no data leaves the machine.

vm-codebase-indexer