searching-codebases

Installation

SKILL.md

Searching Codebases

Find code in any codebase by pattern or concept. One entry point, two search strategies, automatic routing.

Prerequisites

uv tool install ripgrep

tree-sitting (for structural context expansion) installs automatically when the --expand flag is used.

Primary Command

SKILL_DIR=/mnt/skills/user/searching-codebases

python3 $SKILL_DIR/scripts/search.py SOURCE "query1" ["query2" ...] [OPTIONS]

SOURCE is any of:

Local directory path
GitHub URL (downloads tarball automatically)
uploads (uses /mnt/user-data/uploads/)
project (uses /mnt/project/)
Path to a .zip or .tar.gz archive

Search Modes

Regex mode (patterns, identifiers, literal text):

python3 $SKILL_DIR/scripts/search.py ./repo "def handle_error"
python3 $SKILL_DIR/scripts/search.py ./repo "class.*Exception" --regex
python3 $SKILL_DIR/scripts/search.py ./repo "TODO|FIXME|HACK"

Semantic mode (concepts, natural language):

python3 $SKILL_DIR/scripts/search.py ./repo "retry logic with backoff" --semantic
python3 $SKILL_DIR/scripts/search.py ./repo "authentication flow"
python3 $SKILL_DIR/scripts/search.py ./repo "error handling strategy"

Auto-detection: short queries and code-like tokens → regex. Multi-word natural language → semantic. Override with --regex or --semantic.

Options

--regex / --semantic: Force search mode
--expand: Return full function bodies via tree-sitting AST context
--benchmark: Compare indexed regex vs brute-force ripgrep
--branch NAME: Git branch for GitHub URLs (default: main)
--skip DIRS: Comma-separated directories to skip
--json: Machine-readable output
-v: Show index stats and query routing decisions

How It Works

Regex search builds a sparse n-gram inverted index over all files. Queries are decomposed into literal fragments, looked up in the index to identify candidate files (typically 90-99% reduction), then verified with ripgrep. Frequency-weighted n-grams make rare character sequences more selective.

Semantic search builds a TF-IDF index over code chunks (functions, classes, structural entries). Queries are ranked by cosine similarity.

Context expansion (--expand) uses tree-sitting's AST cache to identify function/class boundaries, returning complete structural units rather than line fragments. On first use, tree-sitting scans the repo (~700ms for 250 files); subsequent expansions are sub-millisecond.

Small codebases (< 20 files) skip indexing entirely — direct ripgrep is faster when there's nothing to narrow.

Mixed Queries

Multiple queries can use different modes in a single invocation. Each query is auto-routed independently, and indexes are built once per mode:

python3 $SKILL_DIR/scripts/search.py ./repo \
  "class.*Error" \
  "error recovery strategy" \
  "def retry"

Dependencies

tree-sitting: Provides AST-based context expansion for --expand. Not required — search works without it, just with less structural context in results.
ripgrep: Required for regex verification. Install via uv tool install ripgrep.
scikit-learn: Required for semantic mode. Installs automatically.

When to Use

Known target: "where is the retry logic?", "find all error handlers"
Pattern matching: regex across large codebases with indexed speedup
Concept search: "authentication flow", "database connection pooling"
Cross-reference: find all callers/users of a specific function

When NOT to Use

First encounter: "what does this repo do?" → use exploring-codebases
Repos under ~10 files: just read them directly
Exact symbol lookup: find_symbol('ClassName') via tree-sitting is simpler
Structural overview: use tree-sitting's tree_overview() / dir_overview()

Files

scripts/search.py — Entry point, query routing, output formatting
scripts/resolve.py — Input source resolution (GitHub, uploads, archives)
scripts/context.py — tree-sitting-based AST context expansion
scripts/ngram_index.py — Sparse n-gram inverted index, regex decomposition
scripts/sparse_ngrams.py — Core n-gram algorithms, frequency weights
scripts/code_rag.py — TF-IDF semantic search over code chunks

Related skills

More from oaustegard/claude-skills

Installs

Repository

oaustegard/claude-skills

GitHub Stars

119

First Seen

Mar 29, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

searching-codebases

Searching Codebases

Prerequisites

Primary Command

Search Modes

Options

How It Works

Mixed Queries

Dependencies

When to Use

When NOT to Use

Files

More from oaustegard/claude-skills

developing-preact

reviewing-ai-papers

exploring-codebases

mapping-codebases

accessing-github-repos

asking-questions