repo-search
Repo Search & Summarisation
Semantic search across a directory of documents using ChromaDB vector embeddings. Supports markdown, PDF, DOCX, and XLSX files. Retrieves relevant chunks without loading entire files into context. Designed for use with a "second brain" or personal knowledge base, but works with any collection of documents.
Prerequisites
- Python virtual environment set up (run setup.sh if not done)
- Index built (run ingest if no
.vectordb/directory exists)
First-Time Setup
# Set up Python environment (one-time)
~/.claude/skills/repo-search/setup.sh
# Build the index (run from brain repo root)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/ingest.py /path/to/your/markdown-repo --verbose
Rebuild Index (after adding/changing files)
# Incremental update (only changed files)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/ingest.py /path/to/your/markdown-repo
# Full rebuild
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/ingest.py /path/to/your/markdown-repo --force --verbose
Search Operations
Semantic Search (default)
Find content semantically related to a query:
# Basic search (returns top 10 chunks)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb search "query text here"
# More results
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb search "query text here" -k 20
# Filter by area
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb search "query text" --area finance
# JSON output (for programmatic use)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb -f json search "query text" -k 5
Hybrid Search (vector + keyword)
Combines semantic similarity with BM25 keyword matching for better precision, especially with exact terms, names, or acronyms:
# Hybrid search (recommended for most queries)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb search "query text" --mode hybrid
# Keyword-only search (BM25)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb search "exact phrase" --mode keyword
Search modes: semantic (default), hybrid (vector + BM25 via Reciprocal Rank Fusion), keyword (BM25 only).
Browse by Area
Retrieve all chunks for an area (useful for summarisation):
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb area finance
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb area health -k 100
Browse by File
Get all chunks for a specific file:
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb file "areas/finance/index.md"
Date Range Query
Retrieve chunks within a date range (for timelines):
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb date-range 2025-01-01 2025-12-31
Database Info
# Statistics
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb stats
# List all indexed files
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb list
# Prune orphaned chunks (for files deleted from disk)
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb prune /path/to/your/markdown-repo
Named Collections
Use --collection to manage separate indexes for different corpora:
# Ingest into a named collection
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/ingest.py /path/to/work-docs --collection work
# Search a named collection
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/work-docs/.vectordb --collection work search "query"
Default collection name is brain.
Summarisation Workflow
For large aggregation tasks (timelines, domain summaries, cross-cutting analysis):
- Retrieve relevant chunks using search or area/date-range queries with JSON output
- Batch chunks into manageable groups (by file, date, or topic)
- Summarise each batch using Claude
- Synthesise batch summaries into final output
Example workflow for "summarise my financial position":
# Step 1: Get all finance chunks as JSON
~/.claude/skills/repo-search/.venv/bin/python ~/.claude/skills/repo-search/query.py --db-path /path/to/your/markdown-repo/.vectordb -f json area finance -k 100
# Step 2: Read the JSON output and synthesise with Claude
# (Claude does this step naturally after reading the chunks)
Available Areas
The brain is organised into these areas:
areas→ business, technical, health, relationships, finance, philosophy, mental, career, incomeprojects→ Active initiativesdecisions→ Decision logsresources→ Reference materialreviews→ Daily/weekly/monthly reflectionsoutputs→ Finished contentdocs→ Plans and design documents
Chunking & Embedding Details
- Markdown: Heading-aware chunking (respects
#,##,###boundaries). Each chunk is enriched with its heading chain (e.g.[Title > Section > Subsection]) and document title for better embedding context. - PDF: Page-aware chunking at 1000 chars default.
- DOCX: Paragraph-aware chunking at 1500 chars default.
- XLSX: Row-group chunking at 2000 chars default with sheet names preserved.
- Embedding model:
all-MiniLM-L6-v2(ChromaDB default). Model name is stored in collection metadata. - BM25 index: Built automatically during ingestion for hybrid search support.
Error Handling
- "Database not found": Run the ingest script first
- "No results": Try broader query terms, remove area filter, increase -k, or try
--mode hybrid - Stale results: Re-run ingest to pick up file changes (incremental, fast)
- Orphaned chunks: Use
prunecommand to remove chunks for deleted files - Slow first query: ChromaDB loads the embedding model on first use (~10-20s), subsequent queries are fast
- "Failed to extract": The file may be corrupted or password-protected; check stderr for details
More from dandcg/claude-skills
outlook
Use for email and calendar operations - checking inbox, sending emails, viewing calendar, scheduling events. Trigger on phrases like "check email", "draft email", "my calendar", "schedule", "am I free".
40web-clipper
Clip web pages to clean markdown for later reading and search. Use when needing to save a URL, bookmark an article, clip a page, build a read-later collection, or archive web content. Trigger on phrases like "clip this", "save this page", "bookmark", "read later", "web clip", "save article".
24humanize
Rewrite AI-generated text to sound natural and human. Use for humanizing text, making AI writing undetectable, rewriting to pass AI detectors. Trigger on phrases like "humanize", "make this sound human", "rewrite naturally", "humanize text", "sound more natural", "pass AI detection".
5trello
Manage Trello boards, lists, and cards. Trigger on phrases like "trello", "my boards", "shopping list", "create card", "move card", "sort cards".
3flaresolverr
Use when any URL returns 403, a Cloudflare challenge page, or "Just a moment..." - bypasses anti-bot protection via a real browser in Docker. Trigger on phrases like "scrape", "fetch blocked", "403", "cloudflare", "can't access site".
2email-search
Process email archives (PST files) into a searchable ChromaDB vector database with automatic semantic embeddings. Ingest, classify, search, analyse, and export to markdown. Trigger on phrases like "email archive", "ingest pst", "search emails", "email analytics", "export contacts", "email timeline".
2