osgrep
SKILL.md
osgrep - Semantic Code Search Skill
Overview
osgrep transforms code search from exact string matching to semantic understanding. It uses neural embeddings (Granite + ColBERT) with hybrid ranking (Vector + BM25 + RRF) to find code by meaning, not just tokens.
Core principle: grep finds strings, osgrep finds concepts.
When to Use This Skill
Activate osgrep when:
- User asks to "find" or "search for" concepts (not exact strings)
- Exploring unfamiliar codebases
- Finding similar implementations across files/languages
- Questions like "how does the code handle X?"
- Architectural understanding ("show me all API endpoints")
- Pattern discovery ("find retry logic examples")
DO NOT use osgrep when:
- User provides exact string/regex pattern → use
rgorgrep - Searching for specific variable/function names → use
rg - Simple token search → use
rg - User explicitly requests grep/ripgrep → respect their choice
Core Concepts
Semantic vs Lexical Search
# Lexical (grep/rg) - exact tokens
rg "authenticate" # Finds: authenticate, authenticated, authentication
# Semantic (osgrep) - conceptual meaning
osgrep "user login verification"
# Finds: authenticate(), verifyCredentials(), checkUserSession(),
# validateToken(), isAuthorized() - all semantically similar
Hybrid Architecture
Query → [Vector Search] + [BM25 Full-Text] → [RRF Fusion] → [ColBERT Rerank] → Results
(semantic) (lexical) (rank merge) (precision)
Why hybrid:
- Vector search: Captures semantic similarity ("auth" ≈ "login" ≈ "credential")
- BM25: Ensures exact tokens aren't missed ("JWT", "bcrypt")
- RRF: Merges rankings without score normalization
- ColBERT: Refines top results with token-level cross-attention
Relevance Scoring
0.9-1.0: Nearly identical (duplicates)
0.7-0.9: Highly relevant ✓ (trust these)
0.5-0.7: Moderately relevant (check manually)
0.3-0.5: Weakly relevant (likely noise)
0.0-0.3: False positive (ignore)
Best practice: Filter for scores > 0.6 in automated workflows.
Command Reference
Basic Search
osgrep "<conceptual query>" [path] [options]
# Examples
osgrep "JWT token validation"
osgrep "error handling for database connections" src/
osgrep "rate limiting middleware" --max-count 20
Server Mode (Recommended)
# Start server with live indexing
osgrep serve --port 4444 &
# All queries now use auto-updated index
osgrep "query"
Indexing
osgrep index # Index current directory
osgrep list # Show all indexed projects
osgrep doctor # Check installation health
Output Formats
osgrep "query" # Default: snippets with context
osgrep "query" --content # Full chunk content
osgrep "query" --compact # File paths only (for piping)
osgrep "query" --scores # Show relevance scores
osgrep "query" --json # JSON output (for scripting)
osgrep "query" --plain # No ANSI colors
Result Control
osgrep "query" --max-count 5 # Total results (default: 10)
osgrep "query" --per-file 3 # Matches per file (default: 1)
osgrep "query" --sync # Sync index before search
Integration with Claude Code
Prefer --compact for File Lists
# When Claude needs file paths for further processing
files=$(osgrep "authentication logic" --compact)
echo "$files" | xargs cat # Read files
echo "$files" | xargs rg "specific_token" # Further filtering
Use --json for Structured Output
# When Claude needs to parse results programmatically
osgrep "error handling" --json |
jq -r '.results[] | select(.score > 0.7) | .path' |
sort -u
Use --scores for Confidence
# When Claude needs to assess result relevance
osgrep "database migration" --scores |
awk '$2 > 0.65 {print $1}' # Only high-confidence matches
Combine with Other Tools
# osgrep → rg pipeline
osgrep "API endpoint" --compact | xargs rg "POST|PUT"
# osgrep → ast-grep pipeline
osgrep "validation function" --compact | xargs ast-grep --pattern 'validate($$$)'
# osgrep → fzf pipeline (interactive)
osgrep "component definition" --compact | fzf --preview 'bat {}'
Common Workflows
1. Exploratory Search (Learning Codebase)
# Start broad
osgrep "authentication system" --max-count 20 --scores
# Identify key files
osgrep "authentication system" --compact | sort -u
# Deep dive specific aspects
osgrep "JWT token generation" --content
osgrep "password hashing" --content
osgrep "session management" --content
2. Finding Similar Implementations
# Find pattern variations
osgrep "rate limiting with token bucket" --max-count 15
# Cross-language search
cd ../python-service && osgrep "rate limiting middleware"
cd ../go-service && osgrep "rate limiting middleware"
3. Debugging (Find Error Sources)
# Semantic error search
osgrep "connection timeout error handling" --scores
# Filter high-confidence results
osgrep "connection timeout error handling" --scores |
awk '$2 > 0.7' |
cut -d: -f1 |
sort -u |
xargs bat
4. API Discovery
# Find endpoints
osgrep "REST API endpoint handlers" --max-count 30
# Filter by HTTP method
osgrep "POST API endpoint" --compact | xargs rg "POST"
# Find middleware
osgrep "authentication middleware" --content
5. Security Audit
# Find anti-patterns
osgrep "SQL injection vulnerability" --scores
osgrep "hardcoded secrets" --scores
osgrep "plaintext password" --scores
# Verify good patterns exist
osgrep "parameterized SQL queries" --max-count 10
osgrep "bcrypt password hashing" --max-count 10
6. Refactoring Prep (Impact Analysis)
# Find old pattern
osgrep "legacy cookie authentication" --compact > legacy.txt
# Find new pattern
osgrep "JWT token authentication" --compact > modern.txt
# Identify migration candidates
comm -23 <(sort legacy.txt) <(sort modern.txt)
7. Testing Gap Analysis
# Find tests
osgrep "unit test for authentication" --compact | sort > tested.txt
# Find implementations
osgrep "authentication logic" --compact | sort > all.txt
# Find untested code
comm -13 tested.txt all.txt
Query Optimization
Effective Queries (Conceptual + Specific)
# Good: Concept + context
osgrep "JWT authentication with refresh token rotation"
osgrep "error handling for database connection timeouts"
osgrep "input validation using zod schemas"
osgrep "pagination with cursor-based approach"
Ineffective Queries (Too Vague or Too Specific)
# Too vague
osgrep "function" # Use rg instead
osgrep "error" # Too broad
# Too specific (exact token)
osgrep "getUserById" # Use rg for exact names
osgrep "import { something }" # Use rg for imports
Query Expansion Strategy
# Instead of single term
osgrep "cache" # ✗ Vague
# Expand to concept
osgrep "caching strategy with TTL expiration" # ✓ Clear intent
# Add domain context
osgrep "Redis caching with automatic invalidation" # ✓ Very specific
Performance Considerations
Query Speed
Fast (<200ms):
- osgrep "query" --max-count 3 --compact
- osgrep "query" specific/path/ --max-count 5
Medium (~400ms):
- osgrep "query" # Default: 10 results
- osgrep "query" --scores
Slow (>600ms):
- osgrep "query" --max-count 50
- osgrep "query" --content --max-count 20
Index Size
Small (<1k files): ~50-100MB, index time ~10-30s
Medium (1k-5k files): ~100-500MB, index time ~1-3min
Large (5k-10k files): ~500MB-2GB, index time ~5-15min
Monorepo (>10k files): ~2-10GB, index time ~15-60min
Optimization Tips
# Use server mode for active development (auto-indexing)
osgrep serve &
# Limit results for speed
osgrep "query" --max-count 5
# Target specific paths
osgrep "query" src/api/
# Use --compact to skip rendering
osgrep "query" --compact
Best Practices for Claude Code
1. Always Check Index Status
# Before first search in a project
osgrep doctor # Verify installation
osgrep list # Check if project is indexed
# If not indexed
osgrep index
2. Start with Server Mode
# At project start
osgrep serve --port 4444 &
# Subsequent searches use live index
osgrep "query"
3. Use Appropriate Output Format
# For file lists → --compact
osgrep "concept" --compact
# For confidence assessment → --scores
osgrep "concept" --scores
# For scripting → --json
osgrep "concept" --json | jq ...
# For reading → default or --content
osgrep "concept"
osgrep "concept" --content
4. Filter by Relevance
# Only high-confidence results
osgrep "query" --scores | awk '$2 > 0.7'
# JSON filtering
osgrep "query" --json |
jq -r '.results[] | select(.score > 0.7) | .path'
5. Combine with Traditional Tools
# Semantic → lexical pipeline
osgrep "API endpoint" --compact | xargs rg "router\.(get|post)"
# Semantic → syntax pipeline
osgrep "validation" --compact | xargs ast-grep --pattern 'validate($$$)'
6. Handle No Results Gracefully
# If no results, try broader query
results=$(osgrep "very specific query" --compact)
if [[ -z "$results" ]]; then
results=$(osgrep "broader concept" --compact)
fi
# Or sync and retry
osgrep "query" --sync
7. Use Per-File Limits Strategically
# Finding entry points (one per file)
osgrep "main application initialization" --per-file 1
# Deep exploration (many per file)
osgrep "error handling patterns" --per-file 5
Limitations and Workarounds
Limitation: No Boolean Operators
# No AND/OR/NOT in queries
osgrep "auth AND JWT" # ✗ Doesn't work
# Workaround: Multiple queries + set operations
comm -12 \
<(osgrep "authentication" --compact | sort) \
<(osgrep "JWT" --compact | sort)
Limitation: No Regex
# Cannot use regex patterns
osgrep "user_\d+" # ✗ Doesn't work
# Workaround: osgrep then grep
osgrep "user identifier" --compact | xargs rg "user_\d+"
Limitation: English-Optimized
# Less effective on non-English code
osgrep "用户认证" # May not work well
# Workaround: Use English equivalent
osgrep "user authentication"
Limitation: Cold Start Delay
# First query after boot ~1-2s (model loading)
# Workaround: Use server mode (models stay loaded)
osgrep serve &
Troubleshooting
No Results or Low Scores
# Check if index exists
osgrep list
# Reindex if stale
osgrep index
# Try broader query
osgrep "broader concept" --max-count 20
# Check scores to diagnose
osgrep "query" --scores
Server Issues
# Check if server is running
lsof -i :4444
# Start server if not running
osgrep serve &
# Check installation health
osgrep doctor
Index Corruption
# Clear and rebuild
cd /path/to/project
rm -rf ~/.osgrep/data/$(basename $(pwd))
osgrep index
Examples for Claude Code
Example 1: Architecture Exploration
# User: "Explain how the authentication system works"
# Step 1: Find auth components
osgrep "authentication system" --compact > auth_files.txt
# Step 2: Get detailed content
cat auth_files.txt | head -5 | xargs osgrep "authentication flow" --content
# Step 3: Find related components
osgrep "JWT token management" --content
osgrep "session storage" --content
osgrep "password validation" --content
Example 2: Bug Investigation
# User: "Why are database connections timing out?"
# Step 1: Find timeout handling
osgrep "database connection timeout" --scores --max-count 20
# Step 2: Filter high-confidence matches
osgrep "database connection timeout" --scores |
awk '$2 > 0.7' |
cut -d: -f1 |
sort -u > relevant_files.txt
# Step 3: Check retry logic
cat relevant_files.txt | xargs osgrep "retry connection" --content
Example 3: API Endpoint Discovery
# User: "List all POST endpoints"
# Step 1: Find API handlers
osgrep "API endpoint handler" --compact | sort -u > endpoints.txt
# Step 2: Filter by HTTP method
cat endpoints.txt | xargs rg "POST" | cut -d: -f1 | sort -u
# Step 3: Get endpoint details
osgrep "POST endpoint implementation" --content --max-count 10
Example 4: Security Review
# User: "Audit the code for security issues"
# Step 1: Find potential vulnerabilities
osgrep "SQL injection" --scores > sql_issues.txt
osgrep "hardcoded credentials" --scores > cred_issues.txt
osgrep "plaintext password" --scores > pwd_issues.txt
# Step 2: Check for proper patterns
osgrep "parameterized SQL query" --max-count 10
osgrep "environment variable configuration" --max-count 10
osgrep "bcrypt password hashing" --max-count 10
# Step 3: Generate report
cat sql_issues.txt cred_issues.txt pwd_issues.txt |
awk '$2 > 0.6 {print $0}'
Example 5: Refactoring Analysis
# User: "Help me refactor legacy authentication to JWT"
# Step 1: Find legacy pattern
osgrep "session cookie authentication" --compact | sort > legacy.txt
# Step 2: Find modern pattern examples
osgrep "JWT token authentication" --compact | sort > modern.txt
# Step 3: Identify migration files
comm -23 legacy.txt modern.txt > needs_migration.txt
# Step 4: Analyze migration complexity
wc -l needs_migration.txt
cat needs_migration.txt | xargs wc -l | sort -n
References
See the codebase documentation:
- Core principles:
../osgrep-codebase/principles/semantic-search.md - Ranking details:
../osgrep-codebase/principles/hybrid-ranking.md - Workflow templates:
../osgrep-codebase/templates/search-workflow.md - Type definitions:
../osgrep-codebase/types/core.ts - Cheatsheet:
./assets/cheatsheet.md - Configuration:
./references/configuration.md - Search patterns:
./references/search-patterns.md
Scripts
- search-validator.sh: Validate search result relevance
- See
./scripts/directory
Version
- osgrep: 0.4.15
- Models: Granite Embedding (30M) + osgrep-colbert (Q8)
- Storage: LanceDB with Tree-sitter parsing
Quick Start for Claude Code
# 1. Check installation
osgrep doctor
# 2. Index project (if not already)
cd /path/to/project
osgrep list | grep -q "$(pwd)" || osgrep index
# 3. Start server (recommended)
osgrep serve --port 4444 &
# 4. Search semantically
osgrep "user authentication logic" --scores
# 5. Use with other tools
osgrep "API endpoint" --compact | xargs rg "router"
Key Principle: Use osgrep for semantic exploration, grep/rg for lexical precision. Combine both for powerful code understanding workflows.
Weekly Installs
6
Repository
zpankz/mcp-skillsetGitHub Stars
1
First Seen
Jan 26, 2026
Security Audits
Installed on
codex6
opencode4
claude-code4
kiro-cli4
windsurf4
mcpjam3