code-search-assistant
Code Search Assistant
There are four kinds of code search. Picking the wrong one wastes time or misses results. The skill is matching the question to the tool.
Question → tool
| Question shape | Tool | Why |
|---|---|---|
"Where is FooBar defined?" |
Text grep (rg -w) |
Exact symbol — fast, precise |
"Where is FooBar used?" |
Text grep + filter, or LSP "find references" | Same symbol, many hits |
| "What calls this function, transitively?" | Call graph walk | Grep finds direct calls; you need the tree |
| "Where do we validate email addresses?" | Semantic / fuzzy search | Concept, not symbol — no single keyword |
| "Find all places that cast then dereference" | AST / structural query | Syntactic pattern, not a string |
| "What's the code path from HTTP to DB?" | Dataflow / taint trace | Cross-function, value-following |
Text search — do it right
Grep is fast but dumb. Make it less dumb:
| Trick | Example |
|---|---|
| Word boundaries | rg -w foo — matches foo not foobar |
| File type filter | rg -t py foo — only Python files |
| Definition vs use | `rg '^(def |
| Multi-line pattern | rg -U 'if.*\n.*return None' |
| Exclude vendored/generated | rg foo -g '!vendor/' -g '!*.pb.go' |
| Case-insensitive for NL concepts | rg -i 'email.*valid' |
False-positive pruning: comments, strings, tests. rg foo | rg -v test_ | rg -v '^.*#' — crude but works. Or use the -t type filter to skip test directories if the language has conventions.
Structural search — when grep lies
Grep finds text. AST search finds structure. You need AST when:
- The pattern has nesting: "a
returninside aforinside atry." - The pattern is semantic: "function calls where the 2nd arg is a string literal."
- The pattern spans lines in ways regex can't track.
Tools: semgrep (pattern syntax looks like code with holes), ast-grep, language-specific (Python ast module, clang query).
Example semgrep pattern — find SQL built by concatenation:
pattern: |
$CURSOR.execute($X + $Y)
Grep for execute gives you thousands of hits. The pattern gives you the dangerous ones.
Call graph — the transitive question
"What eventually calls dangerous_write?" Grep finds direct callers. For the full tree:
- Find direct callers of
dangerous_write. - For each, find their callers.
- Repeat until you hit entry points (main, route handlers, tests).
LSP "call hierarchy" does this in IDEs. Manually: breadth-first, dedupe visited functions. Output is a tree, not a list.
Semantic search — the fuzzy question
"Where do we handle session expiry?" — no single symbol. The code might say timeout, ttl, expires_at, staleness, max_age. Semantic search embeds code and query, ranks by meaning.
When you don't have a semantic search index, approximate:
- Brainstorm synonyms:
expir,timeout,ttl,stale,max_age. - Grep for each, union results.
- Rank by proximity to other clue words (
session,auth,cookie).
Worked example — a real question
Q: "Where in this Django app do we actually write to the orders table?"
Wrong first move: grep orders — 847 hits, mostly templates and tests.
Right sequence:
- Find the model.
rg 'class.*Model.*orders' -t pyorrg "db_table.*orders"→Orderinmodels/order.py. - Find writes. ORM writes are
.save(),.create(),.update(),.delete(),.bulk_create(). But those are on any model — need to narrow. - Structural:
rg 'Order\.(objects\.)?(create|update|bulk)' -t py+rg 'order\.save\(\)'(instance-level, harder —ordercould be any variable name). - Cross-reference: find functions that take an
Orderand call.save().rg 'def.*order.*:' -A 20 | rg save. - Raw SQL escape hatch:
rg 'INSERT INTO orders|UPDATE orders' -i— catches anyone bypassing the ORM.
Result: 6 write sites. 4 through the ORM (service layer), 1 in a migration, 1 raw SQL in a management command (flagged — why is this bypassing the ORM?).
Do not
- Do not grep when the question is transitive. "Who calls X" (direct) is grep. "Who eventually calls X" is a graph walk.
- Do not trust grep for "all usages" in dynamically-typed languages.
getattr(obj, 'foo')()won't matchrg foo\(. Know your language's reflection escape hatches. - Do not semantic-search when you have an exact symbol. It's slower and less precise than grep.
- Do not present 847 hits. Filter, rank, group. "Here are 847 matches" is not an answer.
Output format
## Query
<what was asked>
## Search strategy
<text | AST | call-graph | semantic> — <why this one>
## Searches run
1. <command / pattern> → <N> hits
2. <refinement> → <M> hits
...
## Results (ranked)
| Location | Snippet | Relevance |
| -------- | ------- | --------- |
## Notes
<known blind spots — reflection, generated code, dynamic dispatch>
More from santosomar/general-secure-coding-agent-skills
dependency-resolver
Diagnoses and resolves package dependency conflicts — version mismatches, diamond dependencies, cycles — across npm, pip, Maven, Cargo, and similar ecosystems. Use when install fails with a resolution error, when two packages require incompatible versions of a third, or when upgrading one dependency breaks another.
4configuration-generator
Generates configuration files for services and tools (app config, logging config, linter config, database config) from a brief description of desired behavior, matching the target format's idioms. Use when bootstrapping a new service, when the user asks for a config file for a specific tool, or when translating config intent between formats.
3ci-pipeline-synthesizer
Generates CI pipeline configs by analyzing a repo's structure, language, and build needs — GitHub Actions, GitLab CI, or other platforms. Use when bootstrapping CI for a new repo, when porting from one CI to another, when the user asks for a pipeline that builds and tests their project, or when wiring in security gates.
3api-design-assistant
Reviews and designs API contracts — function signatures, REST endpoints, library interfaces — for usability, evolvability, and the principle of least surprise. Use when designing a new public interface, when reviewing an API PR, when the user asks whether a signature is well-designed, or when planning a breaking change.
2code-refactoring-assistant
Executes refactorings — extract method, inline, rename, move — in small, behavior-preserving steps with a test between each. Use when the user wants to restructure working code, when cleaning up after a feature lands, or when a smell has been identified and needs fixing.
2code-smell-detector
Identifies code smells — structural patterns that correlate with maintainability problems — and explains why each matters in context. Use when reviewing a PR for structural quality, when the user asks what's wrong with a piece of code that isn't buggy, or when prioritizing refactoring targets.
2