search-engine-setup
SKILL.md
Search Engine Setup
Overview
This skill helps AI agents implement production-quality search in applications. It covers index design with custom analyzers, database-to-index sync pipelines, search APIs with faceting and highlights, autocomplete, and relevance tuning based on real query data.
Instructions
Index Design (Elasticsearch)
-
Map source database columns to Elasticsearch field types:
- Text columns users search →
textwith custom analyzer - Enum/category columns for filtering →
keyword - Numeric columns for range filters →
integer,float - Boolean flags →
boolean - Dates →
date - Fields for autocomplete →
completion
- Text columns users search →
-
Custom analyzer template for product/content search:
{ "analyzer": { "content_analyzer": { "tokenizer": "standard", "filter": ["lowercase", "synonym_filter", "edge_ngram_filter"] } }, "filter": { "synonym_filter": { "type": "synonym", "synonyms_path": "synonyms.txt" }, "edge_ngram_filter": { "type": "edge_ngram", "min_gram": 3, "max_gram": 15 } } } -
Boost fields by search importance: title/name (3-5x), tags (2x), description (1x).
-
Always add a
suggestfield of typecompletionfor typeahead.
Index Design (Algolia)
- Set
searchableAttributesin priority order:["name", "category", "description"]. - Set
attributesForFaceting: prefix filterable attributes withfilterOnly()for non-displayed facets. - Configure
customRanking:["desc(popularity)", "desc(rating)"]. - Enable typo tolerance (on by default) and set
minWordSizefor1Typo: 3.
Sync Pipeline
- Full re-index: On first run or manual trigger, paginate through all source records (1000 per batch), transform to index documents, bulk insert.
- Incremental sync: Poll
updated_at > last_sync_timeevery 10 seconds, or use database triggers/CDC. - Deletions: Track soft-deleted records. Remove from index when detected.
- Idempotency: Use source record ID as document ID. Upsert, never blind insert.
- Error handling: Log failed documents, continue batch. Retry failures in next cycle.
Search API
Build an endpoint that accepts:
q— full-text query string- Filter params —
category,brand,min_price,max_price,rating,in_stock sort—relevance(default),price_asc,price_desc,newest,ratingpage/per_pageor cursor-based pagination
Query construction (Elasticsearch):
{
"query": {
"bool": {
"must": [{ "multi_match": { "query": "q", "fields": ["name^5", "description"], "fuzziness": "AUTO" }}],
"filter": [
{ "term": { "category": "electronics" }},
{ "range": { "price_cents": { "gte": 2000, "lte": 10000 }}},
{ "term": { "in_stock": true }}
],
"should": [{ "term": { "in_stock": { "value": true, "boost": 2 }}}]
}
},
"highlight": { "fields": { "name": {}, "description": {} }},
"aggs": {
"categories": { "terms": { "field": "category", "size": 20 }},
"brands": { "terms": { "field": "brand", "size": 20 }},
"price_ranges": { "range": { "field": "price_cents", "ranges": [
{ "to": 2500 }, { "from": 2500, "to": 10000 }, { "from": 10000 }
]}}
}
}
Autocomplete
- Use completion suggester for prefix-based typeahead (fastest).
- Return top 5 suggestions with category context.
- Add "did you mean" using phrase suggester for low-result queries.
Relevance Tuning
Analyze search logs to improve quality:
- Zero-result queries: Check for misspellings → add synonyms. Check for missing data → flag content gaps.
- Low CTR queries: Top results don't match intent → adjust boost weights or add synonyms.
- Position bias: If users consistently click result #3+, the ranking formula needs tuning.
- Apply changes iteratively: synonyms first, then boost adjustments, then custom scoring.
Examples
Example 1 — Blog search index
Input: "Set up search for a blog with 10K articles."
Output:
{
"mappings": {
"properties": {
"title": { "type": "text", "analyzer": "content_analyzer", "boost": 5.0 },
"body": { "type": "text", "analyzer": "content_analyzer" },
"author": { "type": "keyword" },
"tags": { "type": "keyword" },
"published_at": { "type": "date" },
"suggest": { "type": "completion", "contexts": [{ "name": "tag", "type": "category" }] }
}
}
}
Example 2 — Algolia configuration for an e-commerce store
Input: "Configure Algolia for a store with products."
Output:
index.setSettings({
searchableAttributes: ['name', 'brand', 'category', 'description'],
attributesForFaceting: ['category', 'brand', 'filterOnly(price_cents)', 'rating'],
customRanking: ['desc(sales_count)', 'desc(rating)'],
typoTolerance: true,
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 6,
hitsPerPage: 20,
snippetEllipsisText: '…',
attributesToSnippet: ['description:30'],
});
Guidelines
- Start with Elasticsearch for control, Algolia for speed-to-market. Elasticsearch gives full tuning power; Algolia is faster to set up but costs more at scale.
- Never search the primary database. Always sync to a dedicated search index. SQL
LIKEdoes not scale. - Fuzziness AUTO is almost always correct. It allows 1 typo for 3-5 char words and 2 typos for 6+ chars.
- Synonyms are the highest-ROI tuning. Most zero-result queries are fixed by adding 10-20 synonym pairs.
- Monitor query performance. Set an alert if p95 search latency exceeds 200ms.
Weekly Installs
1
Repository
terminalskills/skillsGitHub Stars
15
First Seen
3 days ago
Security Audits
Installed on
amp1
cline1
augment1
opencode1
cursor1
kimi-cli1