chromadb-integration-skills
ChromaDB Integration Skills
Purpose: This skill teaches agents how to integrate ChromaDB for semantic search, persistent storage, and pattern matching across ANY domain - research, code, trading, legal, documentation, and more.
Critical Use Case: When agents need to work with large datasets (1000+ items), perform semantic search, maintain persistent knowledge, or learn from historical patterns, ChromaDB eliminates token limits and enables powerful vector-based retrieval.
Used By: All agent types - researchers, developers, traders, legal analysts, documentation writers, QA testers, etc.
When to Use ChromaDB Integration
Use ChromaDB when:
- Large Datasets: Working with 1000+ items (documents, code files, bugs, trades, contracts, etc.)
- Semantic Search: Finding items by meaning, not just keywords
- Persistent Memory: Knowledge needs to survive across sessions, days, months
- Pattern Matching: Identifying similar historical cases/patterns for decision-making
- Cross-Session Learning: Building institutional knowledge over time
- Token Limits: Data too large to fit in context window (100K+ tokens)
- Aggregation: Combining results from multiple queries/sources
Core ChromaDB Concepts
Collections
Definition: Named vector databases storing documents with embeddings and metadata
Naming Strategy:
- Domain-based:
{domain}_{purpose}_{identifier} - Examples:
- Research:
research_prior_art_blockchain_2024,research_literature_ml_transformers - Code:
codebase_api_endpoints,codebase_bug_patterns_auth - Trading:
backtest_results_sma_strategy,market_conditions_spy_2024 - Legal:
case_law_patent_eligibility,contracts_saas_clauses - Documentation:
api_docs_v2,architecture_decisions_2024
- Research:
Documents
Definition: Text content to be searched semantically
Best Practices:
- Chunk Size: 200-500 words optimal (too small = context loss, too large = poor granularity)
- Content Format: Title + summary + key details (e.g.,
"Patent US10123456 - Blockchain Authentication. Abstract: A method for...")) - Deduplication: Use unique IDs to prevent duplicate storage
Metadata
Definition: Structured data for filtering, not semantic search
Strategy:
{
// Temporal filters
"date": "2024-11-14",
"year": 2024,
"month": 11,
// Categorical filters
"type": "bug_report",
"category": "authentication",
"severity": "high",
// Numeric filters
"citations": 42,
"price": 150.25,
"performance_score": 0.87,
// Source tracking
"source": "github_issue",
"author": "kim-asplund",
"url": "https://..."
}
Embeddings
Definition: Vector representations enabling semantic similarity
How It Works:
- ChromaDB automatically generates embeddings from document text
- Similar meanings → similar vectors → close in vector space
- Distance metrics (cosine, euclidean) measure similarity
Universal ChromaDB Workflow
Phase 1: Collection Design
// Step 1: Design collection strategy based on agent type
const collectionStrategy = {
research_agent: "One collection per research topic/question",
code_agent: "Collections by codebase module/feature",
trading_agent: "Collections by strategy/timeframe/symbol",
legal_agent: "Collections by practice area/jurisdiction",
documentation_agent: "Collections by project/version"
};
// Step 2: Create collection with descriptive metadata
mcp__chroma__create_collection({
collection_name: "{domain}_{purpose}_{identifier}",
embedding_function_name: "default", // Uses sentence transformers
metadata: {
created_date: "2024-11-14",
domain: "research|code|trading|legal|docs",
purpose: "Descriptive purpose",
total_items: 0, // Will update
last_updated: "2024-11-14"
}
});
Phase 2: Data Ingestion
// Step 1: Batch data collection (minimize API calls)
const items = collectAllItems(); // From API, files, database, etc.
// Step 2: Transform to ChromaDB format
const documents = items.map(item => formatDocument(item));
const ids = items.map(item => item.id || generateUniqueId());
const metadatas = items.map(item => extractMetadata(item));
// Step 3: Batch insert (ChromaDB handles chunking automatically)
mcp__chroma__add_documents({
collection_name: collectionName,
documents: documents,
ids: ids,
metadatas: metadatas
});
// Step 4: Update collection metadata
mcp__chroma__modify_collection({
collection_name: collectionName,
new_metadata: {
...existingMetadata,
total_items: items.length,
last_updated: new Date().toISOString()
}
});
Phase 3: Semantic Search
// Step 1: Formulate semantic query (natural language works!)
const query = "authentication failures in production environment";
// Step 2: Execute semantic search with filters
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 20,
where: {
"$and": [
{ "environment": "production" },
{ "severity": { "$in": ["high", "critical"] } },
{ "date": { "$gte": "2024-01-01" } }
]
},
include: ["documents", "metadatas", "distances"]
});
// Step 3: Filter by semantic similarity (distance threshold)
const highlyRelevant = results.ids[0].filter((id, idx) =>
results.distances[0][idx] < 0.3 // Adjust threshold based on use case
);
// Step 4: Retrieve full details if needed
const fullDetails = mcp__chroma__get_documents({
collection_name: collectionName,
ids: highlyRelevant,
include: ["documents", "metadatas"]
});
Phase 4: Pattern Matching
// Cross-collection pattern detection
const allCollections = mcp__chroma__list_collections();
const relevantCollections = allCollections.filter(c =>
c.startsWith(collectionPrefix)
);
const patterns = [];
for (const collection of relevantCollections) {
const matches = mcp__chroma__query_documents({
collection_name: collection,
query_texts: [patternQuery],
n_results: 10,
where: { "outcome": "success" } // Only successful cases
});
if (matches.ids[0].length > 0) {
patterns.push({
collection: collection,
matches: matches,
success_rate: calculateSuccessRate(matches)
});
}
}
// Identify best pattern
const bestPattern = patterns.sort((a, b) =>
b.success_rate - a.success_rate
)[0];
Use Case Templates
Template 1: Research Agent - Literature Review
Problem: Store 1000+ research papers, find semantically similar work
// Collection: research_literature_{topic}
const papers = fetchPapersFromAPI("machine learning transformers");
mcp__chroma__create_collection({
collection_name: "research_literature_ml_transformers",
metadata: { topic: "ML Transformers", papers_count: 0 }
});
// Store papers with rich metadata
papers.forEach(paper => {
mcp__chroma__add_documents({
collection_name: "research_literature_ml_transformers",
documents: [`${paper.title}. ${paper.abstract}`],
ids: [paper.doi || paper.id],
metadatas: [{
title: paper.title,
authors: paper.authors.join(", "),
year: paper.year,
citations: paper.citation_count,
venue: paper.venue,
url: paper.url
}]
});
});
// Semantic search: "Find papers about attention mechanisms for vision"
const relevant = mcp__chroma__query_documents({
collection_name: "research_literature_ml_transformers",
query_texts: ["attention mechanisms computer vision"],
n_results: 20,
where: { "year": { "$gte": 2020 }, "citations": { "$gte": 50 } }
});
Benefits: No token limits, semantic discovery, citation filtering, persistent library
Template 2: Code Agent - Bug Pattern Recognition
Problem: Store bug reports, identify similar issues, suggest solutions
// Collection: codebase_bug_patterns_{module}
const bugs = fetchAllGitHubIssues("is:issue label:bug");
mcp__chroma__create_collection({
collection_name: "codebase_bug_patterns_auth",
metadata: { module: "authentication", total_bugs: 0 }
});
// Store bugs with solutions
bugs.forEach(bug => {
mcp__chroma__add_documents({
collection_name: "codebase_bug_patterns_auth",
documents: [`Bug #${bug.number}: ${bug.title}. ${bug.body}`],
ids: [`bug_${bug.number}`],
metadatas: [{
number: bug.number,
title: bug.title,
severity: bug.labels.find(l => l.startsWith("severity:"))?.split(":")[1],
status: bug.state,
solution: bug.resolution || "No solution yet",
created_at: bug.created_at,
resolved_at: bug.closed_at,
url: bug.html_url
}]
});
});
// New bug arrives - find similar historical bugs
const newBugDescription = "User login fails with 401 error after password reset";
const similarBugs = mcp__chroma__query_documents({
collection_name: "codebase_bug_patterns_auth",
query_texts: [newBugDescription],
n_results: 10,
where: { "status": "closed", "solution": { "$ne": "No solution yet" } }
});
// Extract solution from most similar resolved bug
const suggestedSolution = similarBugs.metadatas[0][0].solution;
Benefits: Instant bug pattern matching, solution reuse, similar issue detection
Template 3: Trading Agent - Backtest Results Database
Problem: Store 10,000+ backtest results, identify optimal parameter patterns
// Collection: backtest_results_{strategy_name}
const backtests = runParameterSweep(strategyCode, parameterRanges);
mcp__chroma__create_collection({
collection_name: "backtest_results_sma_crossover",
metadata: { strategy: "SMA Crossover", total_backtests: 0 }
});
// Store each backtest with parameters + results
backtests.forEach(backtest => {
const description = `
SMA Crossover strategy with fast=${backtest.params.fast_period},
slow=${backtest.params.slow_period}, stop_loss=${backtest.params.stop_loss}.
Market conditions: ${backtest.market_regime}, volatility=${backtest.avg_volatility}.
`;
mcp__chroma__add_documents({
collection_name: "backtest_results_sma_crossover",
documents: [description],
ids: [`backtest_${backtest.id}`],
metadatas: [{
fast_period: backtest.params.fast_period,
slow_period: backtest.params.slow_period,
stop_loss: backtest.params.stop_loss,
sharpe_ratio: backtest.sharpe_ratio,
max_drawdown: backtest.max_drawdown,
win_rate: backtest.win_rate,
total_return: backtest.total_return,
market_regime: backtest.market_regime,
symbol: backtest.symbol,
timeframe: backtest.timeframe,
start_date: backtest.start_date,
end_date: backtest.end_date
}]
});
});
// Find optimal parameters for current market conditions
const currentMarket = analyzeCurrentMarket();
const marketDescription = `
Market regime: ${currentMarket.regime}, volatility: ${currentMarket.volatility},
trend strength: ${currentMarket.trend_strength}
`;
const optimalBacktests = mcp__chroma__query_documents({
collection_name: "backtest_results_sma_crossover",
query_texts: [marketDescription],
n_results: 20,
where: {
"$and": [
{ "sharpe_ratio": { "$gte": 1.5 } },
{ "max_drawdown": { "$lte": -0.15 } },
{ "symbol": currentMarket.symbol }
]
}
});
// Extract best parameter set
const bestParams = optimalBacktests.metadatas[0][0];
Benefits: Parameter optimization, market regime matching, performance pattern discovery
Template 4: Documentation Agent - Style Guide Enforcement
Problem: Store API documentation examples, ensure consistent style
// Collection: api_docs_{project_version}
const existingDocs = parseAllApiDocs("./docs/api/");
mcp__chroma__create_collection({
collection_name: "api_docs_v2",
metadata: { version: "2.0", total_endpoints: 0 }
});
// Store documentation with style metadata
existingDocs.forEach(doc => {
mcp__chroma__add_documents({
collection_name: "api_docs_v2",
documents: [doc.fullContent],
ids: [doc.endpoint],
metadatas: [{
endpoint: doc.endpoint,
method: doc.method,
category: doc.category,
style_score: doc.styleScore, // Computed during ingestion
has_examples: doc.examples.length > 0,
has_error_codes: doc.errorCodes.length > 0,
last_updated: doc.lastModified
}]
});
});
// New endpoint documented - find similar endpoints for style consistency
const newEndpoint = "POST /api/v2/users/{id}/preferences";
const similarEndpoints = mcp__chroma__query_documents({
collection_name: "api_docs_v2",
query_texts: [`${newEndpoint} user preferences update`],
n_results: 5,
where: {
"$and": [
{ "method": "POST" },
{ "style_score": { "$gte": 0.9 } },
{ "has_examples": true }
]
}
});
// Use similar endpoint as template
const template = similarEndpoints.documents[0][0];
Benefits: Style consistency, template discovery, automated quality checks
Template 5: QA Testing Agent - Test Pattern Library
Problem: Store test cases, identify gaps, suggest new tests
// Collection: test_cases_{module}
const existingTests = parseTestFiles("./tests/");
mcp__chroma__create_collection({
collection_name: "test_cases_authentication",
metadata: { module: "authentication", total_tests: 0 }
});
// Store test cases with coverage metadata
existingTests.forEach(test => {
mcp__chroma__add_documents({
collection_name: "test_cases_authentication",
documents: [`${test.description}. Covers: ${test.coveredScenarios.join(", ")}`],
ids: [test.id],
metadatas: [{
test_type: test.type, // "unit", "integration", "e2e"
file_path: test.filePath,
line_number: test.lineNumber,
last_run: test.lastRun,
status: test.lastStatus,
execution_time_ms: test.executionTime,
assertions: test.assertionCount
}]
});
});
// New feature added - identify missing test coverage
const newFeature = "Password reset with 2FA verification";
const existingCoverage = mcp__chroma__query_documents({
collection_name: "test_cases_authentication",
query_texts: [newFeature],
n_results: 10
});
// If distance > 0.5, probably not covered
const isCovered = existingCoverage.distances[0][0] < 0.5;
if (!isCovered) {
// Suggest test cases based on similar features
const similarFeatures = mcp__chroma__query_documents({
collection_name: "test_cases_authentication",
query_texts: ["password reset", "2FA verification"],
n_results: 5
});
// Use similar tests as templates
const testTemplates = similarFeatures.documents[0];
}
Benefits: Coverage gap detection, test template discovery, pattern-based test generation
Advanced Patterns
Pattern 1: Multi-Collection Aggregation
Use Case: Search across multiple related collections simultaneously
// Example: Search all research topics for a cross-cutting concept
const researchCollections = mcp__chroma__list_collections();
const topicCollections = researchCollections.filter(c =>
c.startsWith("research_literature_")
);
const crossTopicResults = [];
for (const collection of topicCollections) {
const results = mcp__chroma__query_documents({
collection_name: collection,
query_texts: ["transfer learning"],
n_results: 10
});
crossTopicResults.push({
topic: collection.replace("research_literature_", ""),
papers: results
});
}
// Aggregate and rank by relevance across topics
const allPapers = crossTopicResults.flatMap(r =>
r.papers.ids[0].map((id, idx) => ({
id: id,
topic: r.topic,
distance: r.papers.distances[0][idx],
metadata: r.papers.metadatas[0][idx]
}))
);
const rankedPapers = allPapers.sort((a, b) => a.distance - b.distance);
Pattern 2: Hierarchical Collections
Use Case: Parent-child relationship between collections
// Parent: codebase_architecture_decisions
// Children: codebase_architecture_decisions_{year}
// Create parent collection with aggregated data
mcp__chroma__create_collection({
collection_name: "codebase_architecture_decisions",
metadata: { type: "parent", child_collections: [] }
});
// Create child collections by year
[2022, 2023, 2024].forEach(year => {
mcp__chroma__create_collection({
collection_name: `codebase_architecture_decisions_${year}`,
metadata: { type: "child", parent: "codebase_architecture_decisions", year }
});
});
// Query strategy: Try child first (faster), fallback to parent
const queryYear = 2024;
let results = mcp__chroma__query_documents({
collection_name: `codebase_architecture_decisions_${queryYear}`,
query_texts: [query],
n_results: 10
});
if (results.ids[0].length < 5) {
// Not enough results in child, query parent
results = mcp__chroma__query_documents({
collection_name: "codebase_architecture_decisions",
query_texts: [query],
n_results: 10
});
}
Pattern 3: Temporal Decay
Use Case: Prioritize recent items while keeping historical context
// Store items with temporal metadata
mcp__chroma__add_documents({
collection_name: collectionName,
documents: documents,
ids: ids,
metadatas: metadatas.map(m => ({
...m,
timestamp: Date.now(),
age_days: 0 // Will be updated
}))
});
// Query with temporal boost
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 50 // Get more results for re-ranking
});
// Re-rank with temporal decay
const now = Date.now();
const rankedResults = results.ids[0].map((id, idx) => {
const ageDays = (now - results.metadatas[0][idx].timestamp) / (1000 * 60 * 60 * 24);
const decayFactor = Math.exp(-ageDays / 30); // Half-life ~30 days
const semanticScore = 1 - results.distances[0][idx];
const combinedScore = semanticScore * 0.7 + decayFactor * 0.3;
return {
id: id,
semantic_score: semanticScore,
decay_factor: decayFactor,
combined_score: combinedScore,
metadata: results.metadatas[0][idx]
};
}).sort((a, b) => b.combined_score - a.combined_score);
Performance Optimization
Batching Strategy
// BAD: One document at a time (slow)
for (const item of items) {
mcp__chroma__add_documents({
collection_name: collectionName,
documents: [item.document],
ids: [item.id],
metadatas: [item.metadata]
});
}
// GOOD: Batch insert (100x faster)
const BATCH_SIZE = 100;
for (let i = 0; i < items.length; i += BATCH_SIZE) {
const batch = items.slice(i, i + BATCH_SIZE);
mcp__chroma__add_documents({
collection_name: collectionName,
documents: batch.map(item => item.document),
ids: batch.map(item => item.id),
metadatas: batch.map(item => item.metadata)
});
}
Caching Strategy
// Check collection exists before creating
const existingCollections = mcp__chroma__list_collections();
if (!existingCollections.includes(collectionName)) {
mcp__chroma__create_collection({ collection_name: collectionName });
}
// Check document exists before adding
const existing = mcp__chroma__get_documents({
collection_name: collectionName,
ids: [documentId]
});
if (!existing.ids || existing.ids.length === 0) {
// Document doesn't exist, add it
mcp__chroma__add_documents({ ... });
} else {
// Document exists, update instead
mcp__chroma__update_documents({ ... });
}
Query Optimization
// Use metadata filters to reduce search space
const results = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 20,
where: {
// Pre-filter with metadata (faster than post-filtering semantic results)
"date": { "$gte": "2024-01-01" },
"category": { "$in": ["high_priority", "critical"] }
}
});
// Only include what you need
const minimalResults = mcp__chroma__query_documents({
collection_name: collectionName,
query_texts: [query],
n_results: 10,
include: ["metadatas", "distances"] // Exclude documents if not needed
});
Success Criteria
ChromaDB integration is SUCCESSFUL when:
- ✅ Collections Created: Meaningful naming, appropriate metadata
- ✅ Data Ingested: Batched efficiently, deduplicated
- ✅ Semantic Search Works: Returns relevant results (distance < 0.4)
- ✅ Metadata Filters Applied: Correctly scopes search space
- ✅ Performance Optimized: Batching, caching, minimal queries
- ✅ Cross-Collection Queries: When appropriate for use case
- ✅ Persistent Knowledge: Data survives across sessions
- ✅ Pattern Matching: Identifies similar historical cases
- ✅ Token Limits Eliminated: Handles 1000+ items without context overflow
Skill Version: 1.0 Created: 2025-11-14 Purpose: Teach universal ChromaDB integration patterns for all agent types Target Quality: 65/70 Dependencies: ChromaDB MCP (mcp__chroma__*) Universal: Works for research, code, trading, legal, documentation, QA, and all other domains
🔴 Error Handling & Resilience (Priority 1)
Critical for Production: Prevent data loss, handle failures gracefully
Retry with Exponential Backoff
async function retryWithBackoff(operation, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
await sleep(delay);
}
}
}
Get-or-Create Pattern
function getOrCreateCollection(collectionName, metadata = {}) {
const collections = mcp__chroma__list_collections();
if (collections.includes(collectionName)) {
return { created: false, collection_name: collectionName };
}
mcp__chroma__create_collection({
collection_name: collectionName,
embedding_function_name: "default",
metadata: metadata
});
return { created: true, collection_name: collectionName };
}
Document Validation
function validateDocument(document, id, metadata) {
if (!document || typeof document !== 'string') {
throw new Error(`Document must be non-empty string`);
}
if (!id || id.includes(' ')) {
throw new Error(`ID must be non-empty string without spaces`);
}
if (metadata && typeof metadata !== 'object') {
throw new Error(`Metadata must be object`);
}
}
See full patterns: Load chromadb-error-handling sub-skill
🧪 Testing Patterns (Priority 1)
Essential for Quality: Ensure ChromaDB integrations work correctly
Unit Tests (Mock ChromaDB)
// Mock ChromaDB for fast unit tests
class MockChromaDB {
constructor() {
this.collections = {};
}
create_collection({ collection_name, metadata }) {
this.collections[collection_name] = {
documents: [], ids: [], metadatas: [], metadata
};
}
query_documents({ collection_name, query_texts, n_results }) {
const collection = this.collections[collection_name];
return {
ids: [collection.ids.slice(0, n_results)],
distances: [collection.ids.slice(0, n_results).map(() => 0.2)]
};
}
}
Integration Tests (Real ChromaDB)
describe('Semantic Search Integration', () => {
test('returns relevant documents', async () => {
await chromaClient.add({
collection_name: testCollection,
documents: [
'Machine learning uses neural networks',
'Python is a programming language'
],
ids: ['doc1', 'doc2']
});
const results = await chromaClient.query({
collection_name: testCollection,
query_texts: ['neural networks deep learning'],
n_results: 2
});
expect(results.ids[0]).toContain('doc1');
expect(results.distances[0][0]).toBeLessThan(0.4);
});
});
See full patterns: Load chromadb-testing-patterns sub-skill
🔐 Security & Privacy (Priority 2)
Critical for Compliance: Protect PII, sanitize data
PII Redaction
function redactPII(text) {
let redacted = text;
// Email redaction
redacted = redacted.replace(
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
'[EMAIL_REDACTED]'
);
// Phone redaction
redacted = redacted.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]');
// SSN redaction
redacted = redacted.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');
return redacted;
}
Secret Sanitization
function redactSecrets(text) {
let redacted = text;
// GitHub tokens
redacted = redacted.replace(/ghp_[a-zA-Z0-9]{36}/g, '[GITHUB_TOKEN]');
// AWS keys
redacted = redacted.replace(/AKIA[0-9A-Z]{16}/g, '[AWS_KEY]');
// API keys
redacted = redacted.replace(
/api[_-]?key['\"]?\s*[:=]\s*['\"]?([a-zA-Z0-9_-]{20,})/gi,
'api_key: [REDACTED]'
);
return redacted;
}
Access Control
const collectionPermissions = {
'research_confidential': ['research_team', 'admin'],
'customer_pii': ['support_team', 'admin']
};
function checkAccess(collectionName, userRole) {
const allowedRoles = collectionPermissions[collectionName] || ['admin'];
if (!allowedRoles.includes(userRole)) {
throw new Error(`Access denied for role '${userRole}'`);
}
}
See full patterns: Load chromadb-security-patterns sub-skill
🗄️ Data Lifecycle Management (Priority 2)
Sustain Production: Version collections, archive old data
Schema Versioning
// Semantic versioning for collections
function createVersionedCollection(domain, purpose, version = 'v1') {
const collectionName = `${domain}_${purpose}_${version}`;
mcp__chroma__create_collection({
collection_name: collectionName,
metadata: {
version: version,
created_at: new Date().toISOString(),
schema_version: '1.0',
retention_days: 730, // 2 years
lifecycle_stage: 'active'
}
});
return collectionName;
}
Schema Migration
async function migrateCollectionSchema(oldCollection, newVersion) {
const newCollection = `${oldCollection}_v${newVersion}`;
// Create new collection
await mcp__chroma__create_collection({
collection_name: newCollection,
metadata: { migrated_from: oldCollection, version: newVersion }
});
// Copy all documents with transformed metadata
const allDocs = await mcp__chroma__get_documents({
collection_name: oldCollection,
limit: 100000
});
const transformedMetadatas = allDocs.metadatas.map(transformMetadata);
await mcp__chroma__add_documents({
collection_name: newCollection,
documents: allDocs.documents,
ids: allDocs.ids,
metadatas: transformedMetadatas
});
// Mark old as deprecated
await mcp__chroma__modify_collection({
collection_name: oldCollection,
new_metadata: { lifecycle_stage: 'deprecated', replacement: newCollection }
});
}
Retention Enforcement
async function enforceRetentionPolicies() {
const collections = await mcp__chroma__list_collections();
for (const collectionName of collections) {
const info = await mcp__chroma__get_collection_info({ collection_name });
const retentionDays = info.metadata.retention_days || 730;
const ageDays = calculateAgeDays(info.metadata.created_at);
if (ageDays > retentionDays) {
await archiveCollection(collectionName); // Backup first
await mcp__chroma__delete_collection({ collection_name: collectionName });
}
}
}
See full patterns: Load chromadb-lifecycle-management sub-skill
🐛 Debugging & Troubleshooting (Priority 1)
Common Issues
Issue: No results returned
- Cause: Distance threshold too strict, wrong collection
- Fix: Increase threshold (0.3 → 0.5), verify collection name
- Debug: Check
results.distances[0]values
Issue: Poor semantic matches
- Cause: Document chunking too large/small
- Fix: Optimal chunk size 200-500 words
- Debug: Review document length, split long documents
Issue: Slow queries
- Cause: Large collection without metadata filters
- Fix: Add metadata pre-filters (
whereclause) - Debug: Check collection size, add filters
Distance Threshold Guide
| Distance | Similarity | Use Case |
|---|---|---|
| < 0.2 | Almost exact | Duplicate detection |
| 0.2-0.3 | Very similar | High precision search |
| 0.3-0.5 | Moderately similar | Balanced search |
| 0.5-0.7 | Weakly similar | Broad exploration |
| > 0.7 | Different topics | Not relevant |
Antipatterns to Avoid
❌ Storing entire files as single document
- Loses granularity, poor search relevance
- ✅ Fix: Chunk into 200-500 word sections
❌ No metadata filters on large collections
- Slow queries, high latency
- ✅ Fix: Always filter by date, category, type
❌ Not deduplicating documents
- Wasted storage, duplicate results
- ✅ Fix: Check existence before adding
❌ Ignoring connection failures
- Data loss, silent failures
- ✅ Fix: Implement retry logic, fallback
📚 Sub-Skill Reference
Load targeted sub-skills for deep dives:
- chromadb-error-handling: Retry patterns, validation, circuit breakers (~150 lines)
- chromadb-testing-patterns: Unit/integration tests, mocking, fixtures (~120 lines)
- chromadb-security-patterns: PII redaction, access control, GDPR compliance (~90 lines)
- chromadb-lifecycle-management: Versioning, migration, archival, retention (~100 lines)
Usage: Skill({ skill: "chromadb-error-handling" })
Updated Success Criteria
ChromaDB integration is PRODUCTION-READY when:
Core Functionality (Original):
- ✅ Collections created with meaningful naming
- ✅ Data ingested efficiently (batching)
- ✅ Semantic search returns relevant results
- ✅ Metadata filters applied correctly
- ✅ Performance optimized
Production Readiness (New):
- ✅ Error Handling: Retry logic, validation, graceful degradation
- ✅ Testing: Unit tests (mocked), integration tests (real ChromaDB)
- ✅ Security: PII redacted, secrets sanitized, access control
- ✅ Lifecycle: Versioning strategy, retention policies, archival
Quality Score: 65/70 → 85/100 (with all enhancements)
Skill Version: 2.0 Updated: 2025-11-14 Enhancements: Error handling, testing, security, lifecycle management Quality Score: 85/100 (Production-Ready) Dependencies: ChromaDB MCP (mcp__chroma__*) Sub-Skills: 4 modular sub-skills for targeted loading
More from kimasplund/claude_cognitive_reasoning
self-reflecting-chain
Sequential reasoning with deep self-reflection and backtracking. Use when problems have step-by-step dependencies, need careful logical reasoning, or require error correction. Each step includes self-reflection, and incorrect steps trigger backtracking. Ideal for debugging, mathematical proofs, sequential planning, or causal analysis where order matters.
93agent-memory-skills
Self-improving agent architecture using ChromaDB for continuous learning, self-evaluation, and improvement storage. Agents maintain separate memory collections for learned patterns, performance metrics, and self-assessments without modifying their static .md configuration.
43integrated-reasoning
Meta-orchestration guide for choosing optimal reasoning patterns. Analyzes problem characteristics and recommends which cognitive methodology to use - tree-of-thoughts (find best), breadth-of-thought (explore all), self-reflecting-chain (sequential logic), or direct analysis. Use when facing complex problems and unsure which reasoning approach fits best.
29document-writing-skills
Teaches document writing patterns and templates that agents apply when generating documentation, reports, contracts, guides, and technical writing. Use when creating API docs, user guides, reports, changelogs, ADRs, or technical documentation.
22error-handling-skills
Universal error handling, exception management, and logging best practices for all development agents across JavaScript/TypeScript, Python, Rust, Go, and Java. Use when implementing error handling, exception management, logging, error recovery, or debugging production issues.
19benchmark-framework
Rigorous A/B/C testing framework for empirically evaluating reasoning patterns. Use when you need data-driven pattern selection, want to quantify trade-offs between patterns, or need to validate claims about which cognitive methodology performs best. Enables scientific measurement of quality, cost, and time trade-offs across ToT, BoT, SRC, HE, AR, DR, AT, RTR, and NDF patterns.
10