knowledge-base-manager
Knowledge Base Manager
Overview
Provides a structured methodology for selecting, designing, and governing knowledge bases. Covers architecture decisions (document-based vs entity-based vs hybrid), content curation, quality metrics, versioning strategies, and maintenance governance. Use when choosing a KB architecture, establishing curation workflows, or building governance processes for organizational knowledge.
When NOT to use: Static documentation suffices, fewer than 50 FAQ items cover all questions, or no maintenance resources are available. For implementing retrieval pipelines (chunking, embeddings, vector stores), use the rag-implementer skill. For implementing knowledge graphs (ontology, entity extraction, graph databases), use the knowledge-graph-builder skill.
Quick Reference
| Aspect | Options | Key Considerations |
|---|---|---|
| Architecture | Document-based (RAG), Entity-based (Graph), Hybrid | Match to query patterns; start simple, add complexity when needed |
| Document-based | Vector DB (Pinecone, Weaviate, pgvector) | Best for docs, FAQs, manuals; semantic search; easy to add content |
| Entity-based | Graph DB (Neo4j, ArangoDB) | Best for org charts, catalogs, networks; relationship traversal |
| Hybrid | Both + linking layer | Enterprise, medical, legal; combined queries; highest complexity |
| When to skip KB | Static docs, <50 FAQ items | No maintenance resources, information never changes |
| Implementation | 6 phases | Audit, Curation, Storage, Quality, Versioning, Governance |
| Accuracy target | >90% on test questions | Create 100+ test questions with known correct answers |
| Coverage target | >80% questions answerable | Validate against real user queries continuously |
| Freshness target | <30 days average age | Automated freshness monitoring + scheduled updates |
| Consistency target | >95% conflict-free | Deduplication + single source of truth |
| Query latency | <100ms median | Caching and optimization for common access patterns |
| Storage tech | pgvector, Pinecone, Weaviate, Chroma | pgvector for existing Postgres; Pinecone for managed scale |
| Index types | HNSW, IVFFlat | HNSW for recall; IVFFlat for frequently rebuilt indexes |
| Ingestion pipeline | Load, clean, chunk, embed, store | Chunk at semantic boundaries; 512 tokens max; 10-15% overlap |
| Deduplication | Content hashing, semantic similarity | Hash for exact dupes; cosine similarity >0.95 for semantic dupes |
| Quality testing | Recall@K, MRR, accuracy sampling | 100+ test questions; measure recall@10 >0.8 and MRR >0.7 |
| Drift detection | Embedding distribution monitoring | Track mean shift; alert when >0.1 threshold |
| Versioning | Snapshot, Event-sourced, Git-style | Snapshot for simple; event-sourced for audit; git-style for teams |
| Maintenance | Daily, Weekly, Monthly, Quarterly | Establish schedule from day 1; monitor errors and user feedback |
Common Mistakes
| Mistake | Correct Pattern |
|---|---|
| Ingesting raw data without curation or normalization | Curate, clean, and deduplicate before ingesting; quality over quantity |
| Skipping version control for KB content | Implement versioning from day one with rollback and audit trail |
| Building a KB without validating against user questions | Start with user research and test against real queries for >90% accuracy |
| Choosing hybrid architecture when document-based suffices | Match architecture to actual query patterns; start simple, add complexity when needed |
| Launching without freshness monitoring or update schedules | Set up automated freshness checks and scheduled content reviews |
| No provenance tracking on knowledge entries | Always track source URL, timestamp, author, and confidence score |
| Duplicate information across sources | Establish single source of truth; merge similar entries with conflict resolution rules |
| Perfectionism delaying launch | Launch at 80% coverage and iterate based on real usage data |
Delegation
- Audit existing knowledge sources and classify content types: Use
Exploreagent to inventory documents, assess quality, and identify gaps - Implement end-to-end KB pipeline with storage and retrieval: Use
Taskagent to deploy database, configure search, and run quality checks - Design KB architecture and governance model: Use
Planagent to select between document-based, entity-based, or hybrid approaches
For implementing document retrieval pipelines (chunking, embeddings, vector stores, hybrid search), use the
rag-implementerskill. For implementing knowledge graphs (ontology design, entity extraction, graph databases), use theknowledge-graph-builderskill.
References
- Architecture and Types -- KB types, decision framework, knowledge classification
- Curation and Ingestion -- extraction, cleaning, deduplication, provenance tracking
- Storage and Retrieval -- database selection, interfaces, technology stacks
- Quality Control -- metrics, validation strategies, continuous monitoring
- Versioning -- snapshot, event-sourced, and git-style approaches
- Governance -- maintenance schedules, roles, change processes