knowledge-graph-builder
Knowledge Graph Builder
Overview
Knowledge graphs make implicit relationships explicit, enabling AI systems to reason about connections, verify facts, and reduce hallucinations. They combine structured entity-relationship modeling with semantic search for powerful knowledge retrieval.
When to use: Complex entity relationships central to the domain, verifying AI-generated facts against structured knowledge, semantic search combined with relationship traversal, recommendation systems, fraud detection, or pattern recognition.
When NOT to use: Simple tabular data (use a relational database), purely document-based search with no relationships (use the rag-implementer skill), read-heavy workloads with no traversal needs, or when the team lacks graph modeling expertise. For KB architecture selection and governance, use the knowledge-base-manager skill.
Quick Reference
| Pattern | Approach | Key Points |
|---|---|---|
| Ontology first | Define entity types, relationships, properties before ingesting data | Changing schema later is expensive; validate with domain experts |
| Entity resolution | Deduplicate aggressively during extraction | "Apple Inc" = "Apple" = "Apple Computer" must resolve to one entity |
| Confidence scoring | Attach 0.0-1.0 score + source to every relationship | Enables filtering by reliability, critical for AI grounding |
| Hybrid architecture | Graph traversal (structured) + vector search (semantic) | Vector finds candidates, graph expands context via relationships |
| Incremental build | Core entities first, validate against target queries, then expand | Avoid building the full graph before testing with real queries |
| Database selection | Neo4j (general), Neptune (AWS managed), ArangoDB (multi-model), TigerGraph (massive scale) | Match database to scale, infrastructure, and query complexity |
Common Mistakes
| Mistake | Correct Pattern |
|---|---|
| Ingesting entities before designing the ontology | Define and validate the ontology with domain experts first; changing later is expensive |
| Skipping entity resolution and deduplication | Deduplicate aggressively so "Apple Inc", "Apple", and "Apple Computer" resolve to one entity |
| Omitting confidence scores on relationships | Attach a 0.0-1.0 confidence score and source to every relationship |
| Using only graph traversal without vector search | Implement hybrid architecture combining graph traversal with semantic vector search |
| Building the full graph before validating with real queries | Start with core entities, test against target queries, then expand incrementally |
| Choosing a database before understanding scale requirements | Evaluate query patterns, data volume, and infrastructure constraints before selecting |
Delegation
- Extract entities and relationships from unstructured text: Use
Taskagent to run NER pipelines and build relationship triples - Evaluate graph database options for project requirements: Use
Exploreagent to compare Neo4j, Neptune, ArangoDB, and TigerGraph against scale and query needs - Design ontology and hybrid architecture for a new domain: Use
Planagent to define entity types, relationship schemas, and graph-vector integration strategy - For hybrid KG+RAG systems, delegate to the
rag-implementerskill - For knowledge-graph-powered agent workflows, delegate to the
agent-patternsskill
References
- Ontology Design — Entity types, relationships, properties, RDF schema, validation
- Database Selection — Neo4j, Neptune, ArangoDB, TigerGraph comparison and setup
- Entity Extraction — NER pipeline, relationship extraction, LLM-based extraction
- Hybrid Architecture — Graph + vector integration, hybrid search implementation
- Query Patterns — Cypher queries, API design, common traversal patterns
- AI Integration — KG-RAG, hallucination detection, grounded response generation