knowledge-graph-builder
SKILL.md
Knowledge Graph Builder
This skill provides guidance for designing knowledge graphs that capture entities, relationships, and semantic meaning for powerful querying and reasoning.
Core Competencies
- Graph Modeling: Entity-relationship design for graphs
- Query Languages: Cypher (Neo4j), SPARQL (RDF), Gremlin
- Ontology Design: Schema, taxonomies, semantic relationships
- Graph Algorithms: Pathfinding, centrality, community detection
Knowledge Graph Fundamentals
What Makes a Knowledge Graph
Knowledge Graph = Entities + Relationships + Schema + Semantics
Traditional Database: Knowledge Graph:
┌────────────────────┐ ┌─────────────────────────────┐
│ Tables with rows │ │ (Person)──KNOWS──▶(Person) │
│ Foreign keys │ vs │ │ │
│ JOIN operations │ │ WORKS_AT │
│ │ │ ▼ │
└────────────────────┘ │ (Company)──IN──▶(Industry) │
└─────────────────────────────┘
When to Use Knowledge Graphs
| Use Case | Why Graphs Excel |
|---|---|
| Recommendation systems | Traverse connections to find related items |
| Fraud detection | Identify suspicious relationship patterns |
| Knowledge management | Connect concepts and infer relationships |
| Master data management | Unify entities across systems |
| Root cause analysis | Follow causal chains through dependencies |
Graph Data Modeling
Entity Design
Identify core entities (nodes):
// Person entity with properties
CREATE (p:Person {
id: 'p001',
name: 'Alice Chen',
email: 'alice@example.com',
created_at: datetime()
})
// Multiple labels for categorization
CREATE (c:Organization:Company:TechCompany {
id: 'c001',
name: 'Acme Corp',
founded: 2010
})
Relationship Design
Model connections with typed, directed edges:
// Simple relationship
(person)-[:WORKS_AT]->(company)
// Relationship with properties
(person)-[:WORKS_AT {
role: 'Engineer',
start_date: date('2020-01-15'),
department: 'Engineering'
}]->(company)
// Temporal relationships
(person)-[:EMPLOYED_BY {
from: date('2018-01-01'),
to: date('2020-12-31')
}]->(company1)
(person)-[:EMPLOYED_BY {
from: date('2021-01-01')
}]->(company2)
Common Relationship Patterns
Hierarchical: (Child)──IS_CHILD_OF──▶(Parent)
(Employee)──REPORTS_TO──▶(Manager)
Associative: (Person)──KNOWS──▶(Person)
(Document)──REFERENCES──▶(Document)
Temporal: (Event)──PRECEDES──▶(Event)
(Version)──SUPERSEDES──▶(Version)
Categorical: (Product)──BELONGS_TO──▶(Category)
(Concept)──IS_A──▶(Category)
Spatial: (Location)──NEAR──▶(Location)
(Region)──CONTAINS──▶(City)
Schema Definition
// Node constraints
CREATE CONSTRAINT person_id IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT company_id IF NOT EXISTS
FOR (c:Company) REQUIRE c.id IS UNIQUE;
// Property existence
CREATE CONSTRAINT person_name IF NOT EXISTS
FOR (p:Person) REQUIRE p.name IS NOT NULL;
// Indexes for query performance
CREATE INDEX person_name_idx IF NOT EXISTS
FOR (p:Person) ON (p.name);
CREATE INDEX company_industry_idx IF NOT EXISTS
FOR (c:Company) ON (c.industry);
Cypher Query Patterns
Basic Traversal
// Find all colleagues (people who work at same company)
MATCH (person:Person {name: 'Alice Chen'})-[:WORKS_AT]->(company)
<-[:WORKS_AT]-(colleague:Person)
WHERE colleague <> person
RETURN colleague.name, company.name
// Variable-length paths (1-3 hops)
MATCH path = (start:Person)-[:KNOWS*1..3]->(end:Person)
WHERE start.name = 'Alice Chen' AND end.name = 'Bob Smith'
RETURN path, length(path) as hops
Aggregation
// Count relationships
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, count(p) as employee_count
ORDER BY employee_count DESC
// Collect into lists
MATCH (p:Person)-[:HAS_SKILL]->(s:Skill)
RETURN p.name, collect(s.name) as skills
Recommendations
// "People you may know" - friends of friends
MATCH (me:Person {id: $userId})-[:KNOWS]-(friend)-[:KNOWS]-(suggestion)
WHERE NOT (me)-[:KNOWS]-(suggestion) AND me <> suggestion
RETURN suggestion.name, count(friend) as mutual_friends
ORDER BY mutual_friends DESC
LIMIT 10
// Content-based: similar interests
MATCH (me:Person {id: $userId})-[:INTERESTED_IN]->(topic)
<-[:INTERESTED_IN]-(similar:Person)
WHERE me <> similar
WITH similar, count(topic) as shared_interests
ORDER BY shared_interests DESC
RETURN similar.name, shared_interests
LIMIT 10
Path Analysis
// Shortest path
MATCH path = shortestPath(
(start:Person {name: 'Alice'})-[:KNOWS*]-(end:Person {name: 'Bob'})
)
RETURN path, length(path)
// All shortest paths
MATCH path = allShortestPaths(
(start:Person)-[:KNOWS*]-(end:Person)
)
WHERE start.name = 'Alice' AND end.name = 'Bob'
RETURN path
Graph Algorithms
Centrality Measures
| Algorithm | Purpose | Use Case |
|---|---|---|
| Degree | Connection count | Find popular nodes |
| Betweenness | Bridge detection | Find brokers/bottlenecks |
| PageRank | Influence propagation | Rank importance |
| Closeness | Average distance | Find well-connected nodes |
// Using Neo4j Graph Data Science
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
Community Detection
// Louvain for community detection
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) as members
ORDER BY size(members) DESC
Knowledge Graph Patterns
Entity Resolution
// Find potential duplicates
MATCH (p1:Person), (p2:Person)
WHERE p1.id < p2.id
AND (p1.email = p2.email
OR (p1.name = p2.name AND p1.birth_date = p2.birth_date))
RETURN p1, p2
// Merge duplicates
MATCH (p1:Person {id: 'keep'}), (p2:Person {id: 'duplicate'})
CALL apoc.refactor.mergeNodes([p1, p2], {
properties: 'combine',
mergeRels: true
})
YIELD node
RETURN node
Semantic Layering
┌─────────────────────────────────────────────────────┐
│ Instance Layer │
│ (Alice)──KNOWS──▶(Bob) │
│ (Alice)──WORKS_AT──▶(Acme) │
├─────────────────────────────────────────────────────┤
│ Schema Layer │
│ (:Person)──CAN_KNOW──▶(:Person) │
│ (:Person)──CAN_WORK_AT──▶(:Company) │
├─────────────────────────────────────────────────────┤
│ Ontology Layer │
│ (Person)──IS_A──▶(Agent) │
│ (Company)──IS_A──▶(Organization) │
└─────────────────────────────────────────────────────┘
Temporal Modeling
// State over time
CREATE (person)-[:HAS_STATE {
valid_from: date('2020-01-01'),
valid_to: date('2020-12-31')
}]->(state:PersonState {
status: 'employed',
salary: 80000
})
// Query state at point in time
MATCH (p:Person {id: $personId})-[r:HAS_STATE]->(s)
WHERE r.valid_from <= date($queryDate)
AND (r.valid_to IS NULL OR r.valid_to >= date($queryDate))
RETURN s
Best Practices
Modeling Guidelines
- Prefer relationships over properties when the connection has meaning
- Use specific relationship types (
:MANAGESnot:RELATED_TO) - Model for your queries - understand access patterns first
- Keep properties atomic - no arrays for searchable data
- Version nodes, not graphs - temporal properties on relationships
Performance Tips
- Index properties used in WHERE clauses
- Use parameters ($userId) not string concatenation
- Limit variable-length paths (*1..5 not *)
- Profile queries with EXPLAIN and PROFILE
- Consider relationship direction in traversals
References
references/cypher-patterns.md- Advanced Cypher query examplesreferences/graph-modeling.md- Entity and relationship design patternsreferences/graph-algorithms.md- Algorithm selection and configuration
Weekly Installs
2
Repository
4444j99/a-i--skillsGitHub Stars
2
First Seen
4 days ago
Security Audits
Installed on
amp2
cline2
openclaw2
opencode2
cursor2
kimi-cli2