skills/lyndonkl/claude/data-schema-knowledge-modeling

data-schema-knowledge-modeling

Installation
SKILL.md

Data Schema & Knowledge Modeling

Table of Contents

Overview

This skill formally defines entities, attributes, relationships, constraints, and cardinality to produce unambiguous data models that prevent inconsistencies and enable correct implementation.

Quick example: E-commerce schema:

  • Entities: User, Product, Order, Cart, Payment
  • Relationships: User has many Orders, Order contains many Products (via OrderItems), User has one Cart
  • Constraints: Email must be unique, Order total matches sum of OrderItems, Payment amount equals Order total
  • Result: Unambiguous model that prevents data inconsistencies

Workflow

Copy this checklist and track your progress:

Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate and document the model

Step 1: Gather domain requirements and scope

Ask user for domain description, core use cases (what queries/operations will this support), existing data (if migration/integration), performance/scale requirements, and technology constraints (SQL vs NoSQL vs graph database). Understanding use cases shapes the model - OLTP vs OLAP vs graph traversal require different designs. See Schema Types for guidance.

Step 2: Identify entities and attributes

Extract nouns from requirements (those are candidate entities). For each entity, list attributes with types and nullability. Use resources/template.md for systematic entity identification. Verify each entity represents a distinct concept with independent lifecycle. Document entity purpose and examples.

Step 3: Define relationships and cardinality

Map connections between entities (one-to-one, one-to-many, many-to-many). For many-to-many, identify junction tables/entities. Specify relationship directionality and optionality (can X exist without Y?). Use resources/methodology.md for complex relationship patterns like hierarchies, polymorphic associations, and temporal relationships.

Step 4: Specify constraints and invariants

Define uniqueness constraints, foreign key relationships, check constraints, and business rules. Document domain invariants (rules that must hold true at all times). Identify derived/computed attributes vs stored. Use resources/methodology.md for advanced constraint patterns and validation strategies.

Step 5: Validate and document the model

Create data-schema-knowledge-modeling.md file with complete schema definition. Validate against use cases - can the schema support required queries/operations? Check for normalization (eliminate redundancy) or denormalization (optimize for specific queries). Self-assess using resources/evaluators/rubric_data_schema_knowledge_modeling.json. Minimum standard: Average score ≥ 3.5.

Schema Types

Choose based on use case and technology:

Relational (SQL) Schema

  • Best for: Transactional systems (OLTP), strong consistency, complex queries with joins
  • Pattern: Normalized tables, foreign keys, ACID transactions
  • Example use cases: E-commerce orders, banking transactions, HR systems
  • Key decision: Normalization level (3NF for consistency vs denormalized for read performance)

Document/NoSQL Schema

  • Best for: Flexible/evolving structure, high write throughput, denormalized reads
  • Pattern: Nested documents, embedded relationships, no joins
  • Example use cases: Content management, user profiles, event logs
  • Key decision: Embed vs reference (embed for 1-to-few, reference for 1-to-many)

Graph Schema (Ontology)

  • Best for: Complex relationships, traversal queries, semantic reasoning, knowledge graphs
  • Pattern: Nodes (entities), edges (relationships), properties on both
  • Example use cases: Social networks, fraud detection, recommendation engines, scientific research
  • Key decision: Property graph vs RDF triples

Event/Time-Series Schema

  • Best for: Audit logs, metrics, IoT data, append-only data
  • Pattern: Immutable events, time-based partitioning, aggregation tables
  • Example use cases: User activity tracking, monitoring, financial transactions
  • Key decision: Raw events vs pre-aggregated summaries

Dimensional (Data Warehouse) Schema

  • Best for: Analytics (OLAP), aggregations, historical reporting
  • Pattern: Fact tables + dimension tables (star/snowflake schema)
  • Example use cases: Business intelligence, sales analytics, customer 360
  • Key decision: Star schema (denormalized) vs snowflake (normalized dimensions)

Common Patterns

Pattern: Entity Lifecycle Modeling Track entity state changes explicitly. Example: Order (draft → pending → confirmed → shipped → delivered → completed/cancelled). Include status field, timestamps for each state, and transitions table if history needed.

Pattern: Soft Deletes Never physically delete records - add deletedAt timestamp. Allows data recovery, audit compliance, and referential integrity. Filter WHERE deletedAt IS NULL in queries.

Pattern: Polymorphic Associations Entity relates to multiple types. Example: Comment can be on Post or Photo. Options: (1) separate foreign keys (commentableType + commentableId), (2) junction tables per type, (3) single table inheritance.

Pattern: Temporal/Historical Data Track changes over time. Options: (1) Effective/expiry dates per record, (2) separate history table, (3) event sourcing (store all changes as events). Choose based on query patterns.

Pattern: Multi-tenancy Isolate data per customer. Options: (1) Separate databases (strong isolation), (2) Shared schema with tenantId column (efficient), (3) Separate schemas in same DB (balance). Add tenantId to all queries if shared.

Pattern: Hierarchies Model trees/nested structures. Options: (1) Adjacency list (parentId), (2) Nested sets (left/right values), (3) Path enumeration (materialized path), (4) Closure table (all ancestor-descendant pairs). Trade-offs between read/write performance.

Guardrails

✓ Do:

  • Start with use cases - schema serves queries/operations
  • Normalize first, then denormalize for specific performance needs
  • Document all constraints and invariants explicitly
  • Use meaningful, consistent naming conventions
  • Consider future evolution - design for extensibility
  • Validate model against ALL required use cases
  • Model the real world accurately (don't force fit to technology)

✗ Don't:

  • Design schema in isolation from use cases
  • Premature optimization (denormalize before measuring)
  • Skip constraint definitions (leads to data corruption)
  • Use generic names (data, value, thing) - be specific
  • Ignore cardinality and nullability
  • Model implementation details in domain entities
  • Forget about data migration path from existing systems
  • Create circular dependencies between entities

Quick Reference

Resources:

  • resources/template.md - Structured process for entity identification, relationship mapping, and constraint definition
  • resources/methodology.md - Advanced patterns: temporal modeling, graph ontologies, schema evolution, normalization strategies
  • resources/examples/ - Worked examples showing complete schema designs with validation
  • resources/evaluators/rubric_data_schema_knowledge_modeling.json - Quality assessment before delivery

When to choose which resource:

  • Simple domain (< 10 entities) → Start with template
  • Complex domain or graph/ontology → Study methodology for advanced patterns
  • Need to see examples → Review examples folder
  • Before delivering to user → Always validate with rubric

Expected deliverable: data-schema-knowledge-modeling.md file containing: domain description, complete entity definitions with attributes and types, relationship mappings with cardinality, constraint specifications, diagram (ERD/graph visualization), validation against use cases, and implementation notes.

Common schema notations:

  • ERD (Entity-Relationship Diagram): Visual representation of entities and relationships
  • UML Class Diagram: Object-oriented view with inheritance and associations
  • Graph Diagram: Nodes and edges for graph databases
  • JSON Schema: API/document structure with validation rules
  • SQL DDL: Executable CREATE TABLE statements
  • Ontology (OWL/RDF): Semantic web knowledge representation
Weekly Installs
84
Repository
lyndonkl/claude
GitHub Stars
85
First Seen
Today