file-schema-analysis
File Schema Analysis Expert
⚠️ MANDATORY COMPLIANCE ⚠️
CRITICAL: The 5-step workflow outlined in this document MUST be followed in exact order for EVERY schema analysis. Skipping steps or deviating from the procedure will result in incomplete and unreliable analysis. This is non-negotiable.
File Structure
- SKILL.md (this file): Main instructions and MANDATORY workflow
- examples.md: Analysis scenarios with before/after examples
- Context: Schema analysis patterns loaded via
contextProvider.getDomainIndex("schema"). See ContextProvider Interface. - Memory: Project-specific memory accessed via
memoryStore.getSkillMemory("file-schema-analysis", "{project-name}"). See MemoryStore Interface. - templates/:
analysis_report.md,schema_visualization.md - scripts/: Helper utilities for schema extraction
Interface References
- Context: Loaded via ContextProvider Interface
- Memory: Accessed via MemoryStore Interface
- Schemas: Validated against context_metadata.schema.json and memory_entry.schema.json
Analysis Focus Areas
File schema analysis evaluates 7 critical dimensions:
- Structure Detection: Format identification, field hierarchy, nesting levels
- Type System: Data types, constraints, validation rules, formats
- Relationships: References, dependencies, composition patterns
- Validation: Required vs optional fields, constraints, patterns
- Documentation: Inline comments, descriptions, examples
- Evolution: Version tracking, compatibility analysis, breaking changes
- Quality: Anti-patterns, best practices, performance implications
MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)
⚠️ STEP 1: Identify Target Files (REQUIRED)
YOU MUST:
- Ask the user which files to analyze:
- Specific file paths (e.g.,
schema/user.json,api.proto) - Directory patterns (e.g.,
schemas/*.json,**/*.proto) - File format types (JSON Schema, Protobuf, GraphQL, etc.)
- Specific file paths (e.g.,
- If not specified, scan the current directory for common schema files:
- JSON Schema:
*.schema.json,schemas/*.json - Protobuf:
*.proto - GraphQL:
*.graphql,*.gql,schema.graphql - OpenAPI:
openapi.yaml,swagger.json,api.yaml - Avro:
*.avsc - XML Schema:
*.xsd
- JSON Schema:
- Verify files exist and are readable
- Identify file format for each file (by extension and content inspection)
DO NOT PROCEED WITHOUT IDENTIFYING TARGET FILES
⚠️ STEP 2: Load Project Memory & Context (REQUIRED)
YOU MUST:
-
CHECK PROJECT MEMORY FIRST:
- Identify the project name from the repository root or ask the user
- Use
memoryStore.getSkillMemory("file-schema-analysis", "{project-name}")to load existing project memory. See MemoryStore Interface. - If memory exists, review previously analyzed schemas, patterns, and project-specific context
- If no memory exists, you will create it later in this process
-
USE CONTEXT INDEXES FOR EFFICIENT LOADING:
- Use
contextProvider.getDomainIndex("schema")to discover available schema context files. See ContextProvider Interface. - Based on the file formats identified in Step 1, use
contextProvider.getConditionalContext("schema", detection)to load relevant files - Always load
common_patterns.mdviacontextProvider.getAlwaysLoadFiles("schema") - If analyzing security-sensitive schemas, use
contextProvider.getCrossDomainContext("schema", {"security": true})
- Use
-
Ask clarifying questions in Socratic format:
- What is the purpose of these schema files?
- Are you planning a migration or evolution of the schema?
- What documentation output format do you prefer?
- Any specific concerns (validation, performance, compatibility)?
- Target platforms or systems that consume these schemas?
DO NOT PROCEED WITHOUT COMPLETING THIS STEP
⚠️ STEP 3: Read and Parse Schema Files (REQUIRED)
YOU MUST:
- READ each identified file using the view tool
- Detect format if not already determined:
- Check file extension
- Inspect content structure (JSON, XML, Protobuf syntax, GraphQL SDL, YAML)
- Identify schema definition keywords (
$schema,message,type, etc.)
- Extract metadata:
- Schema version (if present)
- Namespace/package information
- Comments and documentation
- Examples or default values
- Verify readability: Ensure files are valid and parseable
DO NOT PROCEED WITHOUT READING THE FILES
⚠️ STEP 4: Analyze Schema Structure (REQUIRED)
YOU MUST perform deep analysis covering ALL these aspects:
4.1 Structure Analysis
- Identify all entities/types/messages: Top-level definitions
- Map field hierarchy: Nested structures, composition patterns
- Detect relationships: References ($ref, foreign keys), dependencies
- Analyze cardinality: Single values, arrays, maps, repeated fields
4.2 Type System Analysis
- Extract field types: Primitives, complex types, custom types
- Identify constraints:
- Required vs optional fields
- Validation rules (min/max, patterns, enums)
- Default values
- Formats (email, date, UUID, etc.)
- Check for polymorphism: Union types, discriminators, oneOf/anyOf
4.3 Validation Rules
- Field-level validations: Length, range, pattern, format
- Cross-field validations: Dependencies, conditional requirements
- Business rules: Custom constraints, check constraints
- Error conditions: What makes data invalid?
4.4 Documentation Extraction
- Inline documentation: Comments, description fields
- Examples: Sample valid data
- Deprecated fields: Marked for removal
- Custom extensions: Vendor-specific additions
4.5 Evolution and Versioning
- Version identification: Current schema version
- Change history: If documented in comments or separate files
- Compatibility markers: Breaking vs non-breaking changes
- Migration notes: How to upgrade from previous versions
4.6 Quality Assessment
- Check for anti-patterns:
- Overly deep nesting (6+ levels)
- Ambiguous field names
- Missing validation on critical fields
- Inconsistent naming conventions
- Lack of documentation
- Identify best practices:
- Clear, descriptive names
- Appropriate use of types and constraints
- Good documentation coverage
- Logical structure organization
4.7 Cross-References
- External dependencies: Imported schemas, shared definitions
- Internal references: Reused types, common patterns
- Circular references: Potential issues
USE THE TEMPLATES in templates/ directory to structure your analysis
DO NOT PROCEED WITHOUT COMPREHENSIVE ANALYSIS
⚠️ STEP 5: Generate Analysis Report & Update Memory (REQUIRED)
YOU MUST:
-
Generate comprehensive analysis report using the template from
templates/analysis_report.md:- Executive summary
- File inventory
- Field catalog (all fields with types, constraints, descriptions)
- Relationship diagram (ASCII or Mermaid)
- Validation rules summary
- Quality assessment
- Recommendations for improvement
- Version history (if available)
-
Create schema visualization using
templates/schema_visualization.md:- Entity-relationship diagram
- Type hierarchy
- Dependency graph
-
UPDATE PROJECT MEMORY:
- Use
memoryStore.update(layer="skill-specific", skill="file-schema-analysis", project="{project-name}", ...)to store: - Discovered patterns
- Schema conventions and naming patterns
- Format preferences
- Dependencies and relationships
- Metadata for future reference
- Timestamps and staleness tracking are handled automatically by MemoryStore. See MemoryStore Interface.
- Use
-
Provide actionable recommendations:
- Suggest improvements for quality issues
- Identify missing validations
- Recommend documentation additions
- Highlight breaking changes if comparing versions
MEMORY UPDATE IS MANDATORY - DO NOT SKIP
Output Requirements
Analysis Report Must Include:
-
Schema Inventory
- File paths and formats
- Version information
- Size and complexity metrics
-
Field Catalog
- Complete list of all fields/attributes
- Types and constraints
- Required/optional status
- Default values
- Documentation/descriptions
-
Visual Representations
- Entity-relationship diagram
- Type hierarchy diagram
- Dependency graph
-
Validation Summary
- All validation rules
- Business constraints
- Format requirements
-
Quality Report
- Anti-patterns found
- Best practices followed
- Recommendations
-
Evolution Analysis (if multiple versions)
- Changes between versions
- Breaking changes
- Migration guide
Socratic Prompting Guidelines
When interacting with users, ask clarifying questions such as:
Understanding Intent:
- "What decisions are you trying to make with this schema analysis?"
- "Are you planning to modify, migrate, or document these schemas?"
- "Do you need this for technical documentation or API contracts?"
Scope Definition:
- "Should I analyze all schema files or focus on specific ones?"
- "Are there related schemas in other systems I should be aware of?"
- "Do you need cross-version comparison?"
Output Preferences:
- "What format would you prefer for the analysis report (Markdown, JSON, HTML)?"
- "Do you need visual diagrams (Mermaid, PlantUML, ASCII)?"
- "Should I prioritize depth (complete details) or breadth (overview)?"
Context Understanding:
- "What systems or languages consume these schemas?"
- "Are there known issues or pain points with the current schema?"
- "Any compliance or regulatory requirements for the data structure?"
Quality Standards
Your analysis MUST:
- ✅ Be 100% accurate to the actual schema definition
- ✅ Cover all fields and types without omission
- ✅ Identify all validation rules and constraints
- ✅ Extract all documentation present in the files
- ✅ Detect format-specific features correctly
- ✅ Provide actionable recommendations
- ✅ Use templates for consistent output
- ✅ Update project memory for future reference
Your analysis MUST NOT:
- ❌ Hallucinate fields or types not in the schema
- ❌ Miss required or optional markers
- ❌ Ignore validation constraints
- ❌ Skip documentation extraction
- ❌ Provide generic recommendations without analysis basis
Integration with Other Skills
Combine with:
database-schema-analysis: For schemas stored in database systemspython-code-review: When analyzing Python Pydantic models or dataclassesdotnet-code-review: When analyzing C# data modelsgenerate-python-unit-tests: To create schema validation tests
Version History
- v1.1.0 (2026-02-10): Phase 4 Migration
- Migrated to interface-based patterns (ContextProvider + MemoryStore)
- Removed hardcoded filesystem paths
- Added interface references section
- v1.0.0 (2025-02-06): Initial release
- Support for JSON Schema, Protobuf, GraphQL, OpenAPI, Avro, XML/XSD
- Comprehensive analysis workflow
- Template-based reporting
- Project memory integration
Last Updated: 2025-02-06 Maintained by: The Forge