enhance-docs
enhance-docs
Analyze documentation for readability, structure, and RAG optimization.
Parse Arguments
const args = '$ARGUMENTS'.split(' ').filter(Boolean);
const targetPath = args.find(a => !a.startsWith('--')) || '.';
const fix = args.includes('--fix');
const aiMode = args.includes('--ai');
Documentation Locations
| Type | Location | Purpose |
|---|---|---|
| User docs | docs/*.md, README.md |
Human-readable guides |
| Agent docs | agent-docs/*.md |
AI reference material |
| Project memory | CLAUDE.md, AGENTS.md |
AI context/instructions |
Optimization Modes
AI-Only Mode (--ai)
For agent-docs and RAG-optimized documentation:
- Aggressive token reduction
- Dense information packing
- Self-contained sections for retrieval
- Optimal chunking boundaries
Both Mode (--both, default)
For user-facing documentation:
- Balance readability with AI-friendliness
- Clear structure for both humans and retrievers
Workflow
- Discover - Find all .md files
- Parse - Extract structure and content
- Check - Run pattern checks based on mode
- Report - Generate markdown output
- Fix - Apply auto-fixes if --fix
Detection Patterns
1. Link Validation (HIGH)
- Broken anchor links (
[text](#missing-anchor)) - Links to non-existent files
- Malformed link syntax
2. Structure Validation (HIGH)
Heading hierarchy:
- No jumps (H1 → H3 without H2)
- Single H1 per document
- Code blocks with language tags
Position-aware content (based on "lost in the middle" research):
- Critical info at START or END of document
- Supporting details in MIDDLE
- Flag important content buried in middle sections
Recommended structure:
1. Overview/Purpose (START - high attention)
2. Quick Start / TL;DR
3. Detailed Content
4. Reference / API
5. Summary / Key Points (END - high attention)
3. Token Efficiency (HIGH - AI Mode)
Token estimation: characters / 4 or words * 1.3
Unnecessary prose:
- "In this document..."
- "As you can see..."
- "Let's explore..."
- "It's important to note that..."
Verbose phrases:
| Verbose | Concise |
|---|---|
| "in order to" | "to" |
| "due to the fact that" | "because" |
| "has the ability to" | "can" |
| "at this point in time" | "now" |
| "for the purpose of" | "for" |
| "in the event that" | "if" |
Target: ~1500 tokens for project memory files, flexible for reference docs.
4. RAG Optimization (MEDIUM - AI Mode)
Chunk size guidelines:
| Size | Issue |
|---|---|
| >1000 tokens | Too long, split into subtopics |
| <50 tokens | Too short, merge with related content |
| 200-500 tokens | Optimal for retrieval |
Semantic boundaries:
- Single topic per section
- Self-contained sections (avoid "It", "This" at section start)
- Clear section titles that describe content
Context anchors:
# Bad - ambiguous start
## Configuration
It requires several settings...
# Good - self-contained
## Configuration
The plugin configuration requires several settings...
5. Information Density (MEDIUM - AI Mode)
Prefer tables over prose:
# Bad - verbose
The function accepts a path parameter which is required,
a limit parameter which defaults to 10, and an optional
format parameter.
# Good - dense
| Param | Required | Default | Description |
|-------|----------|---------|-------------|
| path | Yes | - | File path |
| limit | No | 10 | Max results |
| format | No | json | Output format |
Prefer lists over paragraphs for sequential items.
Use code blocks for examples, commands, configurations.
6. Cross-Reference Quality (MEDIUM)
- Internal links should use relative paths
- External links should be stable (avoid commit hashes)
- Reference sections should point to canonical sources
7. Balance Suggestions (MEDIUM - Both Mode)
- Missing section headers in long content (>500 words without heading)
- Important information buried late in document
- Missing TL;DR or summary for long documents
Auto-Fixes
| Issue | Fix |
|---|---|
| Inconsistent headings | H1 → H3 becomes H1 → H2 |
| Verbose phrases | Replace with concise alternatives |
| Missing code language | Add based on content detection |
Output Format
## Documentation Analysis: {name}
**File**: {path}
**Mode**: {AI-only | Both}
**Tokens**: ~{count}
| Certainty | Count |
|-----------|-------|
| HIGH | {n} |
| MEDIUM | {n} |
### Link Issues
| Line | Issue | Fix | Certainty |
### Structure Issues
| Line | Issue | Fix | Certainty |
### Efficiency Issues [AI mode]
| Line | Issue | Fix | Certainty |
### RAG Issues [AI mode]
| Line | Issue | Fix | Certainty |
Pattern Statistics
| Category | Patterns | Mode | Certainty |
|---|---|---|---|
| Links | 3 | shared | HIGH |
| Structure | 4 | shared | HIGH |
| Token Efficiency | 3 | ai | HIGH |
| RAG Optimization | 3 | ai | MEDIUM |
| Information Density | 2 | ai | MEDIUM |
| Cross-Reference | 2 | shared | MEDIUM |
| Balance | 3 | both | MEDIUM |
| Total | 20 | - | - |
RAG Chunking
<bad_example>
## Installation
[2000+ tokens of mixed content covering install, config, and usage]
</bad_example> <good_example>
## Installation
[400 tokens - installation only]
## Configuration
[300 tokens - config only]
## Usage
[400 tokens - usage only]
</good_example>
Position-Aware Content
<bad_example>
## Introduction
[Long background...]
## History
[More context...]
## Critical Setup Steps
[Important info buried in middle]
</bad_example> <good_example>
## Quick Start (Critical)
[Important setup steps at START]
## Background
[Supporting context in middle]
## Reference
[Details...]
## Key Reminders
[Critical points repeated at END]
</good_example>
Tables vs Prose
<bad_example>
The API accepts three parameters. The first is `query` which is required.
The second is `limit` which defaults to 10. The third is `format`.
</bad_example> <good_example>
| Param | Required | Default |
|-------|----------|---------|
| query | Yes | - |
| limit | No | 10 |
| format | No | json |
</good_example>
References
agent-docs/CONTEXT-OPTIMIZATION-REFERENCE.md- Token budgeting, position awareness, chunkingagent-docs/PROMPT-ENGINEERING-REFERENCE.md- Structure, information density
Constraints
- Auto-fix only HIGH certainty issues
- Preserve original tone and style
- Balance AI optimization with human readability (default mode)
- Don't remove content, only restructure or condense