Documentation Drift Detector

The agent detects documentation drift by mapping code directories to their docs, comparing git modification histories, extracting Python function signatures via AST, validating every markdown link and anchor, and scoring freshness on a weighted 0-100 scale. All four CLI tools use the Python standard library only.

Quick Start

# 1. Run full drift analysis on a repository
python scripts/drift_analyzer.py /path/to/repo

# 2. Score documentation freshness
python scripts/doc_staleness_scorer.py /path/to/repo

# 3. Validate API docs against Python source
python scripts/api_doc_validator.py /path/to/repo/src /path/to/repo/docs/api.md

# 4. Check all markdown links
python scripts/link_checker.py /path/to/repo

# JSON output for any tool
python scripts/drift_analyzer.py /path/to/repo --json

# Set failure threshold for CI
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

All tools support --help for full usage details.

Core Workflows

Workflow 1: Full Drift Analysis

Scan all documentation against code changes since each doc was last updated. This is the primary entry point for understanding the overall drift state of a repository.

# Basic analysis
python scripts/drift_analyzer.py /path/to/repo

# Analyze with custom doc patterns
python scripts/drift_analyzer.py /path/to/repo --doc-patterns "*.md,*.rst,*.txt"

# JSON output for tooling
python scripts/drift_analyzer.py /path/to/repo --json

# Only show high-severity drift
python scripts/drift_analyzer.py /path/to/repo --min-severity high

# Analyze specific directory
python scripts/drift_analyzer.py /path/to/repo --scope src/

What it does:

Discovers all documentation files in the repo
For each doc, identifies the code directories it describes (via path proximity and content references)
Compares the doc's last-modified date against the git history of its associated code
Identifies specific changes (renamed files, moved directories, changed function signatures)
Classifies each drift instance by category and severity
Generates an actionable report with specific file:line references

Output example:

Documentation Drift Report
==========================
Repository: /path/to/repo
Scan date:  2026-03-18
Docs found: 12
Drifted:    5

HIGH SEVERITY:
  docs/api.md (last updated: 2026-01-15)
    - 23 code files changed since doc update
    - 4 functions renamed in src/handlers/
    - 2 new modules undocumented
    Category: Factual + Structural
    Recommendation: Manual update required

MEDIUM SEVERITY:
  README.md (last updated: 2026-02-28)
    - Installation section references removed dependency
    - Version string outdated (says 1.8.0, current 2.0.0)
    Category: Factual + Temporal
    Recommendation: Auto-fixable (version), Manual (installation)

Workflow 2: API Documentation Validation

Check that API documentation accurately reflects the actual function signatures, class definitions, and module structure in your Python source code.

# Validate API docs against source
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md

# Scan entire docs directory
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive

# JSON output
python scripts/api_doc_validator.py /path/to/src /path/to/docs/api.md --json

# Include private methods in validation
python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --include-private

What it detects:

Functions/classes present in code but missing from docs
Functions/classes documented but no longer in code (removed or renamed)
Parameter mismatches (missing params, wrong types, wrong defaults)
Deprecated items still documented as current
Return type mismatches
Module-level docstring drift

How it works:

The tool uses Python's ast module to parse source files and extract function signatures, class definitions, decorators, and docstrings. It then parses the markdown documentation looking for function/class references, parameter lists, and code blocks. Mismatches are reported with exact locations in both source and documentation.

Workflow 3: README Health Check

Validate README sections against the actual project state. This combines drift analysis, link checking, and completeness scoring into a single README-focused report.

# Check README health
python scripts/doc_staleness_scorer.py /path/to/repo --readme-focus

# Check with custom sections
python scripts/doc_staleness_scorer.py /path/to/repo --required-sections "Installation,Usage,API,Contributing,License"

Validates:

Required sections are present (Installation, Usage, API Reference, Contributing, License)
Version strings match package version (package.json, setup.py, pyproject.toml)
File references in README actually exist
Badge URLs are well-formed
Code examples reference existing files/functions
Table of contents matches actual headings

Workflow 4: Link Integrity Audit

Check every link in every markdown file -- local file references, anchors, cross-document links, and optionally external URLs.

# Check all markdown links
python scripts/link_checker.py /path/to/repo

# Include external URL checks (slower, makes HTTP requests)
python scripts/link_checker.py /path/to/repo --check-external

# Check specific file
python scripts/link_checker.py /path/to/repo/README.md

# JSON output
python scripts/link_checker.py /path/to/repo --json

# Only show broken links
python scripts/link_checker.py /path/to/repo --broken-only

What it checks:

Local file references ([link](path/to/file.md)) -- does the file exist?
Anchor references ([link](#section-name)) -- does the heading exist?
Cross-document anchors ([link](other.md#section)) -- does the file and heading exist?
Relative path correctness (catches ../ errors)
Case sensitivity issues (common on Linux but silent on macOS)
Image references -- do referenced images exist?
Duplicate anchors that would cause ambiguous links

Workflow 5: Continuous Doc Monitoring

Integrate documentation drift detection into your CI/CD pipeline for ongoing monitoring.

GitHub Actions example:

name: Documentation Drift Check
on:
  pull_request:
    branches: [main, dev]
  push:
    branches: [main]

jobs:
  doc-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for git log analysis

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run drift analysis
        run: python engineering/doc-drift-detector/scripts/drift_analyzer.py . --json > drift-report.json

      - name: Check staleness score
        run: python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 50

      - name: Validate API docs
        run: python engineering/doc-drift-detector/scripts/api_doc_validator.py src/ docs/api.md

      - name: Check links
        run: python engineering/doc-drift-detector/scripts/link_checker.py .

      - name: Upload drift report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: drift-report
          path: drift-report.json

Pre-commit hook:

#!/bin/bash
# .git/hooks/pre-commit
# Fail commit if docs are severely stale
python engineering/doc-drift-detector/scripts/doc_staleness_scorer.py . --threshold 30 --quiet
if [ $? -ne 0 ]; then
    echo "Documentation is critically stale. Update docs before committing."
    exit 1
fi

Tools

Tool	Purpose	Lines	Key Feature
`drift_analyzer.py`	Full drift analysis between code and docs	~550	Git history comparison with code-to-doc mapping
`doc_staleness_scorer.py`	Score documentation freshness 0-100	~450	Weighted multi-dimensional scoring
`api_doc_validator.py`	Validate API docs against Python source	~400	AST-based signature extraction and comparison
`link_checker.py`	Audit all markdown links and anchors	~400	Local file, anchor, and cross-document validation

All tools:

Python 3.8+ standard library only
Support --json for machine-readable output
Support --help for usage details
Use non-zero exit codes on failure (CI/CD compatible)
Work on any OS (Windows, macOS, Linux)

Staleness Scoring

Documentation freshness is scored on a 0-100 scale where 100 = perfectly current. The score is a weighted combination of five dimensions:

Dimension	Weight	What It Measures
Last Updated	20%	How recently the doc file was modified relative to its associated code
Code-Doc Alignment	30%	Whether documented items (functions, classes, files) still exist and match
Link Health	15%	Percentage of links that resolve correctly
Completeness	20%	Whether expected sections are present and non-empty
Accuracy	15%	Whether version strings, file paths, and other verifiable facts are correct

Score interpretation:

Score	Label	Action
90-100	Excellent	No action needed
70-89	Good	Minor updates recommended
50-69	Stale	Updates needed before next release
30-49	Critical	Immediate attention required
0-29	Abandoned	Full rewrite likely needed

Customization:

# Override default weights
python scripts/doc_staleness_scorer.py /path/to/repo \
  --weight-updated 0.25 \
  --weight-alignment 0.25 \
  --weight-links 0.15 \
  --weight-completeness 0.20 \
  --weight-accuracy 0.15

# Set staleness thresholds
python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60

Drift Categories

Every detected drift instance is classified into one or more categories:

Structural Drift

Missing or misorganized sections. A README lacks an Installation section. An API doc is missing an entire module. A CHANGELOG has no entries for the latest version.

Detection: Compare actual document headings against expected headings for that document type.

Factual Drift

Incorrect information. A function signature in the docs has the wrong parameters. An installation command references a removed package. A configuration example uses deprecated options.

Detection: Cross-reference documented facts against code analysis (AST parsing, file existence, git tags).

Referential Drift

Broken references. A link points to a file that was moved. An anchor references a heading that was renamed. An image path is wrong.

Detection: Link checker validates every reference against the filesystem and document structure.

Temporal Drift

Outdated time-sensitive content. Version strings are old. "Last updated" dates are stale. "Coming soon" items that shipped months ago. Roadmap items past their target date.

Detection: Extract version strings and dates, compare against git tags, package manifests, and current date.

Semantic Drift

Technically accurate but misleading. A description says "simple REST API" when the project now has GraphQL, gRPC, and WebSocket endpoints. The architecture overview omits a major new subsystem.

Detection: Compare document topic coverage against code directory structure and file counts. Flag when code complexity has grown significantly but documentation scope has not.

Auto-Fix vs Manual-Fix Classification

Not all drift can be fixed programmatically. The tools classify each issue:

Auto-Fixable (safe to automate)

Version string updates -- replace old version with current from package manifest
Date updates -- update "last modified" timestamps
Broken local links -- suggest correct path when file was moved (git log tracks renames)
Missing table of contents entries -- generate from actual headings
Removed file references -- flag for deletion or suggest replacement

Manual-Fix Required (needs human judgment)

Architectural description changes -- requires understanding intent
API usage examples -- new examples need domain context
Migration guides -- require understanding of breaking changes
Getting started rewrites -- narrative flow needs human touch
Security documentation updates -- compliance implications require review

Semi-Automated (template + human review)

New function documentation -- generate skeleton from AST, human fills description
Changelog entries -- generate from git commits, human edits for clarity
README section additions -- provide template, human adds content

The drift report marks each issue with [AUTO], [MANUAL], or [SEMI] tags.

Integration Points

With CI/CD Pipelines

All tools return non-zero exit codes when issues are found:

Exit 0: No issues (or all within threshold)
Exit 1: Issues found exceeding threshold
Exit 2: Tool error (invalid arguments, missing files)

With Code Review

Add drift analysis to PR checks. When a PR modifies code in src/, automatically check whether docs in docs/ need updates. The drift analyzer can scope its analysis to only changed directories.

With Documentation Generators

Pair with tools like Sphinx, MkDocs, or mdBook. Run API validation after doc generation to ensure the generated docs match source. Run link checker on the built output.

With Release Processes

Add staleness scoring to release checklists. Block releases if documentation score falls below threshold. Generate drift reports as release artifacts.

With Other Skills

code-reviewer -- include doc drift in PR review reports
senior-devops -- integrate into deployment pipelines
senior-qa -- documentation quality as part of QA checklist

Reference Guides

Guide	Description
Documentation Standards	README structure, API docs, changelogs, ADRs, docs-as-code
Drift Prevention Guide	Coupling strategies, CI gates, review checklists, prevention patterns

Assets

Asset	Description
Drift Report Template	Template for drift analysis reports
Sample Drift Data	Sample JSON for testing and demonstration

Anti-Patterns

Ignoring drift until release -- run drift analysis in CI on every PR, not as a release-day scramble
Treating all drift as equal -- factual drift (wrong function signatures) is critical; temporal drift (stale dates) is cosmetic; prioritize by category
Manual-only doc updates -- use [AUTO] fixes for version strings and broken links; reserve human effort for semantic and architectural drift
Shallow clone in CI -- fetch-depth: 1 breaks git history comparison; always use fetch-depth: 0 for drift analysis
Skipping link checks on internal docs -- cross-document anchor references break silently on refactors; run link_checker.py on every markdown change

Troubleshooting

Problem	Cause	Solution
`drift_analyzer.py` reports zero docs found	Repository has non-standard doc extensions or docs are in ignored directories (e.g., `node_modules`, `dist`)	Use `--doc-patterns ".md,.rst,*.txt"` to explicitly specify extensions
Staleness scores are unexpectedly low	Docs reference files that were reorganized or moved to new directories	Run `link_checker.py` first to identify broken references, fix them, then re-score
API validator finds no source signatures	Source path points to a non-Python directory or all functions are `_`-prefixed private	Verify `source_path` contains `.py` files; add `--include-private` if the API surface uses private names
Link checker flags valid anchors as broken	Heading text contains special characters, inline code, or emoji that alter the slug	Compare the expected slug (lowercase, special chars stripped, spaces to hyphens) against the actual heading text
Git history comparison shows no changes	Shallow clone lacks full commit history (common in CI)	Clone with `fetch-depth: 0` or pass `--scope` to narrow the analysis window
External URL checks hang or time out	Target servers are slow or block automated HEAD requests	Omit `--check-external` for local-only validation, or run external checks in a separate non-blocking job
Drift report marks everything as `[MANUAL]`	Most detected drift is semantic or architectural, not auto-fixable	This is expected for large refactors; focus on `[AUTO]` and `[SEMI]` items first, then triage `[MANUAL]` items by severity

Success Criteria

Zero stale docs older than 90 days -- every documentation file has been updated within the last 90 days relative to its associated code changes
Aggregate staleness score above 80/100 -- the repository-wide freshness score stays in the "Good" or "Excellent" range
Link integrity above 99% -- fewer than 1% of internal links (file references, anchors, cross-document links) are broken
API doc coverage above 95% -- at least 95% of public functions and classes have corresponding entries in API documentation
Zero high-severity drift issues in CI -- pull requests with high or critical drift are blocked before merge
Version string accuracy at 100% -- every version reference in documentation matches the current release tag or package manifest
Drift report turnaround under 60 seconds -- full drift analysis completes in under one minute for repositories with up to 500 documentation files

Scope & Limitations

Covers:

Detection of documentation drift against git history for any git repository
AST-based validation of Python API documentation (function signatures, class definitions, parameters, return types)
Internal link validation including local files, markdown anchors, cross-document anchors, images, and case-sensitivity checks
Multi-dimensional staleness scoring with configurable weights and CI/CD threshold enforcement

Does NOT cover:

Non-Python source code API validation -- the AST-based validator only parses Python; for TypeScript, Go, Rust, or Java APIs, use language-specific doc generators and pair with the link checker
External URL uptime monitoring -- --check-external performs one-shot HEAD requests but does not provide continuous monitoring; use the senior-devops skill for uptime dashboards
Automatic documentation rewriting -- tools classify issues as [AUTO], [SEMI], or [MANUAL] but do not generate replacement text; use the code-reviewer skill for AI-assisted doc suggestions
Content quality or readability assessment -- staleness scoring measures freshness and structural completeness, not prose quality; see the standards/communication library for writing guidelines

Integration Points

Skill	Integration	Data Flow
code-reviewer	Include drift report in PR review comments	`drift_analyzer.py --json` output feeds into review checklists as a documentation health section
senior-devops	Add staleness gate to CI/CD pipelines	`doc_staleness_scorer.py --threshold 50` returns exit code 1 on failure, blocking deploys
senior-qa	Documentation quality as part of QA acceptance	`link_checker.py --json` output merges into QA dashboards alongside test coverage metrics
senior-fullstack	Validate generated project docs post-scaffold	Run `api_doc_validator.py` against scaffolded `docs/` directory to confirm generated API docs match source
senior-secops	Audit security documentation currency	`drift_analyzer.py --scope security/` detects when security docs fall behind policy changes
senior-architect	Architecture decision record (ADR) freshness	`doc_staleness_scorer.py --required-sections "Status,Context,Decision,Consequences"` validates ADR completeness

Tool Reference

drift_analyzer.py

Purpose: Scan a git repository for documentation that has fallen out of sync with code. Maps documentation files to their associated code directories, compares git modification dates, detects renamed files, version string drift, broken references, and structural gaps. Classifies every issue by category, severity, and fix type.

Usage:

python scripts/drift_analyzer.py <repo_path> [options]

Parameters:

Flag	Type	Default	Description
`repo_path`	positional	(required)	Path to the git repository to analyze
`--json`	flag	off	Output the full drift report as JSON
`--min-severity`	choice	`low`	Minimum severity to include in report. Choices: `critical`, `high`, `medium`, `low`, `info`
`--scope`	string	`""` (all)	Limit code analysis to a subdirectory (e.g., `src/`)
`--doc-patterns`	string	`.md,.rst,.txt,.adoc`	Comma-separated file patterns for documentation discovery

Example:

python scripts/drift_analyzer.py /path/to/repo --min-severity medium --scope src/ --json

Output Formats:

Human-readable (default): Grouped by severity with [AUTO]/[SEMI]/[MANUAL] fix-type tags, category labels, and a fix-type summary
JSON (--json): Structured object with repository, scan_date, summary (counts by severity, category, fix type), and issues array

Exit Codes: 0 = no high/critical issues, 1 = high or critical issues found, 2 = tool error (invalid path, not a git repo)

doc_staleness_scorer.py

Purpose: Score documentation freshness on a weighted 0-100 scale across five dimensions: last updated, code-doc alignment, link health, completeness, and accuracy. Supports CI/CD threshold gates and README-focused analysis.

Usage:

python scripts/doc_staleness_scorer.py <repo_path> [options]

Parameters:

Flag	Type	Default	Description
`repo_path`	positional	(required)	Path to the git repository to score
`--json`	flag	off	Output the full scoring report as JSON
`--threshold`	float	(none)	Fail with exit code 1 if aggregate score falls below this value
`--readme-focus`	flag	off	Only score README files (filenames starting with `readme`)
`--required-sections`	string	`Installation,Usage,API,Contributing,License`	Comma-separated section names for completeness scoring
`--quiet`	flag	off	Only print the aggregate score number (no report)
`--weight-updated`	float	`0.20`	Weight for the "last updated" dimension
`--weight-alignment`	float	`0.30`	Weight for the "code-doc alignment" dimension
`--weight-links`	float	`0.15`	Weight for the "link health" dimension
`--weight-completeness`	float	`0.20`	Weight for the "completeness" dimension
`--weight-accuracy`	float	`0.15`	Weight for the "accuracy" dimension

Example:

python scripts/doc_staleness_scorer.py /path/to/repo --threshold 60 --readme-focus --quiet

Output Formats:

Human-readable (default): Aggregate score with label, per-file score table sorted worst-first, and dimension breakdown with ASCII bars for the bottom 5 files
JSON (--json): Structured object with aggregate_score, aggregate_label, total_documents, and documents array (each with total_score, label, and per-dimension scores/details)
Quiet (--quiet): Single line with the aggregate score (e.g., 72.3)

Exit Codes: 0 = score above threshold (or no threshold set), 1 = score below threshold, 2 = tool error

api_doc_validator.py

Purpose: Extract function and class signatures from Python source files using the ast module and compare them against API documentation in markdown files. Detects undocumented items, phantom documentation for removed code, parameter mismatches, and deprecated items.

Usage:

python scripts/api_doc_validator.py <source_path> <doc_path> [options]

Parameters:

Flag	Type	Default	Description
`source_path`	positional	(required)	Path to a Python source file or directory
`doc_path`	positional	(required)	Path to API documentation file (`.md`) or directory
`--json`	flag	off	Output the validation report as JSON
`--recursive`	flag	off	Recursively scan the doc directory for markdown files
`--include-private`	flag	off	Include `_`-prefixed private functions and classes in validation

Example:

python scripts/api_doc_validator.py /path/to/src /path/to/docs/ --recursive --include-private --json

Output Formats:

Human-readable (default): Summary counts (source signatures, documented items, issues), then issues grouped by severity with type tags, source/doc file locations, and a summary-by-type table
JSON (--json): Structured object with summary (counts by type and severity) and issues array (each with type, severity, name, file/line references, and description)

Exit Codes: 0 = no high-severity issues, 1 = high-severity issues found (e.g., documented items missing from source), 2 = tool error

link_checker.py

Purpose: Scan markdown files for every link type (local files, anchors, cross-document anchors, images, HTML links, reference-style links) and validate them against the filesystem and document headings. Optionally validates external URLs via HTTP HEAD requests. Also detects duplicate heading anchors.

Usage:

python scripts/link_checker.py <path> [options]

Parameters:

Flag	Type	Default	Description
`path`	positional	(required)	File or directory to check (single `.md` file or directory for recursive scan)
`--json`	flag	off	Output the link check report as JSON
`--broken-only`	flag	off	Only show broken links in the report (omit valid links from output)
`--check-external`	flag	off	Also validate external URLs via HTTP HEAD requests (slower, makes network requests)

Example:

python scripts/link_checker.py /path/to/repo --broken-only --json

Output Formats:

Human-readable (default): Summary counts (total, valid, broken, skipped, duplicate anchors), broken links grouped by source file with line numbers and error messages, duplicate anchor list, and link-type breakdown table
JSON (--json): Structured object with summary (counts), broken_links array (each with source file, line, text, target, type, error), duplicate_anchors map, and optionally all_links (when --broken-only is not set)

Exit Codes: 0 = no broken links and no duplicate anchors, 1 = broken links or duplicate anchors found, 2 = tool error

Last Updated: 2026-03-18 Version: 2.0.0 Tools: 4 Python CLI tools, 0 external dependencies Compatibility: Python 3.8+, any OS, any git repository