document-scanning

SKILL.md

Document Scanning

Supported File Types

Extension Type Sub-Agent
.docx Word document word-accessibility
.xlsx Excel workbook excel-accessibility
.pptx PowerPoint presentation powerpoint-accessibility
.pdf PDF document pdf-accessibility

File Discovery Commands

PowerShell (Windows)

# Non-recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf

# Recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
  Where-Object { $_.Name -notlike '~$*' -and $_.Name -notlike '*.tmp' -and $_.Name -notlike '*.bak' } |
  Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|__pycache__|\.vscode)[\\/]' }

Bash (macOS/Linux)

# Non-recursive scan
find "<folder>" -maxdepth 1 -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) ! -name "~\$*"

# Recursive scan
find "<folder>" -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) \
  ! -name "~\$*" ! -name "*.tmp" ! -name "*.bak" \
  ! -path "*/.git/*" ! -path "*/node_modules/*" ! -path "*/__pycache__/*" ! -path "*/.vscode/*"

Delta Detection

Git-based

# Files changed since last commit
git diff --name-only HEAD~1 HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'

# Files changed since a specific tag
git diff --name-only <tag> HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'

# Files changed in the last N days
git log --since="N days ago" --name-only --diff-filter=ACMR --pretty="" -- '*.docx' '*.xlsx' '*.pptx' '*.pdf' | sort -u

Timestamp-based (PowerShell)

# Files modified since a specific date
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
  Where-Object { $_.LastWriteTime -gt [datetime]"2025-01-01" }

Files to Skip

Always exclude these patterns during scanning:

  • ~$* - Office lock/temp files (created when a document is open)
  • *.tmp - Temporary files
  • *.bak - Backup files
  • Files inside .git/, node_modules/, .vscode/, __pycache__/ directories

Scan Configuration Files

File Purpose
.a11y-office-config.json Rule enable/disable for Word, Excel, PowerPoint
.a11y-pdf-config.json Rule enable/disable for PDF scanning

Scan Profiles

Profile Rules Severities Use Case
Strict All Error, Warning, Tip Public-facing, legally required documents
Moderate All Error, Warning Most organizations
Minimal All Error only Triaging large document libraries

Context Passing Format

When delegating to a sub-agent, always provide this context block:

## Document Scan Context
- **File:** [full path]
- **Scan Profile:** [strict | moderate | minimal]
- **Severity Filter:** [error, warning, tip]
- **Disabled Rules:** [list or "none"]
- **User Notes:** [any specifics]
- **Part of Batch:** [yes/no - if yes, indicate X of Y]
Weekly Installs
20
GitHub Stars
185
First Seen
Feb 24, 2026
Installed on
opencode20
gemini-cli20
github-copilot20
codex20
kimi-cli20
amp20