document-scanning
Document Scanning
Supported File Types
| Extension | Type | Sub-Agent |
|---|---|---|
| .docx | Word document | word-accessibility |
| .xlsx | Excel workbook | excel-accessibility |
| .pptx | PowerPoint presentation | powerpoint-accessibility |
| PDF document | pdf-accessibility |
File Discovery Commands
PowerShell (Windows)
# Non-recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf
# Recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
Where-Object { $_.Name -notlike '~$*' -and $_.Name -notlike '*.tmp' -and $_.Name -notlike '*.bak' } |
Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|__pycache__|\.vscode)[\\/]' }
Bash (macOS)
# Non-recursive scan
find "<folder>" -maxdepth 1 -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) ! -name "~\$*"
# Recursive scan
find "<folder>" -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) \
! -name "~\$*" ! -name "*.tmp" ! -name "*.bak" \
! -path "*/.git/*" ! -path "*/node_modules/*" ! -path "*/__pycache__/*" ! -path "*/.vscode/*"
Delta Detection
Git-based
# Files changed since last commit
git diff --name-only HEAD~1 HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'
# Files changed since a specific tag
git diff --name-only <tag> HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'
# Files changed in the last N days
git log --since="N days ago" --name-only --diff-filter=ACMR --pretty="" -- '*.docx' '*.xlsx' '*.pptx' '*.pdf' | sort -u
Timestamp-based (PowerShell)
# Files modified since a specific date
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
Where-Object { $_.LastWriteTime -gt [datetime]"2025-01-01" }
Files to Skip
Always exclude these patterns during scanning:
~$*- Office lock/temp files (created when a document is open)*.tmp- Temporary files*.bak- Backup files- Files inside
.git/,node_modules/,.vscode/,__pycache__/directories
Scan Configuration Files
| File | Purpose |
|---|---|
.a11y-office-config.json |
Rule enable/disable for Word, Excel, PowerPoint |
.a11y-pdf-config.json |
Rule enable/disable for PDF scanning |
Scan Profiles
| Profile | Rules | Severities | Use Case |
|---|---|---|---|
| Strict | All | Error, Warning, Tip | Public-facing, legally required documents |
| Moderate | All | Error, Warning | Most organizations |
| Minimal | All | Error only | Triaging large document libraries |
Context Passing Format
When delegating to a sub-agent, always provide this context block:
## Document Scan Context
- **File:** [full path]
- **Scan Profile:** [strict | moderate | minimal]
- **Severity Filter:** [error, warning, tip]
- **Disabled Rules:** [list or "none"]
- **User Notes:** [any specifics]
- **Part of Batch:** [yes/no - if yes, indicate X of Y]
More from taylorarndt/a11y-agent-team
framework-accessibility
Framework-specific accessibility patterns and fix templates for React, Vue, Angular, Svelte, Next.js, and Tailwind CSS.
28github-analytics-scoring
Scoring formulas and analytical frameworks for GitHub workflow agents. Covers repository health scoring (0-100, A-F grades), priority scoring for issues/PRs/discussions, confidence levels for analytics findings, delta tracking (Fixed/New/Persistent/Regressed), velocity metrics, contributor metrics, bottleneck detection, and trend classification. Use when computing scores, tracking remediation progress, building prioritized dashboards, or detecting workflow bottlenecks.
25github-scanning
GitHub data collection patterns for workflow agents. Covers search query construction by intent, date range handling, repository scope narrowing, preferences.md integration, cross-repo intelligence, parallel stream collection model, and auto-recovery for empty results. Use when building agents that search GitHub for issues, PRs, discussions, releases, security alerts, or CI status.
22accessibility-rules
Cross-format document accessibility rule reference with WCAG 2.2 mapping. Use when looking up accessibility rules for Word (DOCX-*), Excel (XLSX-*), PowerPoint (PPTX-*), or PDF (PDFUA.*, PDFBP.*, PDFQ.*) documents, or when mapping findings to WCAG success criteria for compliance reporting.
21github-workflow-standards
Core standards for all GitHub workflow agents. Covers authentication, smart defaults, repository discovery, dual MD+HTML output, screen-reader-compliant HTML accessibility standards, safety rules, progress announcements, parallel execution, and output quality. Apply when building any GitHub workflow agent - issues, PRs, briefings, analytics, community reports, team management.
20web-scanning
Web content discovery, URL crawling, and page inventory for accessibility audits. Use when scanning web pages, crawling sites for audit scope, or building page inventories for multi-page audits.
20