skills/dalehurley/phpbot/word-documents

word-documents

SKILL.md

Word Document Processing

Overview

Comprehensive toolkit for working with Word documents (.docx). Covers the full lifecycle: analyzing, reviewing, formatting, converting, comparing, creating, and merging documents. Built for professional workflows in legal, compliance, regulatory, and corporate environments.

For advanced code examples and detailed library reference, see reference.md. For predefined formatting standards (legal, academic, corporate), see standards.md.

Quick Reference

Task Script Command
Analyze document analyze.py python3 scripts/analyze.py report.docx
List comments comments.py python3 scripts/comments.py list report.docx
Add comment comments.py python3 scripts/comments.py add report.docx -t "search text" -c "Comment"
List tracked changes track_changes.py python3 scripts/track_changes.py list report.docx
Accept all changes track_changes.py python3 scripts/track_changes.py accept report.docx -o clean.docx
Apply formatting format.py python3 scripts/format.py report.docx -s legal -o formatted.docx
Convert MD to DOCX convert.py python3 scripts/convert.py notes.md -f md -t docx -o report.docx
Convert DOCX to PDF convert.py python3 scripts/convert.py report.docx -f docx -t pdf -o report.pdf
Compare documents compare.py python3 scripts/compare.py original.docx revised.docx
Create document create.py python3 scripts/create.py -i content.md -o report.docx
Merge documents merge.py python3 scripts/merge.py doc1.docx doc2.docx -o combined.docx

When to Use

Use this skill when the user asks to:

  • Analyze a Word document (structure, metadata, styles, word count)
  • Extract or list comments from a document
  • Add review comments to specific text
  • List, accept, or reject tracked changes
  • Apply formatting standards (legal, academic, corporate, regulatory)
  • Convert between Markdown, DOCX, PDF, and HTML formats
  • Compare two versions of a document
  • Create a new Word document from text, Markdown, or a JSON specification
  • Merge multiple Word documents into one
  • Generate a redline or diff between document versions
  • Clean up formatting or normalize styles

Input Parameters

Parameter Required Description Example
file_path Yes Path to the .docx file /Users/me/Documents/contract.docx
output_path No Path for output file ./output.docx
standard No Formatting standard to apply legal, academic, corporate
format No Target format for conversion docx, md, pdf, html

Procedures

1. Analyze a Document

Extract text, metadata, structure, styles, comments, and tracked changes.

# Full analysis as JSON
python3 scripts/analyze.py "/path/to/document.docx"

# Text only
python3 scripts/analyze.py "/path/to/document.docx" --mode text

# Metadata only
python3 scripts/analyze.py "/path/to/document.docx" --mode metadata

# Comments only
python3 scripts/analyze.py "/path/to/document.docx" --mode comments

# Tracked changes only
python3 scripts/analyze.py "/path/to/document.docx" --mode changes

# Structure (headings hierarchy)
python3 scripts/analyze.py "/path/to/document.docx" --mode structure

# Save analysis to file
python3 scripts/analyze.py "/path/to/document.docx" -o analysis.json

2. Manage Comments

List, add, remove, or export comments.

# List all comments
python3 scripts/comments.py list "/path/to/document.docx"

# Add a comment on text matching "indemnification clause"
python3 scripts/comments.py add "/path/to/document.docx" \
  -t "indemnification clause" \
  -c "This clause needs review by senior counsel" \
  -a "Review Bot" \
  -o reviewed.docx

# Remove all comments
python3 scripts/comments.py remove "/path/to/document.docx" -o clean.docx

# Remove comment by index
python3 scripts/comments.py remove "/path/to/document.docx" --index 0 -o clean.docx

# Export comments to JSON
python3 scripts/comments.py export "/path/to/document.docx" -o comments.json

# Export comments to CSV
python3 scripts/comments.py export "/path/to/document.docx" -o comments.csv --format csv

3. Track Changes

List, accept, or reject tracked changes; generate a summary.

# List all tracked changes
python3 scripts/track_changes.py list "/path/to/document.docx"

# Accept all changes (produce clean document)
python3 scripts/track_changes.py accept "/path/to/document.docx" -o accepted.docx

# Reject all changes (revert to original)
python3 scripts/track_changes.py reject "/path/to/document.docx" -o reverted.docx

# Accept changes by specific author
python3 scripts/track_changes.py accept "/path/to/document.docx" --author "Jane Doe" -o partial.docx

# Generate human-readable summary of all changes
python3 scripts/track_changes.py summary "/path/to/document.docx"

4. Apply Formatting Standards

Normalize styles, fonts, spacing, and headings to a named standard.

# Apply legal formatting standard
python3 scripts/format.py "/path/to/document.docx" -s legal -o formatted.docx

# Apply academic formatting (APA)
python3 scripts/format.py "/path/to/document.docx" -s academic -o formatted.docx

# Apply corporate standard
python3 scripts/format.py "/path/to/document.docx" -s corporate -o formatted.docx

# Apply regulatory/compliance standard
python3 scripts/format.py "/path/to/document.docx" -s regulatory -o formatted.docx

# Apply custom standard from JSON file
python3 scripts/format.py "/path/to/document.docx" --custom-standard my_standard.json -o formatted.docx

# Normalize heading hierarchy only
python3 scripts/format.py "/path/to/document.docx" --fix-headings -o formatted.docx

# Clean up direct formatting (convert to named styles)
python3 scripts/format.py "/path/to/document.docx" --clean-direct-formatting -o formatted.docx

5. Convert Between Formats

Convert documents between Markdown, DOCX, PDF, and HTML using pandoc.

# Markdown to DOCX
python3 scripts/convert.py notes.md -f md -t docx -o report.docx

# DOCX to Markdown
python3 scripts/convert.py report.docx -f docx -t md -o report.md

# DOCX to PDF
python3 scripts/convert.py report.docx -f docx -t pdf -o report.pdf

# HTML to DOCX
python3 scripts/convert.py page.html -f html -t docx -o document.docx

# DOCX to HTML
python3 scripts/convert.py report.docx -f docx -t html -o report.html

# Markdown to DOCX with reference template for styling
python3 scripts/convert.py notes.md -f md -t docx -o report.docx --reference template.docx

# DOCX to Markdown preserving track changes
python3 scripts/convert.py report.docx -f docx -t md -o report.md --track-changes all

6. Compare Documents

Compare two versions of a document and produce a diff or redline.

# Compare and output diff as Markdown
python3 scripts/compare.py original.docx revised.docx

# Compare and save diff report as JSON
python3 scripts/compare.py original.docx revised.docx -o diff.json --format json

# Compare and produce a redline DOCX with tracked changes
python3 scripts/compare.py original.docx revised.docx -o redline.docx --format docx

# Word-level diff (more granular)
python3 scripts/compare.py original.docx revised.docx --granularity word

7. Create a New Document

Create a DOCX from text, Markdown, or a structured JSON specification.

# Create from Markdown file
python3 scripts/create.py -i content.md -o report.docx

# Create from plain text
python3 scripts/create.py -i notes.txt -o document.docx

# Create from JSON specification
python3 scripts/create.py -i spec.json -o report.docx

# Create with a template for styling
python3 scripts/create.py -i content.md -o report.docx --template template.docx

# Create with headers and footers
python3 scripts/create.py -i content.md -o report.docx \
  --header "Confidential" --footer "Page {page}"

JSON Specification Format:

{
  "title": "Quarterly Report",
  "metadata": {
    "author": "Jane Doe",
    "subject": "Q4 2025 Financial Review"
  },
  "sections": [
    {
      "heading": "Executive Summary",
      "level": 1,
      "content": "This report covers the fourth quarter..."
    },
    {
      "heading": "Revenue",
      "level": 2,
      "content": "Total revenue reached $2.5M...",
      "table": {
        "headers": ["Quarter", "Revenue", "Growth"],
        "rows": [
          ["Q3", "$2.1M", "5%"],
          ["Q4", "$2.5M", "19%"]
        ]
      }
    }
  ]
}

8. Merge Documents

Combine multiple DOCX files into a single document.

# Merge two or more documents
python3 scripts/merge.py doc1.docx doc2.docx doc3.docx -o combined.docx

# Merge with page breaks between documents
python3 scripts/merge.py doc1.docx doc2.docx -o combined.docx --page-breaks

# Merge using first document as style master
python3 scripts/merge.py master.docx appendix.docx -o full.docx --style-from first

Bundled Scripts

Script Type Description
scripts/analyze.py Python Analyze document structure, metadata, styles, comments, tracked changes
scripts/comments.py Python List, add, remove, export comments with author/date/position
scripts/track_changes.py Python List, accept, reject tracked changes; generate change summary
scripts/format.py Python Apply formatting standards (legal, academic, corporate, regulatory)
scripts/convert.py Python Convert between MD, DOCX, PDF, HTML formats via pandoc
scripts/compare.py Python Compare two documents, produce diff report or redline DOCX
scripts/create.py Python Create new DOCX from text, Markdown, or JSON specification
scripts/merge.py Python Merge multiple DOCX files into one

Examples

Example requests that trigger this skill:

analyze this word document and show me the comments
add a review comment on the indemnification clause
accept all tracked changes in this contract
format this document to legal standards
convert my meeting notes from markdown to word
compare the original contract with the revised version
create a word document from this markdown content
merge these three word documents into one
show me all the tracked changes in this agreement
apply corporate formatting to the quarterly report
export all comments from this document to a spreadsheet
generate a redline between the two contract versions
Weekly Installs
4
GitHub Stars
1
First Seen
14 days ago
Installed on
gemini-cli4
github-copilot4
codex4
amp4
cline4
kimi-cli4