docx-review
docx-review
CLI tool for Word document review: tracked changes, comments, read, diff, and git integration. Built on Microsoft's Open XML SDK — 100% compatible tracked changes and comments.
Install
brew tap drpedapati/tap
brew install sciclaw-docx-review
Binary: /opt/homebrew/bin/docx-review (12MB, self-contained, no runtime)
Verify: docx-review --version
Workflow Choice (Critical)
Creating a NEW manuscript/docx with clean output
Use pandoc from Markdown, not docx-review edit manifests:
pandoc manuscript.md -o manuscript.docx
sciClaw auto-applies its bundled NIH reference template for DOCX output unless --reference-doc is provided.
Editing an EXISTING document with visible review markup
Use docx-review when tracked changes/comments are explicitly desired:
docx-review input.docx edits.json -o reviewed.docx --json
Anti-pattern to avoid
Do not use docx-review placeholder replacement manifests to create first-draft manuscripts. That workflow intentionally produces tracked changes and can make a fresh document appear as a markup-heavy revision file.
Modes
Edit: Apply tracked changes and comments
Takes a .docx + JSON manifest, produces a reviewed .docx with proper OOXML markup.
docx-review input.docx edits.json -o reviewed.docx
docx-review input.docx edits.json -o reviewed.docx --json # structured output
docx-review input.docx edits.json --dry-run --json # validate without modifying
cat edits.json | docx-review input.docx -o reviewed.docx # stdin pipe
docx-review input.docx edits.json -o reviewed.docx --author "Dr. Smith"
Read: Extract document content as JSON
docx-review input.docx --read --json
Returns: paragraphs (with styles), tracked changes (type/text/author/date), comments (anchor text/content/author), metadata (title/author/word count/revision), and summary statistics.
Diff: Semantic comparison of two documents
docx-review --diff old.docx new.docx
docx-review --diff old.docx new.docx --json
Detects: text changes (word-level), formatting (bold/italic/font/color), comment modifications, tracked change differences, metadata changes, structural additions/removals.
Git: Textconv driver for meaningful Word diffs
docx-review --textconv document.docx # normalized text output
docx-review --git-setup # print .gitattributes/.gitconfig instructions
JSON Manifest Format
This is the edit contract. Build this JSON, pass it to docx-review.
{
"author": "Reviewer Name",
"changes": [
{ "type": "replace", "find": "exact text in document", "replace": "new text" },
{ "type": "delete", "find": "exact text to delete" },
{ "type": "insert_after", "anchor": "exact anchor text", "text": "text to insert after" },
{ "type": "insert_before", "anchor": "exact anchor text", "text": "text to insert before" }
],
"comments": [
{ "anchor": "exact text to attach comment to", "text": "Comment content" }
]
}
Change types
| Type | Fields | Result in Word |
|---|---|---|
replace |
find, replace |
Red strikethrough old + blue new text |
delete |
find |
Red strikethrough |
insert_after |
anchor, text |
Blue inserted text after anchor |
insert_before |
anchor, text |
Blue inserted text before anchor |
Critical rules for find and anchor text
- Must be exact copy-paste from the document. The tool does ordinal string matching.
- Include enough context for uniqueness — 15+ words when the phrase is common.
- First occurrence wins. The tool replaces/anchors at the first match only.
- Use
--dry-run --jsonto validate all matches before applying.
JSON Output (--json)
{
"input": "paper.docx",
"output": "paper_reviewed.docx",
"author": "Dr. Smith",
"changes_attempted": 5,
"changes_succeeded": 5,
"comments_attempted": 3,
"comments_succeeded": 3,
"success": true,
"results": [
{ "index": 0, "type": "comment", "success": true, "message": "Comment added" },
{ "index": 0, "type": "replace", "success": true, "message": "Replaced" }
]
}
Exit code 0 = all succeeded. Exit code 1 = at least one failed (partial success possible).
Workflow: AI-Assisted Document Revision
Standard pattern for using docx-review with AI-generated edits:
Step 1: Extract text
docx-review manuscript.docx --read --json > doc_content.json
Or use pandoc for markdown extraction:
pandoc manuscript.docx -t markdown -o manuscript.md
Step 2: Generate the manifest
Feed the extracted text + instructions to the AI. Request output as a docx-review JSON manifest.
Use this system context when prompting for manifest generation:
Generate a JSON edit manifest for docx-review. Output format:
{
"author": "...",
"changes": [{"type": "replace|delete|insert_after|insert_before", ...}],
"comments": [{"anchor": "...", "text": "..."}]
}
CRITICAL: "find" and "anchor" values must be EXACT text from the document.
Include 15+ words of surrounding context for uniqueness. First match wins.
Step 3: Validate with dry run
docx-review manuscript.docx manifest.json --dry-run --json
Check for failures. If any edits fail ("success": false), fix the manifest (usually the find/anchor text doesn't match exactly) and retry.
Step 4: Apply
docx-review manuscript.docx manifest.json -o manuscript_reviewed.docx --json
Step 5: Verify (optional)
docx-review manuscript_reviewed.docx --read --json | jq '.summary'
docx-review --diff manuscript.docx manuscript_reviewed.docx
Workflow: Peer Review Response
For addressing reviewer comments on a manuscript:
- Extract manuscript text (
--read --jsonor pandoc) - Build manifest addressing each reviewer point — use
replacefor text changes,commentsto explain changes to the author - Dry-run validate
- Apply edits
- The output
.docxhas tracked changes the author can review in Word
Workflow: Proofreading
- Extract text
- Generate manifest with grammar/style fixes as
replacechanges and suggestions ascomments - Validate + apply
- Author opens in Word, accepts/rejects each change individually
Workflow: Template Population with Explicit Revision Trail (Advanced)
Only use this workflow when the user explicitly wants visible tracked changes during template population (for audit/review history):
- Start from a template-backed
.docx - Read it with
--read --jsonto identify exact placeholders - Build a manifest that replaces placeholders with real content
- Apply with
docx-review(tracked insertions/deletions are expected) - Review and accept/reject changes in Word
For normal first-draft generation, use pandoc manuscript.md -o manuscript.docx instead.
Key behaviors
- Comments applied first, then tracked changes. Ensures anchors resolve before XML is modified.
- Formatting preserved. RunProperties cloned from source runs onto both deleted and inserted text.
- Multi-run text matching. Text spanning multiple XML
<w:r>elements (common in previously edited documents) is found and handled correctly. - Everything untouched is preserved. Images, charts, bibliographies, footnotes, cross-references, styles, headers/footers survive intact.
Read mode output structure
For programmatic processing of --read --json output, see skill/references/read-schema.md.
Companion tools
The Open XML SDK ecosystem:
| Tool | Install | Purpose |
|---|---|---|
pptx-review |
brew install drpedapati/tools/pptx-review |
PowerPoint read/edit |
xlsx-review |
brew install drpedapati/tools/xlsx-review |
Excel read/edit |
Same architecture: .NET 8, Open XML SDK, single binary, JSON in/out.