docx

Originally fromanthropics/skills
SKILL.md

DOCX Creation, Editing, and Analysis

Overview

A .docx file is a ZIP archive containing XML files.

Quick Reference

Task Approach
Read/analyze content pandoc or unpack for raw XML
Create new document Use docx-js -- see Creating New Documents below
Edit existing document Unpack -> edit XML -> repack -- see Editing Existing Documents below
Visual review Convert DOCX -> PDF -> PNGs via soffice + pdftoppm or scripts/render_docx.py

Converting .doc to .docx

python scripts/office/soffice.py --headless --convert-to docx document.doc

Reading Content

pandoc --track-changes=all document.docx -o output.md
python scripts/office/unpack.py document.docx unpacked/

Converting to Images

python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page

Or use the bundled render script:

python scripts/render_docx.py /path/to/file.docx --output_dir /tmp/docx_pages

Accepting Tracked Changes

python scripts/accept_changes.py input.docx output.docx

Creating New Documents

Generate .docx files with JavaScript, then validate. Install: npm install -g docx

Setup

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

Validation

After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.

python scripts/office/validate.py doc.docx

Page Size

// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly
sections: [{
  properties: {
    page: {
      size: {
        width: 12240,   // 8.5 inches in DXA
        height: 15840   // 11 inches in DXA
      },
      margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 }
    }
  },
  children: [/* content */]
}]
Paper Width Height Content Width (1" margins)
US Letter 12,240 15,840 9,360
A4 (default) 11,906 16,838 9,026

Landscape: pass portrait dimensions and let docx-js swap:

size: { width: 12240, height: 15840, orientation: PageOrientation.LANDSCAPE }

Styles (Override Built-in Headings)

Use Arial as default font. Keep titles black for readability.

const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } },
    paragraphStyles: [
      { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } },
      { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
    ]
  },
  sections: [{ children: [
    new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
  ]}]
});

Lists (NEVER use unicode bullets)

// WRONG
new Paragraph({ children: [new TextRun("\u2022 Item")] })

// CORRECT - use numbering config
const doc = new Document({
  numbering: {
    config: [
      { reference: "bullets",
        levels: [{ level: 0, format: LevelFormat.BULLET, text: "\u2022", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
      { reference: "numbers",
        levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
    ]
  },
  sections: [{ children: [
    new Paragraph({ numbering: { reference: "bullets", level: 0 }, children: [new TextRun("Item")] }),
  ]}]
});

Tables

CRITICAL: Set both columnWidths on the table AND width on each cell.

const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

new Table({
  width: { size: 9360, type: WidthType.DXA },
  columnWidths: [4680, 4680],
  rows: [
    new TableRow({
      children: [
        new TableCell({
          borders,
          width: { size: 4680, type: WidthType.DXA },
          shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
          margins: { top: 80, bottom: 80, left: 120, right: 120 },
          children: [new Paragraph({ children: [new TextRun("Cell")] })]
        })
      ]
    })
  ]
})

Always use WidthType.DXA -- never WidthType.PERCENTAGE (breaks in Google Docs).

Images

new Paragraph({
  children: [new ImageRun({
    type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
    data: fs.readFileSync("image.png"),
    transformation: { width: 200, height: 150 },
    altText: { title: "Title", description: "Desc", name: "Name" }
  })]
})

Page Breaks

new Paragraph({ children: [new PageBreak()] })
// Or: new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })

Table of Contents

new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })

Headers/Footers

sections: [{
  headers: {
    default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
  },
  footers: {
    default: new Footer({ children: [new Paragraph({
      children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
    })] })
  },
  children: [/* content */]
}]

Critical Rules for docx-js

  • Set page size explicitly (defaults to A4)
  • Landscape: pass portrait dimensions, set orientation: PageOrientation.LANDSCAPE
  • Never use \n -- use separate Paragraph elements
  • Never use unicode bullets -- use LevelFormat.BULLET
  • PageBreak must be in Paragraph
  • ImageRun requires type
  • Always use WidthType.DXA for tables
  • Tables need dual widths: columnWidths + cell width
  • Use ShadingType.CLEAR, never SOLID
  • TOC requires HeadingLevel only
  • Override built-in styles with exact IDs: "Heading1", "Heading2", etc.
  • Include outlineLevel for TOC (0 for H1, 1 for H2)

Editing Existing Documents

Follow all 3 steps in order.

Step 1: Unpack

python scripts/office/unpack.py document.docx unpacked/

Step 2: Edit XML

Edit files in unpacked/word/. See XML Reference below.

Use the Edit tool directly for string replacement. Do not write Python scripts.

Use smart quotes for new content:

<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>
Entity Character
&#x2018; ' (left single)
&#x2019; ' (right single / apostrophe)
&#x201C; " (left double)
&#x201D; " (right double)

Adding comments: Use comment.py:

python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0
python scripts/comment.py unpacked/ 0 "Text" --author "Kortix Agent"

Then add markers to document.xml (see Comments in XML Reference).

Step 3: Pack

python scripts/office/pack.py unpacked/ output.docx --original document.docx

Common Pitfalls

  • Replace entire <w:r> elements when adding tracked changes
  • Preserve <w:rPr> formatting -- copy original run's formatting into tracked change runs

XML Reference

Schema Compliance

  • Element order in <w:pPr>: <w:pStyle>, <w:numPr>, <w:spacing>, <w:ind>, <w:jc>, <w:rPr> last
  • Whitespace: Add xml:space="preserve" to <w:t> with leading/trailing spaces
  • RSIDs: Must be 8-digit hex (e.g., 00AB1234)

Tracked Changes

Insertion:

<w:ins w:id="1" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z">
  <w:r><w:t>inserted text</w:t></w:r>
</w:ins>

Deletion:

<w:del w:id="2" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>

Inside <w:del>: use <w:delText> instead of <w:t>.

Minimal edits -- only mark what changes:

<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Kortix Agent" w:date="...">
  <w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Kortix Agent" w:date="...">
  <w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>

Deleting entire paragraphs -- also mark paragraph mark as deleted:

<w:p>
  <w:pPr>
    <w:rPr>
      <w:del w:id="1" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z"/>
    </w:rPr>
  </w:pPr>
  <w:del w:id="2" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z">
    <w:r><w:delText>Entire paragraph content...</w:delText></w:r>
  </w:del>
</w:p>

Rejecting another author's insertion:

<w:ins w:author="Jane" w:id="5">
  <w:del w:author="Kortix Agent" w:id="10">
    <w:r><w:delText>their inserted text</w:delText></w:r>
  </w:del>
</w:ins>

Restoring another author's deletion:

<w:del w:author="Jane" w:id="5">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Kortix Agent" w:id="10">
  <w:r><w:t>deleted text</w:t></w:r>
</w:ins>

Comments

Markers are direct children of w:p, never inside w:r:

<w:commentRangeStart w:id="0"/>
<w:r><w:t>commented text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

Images

  1. Add image file to word/media/
  2. Add relationship to word/_rels/document.xml.rels
  3. Add content type to [Content_Types].xml
  4. Reference in document.xml with <w:drawing>

Visual Review Workflow

  1. Convert DOCX -> PDF -> PNGs
  2. Use scripts/render_docx.py (requires pdf2image and Poppler)
  3. After each meaningful change, re-render and inspect
  4. If visual review is not possible, extract text with python-docx as fallback

Quality Expectations

  • Deliver client-ready documents: consistent typography, spacing, margins, clear hierarchy
  • Avoid formatting defects: clipped/overlapping text, broken tables, unreadable characters
  • Use ASCII hyphens only. Avoid U+2011 and other Unicode dashes
  • Re-render and inspect every page at 100% zoom before final delivery

Dependencies

  • pandoc: Text extraction
  • docx: npm install -g docx (new documents)
  • LibreOffice: PDF conversion (via scripts/office/soffice.py)
  • Poppler: pdftoppm for images
  • python-docx: Reading/analysis (pip install python-docx)
  • pdf2image: Rendering (pip install pdf2image)
Weekly Installs
3
GitHub Stars
2
First Seen
12 days ago
Installed on
cline3
gemini-cli3
github-copilot3
codex3
kimi-cli3
cursor3