docx

Installation

SKILL.md

DOCX Creation, Editing, and Analysis

Overview

A .docx file is a ZIP archive containing XML files.

Quick Reference

Task	Approach
Read/analyze content	`pandoc` or unpack for raw XML
Create new document	Use `docx-js` -- see Creating New Documents below
Edit existing document	Unpack -> edit XML -> repack -- see Editing Existing Documents below
Visual review	Convert DOCX -> PDF -> PNGs via `soffice` + `pdftoppm` or `scripts/render_docx.py`

Converting .doc to .docx

python scripts/office/soffice.py --headless --convert-to docx document.doc

Reading Content

pandoc --track-changes=all document.docx -o output.md
python scripts/office/unpack.py document.docx unpacked/

Converting to Images

python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page

Or use the bundled render script:

python scripts/render_docx.py /path/to/file.docx --output_dir /tmp/docx_pages

Accepting Tracked Changes

python scripts/accept_changes.py input.docx output.docx

Creating New Documents

Generate .docx files with JavaScript, then validate. Install: npm install -g docx

Setup

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

Validation

After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.

python scripts/office/validate.py doc.docx

Page Size

// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly
sections: [{
  properties: {
    page: {
      size: {
        width: 12240,   // 8.5 inches in DXA
        height: 15840   // 11 inches in DXA
      },
      margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 }
    }
  },
  children: [/* content */]
}]

Paper	Width	Height	Content Width (1" margins)
US Letter	12,240	15,840	9,360
A4 (default)	11,906	16,838	9,026

Landscape: pass portrait dimensions and let docx-js swap:

size: { width: 12240, height: 15840, orientation: PageOrientation.LANDSCAPE }

Styles (Override Built-in Headings)

Use Arial as default font. Keep titles black for readability.

const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } },
    paragraphStyles: [
      { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } },
      { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
    ]
  },
  sections: [{ children: [
    new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
  ]}]
});

Lists (NEVER use unicode bullets)

// WRONG
new Paragraph({ children: [new TextRun("\u2022 Item")] })

// CORRECT - use numbering config
const doc = new Document({
  numbering: {
    config: [
      { reference: "bullets",
        levels: [{ level: 0, format: LevelFormat.BULLET, text: "\u2022", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
      { reference: "numbers",
        levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
    ]
  },
  sections: [{ children: [
    new Paragraph({ numbering: { reference: "bullets", level: 0 }, children: [new TextRun("Item")] }),
  ]}]
});

Tables

CRITICAL: Set both columnWidths on the table AND width on each cell.

const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

new Table({
  width: { size: 9360, type: WidthType.DXA },
  columnWidths: [4680, 4680],
  rows: [
    new TableRow({
      children: [
        new TableCell({
          borders,
          width: { size: 4680, type: WidthType.DXA },
          shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
          margins: { top: 80, bottom: 80, left: 120, right: 120 },
          children: [new Paragraph({ children: [new TextRun("Cell")] })]
        })
      ]
    })
  ]
})

Always use WidthType.DXA -- never WidthType.PERCENTAGE (breaks in Google Docs).

Images

new Paragraph({
  children: [new ImageRun({
    type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
    data: fs.readFileSync("image.png"),
    transformation: { width: 200, height: 150 },
    altText: { title: "Title", description: "Desc", name: "Name" }
  })]
})

Page Breaks

new Paragraph({ children: [new PageBreak()] })
// Or: new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })

new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })

Headers/Footers

sections: [{
  headers: {
    default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
  },
  footers: {
    default: new Footer({ children: [new Paragraph({
      children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
    })] })
  },
  children: [/* content */]
}]

Critical Rules for docx-js

Set page size explicitly (defaults to A4)
Landscape: pass portrait dimensions, set orientation: PageOrientation.LANDSCAPE
Never use \n -- use separate Paragraph elements
Never use unicode bullets -- use LevelFormat.BULLET
PageBreak must be in Paragraph
ImageRun requires type
Always use WidthType.DXA for tables
Tables need dual widths: columnWidths + cell width
Use ShadingType.CLEAR, never SOLID
TOC requires HeadingLevel only
Override built-in styles with exact IDs: "Heading1", "Heading2", etc.
Include outlineLevel for TOC (0 for H1, 1 for H2)

Editing Existing Documents

Follow all 3 steps in order.

Step 1: Unpack

python scripts/office/unpack.py document.docx unpacked/

Step 2: Edit XML

Edit files in unpacked/word/. See XML Reference below.

Use the Edit tool directly for string replacement. Do not write Python scripts.

Use smart quotes for new content:

<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>

Entity	Character
`‘`	' (left single)
`’`	' (right single / apostrophe)
`“`	" (left double)
`”`	" (right double)

Adding comments: Use comment.py:

python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0
python scripts/comment.py unpacked/ 0 "Text" --author "Kortix Agent"

Then add markers to document.xml (see Comments in XML Reference).

Step 3: Pack

python scripts/office/pack.py unpacked/ output.docx --original document.docx

Common Pitfalls

Replace entire <w:r> elements when adding tracked changes
Preserve <w:rPr> formatting -- copy original run's formatting into tracked change runs

XML Reference

Schema Compliance

Element order in <w:pPr>: <w:pStyle>, <w:numPr>, <w:spacing>, <w:ind>, <w:jc>, <w:rPr> last
Whitespace: Add xml:space="preserve" to <w:t> with leading/trailing spaces
RSIDs: Must be 8-digit hex (e.g., 00AB1234)

Tracked Changes

Insertion:

<w:ins w:id="1" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z">
  <w:r><w:t>inserted text</w:t></w:r>
</w:ins>

Deletion:

<w:del w:id="2" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>

Inside <w:del>: use <w:delText> instead of <w:t>.

Minimal edits -- only mark what changes:

<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Kortix Agent" w:date="...">
  <w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Kortix Agent" w:date="...">
  <w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>

Deleting entire paragraphs -- also mark paragraph mark as deleted:

<w:p>
  <w:pPr>
    <w:rPr>
      <w:del w:id="1" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z"/>
    </w:rPr>
  </w:pPr>
  <w:del w:id="2" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z">
    <w:r><w:delText>Entire paragraph content...</w:delText></w:r>
  </w:del>
</w:p>

Rejecting another author's insertion:

<w:ins w:author="Jane" w:id="5">
  <w:del w:author="Kortix Agent" w:id="10">
    <w:r><w:delText>their inserted text</w:delText></w:r>
  </w:del>
</w:ins>

Restoring another author's deletion:

<w:del w:author="Jane" w:id="5">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Kortix Agent" w:id="10">
  <w:r><w:t>deleted text</w:t></w:r>
</w:ins>

Comments

Markers are direct children of w:p, never inside w:r:

<w:commentRangeStart w:id="0"/>
<w:r><w:t>commented text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

Images

Add image file to word/media/
Add relationship to word/_rels/document.xml.rels
Add content type to [Content_Types].xml
Reference in document.xml with <w:drawing>

Visual Review Workflow

Convert DOCX -> PDF -> PNGs
Use scripts/render_docx.py (requires pdf2image and Poppler)
After each meaningful change, re-render and inspect
If visual review is not possible, extract text with python-docx as fallback

Quality Expectations

Deliver client-ready documents: consistent typography, spacing, margins, clear hierarchy
Avoid formatting defects: clipped/overlapping text, broken tables, unreadable characters
Use ASCII hyphens only. Avoid U+2011 and other Unicode dashes
Re-render and inspect every page at 100% zoom before final delivery

Dependencies

pandoc: Text extraction
docx: npm install -g docx (new documents)
LibreOffice: PDF conversion (via scripts/office/soffice.py)
Poppler: pdftoppm for images
python-docx: Reading/analysis (pip install python-docx)
pdf2image: Rendering (pip install pdf2image)

Related skills

More from kortix-ai/kortix-registry

Installs

Repository

kortix-ai/korti…registry

GitHub Stars

First Seen

Mar 3, 2026

Security Audits

Gen Agent Trust HubFail

SocketWarn

SnykPass

docx

DOCX Creation, Editing, and Analysis

Overview

Quick Reference

Converting .doc to .docx

Reading Content

Converting to Images

Accepting Tracked Changes

Creating New Documents

Setup

Validation

Page Size

Styles (Override Built-in Headings)

Lists (NEVER use unicode bullets)

Tables

Images

Page Breaks

Table of Contents

Headers/Footers

Critical Rules for docx-js

Editing Existing Documents

Step 1: Unpack

Step 2: Edit XML

Step 3: Pack

Common Pitfalls

XML Reference

Schema Compliance

Tracked Changes

Comments

Images

Visual Review Workflow

Quality Expectations

Dependencies

More from kortix-ai/kortix-registry

openalex-paper-search

legal-writer

paper-creator

deep-research

memory-context-management

domain-research