document-docx
SKILL.md
Document DOCX Skill - Quick Reference
This skill enables creation, editing, and analysis of .docx files for reports, contracts, proposals, documentation, and template-driven outputs.
Modern best practices (2026):
- Prefer templates + styles over manual formatting.
- Treat
.docxas the editable source; treat PDF as a release artifact. - If distributing externally, include basic accessibility hygiene (headings, table headers, alt text).
Quick Reference
| Task | Tool/Library | Language | When to Use |
|---|---|---|---|
| Create DOCX | python-docx | Python | Reports, contracts, proposals |
| Create DOCX | docx | Node.js | Server-side document generation |
| Convert to HTML | mammoth.js | Node.js | Web display, content extraction |
| Parse DOCX | python-docx | Python | Extract text, tables, metadata |
| Template fill | docxtpl | Python | Mail merge, template-based generation |
| Review workflow | Word compare, comments/highlights | Any | Human review without OOXML surgery |
| Tracked changes | OOXML inspection, docx4j/OpenXML SDK/Aspose | Any | True redlines or parsing tracked changes |
Tool Selection
- Prefer
docxtplwhen non-developers must edit layout/design in Word. - Prefer
python-docxfor structural edits (paragraphs/tables/headers/footers) when formatting complexity is moderate. - Prefer
docx(Node.js) for server-side generation in TypeScript-heavy stacks. - Prefer
mammothfor text-first extraction or DOCX-to-HTML (best effort; may drop some layout fidelity).
Known Limits (Plan Around These)
.doc(legacy) is not supported by these libraries; convert to.docxfirst (e.g., LibreOffice).python-docxcannot reliably create true tracked changes; use Word compare or specialized OOXML tooling.- Tables of Contents and many fields are placeholders until opened/updated in Word.
Core Operations
Create Document (Python - python-docx)
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
# Title
title = doc.add_heading('Document Title', 0)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER
# Paragraph with formatting
para = doc.add_paragraph()
run = para.add_run('Bold and ')
run.bold = True
run = para.add_run('italic text.')
run.italic = True
# Table
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid'
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
cell.text = f'Row {i+1}, Col {j+1}'
# Image
doc.add_picture('image.png', width=Inches(4))
# Save
doc.save('output.docx')
Create Document (Node.js - docx)
import { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell } from 'docx';
import * as fs from 'fs';
const doc = new Document({
sections: [{
properties: {},
children: [
new Paragraph({
children: [
new TextRun({ text: 'Bold text', bold: true }),
new TextRun({ text: ' and normal text.' }),
],
}),
new Table({
rows: [
new TableRow({
children: [
new TableCell({ children: [new Paragraph('Cell 1')] }),
new TableCell({ children: [new Paragraph('Cell 2')] }),
],
}),
],
}),
],
}],
});
Packer.toBuffer(doc).then((buffer) => {
fs.writeFileSync('output.docx', buffer);
});
Template-Based Generation (Python - docxtpl)
from docxtpl import DocxTemplate
doc = DocxTemplate('template.docx')
context = {
'company_name': 'Acme Corp',
'date': '2025-01-15',
'items': [
{'name': 'Widget A', 'price': 100},
{'name': 'Widget B', 'price': 200},
]
}
doc.render(context)
doc.save('filled_template.docx')
Extract Content (Python - python-docx)
from docx import Document
doc = Document('input.docx')
# Extract all text
full_text = []
for para in doc.paragraphs:
full_text.append(para.text)
# Extract tables
for table in doc.tables:
for row in table.rows:
row_data = [cell.text for cell in row.cells]
print(row_data)
Styling Reference
| Element | Python Method | Node.js Class |
|---|---|---|
| Heading 1 | add_heading(text, 1) |
HeadingLevel.HEADING_1 |
| Bold | run.bold = True |
TextRun({ bold: true }) |
| Italic | run.italic = True |
TextRun({ italics: true }) |
| Font size | run.font.size = Pt(12) |
TextRun({ size: 24 }) (half-points) |
| Alignment | WD_ALIGN_PARAGRAPH.CENTER |
AlignmentType.CENTER |
| Page break | doc.add_page_break() |
new PageBreak() |
Do / Avoid (Dec 2025)
Do
- Use consistent heading levels and a table of contents for long docs.
- Capture decisions and action items with owners and due dates.
- Store docs in a versioned, searchable system.
Avoid
- Manual formatting instead of styles (breaks consistency).
- Docs with no owner or review cadence (stale quickly).
- Copy/pasting without updating definitions and links.
Output Quality Checklist
- Structure: consistent heading hierarchy, styles, and (when needed) an auto-generated table of contents.
- Decisions: decisions/actions captured with owner + due date (not buried in prose).
- Versioning: doc ID + version + change summary; review cadence defined.
- Accessibility hygiene: headings/reading order are correct; table headers are marked; alt text for non-decorative images.
- Reuse: use
assets/doc-template-pack.mdfor decision logs and recurring doc types.
Optional: AI / Automation
Use only when explicitly requested and policy-compliant.
- Summarize meeting notes into decisions/actions; humans verify accuracy.
- Draft first-pass docs from outlines; do not invent facts or quotes.
Navigation
Resources
- references/docx-patterns.md - Advanced formatting, styles, headers/footers
- references/template-workflows.md - Mail merge, batch generation
- references/tracked-changes.md - Tracked changes: what is feasible, and what is not
- references/accessibility-compliance.md - WCAG 2.2 AA, reading order, alt text, EU EAA
- references/cross-platform-compatibility.md - Rendering across Word, Google Docs, LibreOffice
- references/document-automation-pipelines.md - CI/CD batch generation, quality gates
- data/sources.json - Library documentation links
Scripts
scripts/docx_inspect_ooxml.py- Dependency-free OOXML inspection (including tracked changes signals)scripts/docx_extract.py- Extract text/tables to JSON (requirespython-docx)scripts/docx_render_template.py- Render adocxtpltemplate (requiresdocxtpl)scripts/docx_to_html.mjs- Convert.docxto HTML (requiresmammoth)
Templates
- assets/report-template.md - Standard report structure
- assets/contract-template.md - Legal document structure
- assets/doc-template-pack.md - Decision log, meeting notes, changelog templates
Related Skills
- ../document-pdf/SKILL.md - PDF generation and conversion
- ../docs-codebase/SKILL.md - Technical writing patterns
Fact-Checking
- Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
- Prefer primary sources; report source links and dates for volatile information.
- If web access is unavailable, state the limitation and mark guidance as unverified.
Weekly Installs
123
Repository
vasilyu1983/ai-…s-publicGitHub Stars
42
First Seen
Jan 23, 2026
Security Audits
Installed on
opencode103
gemini-cli98
cursor97
codex94
github-copilot89
claude-code81