oma-hwp

Installation
SKILL.md

HWP Skill - HWP / HWPX / HWPML to Markdown Conversion

Scheduling

Goal

Convert Korean HWP-family documents into readable Markdown or structured JSON while preserving document structure for LLM context, RAG, government-document review, or enterprise document processing.

Intent signature

  • User asks to convert, parse, read, extract, or transform .hwp, .hwpx, or .hwpml.
  • User mentions Korean word processor files, Hangul documents, government forms, or "한글 파일".
  • User needs headings, tables, nested tables, lists, images, footnotes, or hyperlinks extracted from HWP-family files.

When to use

  • Converting Korean HWP documents (.hwp, .hwpx, .hwpml) to Markdown
  • Preparing Korean government/enterprise documents for LLM context or RAG
  • Extracting structured content (tables, headings, lists, images) from HWP
  • User says "convert this HWP", "parse hwpx", "HWP to markdown", "한글 파일"

When NOT to use

  • PDF files -> use oma-pdf (OCR + Tagged PDF specialization)
  • XLSX / DOCX files -> currently out of scope (may be covered by a future oma-docs)
  • Generating or editing HWP documents -> out of scope
  • Already-text files -> use Read tool directly

Expected inputs

  • input_path: .hwp, .hwpx, or .hwpml file path
  • output_path or output_dir: optional explicit output target
  • format: optional output format, default markdown
  • page_range: optional page or section range
  • kordoc_version: optional pinned kordoc version

Expected outputs

  • Markdown output next to the input file or in the requested directory
  • Optional JSON output when requested
  • Post-processed Markdown with flattened GFM tables and stripped Private Use Area glyphs by default
  • A short report with output path, detected source format, and conversion issues

Dependencies

  • bun and bunx
  • bunx kordoc@latest or configured pinned kordoc version
  • resources/flatten-tables.ts for Markdown cleanup
  • Local filesystem access to input and output paths

Control-flow features

  • Branches by file extension, output target, format, page range, encryption/DRM state, and post-processing requirements
  • Calls external CLI tools through bunx and bun run
  • Reads local HWP-family files and writes local Markdown or JSON output
  • Routes non-HWP inputs to other skills instead of stretching this skill's scope

Structural Flow

Entry

  1. Confirm the input path exists.
  2. Confirm the extension is .hwp, .hwpx, or .hwpml.
  3. Resolve output path or directory and default filename.
  4. Check that bun is available.

Scenes

  1. PREPARE: Validate path, extension, size, output target, and requested format.
  2. ACQUIRE: Detect source format and runtime availability.
  3. ACT: Run kordoc with explicit output target and requested options.
  4. VERIFY: Post-process Markdown and inspect structure for headings, tables, lists, images, and footnotes.
  5. FINALIZE: Report output path, source format, and any conversion limitations.

Transitions

  • If the input is .pdf, stop and route to oma-pdf.
  • If the input is .xlsx or .docx, explain that this skill does not advertise those formats.
  • If bun is unavailable, stop and ask the user to install Bun.
  • If Markdown is produced, run resources/flatten-tables.ts unless the caller explicitly needs HTML tables or PUA glyphs preserved.
  • If output is empty or garbled, consult resources/troubleshooting.md.

Failure and recovery

Failure Recovery
bun or bunx unavailable Ask user to install Bun
Unsupported or mismatched format Check extension and magic bytes, then route or stop
Encrypted or DRM-locked document Report limitation and request an accessible copy when needed
Empty Markdown output Treat as possible scanned-image content and recommend OCR outside this skill
Complex merged tables Accept flattened Markdown or HTML fallback as best effort
Stale kordoc cache Use bunx kordoc@latest or configured pinned version

Exit

  • Success: output file exists and structure is readable after post-processing.
  • Partial success: output exists with explicitly reported table, glyph, encryption, or fidelity limitations.
  • Failure: no reliable output is produced and the blocking cause is reported.

Logical Operations

Actions

Action SSL primitive Evidence
Validate file path and extension VALIDATE Input preflight in execution protocol
Check runtime availability VALIDATE bun --version
Select output target and format SELECT Output behavior and config
Run converter CALL_TOOL bunx kordoc@latest
Write output artifact WRITE Markdown or JSON output
Flatten tables and strip PUA glyphs CALL_TOOL resources/flatten-tables.ts
Inspect extraction quality VALIDATE Verification step
Report result NOTIFY Final user-facing summary

Tools and instruments

  • kordoc: primary HWP-family conversion CLI
  • flatten-tables.ts: post-processing for GFM tables and Hancom PUA cleanup
  • bun / bunx: runtime and CLI executor

Canonical command path

bunx kordoc@latest "{input_path}" -o "{output_path}"
bun run ".agents/skills/oma-hwp/resources/flatten-tables.ts" "{output_path}"

For batch conversion, use an explicit output directory:

bunx kordoc@latest "{input_pattern}" -d "{output_dir}"

Resource scope

Scope Resource target
LOCAL_FS Input HWP-family files and generated outputs
PROCESS bunx kordoc and bun run subprocesses
MEMORY Format decisions, validation notes, and final report

Preconditions

  • Input file exists and is readable.
  • Output location is writable or can be created.
  • bun is installed.
  • kordoc can parse the document or fail with a reportable error.

Effects and side effects

  • Creates Markdown or JSON output files.
  • May flatten merged-cell tables, trading cell fidelity for Markdown compatibility.
  • Strips Private Use Area characters by default because they render as blanks without Hancom fonts.
  • Does not intentionally modify the source HWP-family document.

Guardrails

  1. Always pass @latest or an explicit pinned version to avoid stale bunx cache.
  2. Always pass an explicit output target when the user expects a file.
  3. Do not add custom security defenses around kordoc's ZIP, XML, SSRF, or XSS defenses.
  4. Report missing tables, garbled text, empty output, encrypted segments, and best-effort DRM extraction.
  5. Keep full CLI details in resources/execution-protocol.md and troubleshooting branches in resources/troubleshooting.md.

Supported Formats

Format Extension Notes
HWP 5.x binary .hwp Full support (incl. DRM-locked via kordoc's rhwp-algorithm port)
HWPX .hwpx Full support incl. nested tables, merged cells
HWPML .hwp (XML variant) Auto-detected by signature

kordoc also parses PDF / XLSX / DOCX. Those are intentionally outside this skill's scope — see "When NOT to use".

References

  • Execution protocol: resources/execution-protocol.md
  • Troubleshooting: resources/troubleshooting.md
  • Configuration: config/hwp-config.yaml
  • Upstream: https://github.com/chrisryugj/kordoc
  • Related: ../oma-pdf/SKILL.md (use for .pdf inputs)
Related skills

More from first-fluke/oh-my-agent

Installs
7
GitHub Stars
914
First Seen
Apr 18, 2026