oma-hwp
HWP Skill - HWP / HWPX / HWPML to Markdown Conversion
Scheduling
Goal
Convert Korean HWP-family documents into readable Markdown or structured JSON while preserving document structure for LLM context, RAG, government-document review, or enterprise document processing.
Intent signature
- User asks to convert, parse, read, extract, or transform
.hwp,.hwpx, or.hwpml. - User mentions Korean word processor files, Hangul documents, government forms, or "한글 파일".
- User needs headings, tables, nested tables, lists, images, footnotes, or hyperlinks extracted from HWP-family files.
When to use
- Converting Korean HWP documents (
.hwp,.hwpx,.hwpml) to Markdown - Preparing Korean government/enterprise documents for LLM context or RAG
- Extracting structured content (tables, headings, lists, images) from HWP
- User says "convert this HWP", "parse hwpx", "HWP to markdown", "한글 파일"
When NOT to use
- PDF files -> use
oma-pdf(OCR + Tagged PDF specialization) - XLSX / DOCX files -> currently out of scope (may be covered by a future
oma-docs) - Generating or editing HWP documents -> out of scope
- Already-text files -> use Read tool directly
Expected inputs
input_path:.hwp,.hwpx, or.hwpmlfile pathoutput_pathoroutput_dir: optional explicit output targetformat: optional output format, defaultmarkdownpage_range: optional page or section rangekordoc_version: optional pinned kordoc version
Expected outputs
- Markdown output next to the input file or in the requested directory
- Optional JSON output when requested
- Post-processed Markdown with flattened GFM tables and stripped Private Use Area glyphs by default
- A short report with output path, detected source format, and conversion issues
Dependencies
bunandbunxbunx kordoc@latestor configured pinned kordoc versionresources/flatten-tables.tsfor Markdown cleanup- Local filesystem access to input and output paths
Control-flow features
- Branches by file extension, output target, format, page range, encryption/DRM state, and post-processing requirements
- Calls external CLI tools through
bunxandbun run - Reads local HWP-family files and writes local Markdown or JSON output
- Routes non-HWP inputs to other skills instead of stretching this skill's scope
Structural Flow
Entry
- Confirm the input path exists.
- Confirm the extension is
.hwp,.hwpx, or.hwpml. - Resolve output path or directory and default filename.
- Check that
bunis available.
Scenes
- PREPARE: Validate path, extension, size, output target, and requested format.
- ACQUIRE: Detect source format and runtime availability.
- ACT: Run
kordocwith explicit output target and requested options. - VERIFY: Post-process Markdown and inspect structure for headings, tables, lists, images, and footnotes.
- FINALIZE: Report output path, source format, and any conversion limitations.
Transitions
- If the input is
.pdf, stop and route tooma-pdf. - If the input is
.xlsxor.docx, explain that this skill does not advertise those formats. - If
bunis unavailable, stop and ask the user to install Bun. - If Markdown is produced, run
resources/flatten-tables.tsunless the caller explicitly needs HTML tables or PUA glyphs preserved. - If output is empty or garbled, consult
resources/troubleshooting.md.
Failure and recovery
| Failure | Recovery |
|---|---|
bun or bunx unavailable |
Ask user to install Bun |
| Unsupported or mismatched format | Check extension and magic bytes, then route or stop |
| Encrypted or DRM-locked document | Report limitation and request an accessible copy when needed |
| Empty Markdown output | Treat as possible scanned-image content and recommend OCR outside this skill |
| Complex merged tables | Accept flattened Markdown or HTML fallback as best effort |
| Stale kordoc cache | Use bunx kordoc@latest or configured pinned version |
Exit
- Success: output file exists and structure is readable after post-processing.
- Partial success: output exists with explicitly reported table, glyph, encryption, or fidelity limitations.
- Failure: no reliable output is produced and the blocking cause is reported.
Logical Operations
Actions
| Action | SSL primitive | Evidence |
|---|---|---|
| Validate file path and extension | VALIDATE |
Input preflight in execution protocol |
| Check runtime availability | VALIDATE |
bun --version |
| Select output target and format | SELECT |
Output behavior and config |
| Run converter | CALL_TOOL |
bunx kordoc@latest |
| Write output artifact | WRITE |
Markdown or JSON output |
| Flatten tables and strip PUA glyphs | CALL_TOOL |
resources/flatten-tables.ts |
| Inspect extraction quality | VALIDATE |
Verification step |
| Report result | NOTIFY |
Final user-facing summary |
Tools and instruments
kordoc: primary HWP-family conversion CLIflatten-tables.ts: post-processing for GFM tables and Hancom PUA cleanupbun/bunx: runtime and CLI executor
Canonical command path
bunx kordoc@latest "{input_path}" -o "{output_path}"
bun run ".agents/skills/oma-hwp/resources/flatten-tables.ts" "{output_path}"
For batch conversion, use an explicit output directory:
bunx kordoc@latest "{input_pattern}" -d "{output_dir}"
Resource scope
| Scope | Resource target |
|---|---|
LOCAL_FS |
Input HWP-family files and generated outputs |
PROCESS |
bunx kordoc and bun run subprocesses |
MEMORY |
Format decisions, validation notes, and final report |
Preconditions
- Input file exists and is readable.
- Output location is writable or can be created.
bunis installed.kordoccan parse the document or fail with a reportable error.
Effects and side effects
- Creates Markdown or JSON output files.
- May flatten merged-cell tables, trading cell fidelity for Markdown compatibility.
- Strips Private Use Area characters by default because they render as blanks without Hancom fonts.
- Does not intentionally modify the source HWP-family document.
Guardrails
- Always pass
@latestor an explicit pinned version to avoid stalebunxcache. - Always pass an explicit output target when the user expects a file.
- Do not add custom security defenses around kordoc's ZIP, XML, SSRF, or XSS defenses.
- Report missing tables, garbled text, empty output, encrypted segments, and best-effort DRM extraction.
- Keep full CLI details in
resources/execution-protocol.mdand troubleshooting branches inresources/troubleshooting.md.
Supported Formats
| Format | Extension | Notes |
|---|---|---|
| HWP 5.x binary | .hwp |
Full support (incl. DRM-locked via kordoc's rhwp-algorithm port) |
| HWPX | .hwpx |
Full support incl. nested tables, merged cells |
| HWPML | .hwp (XML variant) |
Auto-detected by signature |
kordoc also parses PDF / XLSX / DOCX. Those are intentionally outside this skill's scope — see "When NOT to use".
References
- Execution protocol:
resources/execution-protocol.md - Troubleshooting:
resources/troubleshooting.md - Configuration:
config/hwp-config.yaml - Upstream: https://github.com/chrisryugj/kordoc
- Related:
../oma-pdf/SKILL.md(use for.pdfinputs)
More from first-fluke/oh-my-ag
pm-agent
Product manager that decomposes requirements into actionable tasks with priorities and dependencies. Use for planning, requirements, specification, scope, prioritization, task breakdown, and ISO 21500, ISO 31000, or ISO 38500-aligned planning recommendations.
45orchestrator
Automated multi-agent orchestrator that spawns CLI subagents in parallel, coordinates via MCP Memory, and monitors progress. Use for orchestration, parallel execution, and automated multi-agent workflows.
44qa-agent
Quality assurance specialist for security, performance, accessibility, comprehensive testing, and quality standard alignment. Use for test, review, security audit, OWASP, coverage, lint work, and ISO/IEC 25010 or ISO/IEC 29119-aligned QA recommendations.
43multi-agent-workflow
Guide for coordinating PM, Frontend, Backend, Mobile, and QA agents on complex projects via CLI. Use for manual step-by-step coordination and workflow guidance.
43frontend-agent
Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
42mobile-agent
Mobile specialist for Flutter, React Native, and cross-platform mobile development. Use for mobile app, Flutter, Dart, iOS, Android, Riverpod, and widget work.
42