oma-hwp
HWP Skill - HWP / HWPX / HWPML to Markdown Conversion
Scheduling
Goal
Convert Korean HWP-family documents into readable Markdown or structured JSON while preserving document structure for LLM context, RAG, government-document review, or enterprise document processing.
Intent signature
- User asks to convert, parse, read, extract, or transform
.hwp,.hwpx, or.hwpml. - User mentions Korean word processor files, Hangul documents, government forms, or "한글 파일".
- User needs headings, tables, nested tables, lists, images, footnotes, or hyperlinks extracted from HWP-family files.
When to use
- Converting Korean HWP documents (
.hwp,.hwpx,.hwpml) to Markdown - Preparing Korean government/enterprise documents for LLM context or RAG
- Extracting structured content (tables, headings, lists, images) from HWP
- User says "convert this HWP", "parse hwpx", "HWP to markdown", "한글 파일"
When NOT to use
- PDF files -> use
oma-pdf(OCR + Tagged PDF specialization) - XLSX / DOCX files -> currently out of scope (may be covered by a future
oma-docs) - Generating or editing HWP documents -> out of scope
- Already-text files -> use Read tool directly
Expected inputs
input_path:.hwp,.hwpx, or.hwpmlfile pathoutput_pathoroutput_dir: optional explicit output targetformat: optional output format, defaultmarkdownpage_range: optional page or section rangekordoc_version: optional pinned kordoc version
Expected outputs
- Markdown output next to the input file or in the requested directory
- Optional JSON output when requested
- Post-processed Markdown with flattened GFM tables and stripped Private Use Area glyphs by default
- A short report with output path, detected source format, and conversion issues
Dependencies
bunandbunxbunx kordoc@latestor configured pinned kordoc versionresources/flatten-tables.tsfor Markdown cleanup- Local filesystem access to input and output paths
Control-flow features
- Branches by file extension, output target, format, page range, encryption/DRM state, and post-processing requirements
- Calls external CLI tools through
bunxandbun run - Reads local HWP-family files and writes local Markdown or JSON output
- Routes non-HWP inputs to other skills instead of stretching this skill's scope
Structural Flow
Entry
- Confirm the input path exists.
- Confirm the extension is
.hwp,.hwpx, or.hwpml. - Resolve output path or directory and default filename.
- Check that
bunis available.
Scenes
- PREPARE: Validate path, extension, size, output target, and requested format.
- ACQUIRE: Detect source format and runtime availability.
- ACT: Run
kordocwith explicit output target and requested options. - VERIFY: Post-process Markdown and inspect structure for headings, tables, lists, images, and footnotes.
- FINALIZE: Report output path, source format, and any conversion limitations.
Transitions
- If the input is
.pdf, stop and route tooma-pdf. - If the input is
.xlsxor.docx, explain that this skill does not advertise those formats. - If
bunis unavailable, stop and ask the user to install Bun. - If Markdown is produced, run
resources/flatten-tables.tsunless the caller explicitly needs HTML tables or PUA glyphs preserved. - If output is empty or garbled, consult
resources/troubleshooting.md.
Failure and recovery
| Failure | Recovery |
|---|---|
bun or bunx unavailable |
Ask user to install Bun |
| Unsupported or mismatched format | Check extension and magic bytes, then route or stop |
| Encrypted or DRM-locked document | Report limitation and request an accessible copy when needed |
| Empty Markdown output | Treat as possible scanned-image content and recommend OCR outside this skill |
| Complex merged tables | Accept flattened Markdown or HTML fallback as best effort |
| Stale kordoc cache | Use bunx kordoc@latest or configured pinned version |
Exit
- Success: output file exists and structure is readable after post-processing.
- Partial success: output exists with explicitly reported table, glyph, encryption, or fidelity limitations.
- Failure: no reliable output is produced and the blocking cause is reported.
Logical Operations
Actions
| Action | SSL primitive | Evidence |
|---|---|---|
| Validate file path and extension | VALIDATE |
Input preflight in execution protocol |
| Check runtime availability | VALIDATE |
bun --version |
| Select output target and format | SELECT |
Output behavior and config |
| Run converter | CALL_TOOL |
bunx kordoc@latest |
| Write output artifact | WRITE |
Markdown or JSON output |
| Flatten tables and strip PUA glyphs | CALL_TOOL |
resources/flatten-tables.ts |
| Inspect extraction quality | VALIDATE |
Verification step |
| Report result | NOTIFY |
Final user-facing summary |
Tools and instruments
kordoc: primary HWP-family conversion CLIflatten-tables.ts: post-processing for GFM tables and Hancom PUA cleanupbun/bunx: runtime and CLI executor
Canonical command path
bunx kordoc@latest "{input_path}" -o "{output_path}"
bun run ".agents/skills/oma-hwp/resources/flatten-tables.ts" "{output_path}"
For batch conversion, use an explicit output directory:
bunx kordoc@latest "{input_pattern}" -d "{output_dir}"
Resource scope
| Scope | Resource target |
|---|---|
LOCAL_FS |
Input HWP-family files and generated outputs |
PROCESS |
bunx kordoc and bun run subprocesses |
MEMORY |
Format decisions, validation notes, and final report |
Preconditions
- Input file exists and is readable.
- Output location is writable or can be created.
bunis installed.kordoccan parse the document or fail with a reportable error.
Effects and side effects
- Creates Markdown or JSON output files.
- May flatten merged-cell tables, trading cell fidelity for Markdown compatibility.
- Strips Private Use Area characters by default because they render as blanks without Hancom fonts.
- Does not intentionally modify the source HWP-family document.
Guardrails
- Always pass
@latestor an explicit pinned version to avoid stalebunxcache. - Always pass an explicit output target when the user expects a file.
- Do not add custom security defenses around kordoc's ZIP, XML, SSRF, or XSS defenses.
- Report missing tables, garbled text, empty output, encrypted segments, and best-effort DRM extraction.
- Keep full CLI details in
resources/execution-protocol.mdand troubleshooting branches inresources/troubleshooting.md.
Supported Formats
| Format | Extension | Notes |
|---|---|---|
| HWP 5.x binary | .hwp |
Full support (incl. DRM-locked via kordoc's rhwp-algorithm port) |
| HWPX | .hwpx |
Full support incl. nested tables, merged cells |
| HWPML | .hwp (XML variant) |
Auto-detected by signature |
kordoc also parses PDF / XLSX / DOCX. Those are intentionally outside this skill's scope — see "When NOT to use".
References
- Execution protocol:
resources/execution-protocol.md - Troubleshooting:
resources/troubleshooting.md - Configuration:
config/hwp-config.yaml - Upstream: https://github.com/chrisryugj/kordoc
- Related:
../oma-pdf/SKILL.md(use for.pdfinputs)
More from first-fluke/oh-my-agent
oma-qa
Quality assurance specialist for security, performance, accessibility, comprehensive testing, and quality standard alignment. Use for test, review, security audit, OWASP, coverage, lint work, and ISO/IEC 25010 or ISO/IEC 29119-aligned QA recommendations.
14oma-frontend
Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
12oma-pm
Product manager that decomposes requirements into actionable tasks with priorities and dependencies. Use for planning, requirements, specification, scope, prioritization, task breakdown, and ISO 21500, ISO 31000, or ISO 38500-aligned planning recommendations.
11oma-db
Database specialist for SQL, NoSQL, and vector database modeling, schema design, normalization, indexing, transactions, integrity, concurrency control, backup, capacity planning, data standards, anti-pattern review, and compliance-aware database design. Use for database, schema, ERD, table design, document model, vector index design, RAG retrieval architecture, migration, query tuning, glossary, capacity estimation, backup strategy, database anti-pattern remediation work, and ISO 27001, ISO 27002, or ISO 22301-aware database recommendations.
11oma-backend
Backend specialist for APIs, databases, authentication with clean architecture (Repository/Service/Router pattern). Use for API, endpoint, REST, database, server, migration, and auth work.
11oma-translator
Context-aware translation that preserves tone, style, and natural word order. Use when translating UI strings, documentation, marketing copy, or any multilingual content. Infers register, domain, and style from the source text and surrounding codebase context.
9