oma-pdf
PDF Skill - PDF to Markdown Conversion
Scheduling
Goal
Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review.
Intent signature
- User asks to convert, parse, read, extract, or transform a PDF.
- User needs PDF text, headings, lists, tables, or images prepared for AI consumption.
- User mentions "PDF to markdown", "parse PDF", "read this PDF", or equivalent wording.
When to use
- Converting PDF documents to Markdown for LLM context or RAG
- Extracting structured content such as tables, headings, lists, images, footnotes, or hyperlinks
- Preparing PDF data for AI consumption
- Checking whether a PDF has a text layer before choosing OCR
When NOT to use
- Generating or creating PDFs -> use document-generation tools
- Editing existing PDFs -> out of scope
- Reading an already-text file -> use direct file reading
- Processing HWP, HWPX, DOCX, XLSX, or slide decks -> use the matching document skill
Expected inputs
input_path: PDF file or folder pathoutput_dir: optional target directoryformat: optional output format, defaultmarkdownocr_languages: optional OCR language list for scanned or image-based PDFsextraction_options: optional flags for tagged structure, image extraction, or hybrid conversion
Expected outputs
- Markdown, text, JSON, HTML, or combined extraction output
- Normalized Markdown when Markdown is produced
- A short report with output path, page count, and conversion issues
Dependencies
uvx opendataloader-pdffor standard conversionuvx opendataloader-pdf-hybridfor OCR or hybrid conversionuvx mdformatfor Markdown normalization- Local filesystem access to input and output paths
- Optional OCR runtime via the hybrid server
Control-flow features
- Branches on text-layer quality, tagged PDF availability, scan/OCR needs, and user-requested output format
- Calls external CLI tools through
uvx - Reads local files and writes local extraction outputs
- Uses a hybrid server only when OCR or complex extraction needs justify it
Structural Flow
Entry
- Confirm that the input path exists and is a PDF file, PDF folder, or supported batch input.
- Check file size and warn when the input is large enough to risk slow conversion or memory pressure.
- Resolve
output_dirand the expected output filename.
Scenes
- PREPARE: Validate the input path, output target, and requested extraction options.
- ACQUIRE: Assess whether the PDF has a readable text layer by extracting a text preview.
- ACT: Convert using standard mode, tagged-structure mode, or hybrid OCR mode.
- VERIFY: Run
mdformatfor Markdown output and inspect the result for readable structure. - FINALIZE: Report output path, page count, format, and any extraction quality issues.
Transitions
- If the preview text is readable, use standard conversion.
- If the PDF is tagged and standard output is garbled, retry with
--use-struct-tree. - If the PDF is scanned or image-based, start or reuse the hybrid OCR server and convert with hybrid mode.
- If conversion fails because the PDF is encrypted, stop and ask for the password or an unlocked copy.
- If conversion hits memory or size limits, process smaller page ranges or batches.
Failure and recovery
| Failure | Recovery |
|---|---|
uvx unavailable |
Ask user to install uv before conversion |
| Password-protected PDF | Ask for password or unlocked PDF |
| Garbled output | Retry with tagged structure or hybrid mode |
| Missing tables | Retry with hybrid mode for complex or borderless tables |
| OCR language mismatch | Retry with explicit OCR languages, for example ko,en |
| Large file or memory pressure | Split into page ranges or batch smaller inputs |
Exit
- Success: output file exists, Markdown is formatted when applicable, and extracted structure is readable.
- Partial success: output exists but quality issues are reported explicitly.
- Failure: no reliable output is produced and the blocking cause is reported.
Logical Operations
Actions
| Action | SSL primitive | Evidence |
|---|---|---|
| Validate path and options | VALIDATE |
Input preflight in execution protocol |
| Probe text layer | READ |
Text preview extraction |
| Choose conversion strategy | SELECT |
Standard, tagged, or hybrid mode decision |
| Run converter | CALL_TOOL |
uvx opendataloader-pdf |
| Start OCR server | CALL_TOOL |
uvx opendataloader-pdf-hybrid |
| Write output artifact | WRITE |
Markdown, text, JSON, or HTML output |
| Normalize Markdown | CALL_TOOL |
uvx mdformat |
| Inspect extraction quality | VALIDATE |
Structure/readability verification |
| Report result | NOTIFY |
Final user-facing summary |
Tools and instruments
opendataloader-pdf: primary PDF extraction CLIopendataloader-pdf-hybrid: hybrid OCR and complex extraction pathmdformat: Markdown normalization- Filesystem commands such as
file,wc, orpdfinfomay be used for preflight when available
Canonical command path
uvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}"
uvx mdformat "{output_path}"
For scanned/image-based PDFs, start OCR first and then convert through hybrid mode:
uvx opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "{languages}"
uvx opendataloader-pdf --hybrid docling-fast "{input_path}" --format markdown --output-dir "{output_dir}"
Resource scope
| Scope | Resource target |
|---|---|
LOCAL_FS |
Input PDFs and generated output files |
PROCESS |
uvx subprocesses and optional hybrid server |
MEMORY |
Extracted previews and validation notes |
OTHER |
OCR model/runtime behavior inside hybrid mode |
Preconditions
- The input PDF path exists and is readable.
- The output location is writable or can be created.
- Required CLIs are available through
uvx. - OCR is only attempted when hybrid mode is available or can be started.
Effects and side effects
- Creates or overwrites extraction output depending on configuration and user intent.
- May start a local hybrid OCR server on the configured port.
- May consume significant CPU, memory, or time for large or scanned PDFs.
- Does not intentionally modify the source PDF.
Guardrails
- Do not invent missing content when extraction is incomplete.
- Always report garbled text, missing tables, OCR uncertainty, or partial extraction.
- Prefer standard conversion first when the text layer is readable.
- Use OCR only when the PDF is scanned, image-based, or standard extraction quality is insufficient.
- Keep detailed command sequences in
resources/execution-protocol.mdrather than duplicating every variant here.
References
- Execution protocol:
resources/execution-protocol.md - Configuration:
config/pdf-config.yaml - Context loading:
../_shared/core/context-loading.md - Quality principles:
../_shared/core/quality-principles.md
More from first-fluke/oh-my-agent
oma-qa
Quality assurance specialist for security, performance, accessibility, comprehensive testing, and quality standard alignment. Use for test, review, security audit, OWASP, coverage, lint work, and ISO/IEC 25010 or ISO/IEC 29119-aligned QA recommendations.
15oma-frontend
Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
13oma-pm
Product manager that decomposes requirements into actionable tasks with priorities and dependencies. Use for planning, requirements, specification, scope, prioritization, task breakdown, and ISO 21500, ISO 31000, or ISO 38500-aligned planning recommendations.
12oma-db
Database specialist for SQL, NoSQL, and vector database modeling, schema design, normalization, indexing, transactions, integrity, concurrency control, backup, capacity planning, data standards, anti-pattern review, and compliance-aware database design. Use for database, schema, ERD, table design, document model, vector index design, RAG retrieval architecture, migration, query tuning, glossary, capacity estimation, backup strategy, database anti-pattern remediation work, and ISO 27001, ISO 27002, or ISO 22301-aware database recommendations.
12oma-backend
Backend specialist for APIs, databases, authentication with clean architecture (Repository/Service/Router pattern). Use for API, endpoint, REST, database, server, migration, and auth work.
12oma-debug
Bug diagnosis and fixing specialist - analyzes errors, identifies root causes, provides fixes, and writes regression tests. Use for bug, debug, error, crash, traceback, exception, and regression work.
12