extract-slide-text
Extract slide text from PDF
Run the extract_slide_text.py script to extract the text content of each PDF page into a structured markdown file:
uv run .agents/skills/extract-slide-text/extract_slide_text.py <pdf_path> <output_path> [images_dir]
Arguments
pdf_path(required): Path to the PDF file.output_path(required): Path to write the output markdown fileimages_dir(optional): Path to the slide images directory. Used to generate correct relative image references. Defaults toslide_images/.
Output format
A markdown file with one section per slide:
## Slide 1

\```
Extracted text content from slide 1
\```
## Slide 2

\```
Extracted text content from slide 2
\```
Pages with no extractable text (e.g., full-bleed images) show (no extractable text).
Why this matters
PDF text extraction is deterministic — it produces ground-truth slide content without relying on vision models. This prevents misidentification of embedded screenshots or demo captures as actual slide content, a common failure mode when using only image-based slide analysis.
Prerequisites
Poppler utilities must be installed (provides the pdftotext command):
- macOS:
brew install poppler - Ubuntu:
apt-get install poppler-utils
More from pamelafox/presentation-skills
pdf-to-markdown
Converts PDF files to Markdown using Microsoft's markitdown package. Use this skill when the user asks to convert a PDF to Markdown, extract text from a PDF, or read/parse PDF content.
17generate-writeup
>-
15review-presentation
Use this to review slides for accuracy
15fetch-slides
>-
14outline-slides
>-
14convert-slides-to-images
>-
14