cloudflare-tomarkdown

Installation
SKILL.md

Cloudflare Markdown Conversion

Use this skill to convert URLs or local files (PDFs, Images, HTML, CSV, Office docs) into clean, structured Markdown for text analysis, RAG, and LLMs.

Features & Supported Formats

  • Scraping URLs: Extracts HTML, resolves relative links, handles JSON-LD, extracts title/description.
  • Images: Automatically runs object-detection and uses an LLM (gemma-3-12b-it) to generate image descriptions. Converts SVG to raster.
  • PDFs: Parses internal StructTree tagging for high-fidelity semantic Markdown extraction.
  • Office Docs: Supports .docx, .xlsx, .csv, .ods, .odt, and more.

Usage

Setup & Authentication

This skill requires CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN.

Automatic Setup: The script looks for credentials in the following order:

  1. Environment Variables: CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN.
  2. Local .env: A .env file in the current working directory (process.cwd()).
  3. Global Config: ~/.config/cloudflare-tomarkdown/.env (Standard Linux/macOS path).
  4. Global Fallback: ~/.cloudflare-tomarkdown/.env.

Manual Setup: You can also pass them as parameters:

node scripts/render.js --url "https://example.com" --account "your_id" --token "your_token"

Instruction for the Agent: If the skill fails due to missing credentials, advise the user to create a global config file at ~/.config/cloudflare-tomarkdown/.env.

Scraping a URL

# Basic usage (defaults to 'auto' method, trying AI parsing first, then browser rendering)
node scripts/render.js --url "https://example.com"

Scraping with Options (CSS Selectors, etc.)

Cloudflare allows filtering elements using cssSelector or providing a hostname.

# Only extract the main content container
node scripts/render.js --url "https://developer.cloudflare.com" \
  --options '{"html": {"cssSelector": "main.content"}}'

Converting a Local File (PDFs, Images, Office Docs)

node scripts/render.js --file "report.pdf"

Converting Images with Language Options

Image descriptions are generated via AI. You can specify a desired output language for the description (en, it, de, es, fr, pt).

node scripts/render.js --file "cat.jpeg" \
  --options '{"image": {"descriptionLanguage": "es"}}'

Advanced Options for JS-Heavy Sites

If a site requires complex JavaScript rendering or redirects, use the browser method with specific wait conditions.

# Wait for network to be idle before extracting content
node scripts/render.js --url "https://complex-site.com" --wait "networkidle2"

# Wait for a specific element to appear (e.g. price or main content)
node scripts/render.js --url "https://shop.com/prod" --selector ".product-price"

# Increase timeout for slow pages (in milliseconds)
node scripts/render.js --url "https://slow-site.com" --timeout 60000

Valid --wait options are: load, domcontentloaded (default), networkidle0, and networkidle2.

How It Works Intelligently

The --method auto capability tests two separate rendering paths:

  1. Workers AI tomarkdown (Primary): Ideal for documents, standard web pages, extracting JSON-LD structured data, and resolving standard HTML features. Uses multipart form data.
  2. Browser Rendering API (Fallback): If the page uses complex JavaScript (e.g. Single Page Apps) and the AI path cannot see the content, the Browser Rendering engine opens a headless real browser for accurate conversion.

Calling the REST API Directly (Advanced)

If you'd prefer not to use scripts/render.js, here is the curl equivalent for a local file using the tomarkdown REST API:

curl https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/ai/tomarkdown \
  -X POST \
  -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
  -F "files=@document.pdf" \
  -F 'conversionOptions={"pdf":{"metadata":false}}'

Note: For URLs, you should use curl to fetch the source to a local file first before uploading it as files=@<temp.html>. The tomarkdown REST API does not directly ingest a --data url="https...".

Related skills
Installs
10
GitHub Stars
1
First Seen
Mar 8, 2026