Cloudflare Markdown Conversion

Use this skill to convert URLs or local files (PDFs, Images, HTML, CSV, Office docs) into clean, structured Markdown for text analysis, RAG, and LLMs.

Features & Supported Formats

Scraping URLs: Extracts HTML, resolves relative links, handles JSON-LD, extracts title/description.
Images: Automatically runs object-detection and uses an LLM (gemma-3-12b-it) to generate image descriptions. Converts SVG to raster.
PDFs: Parses internal StructTree tagging for high-fidelity semantic Markdown extraction.
Office Docs: Supports .docx, .xlsx, .csv, .ods, .odt, and more.

Usage

Setup & Authentication

This skill requires CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN.

Automatic Setup: The script looks for credentials in the following order:

Environment Variables: CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN.
Local .env: A .env file in the current working directory (process.cwd()).
Global Config: ~/.config/cloudflare-tomarkdown/.env (Standard Linux/macOS path).
Global Fallback: ~/.cloudflare-tomarkdown/.env.

Manual Setup: You can also pass them as parameters:

node scripts/render.js --url "https://example.com" --account "your_id" --token "your_token"

Instruction for the Agent: If the skill fails due to missing credentials, advise the user to create a global config file at ~/.config/cloudflare-tomarkdown/.env.

Scraping a URL

# Basic usage (defaults to 'auto' method, trying AI parsing first, then browser rendering)
node scripts/render.js --url "https://example.com"

Scraping with Options (CSS Selectors, etc.)

Cloudflare allows filtering elements using cssSelector or providing a hostname.

# Only extract the main content container
node scripts/render.js --url "https://developer.cloudflare.com" \
  --options '{"html": {"cssSelector": "main.content"}}'

Converting a Local File (PDFs, Images, Office Docs)

node scripts/render.js --file "report.pdf"

Converting Images with Language Options

Image descriptions are generated via AI. You can specify a desired output language for the description (en, it, de, es, fr, pt).

node scripts/render.js --file "cat.jpeg" \
  --options '{"image": {"descriptionLanguage": "es"}}'

Advanced Options for JS-Heavy Sites

If a site requires complex JavaScript rendering or redirects, use the browser method with specific wait conditions.

# Wait for network to be idle before extracting content
node scripts/render.js --url "https://complex-site.com" --wait "networkidle2"

# Wait for a specific element to appear (e.g. price or main content)
node scripts/render.js --url "https://shop.com/prod" --selector ".product-price"

# Increase timeout for slow pages (in milliseconds)
node scripts/render.js --url "https://slow-site.com" --timeout 60000

Valid --wait options are: load, domcontentloaded (default), networkidle0, and networkidle2.

How It Works Intelligently

The --method auto capability tests two separate rendering paths:

Workers AI tomarkdown (Primary): Ideal for documents, standard web pages, extracting JSON-LD structured data, and resolving standard HTML features. Uses multipart form data.
Browser Rendering API (Fallback): If the page uses complex JavaScript (e.g. Single Page Apps) and the AI path cannot see the content, the Browser Rendering engine opens a headless real browser for accurate conversion.

Calling the REST API Directly (Advanced)

If you'd prefer not to use scripts/render.js, here is the curl equivalent for a local file using the tomarkdown REST API:

curl https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/ai/tomarkdown \
  -X POST \
  -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
  -F "files=@document.pdf" \
  -F 'conversionOptions={"pdf":{"metadata":false}}'

Note: For URLs, you should use curl to fetch the source to a local file first before uploading it as files=@<temp.html>. The tomarkdown REST API does not directly ingest a --data url="https...".

cloudflare-tomarkdown