baoyu-url-to-markdown
URL to Markdown
Fetches any URL via baoyu-fetch CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.
User Input Tools
When this skill prompts the user, follow this tool-selection rule (priority order):
- Prefer built-in user-input tools exposed by the current agent runtime — e.g.,
AskUserQuestion,request_user_input,clarify,ask_user, or any equivalent. - Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
- Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.
Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.
CLI Setup
Important: The CLI source is vendored in {baseDir}/scripts/lib. scripts/package.json installs only third-party runtime dependencies.
Agent Execution Instructions:
- Determine this SKILL.md file's directory path as
{baseDir} - Resolve
${BUN}runtime: ifbuninstalled →bun; else suggest installing Bun - If
{baseDir}/scripts/node_modulesdoes not exist, run${BUN} install --cwd {baseDir}/scripts ${READER}={baseDir}/scripts/baoyu-fetch- Replace all
${READER}in this document with the resolved value
Preferences (EXTEND.md)
Check EXTEND.md in priority order — the first one found wins:
| Priority | Path | Scope |
|---|---|---|
| 1 | .baoyu-skills/baoyu-url-to-markdown/EXTEND.md |
Project |
| 2 | ${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md |
XDG |
| 3 | $HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md |
User home |
| Result | Action |
|---|---|
| Found | Read, parse, apply settings |
| Not found | MUST run first-time setup (see below) — do NOT silently create defaults |
EXTEND.md supports: download media by default, default output directory.
First-Time Setup ⛔ BLOCKING
When EXTEND.md is not found, you MUST use AskUserQuestion to gather preferences before creating EXTEND.md. NEVER create EXTEND.md with silent defaults. Generation is BLOCKED until setup completes. Batch all three questions into a single call:
- Q1 — Media (header "Media"): "How to handle images and videos in pages?"
- "Ask each time (Recommended)" — Prompt after each save
- "Always download" — Download to local
imgs/andvideos/ - "Never download" — Keep remote URLs
- Q2 — Output (header "Output"): "Default output directory?"
- "url-to-markdown (Recommended)" — Save to
./url-to-markdown/{domain}/{slug}.md - User may pick "Other" and type a custom path
- "url-to-markdown (Recommended)" — Save to
- Q3 — Save (header "Save"): "Where to save preferences?"
- "User (Recommended)" —
~/.baoyu-skills/(all projects) - "Project" —
.baoyu-skills/(this project only)
- "User (Recommended)" —
After answers, write EXTEND.md, confirm "Preferences saved to [path]", then continue.
Full template: references/config/first-time-setup.md.
Supported Keys
| Key | Default | Values | Description |
|---|---|---|---|
download_media |
ask |
ask / 1 / 0 |
ask = prompt each time, 1 = always, 0 = never |
default_output_dir |
empty | path or empty | Default output directory (empty = ./url-to-markdown/) |
EXTEND.md → CLI mapping:
| EXTEND.md key | CLI argument | Notes |
|---|---|---|
download_media: 1 |
--download-media |
Requires --output to be set |
default_output_dir: ./posts/ |
Agent constructs --output ./posts/{domain}/{slug}.md |
Agent generates path, not a direct flag |
Value priority: CLI arguments → EXTEND.md → skill defaults.
Usage
# Default: headless capture, markdown to stdout
${READER} <url>
# Save to file
${READER} <url> --output article.md
# Save with media download
${READER} <url> --output article.md --download-media
# Wait for interaction (login/CAPTCHA) — auto-detect and continue
${READER} <url> --wait-for interaction --output article.md
# Wait for interaction — manual control (Enter to continue)
${READER} <url> --wait-for force --output article.md
# JSON output
${READER} <url> --format json --output article.json
# Force specific adapter
${READER} <url> --adapter youtube --output transcript.md
Options
| Option | Description |
|---|---|
<url> |
URL to fetch |
--output <path> |
Output file path (default: stdout) |
--format <type> |
Output format: markdown (default) or json |
--json |
Shorthand for --format json |
--adapter <name> |
Force adapter: x, youtube, hn, or generic (default: auto-detect) |
--headless |
Force headless Chrome (no visible window) |
--wait-for <mode> |
Interaction wait mode: none (default), interaction, or force |
--wait-for-interaction |
Alias for --wait-for interaction |
--wait-for-login |
Alias for --wait-for interaction |
--timeout <ms> |
Page load timeout (default: 30000) |
--interaction-timeout <ms> |
Login/CAPTCHA wait timeout (default: 600000 = 10 min) |
--interaction-poll-interval <ms> |
Poll interval for interaction checks (default: 1500) |
--download-media |
Download images/videos to local imgs/ and videos/, rewrite markdown links. Requires --output |
--media-dir <dir> |
Base directory for downloaded media (default: same as --output directory) |
--cdp-url <url> |
Reuse existing Chrome DevTools Protocol endpoint |
--browser-path <path> |
Custom Chrome/Chromium binary path |
--chrome-profile-dir <path> |
Chrome user data directory (default: BAOYU_CHROME_PROFILE_DIR env or ./baoyu-skills/chrome-profile) |
--debug-dir <dir> |
Write debug artifacts (document.json, markdown.md, page.html, network.json) |
Agent Quality Gate
CRITICAL: treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without failing the CLI.
After every headless run, inspect the saved markdown. See references/quality-gate.md for the full checklist, recovery workflow, and capture-mode table. Read it whenever a run looks suspicious or the user asks about login/CAPTCHA handling.
Output Path Generation
The agent must construct the output file path — baoyu-fetch does not auto-generate paths.
Algorithm:
- Determine base directory from EXTEND.md
default_output_diror default./url-to-markdown/ - Extract domain from URL (e.g.,
example.com) - Generate slug from URL path or page title (kebab-case, 2-6 words)
- Construct:
{base_dir}/{domain}/{slug}/{slug}.md— each URL gets its own directory so media files stay isolated - Conflict resolution: append timestamp
{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md
Pass the constructed path to --output. Media files (--download-media) are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.
Adapters & Media
See references/adapters.md for the adapter catalog (X, YouTube, Hacker News, generic), per-adapter notes, the media download flow (ask / always / never), and the JSON output schema. Read it before answering adapter-specific questions or handling media prompts.
Environment Variables
| Variable | Description |
|---|---|
BAOYU_CHROME_PROFILE_DIR |
Chrome user data directory (can also use --chrome-profile-dir) |
Troubleshooting: Chrome not found → use --browser-path. Timeout → increase --timeout. Login/CAPTCHA → --wait-for interaction. Debug → --debug-dir to inspect captured HTML and network logs.
Extension Support
Custom configurations via EXTEND.md. See Preferences section above for paths and supported keys.
More from yelban/baoyu-skills.tw
baoyu-xhs-images
[Deprecated: use baoyu-image-cards] Generates Xiaohongshu (Little Red Book) image card series with 12 visual styles, 8 layouts, and 3 color palettes. Breaks content into 1-10 cartoon-style image cards optimized for XHS engagement. Use when user mentions \"小紅書圖片\", \"XHS images\", \"RedNote infographics\", \"小紅書種草\", \"小綠書\", \"微信圖文\", \"微信貼圖\", or wants social media infographic series for Chinese platforms.
30baoyu-image-gen
[Deprecated: use baoyu-imagine] AI image generation with OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
29baoyu-article-illustrator
Analyzes article structure, identifies positions requiring visual aids, generates illustrations with Type × Style × Palette three-dimension approach. Use when user asks to "illustrate article", "add images", "generate images for article", or "為文章配圖".
27baoyu-slide-deck
Generates professional slide deck images from content. Creates outlines with style instructions, then generates individual slide images. Use when user asks to "create slides", "make a presentation", "generate deck", "slide deck", or "PPT".
25baoyu-post-to-x
Posts content and articles to X (Twitter). Supports regular posts with images/videos and X Articles (long-form Markdown). Uses real Chrome with CDP to bypass anti-automation. Use when user asks to "post to X", "tweet", "publish to Twitter", or "share on X".
25baoyu-infographic
Generate professional infographics with 21 layout types and 22 visual styles. Analyzes content, recommends layout×style combinations, and generates publication-ready infographics. Use when user asks to create "infographic", "資訊圖", "visual summary", "視覺化", or "高密度資訊大圖".
25