skills/agykit/agykit/llms-txt-crawler

llms-txt-crawler

SKILL.md

llms.txt Crawler Skill

This skill enables you to fetch llms.txt files from websites and crawl all pages listed within them. The llms.txt format is a standard way for websites to provide LLM-friendly content listings.

Overview

The llms.txt file typically follows this format:

# Site Name

## Section Name

- [Page Title](https://example.com/page.md): Description of the page
- [Another Page](https://example.com/another.md): Another description

This skill parses these files and downloads all linked content.

Usage

Basic Usage

Run the crawl script with a target URL:

cd /path/to/skills/llms-txt-crawler/scripts
npm install  # First time only
node crawl.js --url https://example.com

Command Line Options

Option Short Description Default
--url -u Base URL of the site with llms.txt Required
--output -o Output directory for crawled files ./output
--format -f Output format: md, json, or txt md
--delay -d Delay between requests in milliseconds 500
--concurrent -c Maximum concurrent requests 3

Examples

Crawl agentskills.io documentation:

node crawl.js --url https://agentskills.io --output ./agentskills-docs

Crawl with custom rate limiting:

node crawl.js --url https://example.com --delay 1000 --concurrent 2

Output as JSON:

node crawl.js --url https://example.com --format json

Output Structure

The script creates the following output structure:

output/
├── llms.txt              # Original llms.txt file
├── index.json            # Metadata about all crawled pages
└── pages/
    ├── page-1.md
    ├── page-2.md
    └── ...

Error Handling

  • Network errors: Retries up to 3 times with exponential backoff
  • Rate limiting: Respects delay settings between requests
  • Missing pages: Logs warnings but continues crawling other pages
  • Invalid URLs: Skips and logs invalid URLs

Integration Tips

When using this skill in an agent workflow:

  1. First run the crawler to download content
  2. The index.json file contains metadata about all pages
  3. Use the downloaded markdown files for context or analysis

See Also

Weekly Installs
11
Repository
agykit/agykit
First Seen
Jan 30, 2026
Installed on
opencode9
openclaw8
cursor7
claude-code6
codex6
gemini-cli5