katana-web-crawling
Katana web crawling
Katana is a fast crawler/spider from ProjectDiscovery, aimed at automation pipelines (URLs in → discovered endpoints out). Official docs and flags: repository README and katana -h.
Scope and ethics
Use only on systems you own or are explicitly authorized to test (contract, bug bounty program rules, internal env). Crawl gently: set concurrency, rate limits, and depth to reduce load. Misuse can violate law and terms of service—you are responsible for your actions (tool ships with that warning).
Installation
Go (requires Go 1.25+ per upstream; verify current README if install fails):
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latest
Docker:
docker pull projectdiscovery/katana:latest
docker run projectdiscovery/katana:latest -u https://example.com
Headless in Docker often needs -system-chrome and Chrome/Chromium available—see upstream Docker section.
Input
- Single/multiple URLs:
-u https://a.comor comma-separated URLs - File:
-list urls.txt - STDIN:
echo https://example.com | katanaorcat domains | httpx | katana
Modes
| Mode | When |
|---|---|
| Standard (default) | Fast; uses Go HTTP client; no full JS/DOM render—may miss post-render routes |
Headless (-headless) |
Browser context; better for JS-heavy apps; optional -system-chrome |
Enable JS file parsing for more endpoints: -js-crawl (-jc). -jsluice is heavier.
Flags to know first
| Flag | Purpose |
|---|---|
-d, -depth |
Max crawl depth (default 3) |
-c, -concurrency |
Parallel fetchers |
-rl, -rate-limit |
Max requests per second |
-ct, -crawl-duration |
Cap total crawl time (e.g. 5m) |
-cs / -cos |
In-scope / out-of-scope URL regex |
-ns |
Disable default host scope if you need cross-host (use carefully) |
-iqp |
Ignore same path with different query strings |
-fs, -filter-similar |
Reduce near-duplicate paths |
-kf, -known-files |
robots.txt / sitemap.xml etc. (min depth 3 for full coverage per docs) |
-j, -jsonl |
JSONL output for scripting |
-o, -output |
Write to file |
-sr, -store-response |
Store HTTP for review (disk use) |
-proxy |
HTTP/SOCKS5 proxy |
-H |
Extra headers (auth, cookies) via header:value |
Run katana -h for the full list (filters, form fill, tech detect, TLS options, etc.).
Minimal examples
katana -u https://example.com -d 2 -silent
katana -u https://example.com -jsonl -o endpoints.jsonl
katana -list seeds.txt -d 3 -cs '.*\.example\.com.*' -rl 30 -jsonl
Headless (JS-heavy target):
katana -u https://example.com -headless -d 2
Pipelines
Common pattern: resolve live HTTP first, then crawl:
cat domains.txt | httpx -silent | katana -jsonl -o crawl.jsonl
Combine with other PD tools (naabu, nuclei, etc.) only in authorized assessments.
Troubleshooting
CGO_ENABLED=1required for go install per README.- Headless failures: try
-system-chrome, ensure Chrome/Chromium installed, or use Docker image with documented Chrome setup. - Health check:
-health-check/-hc.
References
- Source and releases: github.com/projectdiscovery/katana
More from agentic-reserve/blockint-skills
evm-solidity-defi-triage-agent
Guides EVM Solidity DeFi triage from public verified source or bytecode—access control, proxies, oracle usage, reentrancy and CEI patterns, DEX/router integrations, and common vulnerability classes. Use when the user asks for Ethereum or L2 smart contract security review, Solidity audit triage, OpenZeppelin proxy risks, or EVM-specific DeFi patterns—not for live exploits or private keys.
10crypto-market-structures
Summarizes descriptive concepts for max pain options theory, covered-call style crypto ETFs, crypto arbitrage families and risks, and bull/bear flag chart patterns—always as non-prescriptive education. Use when the user asks about max pain, premium income ETFs, arbitrage, funding rates, flash loans, or bull/bear flags in crypto trading context.
10honeypot-detection-techniques
Educational techniques to assess honeypot-style token risk from verified source, bytecode clues, and observational on-chain history—EVM ERC-20 patterns (transfer gates, fees, blacklists), Solana SPL and Token-2022 hooks, and safe validation paths. Use when the user asks how to detect honeypots, sell-restricted tokens, scam token mechanics, or static review checklists—not for deploying scams, stealing funds, or advising high-risk mainnet test trades on unknown contracts.
10solana-defi-vulnerability-analyst-agent
Guides discovery and documentation of Solana DeFi protocol risks from public code and chain state—Anchor/native programs, PDAs, CPIs, oracles, pools, SPL mechanics, and historical tx reconstruction. Use when the user asks for Solana program security review, DeFi vulnerability triage, PDA or CPI safety, oracle or liquidity-pool risk, launchpad/bonding-curve issues, or evidence-backed severity findings without exploits or private keys.
10solana-tracing-specialist
Guides Solana-specific on-chain forensics—ATA resolution, SPL instruction parsing, transaction history via RPC and indexers (e.g. Helius-style APIs), fund-flow graphs, Solana clustering heuristics, and program authority review. Use when the user investigates Solana wallets, SPL tokens, DEX/Jito flows, rug or phishing patterns on Solana, or needs evidence-structured tracing reports with public data only.
10risk-exposure-screening-concepts
Educational map of risk exposure screening—typical risk indicator taxonomies, exposure value and percentage, address-level vs transaction-level engines, and common template families (entity label, multi-hop interaction, blacklist). Use when the user asks how commercial screening tools reason about labeled addresses, tainted flows, or deposit vs withdrawal checks—not for legal sanctions determinations or substituting a vendor’s live rules.
10