blockchain-spider-toolkit
BlockchainSpider — on-chain data collection toolkit
Reference skill. This repository does not vendor BlockchainSpider; read the upstream README and docs for install, spiders, and options.
- Repository: github.com/wuzhy1ng/BlockchainSpider (MIT license)
- Stack: Python, Scrapy spiders, CSV/JSON outputs under
./databy default (paths per project config).
What it is for
Typical capabilities described in the project (confirm against current docs):
| Area | Examples |
|---|---|
| Transfer subgraph | Money-flow graph centered on a source address or transaction (e.g. txs.blockscan-style crawls). |
| EVM transactions | Block ranges, latest listener, receipts, logs, token transfers; multiple EVM-compatible providers. |
| Solana | Slot ranges or live streams via JSON-RPC providers. |
| Labels | Optional plugins (e.g. label-oriented crawls)—scope and ethics depend on source and law. |
Academic background appears in project references (e.g. TRacer / transaction semantics papers—see repo Reference section).
Prerequisites
- Python environment and
pip install -r requirements.txtfrom the cloned repo. - RPC / indexer API endpoints you are authorized to use (respect ToS, rate limits, and billing).
- API keys for third-party explorers (Etherscan-class APIs, etc.) must be supplied by you—never commit keys or paste live keys into chats.
Example command shapes (placeholders only)
Upstream examples use scrapy crawl <spider> -a .... Illustrative patterns (replace placeholders):
# EVM transfer / subgraph style (example spider name from upstream docs)
scrapy crawl txs.blockscan -a source=<ADDRESS> -a apikeys=<YOUR_ETHERSCAN_API_KEY> -a endpoint=<ETHERSCAN_COMPATIBLE_API_URL>
# EVM blocks / transactions over a range
scrapy crawl trans.block.evm -a start_blk=<N> -a end_blk=<M> -a providers=<YOUR_ETH_HTTP_RPC_URL>
# Solana slot range
scrapy crawl trans.block.solana -a start_slot=<S1> -a end_slot=<S2> -a providers=<YOUR_SOLANA_JSON_RPC_URL>
Exact spider names and arguments change with releases—always copy from the current README.
How to combine with blockint
| Task | Skill |
|---|---|
| High-level analytics / AML context | blockchain-analytics-operations |
| Solana forensic tracing methodology | solana-tracing-specialist |
| Multi-chain clustering | cross-chain-clustering-techniques-agent |
| Web surface crawling (HTTP), not chain RPC | katana-web-crawling |
Guardrails
- Lawful use only — comply with sanctions, privacy, and computer misuse rules in your jurisdiction; do not use spiders to harass or dox.
- Darknet / sensitive label sources — some demo commands in upstream docs point to Tor or sensitive data sources; obtain legal and security approval before running.
- Do not store or share API keys, customer identifiers, or non-public investigation exports in public repos.
- Outputs are raw or heuristic—validate critical facts against primary chain data.
Related research codebase
- mots-transaction-semantics — MoTS (WWW 2023 “Know Your Transactions”); upstream notes MoTS merged into BlockchainSpider—use MoTS skill for legacy spider names (
blocks.eth,blocks.semantic.eth,labels.action) and the bundled PDF.
Goal: a stable pointer and safe usage framing for BlockchainSpider inside blockint workflows.
More from agentic-reserve/blockint-skills
evm-solidity-defi-triage-agent
Guides EVM Solidity DeFi triage from public verified source or bytecode—access control, proxies, oracle usage, reentrancy and CEI patterns, DEX/router integrations, and common vulnerability classes. Use when the user asks for Ethereum or L2 smart contract security review, Solidity audit triage, OpenZeppelin proxy risks, or EVM-specific DeFi patterns—not for live exploits or private keys.
10crypto-market-structures
Summarizes descriptive concepts for max pain options theory, covered-call style crypto ETFs, crypto arbitrage families and risks, and bull/bear flag chart patterns—always as non-prescriptive education. Use when the user asks about max pain, premium income ETFs, arbitrage, funding rates, flash loans, or bull/bear flags in crypto trading context.
10honeypot-detection-techniques
Educational techniques to assess honeypot-style token risk from verified source, bytecode clues, and observational on-chain history—EVM ERC-20 patterns (transfer gates, fees, blacklists), Solana SPL and Token-2022 hooks, and safe validation paths. Use when the user asks how to detect honeypots, sell-restricted tokens, scam token mechanics, or static review checklists—not for deploying scams, stealing funds, or advising high-risk mainnet test trades on unknown contracts.
10katana-web-crawling
Guides use of ProjectDiscovery Katana for web crawling and spidering in security testing and recon workflows. Covers installation, standard vs headless mode, scope and rate limits, JSONL output, and piping from httpx or URL lists. Use when the user mentions Katana, projectdiscovery/katana, web crawling, spidering, endpoint discovery, attack surface mapping, or chaining crawlers in automation pipelines.
10solana-defi-vulnerability-analyst-agent
Guides discovery and documentation of Solana DeFi protocol risks from public code and chain state—Anchor/native programs, PDAs, CPIs, oracles, pools, SPL mechanics, and historical tx reconstruction. Use when the user asks for Solana program security review, DeFi vulnerability triage, PDA or CPI safety, oracle or liquidity-pool risk, launchpad/bonding-curve issues, or evidence-backed severity findings without exploits or private keys.
10solana-tracing-specialist
Guides Solana-specific on-chain forensics—ATA resolution, SPL instruction parsing, transaction history via RPC and indexers (e.g. Helius-style APIs), fund-flow graphs, Solana clustering heuristics, and program authority review. Use when the user investigates Solana wallets, SPL tokens, DEX/Jito flows, rug or phishing patterns on Solana, or needs evidence-structured tracing reports with public data only.
10