Crab Catch
Crab Catch — Research Skill Orchestration
Prerequisites
CLAWBOT must run these commands at session start (not just read them):
which agent-browser || npm install -g agent-browser
agent-browser install
Data Sources
CLAWBOT is the research driver — responsible for analysis, judgment, iteration, and report generation. The APIs, scripts, and tools below are data-fetching methods only.
API Base URL: https://crab-skill.opsat.io
Signature Authentication
All API requests except /api/health require Crab signature headers.
- Run
node skills/scripts/crab-sign.jsonce at session start to get headers JSON. (First run auto-generates credentials; cached signature reused if still valid within 24h.) - Store the output and attach these four headers to all subsequent API requests:
X-Crab-Timestamp,X-Crab-Signature,X-Crab-Key,X-Crab-Address. No need to re-run the script for each request. - Only re-run with
--refreshif API returnsauth_expired.
Twitter & Social Data (see twitter-analysis/SKILL.md for full params)
Profile & content — who are they, what do they say:
| Research goal | Endpoint | Key params |
|---|---|---|
| User profile & stats | /api/twitter/user |
username |
| User's original posts | /api/twitter/tweets |
username, product |
| User's posts + replies | /api/twitter/replies |
username, product |
| Deleted tweets | /api/twitter/deleted-tweets |
username |
| Tweet long-form article | /api/readx/tweet-article |
tweet_id |
Engagement & spread — how is a tweet being received:
| Research goal | Endpoint | Key params |
|---|---|---|
| Full reply thread under a tweet | /api/readx/tweet-detail-conversation-v2 |
tweet_id, cursor |
| Who quoted this tweet (KOL amplification) | /api/readx/tweet-quotes |
tweet_id |
| Who retweeted (spread network) | /api/readx/tweet-retweeters |
tweet_id |
| Who liked (supporter profile) | /api/readx/tweet-favoriters |
tweet_id |
| Tweet detail with views/source | /api/readx/tweet-detail-v2 |
tweet_id |
| Batch fetch multiple tweets | /api/readx/tweet-results-by-ids |
tweet_ids |
Relationships & credibility — who follows/endorses who:
| Research goal | Endpoint | Key params |
|---|---|---|
| KOL followers of project | /api/twitter/kol-followers |
username |
| Verified (blue-check) followers | /api/readx/user-verified-followers |
user_id |
| Who the project follows (inner circle) | /api/readx/following-light |
username |
| Follower list | /api/readx/followers-light |
username |
| Mutual follow / relationship check | /api/readx/friendships-show |
source_screen_name, target_screen_name |
| Follow/unfollow events over time | /api/twitter/follower-events |
username, isFollow |
Search & discovery — find discussions, mentions, risk signals:
| Research goal | Endpoint | Key params |
|---|---|---|
| Structured search (filters) | /api/twitter/search |
keywords, fromUser, mentionUser, minLikes, minReplies... |
| Advanced search (Twitter syntax) | /api/readx/search2 |
q (e.g. "project" min_faves:100 -filter:replies) |
AI-powered comment analysis (see gork-analysis/SKILL.md):
| Research goal | Endpoint | Key params |
|---|---|---|
| Deep insight from tweet comments | /api/gork/analyze |
prompt (include tweet URL + question) |
Gork vs conversation-v2: Use
conversation-v2as the primary comment source (fast, raw data). Usegork/analyzeonly when reply threads need deeper AI interpretation (mixed sentiment, technical debates). Limit: max 2 Gork calls per research.
GitHub Code (see github-analysis/SKILL.md)
Local script skills/scripts/github_analyze.js — no external API.
convertToMarkdown(url, options) or analyzeRepository(url, options).
On-chain Data (see onchain-audit/SKILL.md)
Binance API (onchain) — address + chainName (uppercase: BSC/ETHEREUM/BASE/SOLANA):
| Endpoint | Description |
|---|---|
/api/onchain/audit |
Contract audit (Binance + Bitget dual-source) |
/api/onchain/token-info |
Token metadata and market dynamics |
/api/onchain/wallet |
Wallet positions (BSC/BASE/SOLANA only) |
/api/onchain/token-search |
Token search (requires keyword) |
Bitget API (onchain-2) — chain + contract (lowercase: bnb/eth/base/sol):
| Endpoint | Description |
|---|---|
/api/onchain-2/token-info |
Token details |
/api/onchain-2/token-price |
Token price |
/api/onchain-2/tx-info |
Transaction statistics |
/api/onchain-2/liquidity |
Liquidity pool info |
/api/onchain-2/security-audit |
Security audit |
Website Content (see agent-browser/SKILL.md)
CLAWBOT uses agent-browser CLI to open and inspect websites.
Primary method for fetching web page content — no API proxy needed.
Others
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check |
Language Preference
Output language matches the user's input language; default Chinese (zh-CN). Raw API data (usernames, tickers, addresses, code) stays in original form.
Orchestration Flow
User provides URL / Ticker / contract address + research intent
│
▼
Step 1 — Parse input, initialize entity queue
Extract all entities from user input:
Twitter links, GitHub repos, contract addresses, tickers, chain
Aggregator URLs → extract entities from path (see rules below)
Initialize entity queue:
entity_queue = [{ entity, depth: 0 } for each extracted entity]
processed = set()
MAX_DEPTH = 2 # prevent infinite recursion
│
▼
Step 2 — Collect raw intelligence (entity-driven loop)
Goal: maximize information density. Gather everything, filter later.
Every data source may discover NEW entities — feed them back into the queue.
┌──────────────────────────────────────────────────────────────┐
│ While entity_queue is not empty: │
│ │
│ { entity, depth } = queue.pop() │
│ if entity in processed: skip │
│ if depth > MAX_DEPTH: note in findings, do NOT process │
│ processed.add(entity) │
│ │
│ Route by entity type: │
│ URL → 2a. Website exploration │
│ Twitter → 2b. Social data collection │
│ GitHub → 2c. Code analysis │
│ Contract → 2d. On-chain analysis │
│ Ticker → 2d. On-chain token-search first │
│ │
│ After each source returns: │
│ Extract new entities from results │
│ Add to queue with depth: current_depth + 1 │
│ (see "Entity Discovery Rules" below) │
└──────────────────────────────────────────────────────────────┘
--- 2a. Website exploration ---
For clawhub.ai URLs: extract owner/repo → route to 2c (skip browser)
For other URLs — use agent-browser CLI:
# Open & orient
agent-browser open <url>
agent-browser wait --load networkidle
agent-browser get title # confirm page loaded
agent-browser get url # detect redirects
# 1. Landing page
agent-browser snapshot -i
agent-browser scroll down 2000
agent-browser snapshot -c
agent-browser screenshot --full
→ Extract: headline, key numbers, partner logos, CTA text
# 2. Docs / Whitepaper
agent-browser find text "Docs" click
agent-browser wait --load networkidle
agent-browser snapshot -c
agent-browser screenshot --full
agent-browser pdf docs.pdf
→ Look for: token distribution, vesting, supply mechanics
agent-browser back
# 3. Team / About
agent-browser find text "Team" click
agent-browser wait --load networkidle
agent-browser snapshot -c
agent-browser screenshot --full
→ Extract: names, titles, LinkedIn/Twitter links
→ Red flag: stock photos, no real identities, fully anonymous
agent-browser back
# 4. App / DApp (if exists)
agent-browser find text "App" click
agent-browser wait --load networkidle
agent-browser snapshot -i
→ DApp checks: UI renders? Real values or zeros? Core functions present?
→ Do NOT click "Connect Wallet" or sign anything
agent-browser screenshot --full
agent-browser back
# 5. Tokenomics
agent-browser find text "Token" click
agent-browser wait --load networkidle
agent-browser snapshot -c
agent-browser screenshot --full
→ Look for: distribution chart, unlock schedule, tax rates, contract addr
agent-browser back
# 6. Footer
agent-browser scroll down 9999
agent-browser snapshot -c
→ Extract: social links, legal info, company registration
agent-browser close
# Security check (during browsing):
SSL/TLS, domain age, redirect chains, suspicious popups,
wallet-connect on landing, obfuscated scripts
Flag issues; do NOT interact with suspicious wallet prompts
# Fallback
Page blank / Cloudflare → retry: agent-browser open <url> --headed
Geo-restricted → note in report, try alternative URLs
No website → skip, flag as risk signal
★ Extract new entities → add to queue (see Entity Discovery Rules below)
--- 2b. Social data collection ---
Layer 1 — Basic profile & content (parallel):
- /api/twitter/user → profile, follower count, account age
- /api/twitter/tweets (product:"Top") → highest engagement posts
- /api/twitter/tweets (product:"Latest") → most recent posts
- /api/twitter/replies → who they interact with
- /api/twitter/kol-followers → credible followers
- /api/twitter/deleted-tweets → removed content
- /api/twitter/follower-events → follow/unfollow patterns
Layer 2 — Deep engagement analysis (after Layer 1 returns tweets):
Pick the 2 most valuable tweets (highest engagement + most controversial):
For each tweet:
- /api/readx/tweet-detail-v2 → views, source, engagement metrics
(judge real reach: high views + low engagement = bot-inflated?)
- /api/readx/tweet-detail-conversation-v2 → full reply thread
(direct data, faster than Gork — use this as primary comment source)
- /api/readx/tweet-quotes → who quoted it (KOL amplification signal)
- /api/readx/tweet-retweeters → spread network
If tweets contain long-form content (Twitter Articles):
- /api/readx/tweet-article → extract full article text
(founders often publish roadmaps, postmortems, or announcements as articles)
If deleted-tweets returned tweet IDs:
- /api/readx/tweet-results-by-ids → batch fetch deleted tweet snapshots
(retrieve what was said before deletion — high-value intelligence)
Gork analysis (supplementary, max 2 calls):
Only use /api/gork/analyze when reply threads need deeper interpretation
(e.g. mixed sentiment, technical debates, insider claims that need synthesis).
If conversation-v2 already provides clear signal, skip Gork to save time.
Layer 3 — Relationship & reach analysis:
- /api/readx/following-light → who does the project follow back?
(reveals inner circle, partner accounts, team alt accounts)
- /api/readx/user-verified-followers → verified/blue-check followers
(requires `user_id` = `rest_id` from /api/twitter/user response, NOT username)
- /api/readx/friendships-show → verify relationships between
team members, KOLs, and project account (mutual follows?)
Layer 4 — Broader search & discovery:
- /api/twitter/search keywords:{project_name} → who's talking about it
- /api/twitter/search mentionUser:{username} → who mentions the project
- /api/readx/search2 q:"{project_name} min_faves:100" → high-impact discussions
- /api/readx/search2 q:"{project_name} scam OR rug OR hack" → risk signals
- /api/twitter/search keywords:{project_name} minReplies:20 → controversial threads
★ Extract new entities → add to queue (see Entity Discovery Rules below)
--- 2c. Code analysis ---
github-analysis → analyzeRepository / convertToMarkdown
Focus: tech stack, commit activity, code completeness, risk points
★ Extract new entities → add to queue (see Entity Discovery Rules below)
--- 2d. On-chain analysis ---
Binance + Bitget dual-source (parallel):
audit, token-info, wallet, liquidity, tx-info, security-audit
Cross-verify between sources when possible.
★ Extract new entities → add to queue (see Entity Discovery Rules below)
│
▼
Step 3 — Cross-reference & verify claims
Goal: find contradictions — they are the most valuable signals.
If verification needs data not yet collected, go back to Step 2 to fetch it.
Compare data across sources:
- Twitter says X vs website says Y vs on-chain shows Z
- Claimed team → search GitHub commit history, LinkedIn, past projects
- Claimed partnerships → check counterparty's official channels
- Claimed TVL/volume → compare with on-chain data
- Claimed audit → verify firm + report link existence and date
Website claims vs Code/On-chain verification:
| Website/Docs claim | Verify with | How |
|--------------------|-------------|-----|
| "Decentralized" | On-chain: ownership | Contract has pause/mint/blacklist? Owner is EOA or multisig? |
| "Audited by X" | Website + GitHub | Link valid? Deployed code matches audited version? |
| "Max supply N" | Code: mint function | Contract has uncapped mint()? Owner can mint? |
| "Deflationary/Burn"| Code: burn mechanism | burn() exists? Actually called on-chain? |
| "Locked liquidity" | On-chain: LP lock | Lock contract verified? Duration? Amount? |
| "Governance/DAO" | On-chain: governance | Proposals exist? Real votes or single-wallet? |
| "Open source" | GitHub: repo | Repo public? Code matches deployed bytecode? |
| "Multi-chain" | On-chain per chain | Contracts actually deployed on claimed chains? |
| "Partnerships" | Partner's channels | Partner acknowledges? Or one-sided claim? |
Priority: verify claims that affect user funds first (audit, liquidity, ownership).
If a claim cannot be verified with existing data → fetch missing data (Step 2).
Mark each claim: ✅ Verified / ⚠️ Unverified / ❌ Contradicted
│
▼
Step 4 — Deep dig (hypothesis-driven)
Goal: follow high-value leads that emerged from Steps 2-3.
For each significant finding, ask "what does this imply?":
- Team member found → trigger team member analysis (see below)
- Contract is upgradable → who holds the proxy admin? Is it a multisig?
- Large holder detected → where did their tokens come from? Deployer?
- Deleted tweets found → what did they say? Why deleted? Timing?
- GitHub inactive → is the project abandoned or is code closed-source?
- TVL mismatch → organic demand or incentivized/fake liquidity?
Team member analysis:
When a team member (founder, co-founder, CTO, etc.) is identified from
any source, their Twitter handle is already queued in Step 2 and will
be processed by 2b (Layer 1-4) automatically. Step 4 adds ADDITIONAL
analysis that goes beyond what the entity queue covers:
1. Cross-source identity verification:
- Does the Twitter profile match the website Team page claims?
- Does GitHub commit history match claimed expertise?
- /api/readx/friendships-show → do all team members follow each other?
(if they don't, are they really a team?)
2. History & reputation check:
- /api/readx/search2 q:"{name} founder OR CEO OR CTO" → past projects
- Did those projects succeed or fail/rug?
- On-chain (if wallet address known or linked):
→ /api/onchain/wallet → what tokens do they hold?
→ Check if their wallet deployed other contracts (pattern?)
→ Check fund flow between team wallet and project deployer
3. Red flags to synthesize:
- Account created same time as project (sockpuppet?)
- No history before this project (fabricated identity?)
- Past association with failed/rugged projects
- Identity claims don't match across sources
- Team members don't follow each other (fake team?)
- Following list is mostly bots or empty accounts
Proactive exploration patterns (only for NEW leads from Steps 2-3,
do NOT repeat searches already done in Step 2b Layer 4):
- Search for specific controversies discovered in Step 3:
/api/readx/search2 q:"{project} + {specific controversy keyword}"
- Search for team members' other projects and outcomes:
/api/readx/search2 q:"{member_name} founder OR CEO OR CTO"
- Check if contract deployer has deployed other tokens (pattern?)
- Look for on-chain connections between team wallets and exchanges
- Use Gork for deep interpretation when search results are ambiguous:
prompt = "discussions about {project} + {specific finding}"
If new high-value leads emerge → loop back to Step 2 (respecting MAX_DEPTH).
Stop when: no new high-value leads, or sufficient to form a judgment.
─── END OF DATA COLLECTION PHASE ───
Everything above is about gathering and verifying raw intelligence.
Everything below is about analysis and report generation.
│
▼
Step 5 — Distill & prioritize findings
Goal: compress raw intelligence into high-density insights.
From all collected data, select only what matters:
- Rank findings by impact (deal-breaker > important > nice-to-know)
- Discard noise: routine data that confirms nothing special
- Highlight contradictions and anomalies — these are the story
- Connect dots: A + B together imply C (CLAWBOT's analytical value)
- Identify information gaps: what couldn't be verified and why
- Reconstruct project timeline from all time-stamped data
This step is pure analysis — no new data fetching.
│
▼
Step 6 — Produce final research report
Write report using distilled findings from Step 5.
Use `REPORT_TEMPLATE.md` as the report structure.
Report should read as curated intelligence, not a data dump.
Language follows user input. Inline citations for all evidence.
Entity Discovery Rules
During Step 2, every data source may reveal new entities. Extract and queue them
with depth: current_depth + 1:
From website (2a):
| Found in | Entity type | Example | Action |
|---|---|---|---|
| Team / About page | Twitter handle | @john_dev |
→ queue as Twitter entity |
| Tokenomics page | Contract address | 0x1234... |
→ queue as Contract entity |
| Footer / Links | GitHub repo | github.com/org/repo |
→ queue as GitHub entity |
| Docs / Partners | Partner names | "partnered with X" |
→ note for search in Layer 4 |
From social data (2b):
| Found in | Entity type | Example | Action |
|---|---|---|---|
| Bio / tweets | Twitter handle | co-founder @jane |
→ queue as Twitter entity |
| Tweets | Contract address | CA: 0x5678... |
→ queue as Contract entity |
| Tweets | GitHub link | github.com/org/repo |
→ queue as GitHub entity |
| Reply threads (conversation-v2) | Person mention | insider says @whale_x |
→ queue as Twitter entity |
| Quote tweets | KOL handle | @kol quoted with commentary |
→ note who amplifies + stance |
| Following list | Inner circle account | project follows @alt_account |
→ queue as Twitter entity |
| KOL followers | Notable followers | @vitalik follows |
→ note for cross-reference |
| Deleted tweets | Tweet IDs | deleted tweet 123456 |
→ fetch via tweet-results-by-ids |
From code (2c):
| Found in | Entity type | Example | Action |
|---|---|---|---|
| Commit authors | GitHub/Twitter handle | author: dev123 |
→ note for cross-ref |
| Source code | Hardcoded address | admin = 0xABCD |
→ queue as Contract entity |
| Dependencies | Related repos | import from org/lib |
→ note for reference |
From on-chain (2d):
| Found in | Entity type | Example | Action |
|---|---|---|---|
| Contract data | Deployer address | deployer of contract | → check other deployments |
| Contract data | Admin / proxy | proxy admin, timelock | → queue as Contract entity |
| Token holders | Large holders | top 10 wallets | → note for pattern analysis |
| Liquidity | LP provider | LP creator address | → compare with deployer (insider?) |
Depth control:
- Depth 0: entities from user input
- Depth 1: entities discovered from depth-0 results (team members, mentioned contracts)
- Depth 2: entities discovered from depth-1 results (max depth, only follow high-value leads)
- Beyond depth 2: do NOT queue, only note in findings for manual follow-up
Failure Handling
| Failure type | Action |
|---|---|
| Timeout / 502 / 503 / 504 | Retry once after 3s (/api/gork/analyze: allow 120s before timeout) |
| 429 (rate limit) | Retry once after Retry-After or 10s |
| 401 / 403 / 400 | Do not retry; skip |
| Other errors | Do not retry; skip |
On failure: skip that data source, continue with remaining sources. Include a Data Coverage note in the report listing available/unavailable sources. Omit sections with no data; never halt the entire workflow for a single failure.
Entity Extraction Rules
| Entity Type | Identification |
|---|---|
| Twitter profile | x.com/{username} or twitter.com/{username} |
| Twitter post | x.com/{username}/status/{id} |
| GitHub repo | github.com/{owner}/{repo} |
| EVM contract | 0x + 40 hex chars |
| Solana address | base58 32–44 chars + contextual keywords (see below) |
| Ticker | $XXX or ticker/symbol/token: XXX |
| Chain attribution | URL domain / path keywords / page text keywords |
Solana Address Contextual Keywords
A base58 string is only identified as a Solana address when at least one contextual keyword is present in surrounding text, URL, or page content:
solana, sol, raydium, jupiter, orca, meteora, marinade,
tensor, magic eden, jito, pump.fun, moonshot, birdeye,
solscan, solana.fm, solanabeach, spl token, program id
If no keyword is found, flag as "unresolved address".
Aggregator URL Parsing
These URLs are parsed for entities from the path (not treated as official sites):
| Platform | Path format | Parsed result |
|---|---|---|
| clawhub.ai | /owner/repo |
→ repo (owner/repo) — use github-analysis directly, skip agent-browser |
| dexscreener.com | /chain/address |
→ contract + chain |
| dextools.io | /app/chain/pair/address |
→ contract + chain |
| pump.fun | /address |
→ Solana contract |
| gmgn.ai | /chain/address |
→ contract + chain |
| birdeye.so | /token/address |
→ contract |
| defined.fi | /chain/address |
→ contract + chain |
Data Display Rules
- API latency / performance metrics: If the data was not successfully fetched or the request returned an error, do not display API latency or performance data in the report. Only show latency data when it was actually measured successfully.
- Skip any metric that returned an error or timed out — leave it out entirely rather than showing "N/A" or error messages.
Local Memory & Report Storage
After generating the final research report, store a copy locally:
- Save the report as PDF to
~/.crab-catch/reports/{project_name}_{YYYY-MM-DD}.pdf - Maintain an index file
~/.crab-catch/reports/index.jsonwith entries:{ "project": "name", "date": "YYYY-MM-DD", "file": "filename.pdf", "entry": "original user input" } - This allows past research to be retrieved in future sessions.
Report Output
Use REPORT_TEMPLATE.md as the report structure, with the following constraints:
Section constraints
Must keep — always present, fixed order, follow template format:
- Header (project name + timestamp)
- 📌 Basic Information
- 🧠 Core Findings (with Executive Summary)
- 📝 Conclusion & Verdict
- 📂 References
Default keep — included by default; user can request to skip:
- 🛡️ Verification & Cross-Reference (Claim / Contradictions / Gaps)
- ⚠️ Risk Warning
Data-dependent — include if data available, skip entire subsection if not:
- 📊 Deep Dive
- 👤 Team & Key Figures (skip if no team info found)
- 💻 GitHub Analysis (skip if no repo)
- ⛓️ On-chain Security (skip if no contract)
- 📈 Social Signals (skip if no Twitter)
- 📅 Project Timeline (skip if insufficient time data)
Free — table row count, description text, signal count are flexible.
Formatting rules
- Inline citations:
[[N]](url)after every evidence claim - Numbers: K / M / B format; prices:
$prefix - Highlight high-risk signals (honeypot, high tax, upgradable contracts)
- Include Data Coverage note when sources were unavailable
- Append DYOR disclaimer
- Output language matches user input; default Chinese (zh-CN)