Crab Catch — Research Skill Orchestration

Prerequisites

CLAWBOT must run these commands at session start (not just read them):

which agent-browser || npm install -g agent-browser
agent-browser install

Data Sources

CLAWBOT is the research driver — responsible for analysis, judgment, iteration, and report generation. The APIs, scripts, and tools below are data-fetching methods only.

API Base URL: https://crab-skill.opsat.io

Signature Authentication

All API requests except /api/health require Crab signature headers.

Run node skills/scripts/crab-sign.js once at session start to get headers JSON. (First run auto-generates credentials; cached signature reused if still valid within 24h.)
Store the output and attach these four headers to all subsequent API requests: X-Crab-Timestamp, X-Crab-Signature, X-Crab-Key, X-Crab-Address. No need to re-run the script for each request.
Only re-run with --refresh if API returns auth_expired.

Twitter & Social Data (see `twitter-analysis/SKILL.md` for full params)

Profile & content — who are they, what do they say:

Research goal	Endpoint	Key params
User profile & stats	`/api/twitter/user`	`username`
User's original posts	`/api/twitter/tweets`	`username`, `product`
User's posts + replies	`/api/twitter/replies`	`username`, `product`
Deleted tweets	`/api/twitter/deleted-tweets`	`username`
Tweet long-form article	`/api/readx/tweet-article`	`tweet_id`

Engagement & spread — how is a tweet being received:

Research goal	Endpoint	Key params
Full reply thread under a tweet	`/api/readx/tweet-detail-conversation-v2`	`tweet_id`, `cursor`
Who quoted this tweet (KOL amplification)	`/api/readx/tweet-quotes`	`tweet_id`
Who retweeted (spread network)	`/api/readx/tweet-retweeters`	`tweet_id`
Who liked (supporter profile)	`/api/readx/tweet-favoriters`	`tweet_id`
Tweet detail with views/source	`/api/readx/tweet-detail-v2`	`tweet_id`
Batch fetch multiple tweets	`/api/readx/tweet-results-by-ids`	`tweet_ids`

Relationships & credibility — who follows/endorses who:

Research goal	Endpoint	Key params
KOL followers of project	`/api/twitter/kol-followers`	`username`
Verified (blue-check) followers	`/api/readx/user-verified-followers`	`user_id`
Who the project follows (inner circle)	`/api/readx/following-light`	`username`
Follower list	`/api/readx/followers-light`	`username`
Mutual follow / relationship check	`/api/readx/friendships-show`	`source_screen_name`, `target_screen_name`
Follow/unfollow events over time	`/api/twitter/follower-events`	`username`, `isFollow`

Search & discovery — find discussions, mentions, risk signals:

Research goal	Endpoint	Key params
Structured search (filters)	`/api/twitter/search`	`keywords`, `fromUser`, `mentionUser`, `minLikes`, `minReplies`...
Advanced search (Twitter syntax)	`/api/readx/search2`	`q` (e.g. `"project" min_faves:100 -filter:replies`)

AI-powered comment analysis (see gork-analysis/SKILL.md):

Research goal	Endpoint	Key params
Deep insight from tweet comments	`/api/gork/analyze`	`prompt` (include tweet URL + question)

Gork vs conversation-v2: Use conversation-v2 as the primary comment source (fast, raw data). Use gork/analyze only when reply threads need deeper AI interpretation (mixed sentiment, technical debates). Limit: max 2 Gork calls per research.

GitHub Code (see `github-analysis/SKILL.md`)

Local script skills/scripts/github_analyze.js — no external API. convertToMarkdown(url, options) or analyzeRepository(url, options).

On-chain Data (see `onchain-audit/SKILL.md`)

Binance API (onchain) — address + chainName (uppercase: BSC/ETHEREUM/BASE/SOLANA):

Endpoint	Description
`/api/onchain/audit`	Contract audit (Binance + Bitget dual-source)
`/api/onchain/token-info`	Token metadata and market dynamics
`/api/onchain/wallet`	Wallet positions (BSC/BASE/SOLANA only)
`/api/onchain/token-search`	Token search (requires `keyword`)

Bitget API (onchain-2) — chain + contract (lowercase: bnb/eth/base/sol):

Endpoint	Description
`/api/onchain-2/token-info`	Token details
`/api/onchain-2/token-price`	Token price
`/api/onchain-2/tx-info`	Transaction statistics
`/api/onchain-2/liquidity`	Liquidity pool info
`/api/onchain-2/security-audit`	Security audit

Website Content (see `agent-browser/SKILL.md`)

CLAWBOT uses agent-browser CLI to open and inspect websites. Primary method for fetching web page content — no API proxy needed.

Others

Endpoint	Method	Description
`/api/health`	GET	Health check

Language Preference

Output language matches the user's input language; default Chinese (zh-CN). Raw API data (usernames, tickers, addresses, code) stays in original form.

Orchestration Flow

User provides URL / Ticker / contract address + research intent
  │
  ▼
Step 1 — Parse input, initialize entity queue
  Extract all entities from user input:
    Twitter links, GitHub repos, contract addresses, tickers, chain
    Aggregator URLs → extract entities from path (see rules below)

  Initialize entity queue:
    entity_queue = [{ entity, depth: 0 } for each extracted entity]
    processed    = set()
    MAX_DEPTH    = 2   # prevent infinite recursion
  │
  ▼
Step 2 — Collect raw intelligence (entity-driven loop)
  Goal: maximize information density. Gather everything, filter later.
  Every data source may discover NEW entities — feed them back into the queue.

  ┌──────────────────────────────────────────────────────────────┐
  │ While entity_queue is not empty:                             │
  │                                                              │
  │   { entity, depth } = queue.pop()                            │
  │   if entity in processed: skip                               │
  │   if depth > MAX_DEPTH: note in findings, do NOT process     │
  │   processed.add(entity)                                      │
  │                                                              │
  │   Route by entity type:                                      │
  │     URL      → 2a. Website exploration                       │
  │     Twitter  → 2b. Social data collection                    │
  │     GitHub   → 2c. Code analysis                             │
  │     Contract → 2d. On-chain analysis                         │
  │     Ticker   → 2d. On-chain token-search first               │
  │                                                              │
  │   After each source returns:                                 │
  │     Extract new entities from results                        │
  │     Add to queue with depth: current_depth + 1               │
  │     (see "Entity Discovery Rules" below)                     │
  └──────────────────────────────────────────────────────────────┘

  --- 2a. Website exploration ---

  For clawhub.ai URLs: extract owner/repo → route to 2c (skip browser)
  For other URLs — use agent-browser CLI:

      # Open & orient
      agent-browser open <url>
      agent-browser wait --load networkidle
      agent-browser get title                    # confirm page loaded
      agent-browser get url                      # detect redirects

      # 1. Landing page
      agent-browser snapshot -i
      agent-browser scroll down 2000
      agent-browser snapshot -c
      agent-browser screenshot --full
      → Extract: headline, key numbers, partner logos, CTA text

      # 2. Docs / Whitepaper
      agent-browser find text "Docs" click
      agent-browser wait --load networkidle
      agent-browser snapshot -c
      agent-browser screenshot --full
      agent-browser pdf docs.pdf
      → Look for: token distribution, vesting, supply mechanics
      agent-browser back

      # 3. Team / About
      agent-browser find text "Team" click
      agent-browser wait --load networkidle
      agent-browser snapshot -c
      agent-browser screenshot --full
      → Extract: names, titles, LinkedIn/Twitter links
      → Red flag: stock photos, no real identities, fully anonymous
      agent-browser back

      # 4. App / DApp (if exists)
      agent-browser find text "App" click
      agent-browser wait --load networkidle
      agent-browser snapshot -i
      → DApp checks: UI renders? Real values or zeros? Core functions present?
      → Do NOT click "Connect Wallet" or sign anything
      agent-browser screenshot --full
      agent-browser back

      # 5. Tokenomics
      agent-browser find text "Token" click
      agent-browser wait --load networkidle
      agent-browser snapshot -c
      agent-browser screenshot --full
      → Look for: distribution chart, unlock schedule, tax rates, contract addr
      agent-browser back

      # 6. Footer
      agent-browser scroll down 9999
      agent-browser snapshot -c
      → Extract: social links, legal info, company registration

      agent-browser close

      # Security check (during browsing):
      SSL/TLS, domain age, redirect chains, suspicious popups,
      wallet-connect on landing, obfuscated scripts
      Flag issues; do NOT interact with suspicious wallet prompts

      # Fallback
      Page blank / Cloudflare → retry: agent-browser open <url> --headed
      Geo-restricted → note in report, try alternative URLs
      No website → skip, flag as risk signal

      ★ Extract new entities → add to queue (see Entity Discovery Rules below)

  --- 2b. Social data collection ---

  Layer 1 — Basic profile & content (parallel):
    - /api/twitter/user → profile, follower count, account age
    - /api/twitter/tweets (product:"Top") → highest engagement posts
    - /api/twitter/tweets (product:"Latest") → most recent posts
    - /api/twitter/replies → who they interact with
    - /api/twitter/kol-followers → credible followers
    - /api/twitter/deleted-tweets → removed content
    - /api/twitter/follower-events → follow/unfollow patterns

  Layer 2 — Deep engagement analysis (after Layer 1 returns tweets):
    Pick the 2 most valuable tweets (highest engagement + most controversial):

    For each tweet:
      - /api/readx/tweet-detail-v2 → views, source, engagement metrics
        (judge real reach: high views + low engagement = bot-inflated?)
      - /api/readx/tweet-detail-conversation-v2 → full reply thread
        (direct data, faster than Gork — use this as primary comment source)
      - /api/readx/tweet-quotes → who quoted it (KOL amplification signal)
      - /api/readx/tweet-retweeters → spread network

    If tweets contain long-form content (Twitter Articles):
      - /api/readx/tweet-article → extract full article text
        (founders often publish roadmaps, postmortems, or announcements as articles)

    If deleted-tweets returned tweet IDs:
      - /api/readx/tweet-results-by-ids → batch fetch deleted tweet snapshots
        (retrieve what was said before deletion — high-value intelligence)

    Gork analysis (supplementary, max 2 calls):
      Only use /api/gork/analyze when reply threads need deeper interpretation
      (e.g. mixed sentiment, technical debates, insider claims that need synthesis).
      If conversation-v2 already provides clear signal, skip Gork to save time.

  Layer 3 — Relationship & reach analysis:
    - /api/readx/following-light → who does the project follow back?
      (reveals inner circle, partner accounts, team alt accounts)
    - /api/readx/user-verified-followers → verified/blue-check followers
      (requires `user_id` = `rest_id` from /api/twitter/user response, NOT username)
    - /api/readx/friendships-show → verify relationships between
      team members, KOLs, and project account (mutual follows?)

  Layer 4 — Broader search & discovery:
    - /api/twitter/search keywords:{project_name} → who's talking about it
    - /api/twitter/search mentionUser:{username} → who mentions the project
    - /api/readx/search2 q:"{project_name} min_faves:100" → high-impact discussions
    - /api/readx/search2 q:"{project_name} scam OR rug OR hack" → risk signals
    - /api/twitter/search keywords:{project_name} minReplies:20 → controversial threads

  ★ Extract new entities → add to queue (see Entity Discovery Rules below)

  --- 2c. Code analysis ---

  github-analysis → analyzeRepository / convertToMarkdown
  Focus: tech stack, commit activity, code completeness, risk points

  ★ Extract new entities → add to queue (see Entity Discovery Rules below)

  --- 2d. On-chain analysis ---

  Binance + Bitget dual-source (parallel):
    audit, token-info, wallet, liquidity, tx-info, security-audit
  Cross-verify between sources when possible.

  ★ Extract new entities → add to queue (see Entity Discovery Rules below)

  │
  ▼
Step 3 — Cross-reference & verify claims
  Goal: find contradictions — they are the most valuable signals.
  If verification needs data not yet collected, go back to Step 2 to fetch it.

  Compare data across sources:
    - Twitter says X vs website says Y vs on-chain shows Z
    - Claimed team → search GitHub commit history, LinkedIn, past projects
    - Claimed partnerships → check counterparty's official channels
    - Claimed TVL/volume → compare with on-chain data
    - Claimed audit → verify firm + report link existence and date

  Website claims vs Code/On-chain verification:

    | Website/Docs claim | Verify with | How |
    |--------------------|-------------|-----|
    | "Decentralized"    | On-chain: ownership | Contract has pause/mint/blacklist? Owner is EOA or multisig? |
    | "Audited by X"     | Website + GitHub | Link valid? Deployed code matches audited version? |
    | "Max supply N"     | Code: mint function | Contract has uncapped mint()? Owner can mint? |
    | "Deflationary/Burn"| Code: burn mechanism | burn() exists? Actually called on-chain? |
    | "Locked liquidity" | On-chain: LP lock | Lock contract verified? Duration? Amount? |
    | "Governance/DAO"   | On-chain: governance | Proposals exist? Real votes or single-wallet? |
    | "Open source"      | GitHub: repo | Repo public? Code matches deployed bytecode? |
    | "Multi-chain"      | On-chain per chain | Contracts actually deployed on claimed chains? |
    | "Partnerships"     | Partner's channels | Partner acknowledges? Or one-sided claim? |

    Priority: verify claims that affect user funds first (audit, liquidity, ownership).
    If a claim cannot be verified with existing data → fetch missing data (Step 2).

  Mark each claim: ✅ Verified / ⚠️ Unverified / ❌ Contradicted
  │
  ▼
Step 4 — Deep dig (hypothesis-driven)
  Goal: follow high-value leads that emerged from Steps 2-3.

  For each significant finding, ask "what does this imply?":
    - Team member found → trigger team member analysis (see below)
    - Contract is upgradable → who holds the proxy admin? Is it a multisig?
    - Large holder detected → where did their tokens come from? Deployer?
    - Deleted tweets found → what did they say? Why deleted? Timing?
    - GitHub inactive → is the project abandoned or is code closed-source?
    - TVL mismatch → organic demand or incentivized/fake liquidity?

  Team member analysis:
    When a team member (founder, co-founder, CTO, etc.) is identified from
    any source, their Twitter handle is already queued in Step 2 and will
    be processed by 2b (Layer 1-4) automatically. Step 4 adds ADDITIONAL
    analysis that goes beyond what the entity queue covers:

    1. Cross-source identity verification:
       - Does the Twitter profile match the website Team page claims?
       - Does GitHub commit history match claimed expertise?
       - /api/readx/friendships-show → do all team members follow each other?
         (if they don't, are they really a team?)

    2. History & reputation check:
       - /api/readx/search2 q:"{name} founder OR CEO OR CTO" → past projects
       - Did those projects succeed or fail/rug?
       - On-chain (if wallet address known or linked):
         → /api/onchain/wallet → what tokens do they hold?
         → Check if their wallet deployed other contracts (pattern?)
         → Check fund flow between team wallet and project deployer

    3. Red flags to synthesize:
       - Account created same time as project (sockpuppet?)
       - No history before this project (fabricated identity?)
       - Past association with failed/rugged projects
       - Identity claims don't match across sources
       - Team members don't follow each other (fake team?)
       - Following list is mostly bots or empty accounts

  Proactive exploration patterns (only for NEW leads from Steps 2-3,
    do NOT repeat searches already done in Step 2b Layer 4):
    - Search for specific controversies discovered in Step 3:
        /api/readx/search2 q:"{project} + {specific controversy keyword}"
    - Search for team members' other projects and outcomes:
        /api/readx/search2 q:"{member_name} founder OR CEO OR CTO"
    - Check if contract deployer has deployed other tokens (pattern?)
    - Look for on-chain connections between team wallets and exchanges
    - Use Gork for deep interpretation when search results are ambiguous:
        prompt = "discussions about {project} + {specific finding}"

  If new high-value leads emerge → loop back to Step 2 (respecting MAX_DEPTH).
  Stop when: no new high-value leads, or sufficient to form a judgment.

  ─── END OF DATA COLLECTION PHASE ───
  Everything above is about gathering and verifying raw intelligence.
  Everything below is about analysis and report generation.
  │
  ▼
Step 5 — Distill & prioritize findings
  Goal: compress raw intelligence into high-density insights.

  From all collected data, select only what matters:
    - Rank findings by impact (deal-breaker > important > nice-to-know)
    - Discard noise: routine data that confirms nothing special
    - Highlight contradictions and anomalies — these are the story
    - Connect dots: A + B together imply C (CLAWBOT's analytical value)
    - Identify information gaps: what couldn't be verified and why
    - Reconstruct project timeline from all time-stamped data

  This step is pure analysis — no new data fetching.
  │
  ▼
Step 6 — Produce final research report
  Write report using distilled findings from Step 5.
  Use `REPORT_TEMPLATE.md` as the report structure.
  Report should read as curated intelligence, not a data dump.
  Language follows user input. Inline citations for all evidence.

Entity Discovery Rules

During Step 2, every data source may reveal new entities. Extract and queue them with depth: current_depth + 1:

From website (2a):

Found in	Entity type	Example	Action
Team / About page	Twitter handle	`@john_dev`	→ queue as Twitter entity
Tokenomics page	Contract address	`0x1234...`	→ queue as Contract entity
Footer / Links	GitHub repo	`github.com/org/repo`	→ queue as GitHub entity
Docs / Partners	Partner names	`"partnered with X"`	→ note for search in Layer 4

From social data (2b):

Found in	Entity type	Example	Action
Bio / tweets	Twitter handle	`co-founder @jane`	→ queue as Twitter entity
Tweets	Contract address	`CA: 0x5678...`	→ queue as Contract entity
Tweets	GitHub link	`github.com/org/repo`	→ queue as GitHub entity
Reply threads (conversation-v2)	Person mention	`insider says @whale_x`	→ queue as Twitter entity
Quote tweets	KOL handle	`@kol quoted with commentary`	→ note who amplifies + stance
Following list	Inner circle account	`project follows @alt_account`	→ queue as Twitter entity
KOL followers	Notable followers	`@vitalik follows`	→ note for cross-reference
Deleted tweets	Tweet IDs	`deleted tweet 123456`	→ fetch via tweet-results-by-ids

From code (2c):

Found in	Entity type	Example	Action
Commit authors	GitHub/Twitter handle	`author: dev123`	→ note for cross-ref
Source code	Hardcoded address	`admin = 0xABCD`	→ queue as Contract entity
Dependencies	Related repos	`import from org/lib`	→ note for reference

From on-chain (2d):

Found in	Entity type	Example	Action
Contract data	Deployer address	deployer of contract	→ check other deployments
Contract data	Admin / proxy	proxy admin, timelock	→ queue as Contract entity
Token holders	Large holders	top 10 wallets	→ note for pattern analysis
Liquidity	LP provider	LP creator address	→ compare with deployer (insider?)

Depth control:

Depth 0: entities from user input
Depth 1: entities discovered from depth-0 results (team members, mentioned contracts)
Depth 2: entities discovered from depth-1 results (max depth, only follow high-value leads)
Beyond depth 2: do NOT queue, only note in findings for manual follow-up

Failure Handling

Failure type	Action
Timeout / 502 / 503 / 504	Retry once after 3s (`/api/gork/analyze`: allow 120s before timeout)
429 (rate limit)	Retry once after `Retry-After` or 10s
401 / 403 / 400	Do not retry; skip
Other errors	Do not retry; skip

On failure: skip that data source, continue with remaining sources. Include a Data Coverage note in the report listing available/unavailable sources. Omit sections with no data; never halt the entire workflow for a single failure.

Entity Extraction Rules

Entity Type	Identification
Twitter profile	`x.com/{username}` or `twitter.com/{username}`
Twitter post	`x.com/{username}/status/{id}`
GitHub repo	`github.com/{owner}/{repo}`
EVM contract	`0x` + 40 hex chars
Solana address	base58 32–44 chars + contextual keywords (see below)
Ticker	`$XXX` or `ticker/symbol/token: XXX`
Chain attribution	URL domain / path keywords / page text keywords

Solana Address Contextual Keywords

A base58 string is only identified as a Solana address when at least one contextual keyword is present in surrounding text, URL, or page content:

solana, sol, raydium, jupiter, orca, meteora, marinade, tensor, magic eden, jito, pump.fun, moonshot, birdeye, solscan, solana.fm, solanabeach, spl token, program id

If no keyword is found, flag as "unresolved address".

Aggregator URL Parsing

These URLs are parsed for entities from the path (not treated as official sites):

Platform	Path format	Parsed result
clawhub.ai	`/owner/repo`	→ repo (owner/repo) — use `github-analysis` directly, skip agent-browser
dexscreener.com	`/chain/address`	→ contract + chain
dextools.io	`/app/chain/pair/address`	→ contract + chain
pump.fun	`/address`	→ Solana contract
gmgn.ai	`/chain/address`	→ contract + chain
birdeye.so	`/token/address`	→ contract
defined.fi	`/chain/address`	→ contract + chain

Data Display Rules

API latency / performance metrics: If the data was not successfully fetched or the request returned an error, do not display API latency or performance data in the report. Only show latency data when it was actually measured successfully.
Skip any metric that returned an error or timed out — leave it out entirely rather than showing "N/A" or error messages.

Local Memory & Report Storage

After generating the final research report, store a copy locally:

Save the report as PDF to ~/.crab-catch/reports/{project_name}_{YYYY-MM-DD}.pdf

Maintain an index file ~/.crab-catch/reports/index.json with entries:

{ "project": "name", "date": "YYYY-MM-DD", "file": "filename.pdf", "entry": "original user input" }

This allows past research to be retrieved in future sessions.

Report Output

Use REPORT_TEMPLATE.md as the report structure, with the following constraints:

Section constraints

Must keep — always present, fixed order, follow template format:

Header (project name + timestamp)
📌 Basic Information
🧠 Core Findings (with Executive Summary)
📝 Conclusion & Verdict
📂 References

Default keep — included by default; user can request to skip:

🛡️ Verification & Cross-Reference (Claim / Contradictions / Gaps)
⚠️ Risk Warning

Data-dependent — include if data available, skip entire subsection if not:

📊 Deep Dive
- 👤 Team & Key Figures (skip if no team info found)
- 💻 GitHub Analysis (skip if no repo)
- ⛓️ On-chain Security (skip if no contract)
- 📈 Social Signals (skip if no Twitter)
- 📅 Project Timeline (skip if insufficient time data)

Free — table row count, description text, signal count are flexible.

Formatting rules

Inline citations: [[N]](url) after every evidence claim
Numbers: K / M / B format; prices: $ prefix
Highlight high-risk signals (honeypot, high tax, upgradable contracts)
Include Data Coverage note when sources were unavailable
Append DYOR disclaimer
Output language matches user input; default Chinese (zh-CN)

Crab Catch