unclawg-discover

Installation

SKILL.md

/unclawg-discover

Build a high-signal customer feed from public social channels.

This skill is a generic core. Project-specific strategy, queries, voice, and handoff contracts belong in the client overlay (skillbox-config/clients/{client}/overlay.yaml → auto-generated context.yaml).

Runtime Security Profile (AI Default)

Treat this tracked skill as an AI runtime skill, not an operator shell runbook.
Use wrapper command uc_discover only.
Do not run scripts/search_*.sh, raw curl, or ad-hoc python from this skill.
If uc_discover is missing, fail closed and ask for wrapper installation.

Prerequisites

APIFY_API_KEY (required for Twitter/X and LinkedIn scripts)
uc_discover wrapper installed and allowlisted in runtime config

If env vars are missing in non-interactive shells:

export APIFY_API_KEY=$(grep 'APIFY_API_KEY' ~/.zshrc | grep -o '"[^"]*"' | tr -d '"')

Wrapper Command Contract

Required wrapper command:

uc_discover with platform subcommands and strict schema validation.

Command mapping:

uc_discover reddit --query "<query>" --subreddit <subreddit> --time-filter week --limit 25
uc_discover hn --query "<query>" --days 7 --limit 25
uc_discover twitter --query "<query>" --days 1 --limit 20
uc_discover linkedin --query "<query>" --days 1 --limit 20 --sort-by date_posted

Freshness-First Default (X + LinkedIn)

For audience growth and fast replies, use a strict recency contract:

Hard cutoff: only keep posts with age <= 6 hours on Twitter/X and LinkedIn.
Recency bias: heavily prioritize newest posts (0-2h first, then 2-4h, then 4-6h).
No stale fallback by default: do not widen past 6h unless the user explicitly overrides.
Timestamp quality gate: if post age cannot be parsed, treat it as stale and drop it.

Recommended run cadence:

Primary recommendation: every 4 hours (6 runs/day).
Budget fallback: 2 runs/day (every 12 hours), still with 6h hard cutoff.

Client Overlay System (Required for Project-Specific Behavior)

Project-specific configuration lives in the skillbox client overlay: skillbox-config/clients/{client}/overlay.yaml → auto-generated context.yaml.

Resolution rules:

If a context.yaml is present in the active skillbox client directory, load it.
If no client overlay is resolved, create one using the skillbox-quickstart scan + generate flow before proceeding.
If multiple clients could apply, ask user which client overlay to use.

Soul / Mode / Skill Separation

Layer	What it owns	Files
Soul (`soul_md` via API)	Voice, tone, personas (with voice calibration), reply archetypes, engagement principles, boundaries	Fetched from `/v0/integrations/claw-runtime/policies/soul_md`
Client Overlay (`skillbox-config/clients/{client}/overlay.yaml`)	Query packs, subreddit targets, ranking weights, exclusion regex, platform scope, handoff schema	Auto-generated `context.yaml`
Skill (this file)	API calls, script execution, data flow, error handling	`SKILL.md` + `scripts/`

This skill is personality-agnostic. It searches, filters, scores, and outputs candidates. How the agent talks is the soul's job. What the agent searches for is the client overlay's job.

If no soul is published, fall back to references/voice-guide.md (generic defaults).

NEVER Do These Things

NEVER hardcode project/company strategy in tracked core files. Keep it in the client overlay (skillbox-config/clients/{client}/overlay.yaml).
NEVER put voice, tone, or personality guidance in client overlays. That belongs in the soul.
NEVER submit actions directly from discovery. Discovery outputs candidates; execution is downstream.
NEVER skip source links or raw post text. Every candidate needs provenance.
NEVER skip quality gates. Use the checklist in references/feed-quality-checklist.md.

Core Assets

scripts/search_reddit.sh
scripts/search_hn.sh
scripts/search_twitter.sh
scripts/search_linkedin.sh
scripts/search_linkedin_jobs.sh (optional extension channel)
scripts/search_indeed.sh (optional extension channel)
scripts/search_youtube.sh
scripts/search_tiktok.sh
scripts/search_instagram_comments.sh
scripts/search_tiktok_comments.sh
scripts/reddit_user_socials.sh
scripts/search_log.sh
scripts/select_mode.sh
scripts/package_public.sh
references/personas.md
references/competitor-signals.md
references/target_profiles.md
references/voice-guide.md
references/feed-quality-checklist.md

Execution Flow

Phase 0 - Bootstrap

command -v uc_discover >/dev/null || {
  echo "Missing required wrapper: uc_discover"
  echo "Install/allowlist wrapper before running /unclawg-discover."
  exit 1
}

Phase 1 - Strategic Intake (Ask-Cascade Order)

Use high-level decisions first, then detail decisions.

Objective: prospecting, audience-growth, content-sourcing, recruiting, other.
Persona cluster: who exactly are we looking for.
Channel scope: Reddit-only vs multi-platform.
Freshness strategy: strict max-age (default 6h) and stale fallback policy.
Throughput: target candidate count and freshness window.
Handoff target: where this feed goes next.

Apply references/feed-quality-checklist.md while collecting these choices.

Phase 2 - Load Mode or Build Temporary Plan

If a client overlay is resolved, use its:

query pack per platform
inclusion/exclusion signals
ranking weights
output format
handoff contract

If no client overlay exists, assemble a temporary plan from references/personas.md and ask for explicit confirmation before running paid queries (Apify).

Phase 3 - Run Discovery

Run 2-4 focused queries per selected platform.

Reddit (free)

uc_discover reddit --query "<query>" --subreddit <subreddit> --time-filter week --limit 25

Hacker News (free)

uc_discover hn --query "<query>" --days 7 --limit 25

Twitter/X (Apify)

uc_discover twitter --query "<query>" --days 1 --limit 20

LinkedIn (Apify)

uc_discover linkedin --query "<query>" --days 1 --limit 20 --sort-by date_posted

Phase 3.5 - Comment Mining (Optional)

If the active client overlay includes comment_mining_targets, mine comment sections of curated accounts for real customer signals.

Check client overlay for comment_mining_targets section.
If present, run the appropriate scripts against listed accounts:
- Instagram: scripts/search_instagram_comments.sh <handle1> [handle2] ...
- TikTok: scripts/search_tiktok_comments.sh <handle1> [handle2] ...
Score comments by signal density using an overlay-defined or caller-provided signal regex. The public scripts accept --signal-regex or COMMENT_SIGNAL_REGEX; tracked defaults stay generic.
Merge comment-sourced candidates into the main candidate pool before Phase 4 filtering.
Comment-sourced candidates use the POST/VIDEO as the engagement target (comment there to reach real customers in the thread).

Cost note: Comment mining uses Apify credits. Confirm with user before running if budget is a concern.

Phase 3.6 - Hiring-Board Extension (Optional, Non-Wrapper)

Use only when runtime policy allows local script execution and the active client overlay explicitly calls for hiring-board discovery.

scripts/search_linkedin_jobs.sh "<query>" 20 C
scripts/search_indeed.sh "<position>" "remote" 20

Treat these as slower-moving research channels. Keep conversational channels (Twitter/X + LinkedIn posts) on strict recency gates for quick replies.

Phase 4 - Filter and Score

Deduplicate — use platform-appropriate strategy:
- Twitter/X, Reddit, HN: unique_by(.url) (URLs are reliable)
- LinkedIn: unique_by(.text[0:100] + .author_name) (URLs are often empty)
Remove competitor/vendor noise using references/competitor-signals.md.
Remove low-evidence posts (missing meaningful text or too short).
Hard freshness gate (X + LinkedIn):
- Drop any candidate older than 6 hours.
- Drop candidates with unparseable/missing post timestamps.
- Sort surviving X + LinkedIn candidates by recency first, then engagement.
LinkedIn extra filter: Apply aggressive keyword regex on .text field to remove job spam, hiring posts, LinkedIn puzzles, and viral non-technical content. The LinkedIn scraper returns ~99% noise — filter BEFORE scoring to save time.
Score survivors by:
- explicit pain or intent
- decision-maker likelihood
- recency
- engagement signal
- fit to objective/persona

Recency bucket scoring for X + LinkedIn (default):

0-2h: highest priority
2-4h: high priority
4-6h: medium priority
>6h: reject

Phase 5 - Normalize Output

For each candidate, produce:

source_platform
source_post_url
source_post_text
source_author_handle (optional)
source_author_name (optional)
source_post_id (optional)
persona_hint (optional)
intent_signal (optional)

Also include:

summary
reply_strategy
action
evidence (short note: why this was selected)

Phase 6 - Quality Gate and Present

Before finalizing, run the checklist in references/feed-quality-checklist.md.

Return:

ranked table of candidates
top 3 immediate outreach targets
top 3 content/influence targets
rejected-pattern summary (what was filtered out)

Phase 7 - Save + Handoff

Default mode is in-memory output only (no local writes). Save to disk only if runtime policy explicitly allows write tools.

Route by handoff_type from the active client overlay:

approval-portal → Invoke /unclawg-feed (API approval flow; no SSH required).
engagement-queue → Save to briefs/ directory (future, not implemented).
db-insert → Phase 7A (private operator mode only; not for public/default use).
unset / null → In-memory output only, present to user.

For public distributions and users without infrastructure access, keep handoff_type: approval-portal.

Phase 7A - Private Operator Extension (Optional)

Use only when handoff_type: db-insert is intentionally set in a local, gitignored client overlay.

Present candidate table for user confirmation (user may drop rows, edit angles).
On confirmation, batch INSERT via SSH to prod DB:

INSERT INTO social_engagement_queue
  (id, platform, post_url, post_author, post_title, post_caption,
   post_metrics, posted_at, persona, market_angle,
   status, created_at, updated_at)
VALUES (gen_random_uuid(), ...)
ON CONFLICT (post_url) DO NOTHING;

Use the ssh-info skill to resolve the current SSH target and Docker DB container, then run the insert from that runtime context. Keep real hosts, users, and container names out of tracked files.
Verify row count after insert.
Report: rows inserted, rows skipped (conflict), total in queue.

Known Platform Issues

LinkedIn: Low Signal-to-Noise (Critical)

Actor: apimaestro/linkedin-posts-search-scraper-no-cookies (4.14 rating)

Problem: This actor returns generic LinkedIn feed content regardless of search query. Exact-phrase matching does not work — queries like "AI agent governance human approval" return Indian job spam, TCS hiring posts, LinkedIn puzzles, and viral non-technical content. Observed signal rate: <1% relevant content across 500+ results from 16 queries.

Impact: LinkedIn discovery via this actor is unreliable for targeted prospecting. Do not promise high LinkedIn candidate volumes to users.

Workarounds:

Filter aggressively post-collection using keyword regex on .text field
Use broader, single-keyword queries ("AI agent", "agentic AI") and accept lower precision
For high-value LinkedIn targeting, fall back to manual curation (user pastes specific post URLs)
Consider alternative actors that use authenticated LinkedIn sessions for better search quality

Dedup: This actor returns empty url fields on most posts. NEVER use unique_by(.url) for LinkedIn results. Use composite key instead:

jq 'unique_by(.text[0:100] + .author_name)' results.json

Twitter/X: Works Well

Actor: api-ninja/x-twitter-advanced-search (4.93 rating)

Twitter search returns accurate keyword-matched results. Exact phrases work. Typical signal rate is 40-80% relevant content depending on query specificity. Very niche phrases (e.g., "crewai agent safety control") may return 0 results — broaden the query if needed.

Notes

Persona voice lives in the soul, not in client overlays. Client overlays only map persona IDs to search queries.
Keep project-specific keywords and query packs in the client overlay.
Keep core scripts and references reusable across domains.
If discovery quality degrades, tune overlay-level ranking weights before touching core logic.
If reply quality degrades, tune the soul (voice, archetypes, persona calibration) — not the overlay or skill.
For public packaging, use scripts/package_public.sh to exclude client overlay data.

Related skills

More from build000r/skills

Installs

Repository

build000r/skills

GitHub Stars

First Seen

Mar 10, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykWarn