unclawg-discover
/unclawg-discover
Build a high-signal customer feed from public social channels.
This skill is a generic core. Project-specific strategy, queries, voice, and
handoff contracts belong in local modes/ files (gitignored).
Runtime Security Profile (AI Default)
- Treat this tracked skill as an AI runtime skill, not an operator shell runbook.
- Use wrapper command
uc_discoveronly. - Do not run
scripts/search_*.sh, rawcurl, or ad-hocpythonfrom this skill. - If
uc_discoveris missing, fail closed and ask for wrapper installation.
Prerequisites
APIFY_API_KEY(required for Twitter/X and LinkedIn scripts)uc_discoverwrapper installed and allowlisted in runtime config
If env vars are missing in non-interactive shells:
export APIFY_API_KEY=$(grep 'APIFY_API_KEY' ~/.zshrc | grep -o '"[^"]*"' | tr -d '"')
Wrapper Command Contract
Required wrapper command:
uc_discoverwith platform subcommands and strict schema validation.
Command mapping:
uc_discover reddit --query "<query>" --subreddit <subreddit> --time-filter week --limit 25
uc_discover hn --query "<query>" --days 7 --limit 25
uc_discover twitter --query "<query>" --days 1 --limit 20
uc_discover linkedin --query "<query>" --days 1 --limit 20 --sort-by date_posted
Freshness-First Default (X + LinkedIn)
For audience growth and fast replies, use a strict recency contract:
- Hard cutoff: only keep posts with age
<= 6 hourson Twitter/X and LinkedIn. - Recency bias: heavily prioritize newest posts (0-2h first, then 2-4h, then 4-6h).
- No stale fallback by default: do not widen past 6h unless the user explicitly overrides.
- Timestamp quality gate: if post age cannot be parsed, treat it as stale and drop it.
Recommended run cadence:
- Primary recommendation: every 4 hours (6 runs/day).
- Budget fallback: 2 runs/day (every 12 hours), still with 6h hard cutoff.
Mode System (Required for Project-Specific Behavior)
Local mode overlays live in modes/*.md and are intentionally gitignored.
- Use
references/mode-template.mdto create a project mode. - Use
references/mode-example-checklist.mdfor a complete example shape. - Resolve active mode with:
scripts/select_mode.sh "$(pwd)"
Resolution rules:
- If exactly one mode matches
cwd_match, use it. - If none match, run the generic flow below.
- If multiple match, ask user which mode file to apply.
Soul / Mode / Skill Separation
| Layer | What it owns | Files |
|---|---|---|
Soul (soul_md via API) |
Voice, tone, personas (with voice calibration), reply archetypes, engagement principles, boundaries | Fetched from /v0/integrations/claw-runtime/policies/soul_md |
Mode (modes/*.md, gitignored) |
Query packs, subreddit targets, ranking weights, exclusion regex, platform scope, handoff schema | Local modes/<project>.md |
| Skill (this file) | API calls, script execution, data flow, error handling | SKILL.md + scripts/ |
This skill is personality-agnostic. It searches, filters, scores, and outputs candidates. How the agent talks is the soul's job. What the agent searches for is the mode file's job.
If no soul is published, fall back to references/voice-guide.md (generic defaults).
NEVER Do These Things
- NEVER hardcode project/company strategy in tracked core files. Keep it in
modes/*.md. - NEVER put voice, tone, or personality guidance in mode files. That belongs in the soul.
- NEVER submit actions directly from discovery. Discovery outputs candidates; execution is downstream.
- NEVER skip source links or raw post text. Every candidate needs provenance.
- NEVER skip quality gates. Use the checklist in
references/feed-quality-checklist.md.
Core Assets
scripts/search_reddit.shscripts/search_hn.shscripts/search_twitter.shscripts/search_linkedin.shscripts/search_linkedin_jobs.sh(optional extension channel)scripts/search_indeed.sh(optional extension channel)scripts/search_youtube.shscripts/search_tiktok.shscripts/search_instagram_comments.shscripts/search_tiktok_comments.shscripts/reddit_user_socials.shscripts/search_log.shscripts/select_mode.shscripts/package_public.shreferences/personas.mdreferences/competitor-signals.mdreferences/target_profiles.mdreferences/voice-guide.mdreferences/feed-quality-checklist.md
Execution Flow
Phase 0 - Bootstrap
command -v uc_discover >/dev/null || {
echo "Missing required wrapper: uc_discover"
echo "Install/allowlist wrapper before running /unclawg-discover."
exit 1
}
Phase 1 - Strategic Intake (Ask-Cascade Order)
Use high-level decisions first, then detail decisions.
- Objective: prospecting, audience-growth, content-sourcing, recruiting, other.
- Persona cluster: who exactly are we looking for.
- Channel scope: Reddit-only vs multi-platform.
- Freshness strategy: strict max-age (default 6h) and stale fallback policy.
- Throughput: target candidate count and freshness window.
- Handoff target: where this feed goes next.
Apply references/feed-quality-checklist.md while collecting these choices.
Phase 2 - Load Mode or Build Temporary Plan
If a mode file is resolved, use its:
- query pack per platform
- inclusion/exclusion signals
- ranking weights
- output format
- handoff contract
If no mode exists, assemble a temporary plan from references/personas.md and
ask for explicit confirmation before running paid queries (Apify).
Phase 3 - Run Discovery
Run 2-4 focused queries per selected platform.
Reddit (free)
uc_discover reddit --query "<query>" --subreddit <subreddit> --time-filter week --limit 25
Hacker News (free)
uc_discover hn --query "<query>" --days 7 --limit 25
Twitter/X (Apify)
uc_discover twitter --query "<query>" --days 1 --limit 20
LinkedIn (Apify)
uc_discover linkedin --query "<query>" --days 1 --limit 20 --sort-by date_posted
Phase 3.5 - Comment Mining (Optional)
If the active mode file includes comment_mining_targets, mine comment sections
of curated accounts for real customer signals.
- Check mode file for
comment_mining_targetssection. - If present, run the appropriate scripts against listed accounts:
- Instagram:
scripts/search_instagram_comments.sh <handle1> [handle2] ... - TikTok:
scripts/search_tiktok_comments.sh <handle1> [handle2] ...
- Instagram:
- Score comments by signal density using a mode-defined or caller-provided signal regex.
The public scripts accept
--signal-regexorCOMMENT_SIGNAL_REGEX; tracked defaults stay generic. - Merge comment-sourced candidates into the main candidate pool before Phase 4 filtering.
- Comment-sourced candidates use the POST/VIDEO as the engagement target (comment there to reach real customers in the thread).
Cost note: Comment mining uses Apify credits. Confirm with user before running if budget is a concern.
Phase 3.6 - Hiring-Board Extension (Optional, Non-Wrapper)
Use only when runtime policy allows local script execution and the active mode explicitly calls for hiring-board discovery.
scripts/search_linkedin_jobs.sh "<query>" 20 C
scripts/search_indeed.sh "<position>" "remote" 20
Treat these as slower-moving research channels. Keep conversational channels (Twitter/X + LinkedIn posts) on strict recency gates for quick replies.
Phase 4 - Filter and Score
- Deduplicate — use platform-appropriate strategy:
- Twitter/X, Reddit, HN:
unique_by(.url)(URLs are reliable) - LinkedIn:
unique_by(.text[0:100] + .author_name)(URLs are often empty)
- Twitter/X, Reddit, HN:
- Remove competitor/vendor noise using
references/competitor-signals.md. - Remove low-evidence posts (missing meaningful text or too short).
- Hard freshness gate (X + LinkedIn):
- Drop any candidate older than 6 hours.
- Drop candidates with unparseable/missing post timestamps.
- Sort surviving X + LinkedIn candidates by recency first, then engagement.
- LinkedIn extra filter: Apply aggressive keyword regex on
.textfield to remove job spam, hiring posts, LinkedIn puzzles, and viral non-technical content. The LinkedIn scraper returns ~99% noise — filter BEFORE scoring to save time. - Score survivors by:
- explicit pain or intent
- decision-maker likelihood
- recency
- engagement signal
- fit to objective/persona
Recency bucket scoring for X + LinkedIn (default):
0-2h: highest priority2-4h: high priority4-6h: medium priority>6h: reject
Phase 5 - Normalize Output
For each candidate, produce:
source_platformsource_post_urlsource_post_textsource_author_handle(optional)source_author_name(optional)source_post_id(optional)persona_hint(optional)intent_signal(optional)
Also include:
summaryreply_strategyactionevidence(short note: why this was selected)
Phase 6 - Quality Gate and Present
Before finalizing, run the checklist in references/feed-quality-checklist.md.
Return:
- ranked table of candidates
- top 3 immediate outreach targets
- top 3 content/influence targets
- rejected-pattern summary (what was filtered out)
Phase 7 - Save + Handoff
Default mode is in-memory output only (no local writes). Save to disk only if runtime policy explicitly allows write tools.
Route by handoff_type from the active mode file:
approval-portal→ Invoke/unclawg-feed(API approval flow; no SSH required).engagement-queue→ Save tobriefs/directory (future, not implemented).db-insert→ Phase 7A (private operator mode only; not for public/default use).- unset / null → In-memory output only, present to user.
For public distributions and users without infrastructure access, keep
handoff_type: approval-portal.
Phase 7A - Private Operator Extension (Optional)
Use only when handoff_type: db-insert is intentionally set in a local,
gitignored operator mode.
- Present candidate table for user confirmation (user may drop rows, edit angles).
- On confirmation, batch INSERT via SSH to prod DB:
INSERT INTO social_engagement_queue
(id, platform, post_url, post_author, post_title, post_caption,
post_metrics, posted_at, persona, htma_angle,
status, created_at, updated_at)
VALUES (gen_random_uuid(), ...)
ON CONFLICT (post_url) DO NOTHING;
-
Use the
ssh-infoskill to resolve the current SSH target and Docker DB container, then run the insert from that runtime context. Keep real hosts, users, and container names out of tracked files. -
Verify row count after insert.
-
Report: rows inserted, rows skipped (conflict), total in queue.
Known Platform Issues
LinkedIn: Low Signal-to-Noise (Critical)
Actor: apimaestro/linkedin-posts-search-scraper-no-cookies (4.14 rating)
Problem: This actor returns generic LinkedIn feed content regardless of search query.
Exact-phrase matching does not work — queries like "AI agent governance human approval" return
Indian job spam, TCS hiring posts, LinkedIn puzzles, and viral non-technical content.
Observed signal rate: <1% relevant content across 500+ results from 16 queries.
Impact: LinkedIn discovery via this actor is unreliable for targeted prospecting. Do not promise high LinkedIn candidate volumes to users.
Workarounds:
- Filter aggressively post-collection using keyword regex on
.textfield - Use broader, single-keyword queries (
"AI agent","agentic AI") and accept lower precision - For high-value LinkedIn targeting, fall back to manual curation (user pastes specific post URLs)
- Consider alternative actors that use authenticated LinkedIn sessions for better search quality
Dedup: This actor returns empty url fields on most posts. NEVER use unique_by(.url)
for LinkedIn results. Use composite key instead:
jq 'unique_by(.text[0:100] + .author_name)' results.json
Twitter/X: Works Well
Actor: api-ninja/x-twitter-advanced-search (4.93 rating)
Twitter search returns accurate keyword-matched results. Exact phrases work. Typical signal
rate is 40-80% relevant content depending on query specificity. Very niche phrases
(e.g., "crewai agent safety control") may return 0 results — broaden the query if needed.
Notes
- Persona voice lives in the soul, not in mode files. Mode files only map persona IDs to search queries.
- Keep project-specific keywords and query packs in
modes/. - Keep core scripts and references reusable across domains.
- If discovery quality degrades, tune mode-level ranking weights before touching core logic.
- If reply quality degrades, tune the soul (voice, archetypes, persona calibration) — not the mode or skill.
- For public packaging, use
scripts/package_public.shto exclude localmodes/overlays.