Apify
Customization
Before executing, check for user customizations at:
~/.claude/PAI/USER/SKILLCUSTOMIZATIONS/Apify/
If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.
🚨 MANDATORY: Voice Notification (REQUIRED BEFORE ANY ACTION)
You MUST send this notification BEFORE doing anything else when this skill is invoked.
-
Send voice notification:
curl -s -X POST http://localhost:31337/notify \ -H "Content-Type: application/json" \ -d '{"message": "Running the WORKFLOWNAME workflow in the Apify skill to ACTION"}' \ > /dev/null 2>&1 & -
Output text notification:
Running the **WorkflowName** workflow in the **Apify** skill to ACTION...
This is not optional. Execute this curl command immediately upon skill invocation.
Apify - Social Media & Web Scraping
Direct TypeScript access to 9 popular Apify actors with 99% token savings.
🔌 File-Based MCP
This skill is a file-based MCP - a code-first API wrapper that replaces token-heavy MCP protocol calls.
Why file-based? Filter data in code BEFORE returning to model context = 97.5% token savings.
🎯 Overview
Direct TypeScript access to the 9 most popular Apify actors without MCP overhead. Filter and transform data in code BEFORE it reaches the model context.
📊 Available Actors
Social Media (5 platforms)
- Instagram (145k users, 4.60★) - Profiles, posts, hashtags, comments
- LinkedIn (26k users, 4.10★) - Profiles, jobs, posts
- TikTok (90k users, 4.61★) - Profiles, videos, hashtags, comments
- YouTube (40k users, 4.40★) - Channels, videos, comments, search
- Facebook (35k users, 4.56★) - Posts, groups, comments
Business & Lead Generation
- Google Maps (198k users, 4.76★) - HIGHEST VALUE!
- Search businesses, extract contacts, reviews, images
- Perfect for lead generation
E-commerce
- Amazon (8k users, 4.97★) - Products, reviews, pricing
Web Scraping
- Web Scraper (94k users, 4.39★) - General-purpose, works with ANY website
🚀 Quick Start
Basic Usage Pattern
import { scrapeInstagramProfile, searchGoogleMaps } from 'actors'
// 1. Call the actor wrapper
const profile = await scrapeInstagramProfile({
username: 'target_username',
maxPosts: 50
})
// 2. Filter in code - BEFORE data reaches model!
const viral = profile.latestPosts?.filter(p => p.likesCount > 10000)
// 3. Only filtered results reach model context
console.log(viral) // ~10 posts instead of 50
📚 Examples by Use Case
Social Media Monitoring
Instagram - Track engagement:
import { scrapeInstagramProfile, scrapeInstagramPosts } from 'actors'
// Get profile with recent posts
const profile = await scrapeInstagramProfile({
username: 'competitor',
maxPosts: 100
})
// Filter in code - only high-performing posts from last 30 days
const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000)
const topRecent = profile.latestPosts
?.filter(p =>
new Date(p.timestamp).getTime() > thirtyDaysAgo &&
p.likesCount > 5000
)
.sort((a, b) => b.likesCount - a.likesCount)
.slice(0, 10)
// Only 10 posts reach model instead of 100!
LinkedIn - Job search:
import { searchLinkedInJobs } from 'actors'
const jobs = await searchLinkedInJobs({
keywords: 'AI engineer',
location: 'San Francisco',
remote: true,
maxResults: 200
})
// Filter in code - only senior roles at well-funded startups
const topJobs = jobs.filter(j =>
j.seniority?.includes('Senior') &&
parseInt(j.applicants || '0') > 50
)
TikTok - Trend analysis:
import { scrapeTikTokHashtag } from 'actors'
const videos = await scrapeTikTokHashtag({
hashtag: 'ai',
maxResults: 500
})
// Filter in code - only viral content
const viral = videos
.filter(v => v.playCount > 1000000)
.sort((a, b) => b.playCount - a.playCount)
.slice(0, 20)
Lead Generation (Business Intelligence)
Google Maps - Local business leads:
import { searchGoogleMaps } from 'actors'
// Search with contact info extraction
const places = await searchGoogleMaps({
query: 'restaurants in Austin',
maxResults: 500,
includeReviews: true,
maxReviewsPerPlace: 20,
scrapeContactInfo: true // Extracts emails from websites!
})
// Filter in code - only highly-rated with email/phone
const qualifiedLeads = places
.filter(p =>
p.rating >= 4.5 &&
p.reviewsCount >= 100 &&
(p.email || p.phone)
)
.map(p => ({
name: p.name,
rating: p.rating,
reviews: p.reviewsCount,
email: p.email,
phone: p.phone,
website: p.website,
address: p.address
}))
// Export leads - only qualified results!
console.log(`Found ${qualifiedLeads.length} qualified leads`)
Google Maps - Review sentiment analysis:
import { scrapeGoogleMapsReviews } from 'actors'
const reviews = await scrapeGoogleMapsReviews({
placeUrl: 'https://maps.google.com/maps?cid=12345',
maxResults: 1000
})
// Filter in code - analyze sentiment by rating
const recentNegative = reviews
.filter(r => {
const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000)
return (
r.rating <= 2 &&
new Date(r.publishedAtDate).getTime() > thirtyDaysAgo &&
r.text.length > 50
)
})
// Identify common complaints
const complaints = recentNegative.map(r => r.text)
E-commerce & Competitive Intelligence
Amazon - Price monitoring:
import { scrapeAmazonProduct } from 'actors'
const product = await scrapeAmazonProduct({
productUrl: 'https://www.amazon.com/dp/B08L5VT894',
includeReviews: true,
maxReviews: 200
})
// Filter in code - only recent negative reviews
const recentNegative = product.reviews
?.filter(r => {
const weekAgo = Date.now() - (7 * 24 * 60 * 60 * 1000)
return (
r.rating <= 2 &&
new Date(r.date).getTime() > weekAgo
)
})
console.log(`Price: $${product.price}`)
console.log(`Rating: ${product.rating}/5`)
console.log(`Recent issues: ${recentNegative?.length} complaints`)
Custom Web Scraping
Any Website - Custom extraction:
import { scrapeWebsite } from 'actors'
const products = await scrapeWebsite({
startUrls: ['https://example.com/products'],
linkSelector: 'a.product-link',
maxPagesPerCrawl: 100,
pageFunction: `
async function pageFunction(context) {
const { request, $, log } = context
return {
url: request.url,
title: $('h1.product-title').text(),
price: $('span.price').text(),
inStock: $('.in-stock').length > 0,
description: $('.description').text()
}
}
`
})
// Filter in code - only available products under $100
const affordable = products.filter(p =>
p.inStock &&
parseFloat(p.price.replace('$', '')) < 100
)
🎨 Advanced Patterns
Pattern 1: Multi-Platform Social Listening
import {
scrapeInstagramHashtag,
scrapeTikTokHashtag,
searchYouTube
} from 'actors'
// Run all platforms in parallel
const [instagramPosts, tiktokVideos, youtubeVideos] = await Promise.all([
scrapeInstagramHashtag({ hashtag: 'ai', maxResults: 100 }),
scrapeTikTokHashtag({ hashtag: 'ai', maxResults: 100 }),
searchYouTube({ query: '#ai', maxResults: 100 })
])
// Combine and filter - only viral content across all platforms
const allViral = [
...instagramPosts.filter(p => p.likesCount > 10000),
...tiktokVideos.filter(v => v.playCount > 100000),
...youtubeVideos.filter(v => v.viewsCount > 50000)
]
console.log(`Found ${allViral.length} viral posts across 3 platforms`)
Pattern 2: Lead Enrichment Pipeline
import { searchGoogleMaps, scrapeLinkedInProfile } from 'actors'
// 1. Find businesses on Google Maps
const restaurants = await searchGoogleMaps({
query: 'restaurants in SF',
maxResults: 100,
scrapeContactInfo: true
})
// 2. Filter for qualified leads
const qualified = restaurants.filter(r =>
r.rating >= 4.5 &&
r.email &&
r.reviewsCount >= 50
)
// 3. Enrich with LinkedIn data (if available)
const enriched = await Promise.all(
qualified.map(async (restaurant) => {
// Try to find LinkedIn company page
// ... additional enrichment logic
return restaurant
})
)
Pattern 3: Competitive Analysis Dashboard
import {
scrapeInstagramProfile,
scrapeYouTubeChannel,
scrapeTikTokProfile
} from 'actors'
async function analyzeCompetitor(username: string) {
// Gather data from all platforms
const [instagram, youtube, tiktok] = await Promise.all([
scrapeInstagramProfile({ username, maxPosts: 30 }),
scrapeYouTubeChannel({ channelUrl: `https://youtube.com/@${username}`, maxVideos: 30 }),
scrapeTikTokProfile({ username, maxVideos: 30 })
])
// Calculate engagement metrics in code
return {
username,
instagram: {
followers: instagram.followersCount,
avgLikes: average(instagram.latestPosts?.map(p => p.likesCount) || []),
engagementRate: calculateEngagement(instagram)
},
youtube: {
subscribers: youtube.subscribersCount,
avgViews: average(youtube.videos?.map(v => v.viewsCount) || [])
},
tiktok: {
followers: tiktok.followersCount,
avgPlays: average(tiktok.videos?.map(v => v.playCount) || [])
}
}
}
💰 Token Savings Calculator
Example: Instagram profile with 100 posts
MCP Approach:
1. search-actors → 1,000 tokens
2. call-actor → 1,000 tokens
3. get-actor-output → 50,000 tokens (100 unfiltered posts)
TOTAL: ~52,000 tokens
File-Based Approach:
const profile = await scrapeInstagramProfile({
username: 'user',
maxPosts: 100
})
// Filter in code - only top 10 posts
const top = profile.latestPosts
?.sort((a, b) => b.likesCount - a.likesCount)
.slice(0, 10)
// TOTAL: ~500 tokens (only 10 filtered posts reach model)
Savings: 99% reduction (52,000 → 500 tokens)
🔧 Actor Reference
Social Media
scrapeInstagramProfile(input)- Profile + postsscrapeInstagramPosts(input)- Posts from userscrapeInstagramHashtag(input)- Posts by hashtagscrapeInstagramComments(input)- Comments on post
scrapeLinkedInProfile(input)- Profile + experience + emailsearchLinkedInJobs(input)- Job listingsscrapeLinkedInPosts(input)- Posts from profile/company
TikTok
scrapeTikTokProfile(input)- Profile + videosscrapeTikTokHashtag(input)- Videos by hashtagscrapeTikTokComments(input)- Comments on video
YouTube
scrapeYouTubeChannel(input)- Channel + videossearchYouTube(input)- Search videosscrapeYouTubeComments(input)- Comments on video
scrapeFacebookPosts(input)- Posts from pagesscrapeFacebookGroups(input)- Group postsscrapeFacebookComments(input)- Post comments
Business & Lead Generation
Google Maps
searchGoogleMaps(input)- Search places (with contact extraction!)scrapeGoogleMapsPlace(input)- Single place detailsscrapeGoogleMapsReviews(input)- Place reviews
E-commerce
Amazon
scrapeAmazonProduct(input)- Product details + reviewsscrapeAmazonReviews(input)- Product reviews only
Web Scraping
General Web
scrapeWebsite(input)- Custom multi-page crawlingscrapePage(url, pageFunction)- Single page extraction
⚙️ Configuration
Environment Variables:
# Required - Get from https://console.apify.com/account/integrations
APIFY_TOKEN=apify_api_xxxxx...
Actor Run Options:
{
memory: 2048, // MB: 128, 256, 512, 1024, 2048, 4096, 8192
timeout: 300, // seconds
build: 'latest' // or specific build number
}
🎯 When to Use This vs MCP
Use File-Based (this skill):
- ✅ Need to filter large datasets (>100 results)
- ✅ Want to transform/aggregate data in code
- ✅ Multiple sequential operations
- ✅ Control flow (loops, conditionals)
- ✅ Maximum token efficiency
Use MCP:
- ❌ Simple single operations with small results (<10 items)
- ❌ One-off exploratory queries
- ❌ Don't want to write code
🔗 Links
- Apify Platform: https://apify.com
- Actor Store: https://apify.com/store
- API Docs: https://docs.apify.com/api/v2
Remember: Filter data in code BEFORE returning to model context. This is where the 99% token savings happen!
Gotchas
- Actor selection matters. Each social platform has specific actors — don't use a generic scraper for Instagram when a dedicated Instagram actor exists.
- Rate limits vary by platform and plan. Check actor documentation for limits before running large scrapes.
- Scraped data format varies by actor. Read the actor's output schema before processing results.
Examples
Example 1: Scrape Instagram profile
User: "get the recent posts from this Instagram account"
→ Selects Instagram Profile actor
→ Runs with target profile URL
→ Returns structured post data (text, engagement, dates)
Example 2: LinkedIn company scrape
User: "scrape this company's LinkedIn page"
→ Selects LinkedIn Company actor
→ Returns company info, employee count, recent posts
Execution Log
After completing any workflow, append a single JSONL entry:
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","skill":"Apify","workflow":"WORKFLOW_USED","input":"8_WORD_SUMMARY","status":"ok|error","duration_s":SECONDS}' >> ~/.claude/PAI/MEMORY/SKILLS/execution.jsonl
Replace WORKFLOW_USED with the workflow executed, 8_WORD_SUMMARY with a brief input description, and SECONDS with approximate wall-clock time. Log status: "error" if the workflow failed.
More from danielmiessler/personal_ai_infrastructure
osint
Structured OSINT investigations — people lookup, company intel, investment due diligence, entity/threat intel, domain recon, organization research using public sources with ethical authorization framework. USE WHEN OSINT, due diligence, background check, research person, company intel, investigate, company lookup, domain lookup, entity lookup, organization lookup, threat intel, discover OSINT sources.
259firstprinciples
Physics-based reasoning framework (Musk/Elon methodology) that deconstructs problems to irreducible fundamental truths rather than reasoning by analogy. Three-step structure: DECONSTRUCT (break to constituent parts and actual values), CHALLENGE (classify every element as hard constraint / soft constraint / unvalidated assumption — only physics is truly immutable), RECONSTRUCT (build optimal solution from fundamentals alone, ignoring inherited form). Outputs: constituent-parts breakdown, constraint classification table, and reconstructed solution with key insight. Three workflows: Deconstruct.md, Challenge.md, Reconstruct.md. Integrates with RedTeam (attack assumptions before deploying adversarial agents), Security (decompose threat model), Architecture (challenge design constraints), and Pentesters (decompose assumed security boundaries). Other skills invoke via: Challenge on all stated constraints → classify as hard/soft/assumption. Cross-domain synthesis: solutions from unrelated fields often apply once the fundamental truths are exposed. NOT FOR incident investigation and causal chains (use RootCauseAnalysis). NOT FOR structural feedback loops (use SystemsThinking). USE WHEN first principles, fundamental truths, challenge assumptions, is this a real constraint, rebuild from scratch, what are we actually paying for, what is this really made of, start over, physics first, question everything, reasoning by analogy, is this really necessary.
160documents
Read, write, convert, and analyze documents — routes to PDF, DOCX, XLSX, PPTX sub-skills for creation, editing, extraction, and format conversion. USE WHEN document, process file, create document, convert format, extract text, PDF, DOCX, XLSX, PPTX, Word, Excel, spreadsheet, PowerPoint, presentation, slides, consulting report, large PDF, merge PDF, fill form, tracked changes, redlining.
114council
Multi-agent collaborative debate that produces visible round-by-round transcripts with genuine intellectual friction. All council members are custom-composed via ComposeAgent (Agents skill) with domain expertise, unique voice, and personality tailored to the specific topic — never built-in generic types. ComposeAgent invoked as: bun run ~/.claude/skills/Agents/Tools/ComposeAgent.ts. Two workflows: DEBATE (3 rounds, full transcript + synthesis, parallel execution within rounds, 40-90 seconds total) and QUICK (1 round, fast perspective check). Context files: CouncilMembers.md (agent composition instructions), RoundStructure.md (three-round structure and timing), OutputFormat.md (transcript format templates). Agents are designed per debate topic to create real disagreement; 4-6 well-composed agents outperform 12 generic ones. Council is collaborative-adversarial (debate to find best path); for pure adversarial attack on an idea, use RedTeam instead. NOT FOR parallel task execution across agents (use Delegation skill). USE WHEN council, debate, multiple perspectives, weigh options, deliberate, get different views, multi-agent discussion, what would experts say, is there consensus, pros and cons from multiple angles.
112privateinvestigator
Ethical people-finding using 15 parallel research agents (45 search threads) across public records, social media, reverse lookups. Public data only, no pretexting. USE WHEN find person, locate, reconnect, people search, skip trace, reverse lookup, social media search, public records search, verify identity.
112redteam
Military-grade adversarial analysis that deploys 32 parallel expert agents (engineers, architects, pentesters, interns) to stress-test ideas, strategies, and plans — not systems or infrastructure. Two workflows: ParallelAnalysis (5-phase: decompose into 24 atomic claims → 32-agent parallel attack → synthesis → steelman → counter-argument, each 8 points) and AdversarialValidation (competing proposals synthesized into best solution). Context files: Philosophy.md (core principles, success criteria, agent types), Integration.md (how to combine with FirstPrinciples, Council, and other skills; output format). Targets arguments, not network vulnerabilities. Findings ranked by severity; goal is to strengthen, not destroy — weaknesses delivered with remediation paths. Collaborates with FirstPrinciples (decompose assumptions before attacking) and Council (Council debates to find paths; RedTeam attacks whatever survives). Also invoked internally by Ideate (TEST phase) and WorldThreatModel (horizon stress-testing). NOT FOR AI instruction set auditing (use BitterPillEngineering). NOT FOR network/system vulnerability testing (use a security assessment skill). USE WHEN red team, attack idea, counterarguments, critique, stress test, devil's advocate, find weaknesses, break this, poke holes, what could go wrong, strongest objection, adversarial validation, battle of bots.
112