scrape-to-crm
Scrape-to-CRM Pipeline
End-to-end pipeline: LinkedIn scrape (Apify) → data transform → n8n workflow → Attio CRM upsert.
Trigger Phrases
- "scrape LinkedIn to CRM", "scrape to Attio"
- "pipeline LinkedIn leads"
- "scrape and push to Attio"
- LinkedIn URL + "push to CRM"
/scrape-to-crm [target]
Pipeline Steps
Step 1: Gather Requirements
Ask the user (if not provided via arguments):
-
What to scrape — one of:
- LinkedIn search URL (e.g.,
https://www.linkedin.com/search/results/people/?keywords=...) - LinkedIn company page URL (e.g.,
https://www.linkedin.com/company/acme/) - Company name or keyword (e.g., "fintech Tel Aviv")
- LinkedIn search URL (e.g.,
-
Attio target — which object and optional list:
- Object:
people(default) orcompanies - List: optional Attio list name to add records to
- Object:
-
Geography (if adding to a list with Country/City attributes):
- Country: e.g., "Israel", "Turkey", "Greece"
- City: e.g., "Haifa", "Tel Aviv", "Ashdod", "Istanbul"
-
Source — how this lead was sourced (default: "Scraped"):
- Options: Scraped, Referral, Manual
-
Limit — max records to scrape (default: 25, max: 100)
Step 2: Apify — Find and Run LinkedIn Actor
Use MCP Apify tools in sequence:
2a. Search for the best actor
Tool: mcp__apify__search-actors
Query: "linkedin" + context (e.g., "linkedin profile scraper", "linkedin company scraper", "linkedin search")
Preferred actors by use case:
| Use Case | Preferred Actor | Fallback |
|---|---|---|
| Profile scraping | anchor~linkedin-profile-scraper |
Search for "linkedin profile" |
| Company page | apify~linkedin-company-scraper |
Search for "linkedin company" |
| Search results | curious_coder~linkedin-search |
Search for "linkedin search scraper" |
| People search | bebity~linkedin-people-search |
Search for "linkedin people" |
2b. Fetch actor details and input schema
Tool: mcp__apify__fetch-actor-details
Actor ID: [from search results]
Review the input schema to understand required fields.
2c. Run the actor
Tool: mcp__apify__call-actor
Actor ID: [selected actor]
Input: {
// Varies by actor — always check schema from step 2b
// Common patterns:
"startUrls": [{"url": "<linkedin_url>"}], // or "searchUrl", "urls"
"maxResults": <limit>,
"proxy": {"useApifyProxy": true}
}
Important: Set async: false for small runs (<50 records), async: true for larger runs.
2d. Get results
Tool: mcp__apify__get-actor-output
Run ID: [from call-actor response]
Format: "json"
Step 3: Transform Scraped Data
Map Apify output fields to Attio-compatible format. Use the field mapping from references/field-mapping.md.
Core transformation logic (execute in a Code node or inline):
function transformProfiles(items) {
return items.map(item => ({
// People fields
full_name: item.fullName || item.name || `${item.firstName || ''} ${item.lastName || ''}`.trim(),
email: item.email || item.emailAddress || null,
phone: normalizePhone(item.phone || item.phoneNumber || null),
title: item.title || item.headline || item.position || null,
company_name: item.companyName || item.company || null,
linkedin_url: item.linkedInUrl || item.profileUrl || item.url || null,
location: item.location || item.geoLocation || null,
// Company fields (if scraping companies)
domain: item.website || item.companyUrl || null,
industry: item.industry || null,
employee_count: item.employeeCount || item.staffCount || null,
description: item.description || item.about || null,
}));
}
function normalizePhone(phone) {
if (!phone) return null;
let digits = phone.replace(/\D/g, '');
// Israeli numbers: 05X or 0X → +972
if (/^0[2-9]/.test(digits) && digits.length === 10) {
digits = '972' + digits.slice(1);
}
return '+' + digits;
}
Step 4: n8n Workflow (Optional)
If the user wants a persistent/repeatable pipeline, build an n8n workflow:
Use the n8n-workflow-builder skill patterns. The workflow should have:
- Webhook Trigger — receives scraped data as POST body
- Code Node — transforms data using the mapping from Step 3
- HTTP Request Node — upserts each record to Attio
Tool: mcp__n8n__search_nodes
Query: "webhook"
For one-time runs, skip n8n and push directly to Attio from Step 5.
Step 5: Attio CRM — Upsert Records
Push transformed records to Attio using the API.
People upsert
curl -X PUT "https://api.attio.com/v2/objects/people/records?matching_attribute=email_addresses" \
-H "Authorization: Bearer $ATTIO_API_KEY" \
-H "Content-Type: application/json" \
-d "$(cat <<'BODY'
{
"data": {
"values": {
"name": [{"full_name": "<full_name>"}],
"email_addresses": [{"email_address": "<email>"}],
"phone_numbers": [{"original_phone_number": "<phone_e164>"}],
"job_title": [{"value": "<title>"}],
"linkedin": [{"value": "<linkedin_url>"}]
}
}
}
BODY
)"
Companies upsert
curl -X PUT "https://api.attio.com/v2/objects/companies/records?matching_attribute=domains" \
-H "Authorization: Bearer $ATTIO_API_KEY" \
-H "Content-Type: application/json" \
-d "$(cat <<'BODY'
{
"data": {
"values": {
"name": [{"value": "<company_name>"}],
"domains": [{"domain": "<domain>"}],
"description": [{"value": "<description>"}]
}
}
}
BODY
)"
Link person to company
After upserting both, link via:
PUT /v2/objects/people/records?matching_attribute=email_addresses
Include "company" attribute with company record ID
Step 5b: Add Records to List (if list specified)
If the user specified an Attio list in Step 1, add each upserted record to that list.
Create list (if it doesn't exist)
curl -X POST "https://api.attio.com/v2/lists" \
-H "Authorization: Bearer $ATTIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"data": {
"name": "<list_name>",
"api_slug": "<snake_case_slug>",
"parent_object": "companies",
"workspace_access": "full-access",
"workspace_member_access": []
}
}'
Add record to list
For each upserted record, create a list entry using the record_id from the upsert response:
curl -X POST "https://api.attio.com/v2/lists/{list_id}/entries" \
-H "Authorization: Bearer $ATTIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"data": {
"parent_record_id": "<record_id from upsert response>",
"parent_object": "companies",
"entry_values": {
"country": "<country from Step 1>",
"city": "<city from Step 1>",
"source": "<source from Step 1, default: Scraped>"
}
}
}'
Note: If the list has Country/City/Source select attributes, always populate entry_values with the geography and source from Step 1. Omit any field that wasn't provided by the user.
Critical: parent_object is required — omitting it causes a silent failure where the API accepts the call but no entry appears in the list. Set it to "companies" or "people" matching the list's parent_object setting.
Scopes needed: list_entry:read-write, list_configuration:read
Notes:
- Multiple entries for the same parent record are allowed
- Throws on unique attribute conflicts if the list has unique constraints
Step 6: Report Results
Present a summary table:
## Scrape-to-CRM Results
**Source:** [LinkedIn URL or search query]
**Actor:** [actor name used]
**Records scraped:** X
**Records pushed to Attio:** Y (Z new, W updated)
**Failures:** N
| Name | Title | Company | Email | Status |
|------|-------|---------|-------|--------|
| ... | ... | ... | ... | Created/Updated/Failed |
Attio API Gotchas
Reference: ~/.claude/skills/lead-enrichment/references/api-patterns.md
nameattribute → must usefull_namefield, NOTvalue- Phone numbers → use
original_phone_numberfield, must be E.164 format - Israeli numbers → strip non-digits, replace leading
0with+972 - Body serialization → always use
JSON.stringify()— string concat breaks with special chars - Matching attributes → People:
email_addresses, Companies:domains - Rate limits → Attio allows ~10 req/s; add 100ms delay between upserts for bulk
Environment Variables
Read from ~/.env or project .env:
APIFY_API_TOKEN— required for Apify actor runsATTIO_API_KEY— required for CRM upsertsN8N_API_KEY— optional, only if building n8n workflow
Error Handling
- Apify actor fails → show error, suggest alternative actor via
search-actors - No email in scraped data → skip Attio People upsert for that record, note in report
- Attio upsert fails → log the record and error, continue with remaining records
- Rate limit hit → back off with exponential delay (100ms → 200ms → 400ms)
- Partial results → always report what succeeded vs. failed; never silently drop records
Testing
Run /scrape-to-crm test to execute a dry-run:
- Asks for a small LinkedIn URL (company page with <10 employees)
- Runs Apify scrape with
maxResults: 5 - Transforms data and displays mapped fields (no CRM push)
- User confirms → pushes to Attio
- Verifies records exist via Attio API GET