skills/getaero-io/gtm-eng-skills/investor-company-prospecting

investor-company-prospecting

SKILL.md

Start here first: read gtm-meta-skill before running this skill.

YC / Investor-Portfolio Prospecting

Find companies backed by a specific investor or accelerator (YC, a16z, Sequoia, etc.) that are hiring for a target role, then find contacts and build personalized outbound.

Core insight: VC portfolio data is public

Every major VC and accelerator publishes their portfolio online. Do NOT waste turns trying to discover portfolio companies through Deepline search tools. Instead, fetch the public portfolio page directly and extract company names from it.

This is faster, cheaper, and more complete than any provider-based approach.


What NOT to do (and why)

These approaches were tested in eval runs and all failed or were extremely wasteful:

Do NOT use Apollo to filter by investor

Apollo's q_organization_keyword_tags and q_keywords fields do not reliably filter by investor/accelerator. Searching q_keywords: "Y Combinator" returns mostly irrelevant results. Apollo is great for people search AFTER you have a company list, but cannot produce the company list.

Do NOT search Apollo for people first and then verify investor status

Searching for "GTM Engineer" across all companies returns ~7-9% YC hit rate. You end up pulling 300+ contacts and running call_ai on each one to check if their company is YC-backed. Burns 60-80% of your turn budget on filtering alone.

Crustdata crunchbase_investors filter is inconsistent

Crustdata has a crunchbase_investors filter that SHOULD work but returns results inconsistently. Do not rely on it as your primary source.

Do NOT use call_ai to verify investor status at scale

Using call_ai + WebSearch to check "Is {{company_name}} a YC company?" per row is ~5-10s per row. Unacceptable for 100+ rows.


The proven approach: Fetch the public portfolio page

Step 1: Get the company list from the VC's public portfolio

Every major investor publishes their portfolio. Fetch it directly:

For YC specifically — use the YC company directory:

# Fetch YC companies page and extract company data
curl -sS "https://www.ycombinator.com/companies" \
  -H "Accept: text/html" \
  -o /tmp/yc_page.html

# Or use the YC API/directory to get structured data
# YC's company directory is at ycombinator.com/companies
# You can filter by batch, industry, etc.

Parse the HTML to extract company names and domains. Use run_javascript or python to extract structured data from the page.

# Example: parse company names from the fetched page
# Adjust selectors based on the actual page structure
from html.parser import HTMLParser
import csv, json

# Write extracted companies to CSV
with open('yc_companies.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['company_name', 'domain', 'batch', 'description'])
    writer.writeheader()
    for company in extracted_companies:
        writer.writerow(company)

For other VCs — find their portfolio page and fetch it:

Investor Portfolio URL pattern Notes
YC ycombinator.com/companies Filterable directory with batch, industry
a16z a16z.com/portfolio Lists all portfolio companies
Sequoia sequoiacap.com/our-companies Categorized by stage
Greylock greylock.com/portfolio Simple list
Benchmark benchmark.com/portfolio Simple list
# Generic approach for any VC:
curl -sS "https://[vc-website]/portfolio" -o /tmp/portfolio.html
# Parse the HTML to extract company names and domains

If the portfolio page uses JavaScript rendering and curl returns an empty shell, use call_ai with WebSearch to get the list:

deepline tools execute call_ai --payload '{
  "prompt": "List all portfolio companies from [VC Name]. Return company name and website domain for each. Focus on companies founded in the last 3 years.",
  "json_mode": {"type":"object","properties":{"companies":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"domain":{"type":"string"}},"required":["name"]}}},"required":["companies"]},
  "tools": ["WebSearch"],
  "max_tokens": 4000
}'

Step 1b: Filter to companies hiring your target role (optional)

If you need companies hiring a specific role (e.g., "GTM Engineer"), use Exa to cross-reference against the YC job board. Exa works well for small datasets (≤50 results) but degrades on larger ones.

deepline tools execute exa_search --payload '{
  "query": "GTM Engineer",
  "numResults": 50,
  "includeDomains": ["ycombinator.com"],
  "type": "auto"
}'

Cross-reference the Exa results against your portfolio company list to identify which companies are actively hiring for the role.

For larger datasets or when Exa returns too few results, use call_ai with WebSearch to check job boards:

deepline enrich --input yc_companies.csv --in-place --rows 0:2 \
  --with 'has_role=call_ai:{"prompt":"Is {{company_name}} ({{domain}}) currently hiring a GTM Engineer or similar GTM role? Check their careers page and job boards. Return true or false with the job title if found.","json_mode":{"type":"object","properties":{"hiring":{"type":"boolean"},"job_title":{"type":"string"}},"required":["hiring"]},"tools":["WebSearch"]}'

Step 2: Build a clean company seed CSV

From Step 1, you should have a CSV with at minimum: company_name, domain.

For companies missing domains, resolve them in a single enrichment pass:

deepline enrich --input yc_companies.csv --in-place \
  --with 'resolved_domain=call_ai:{"prompt":"What is the primary website domain for {{company_name}}? Return ONLY the domain, nothing else (e.g. pump.co, langfuse.com).","max_tokens":20}'

This is fast because call_ai without WebSearch uses model knowledge and these are well-known VC-backed companies.

Step 3: Find contacts at each company

Important: these are small startups (5-50 people). Exact-job-title searches are often sparse in smaller teams, so broaden your search:

deepline enrich --input yc_companies.csv --output yc_with_contacts.csv \
  --rows 0:2 \
  --with 'contact=dropleads_search_people:{
    "filters": {
      "companyDomains": ["{{domain}}"],
      "keywords": ["sales", "growth", "marketing", "revenue", "operations"],
      "seniority": ["C-Level", "VP", "Director", "Manager"]
    },
    "pagination": {"page": 1, "limit": 5}
  }'

Note: no person_titles filter — for tiny startups, just get whoever is there and pick the best GTM-adjacent contact. Founders, growth leads, founding AEs, and heads of sales are all valid targets when the company has 10 people.

If Dropleads returns 0 for a company, fall back to call_ai with WebSearch:

deepline enrich --input yc_missing.csv --in-place \
  --with 'contact_lookup=call_ai:{"prompt":"Find the founder, CEO, or GTM/growth lead at {{company_name}} ({{domain}}). Return their full name, title, and LinkedIn URL.","json_mode":{"type":"object","properties":{"name":{"type":"string"},"title":{"type":"string"},"linkedin_url":{"type":"string"}},"required":["name","title"]},"tools":["WebSearch"]}'

Step 4: Find emails via waterfall

For small startups, use LeadMagic as your primary email finder, not Hunter. Hunter has poor coverage for companies under 50 people. LeadMagic's email_finder has significantly better fill rates for small startups.

deepline enrich --input yc_with_contacts.csv --in-place --rows 0:2 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email"]' \
  --with 'leadmagic=leadmagic_email_finder:{"first_name":"{{first_name}}","last_name":"{{last_name}}","domain":"{{domain}}"}' \
  --with 'dropleads=dropleads_email_finder:{"first_name":"{{first_name}}","last_name":"{{last_name}}","company_domain":"{{domain}}"}' \
  --with 'hunter=hunter_email_finder:{"domain":"{{domain}}","first_name":"{{first_name}}","last_name":"{{last_name}}"}' \
  --end-waterfall

Expected fill rate: ~60-70% with the 3-provider waterfall. Lower than typical because these are tiny, recently-founded companies.

Step 5: Generate personalized email copy

deepline enrich --input yc_with_contacts.csv --in-place --rows 0:2 \
  --with 'outbound=call_ai:{"prompt":"Write a short (≤80 word) cold email to {{first_name}} at {{company_name}}. They are a {{title}}. Mention Deepline helps GTM teams enrich data across 20+ providers in a single CLI. Be specific to their company — no generic templates. Return subject and body.","json_mode":{"type":"object","properties":{"subject":{"type":"string"},"body":{"type":"string"}},"required":["subject","body"]}}'

After pilot, run the full batch:

deepline enrich --input yc_with_contacts.csv --in-place --rows 2: \
  # ... same flags

Common pitfalls

Pitfall What happens Fix
Trying to discover portfolio companies via Deepline tools Wastes 60-80% of turn budget on company discovery Fetch the public portfolio page directly
json_mode: true in call_ai RuntimeError: json_mode expected object|string, got boolean Pass a JSON schema object: {"type":"object","properties":{...},"required":[...]}
Searching with strict titles at small startups 0 results — person hasn't been hired yet Remove title filter, get broader roles, pick best match
Using Hunter as primary email finder for <50 person companies 0/25 fill rate Use LeadMagic first — better small-company coverage
Asking for user approval in headless/eval mode Run stops dead The prompt says "You have approval to spend up to N credits" — treat that as blanket approval
Storing raw JSON from LeadMagic in email column Email column contains {"data":{"email":"x@y.com",...}} instead of x@y.com Use waterfall syntax with --result-getters to auto-extract, or add a run_javascript step
Using Exa for large datasets (100+ companies) Degraded results, missed companies Exa is fine for ≤50 results; for larger lists, fetch the portfolio page directly

Cost estimation

Step Credits Notes
Portfolio page fetch (curl) 0 Free — public web page
Exa search (optional, role filtering) ~5 Only if filtering by active job postings
call_ai domain resolution ~0 Free (model knowledge, no tools)
Dropleads contact search (per company) ~0.1 Contact lookup
LeadMagic email finder (per contact) ~0.3 Primary email source
Dropleads email finder (per contact) ~0.3 Waterfall fallback
call_ai outbound copy (per contact) ~0 Free
Total for 25 companies ~15-20 Well within 150 credit budget

Related skills

  • Need emails after finding contacts? → Use contact-to-email skill
  • Building a broader account list? → Use build-tam skill
  • Want to score accounts first? → Use niche-signal-discovery skill

Get started

Sign up and get your API key at code.deepline.com.

curl -s "https://code.deepline.com/api/v2/cli/install" | bash
deepline auth register
Weekly Installs
1
GitHub Stars
3
First Seen
12 days ago
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1