customer-discovery
Customer Discovery
Find all customers of a company by scanning multiple public data sources. Produces a deduplicated report with confidence scoring.
Quick Start
Find all customers of Datadog
Who are Notion's customers? Use deep mode.
Inputs
| Input | Required | Default | Description |
|---|---|---|---|
| Company name | Yes | — | The company to research |
| Website URL | No | Auto-detected | The company's website URL |
| Depth | No | standard | quick, standard, or deep |
Procedure
Step 1: Gather Inputs
Ask the user for:
- Company name (required)
- Company website URL (optional — if not provided, WebSearch for it)
- Depth tier — present these options, default to Standard:
- Quick (~2-3 min): Website logos, case studies, G2 reviews, press search
- Standard (~5-8 min): Quick + blog posts, Wayback Machine, LinkedIn, Twitter, Reddit, HN, job postings, YouTube
- Deep (~10-15 min): Standard + SEC filings, podcasts, GitHub, integration directories, BuiltWith, Crunchbase
Step 2: Create Output Directory
mkdir -p customer-discovery-[company-slug]
Step 3: Run Sources for Selected Tier
Collect all results into a running list. For each customer found, record:
- name: Company name
- confidence: high / medium / low
- source_type: e.g., "logo_wall", "case_study", "g2_review", "press", "job_posting"
- evidence_url: URL where the evidence was found
- notes: Brief description of the evidence
Quick Sources
1. Website logo wall
Run the scrape_website_logos.py script:
python3 skills/capabilities/customer-discovery/scripts/scrape_website_logos.py \
--url "[company-url]" --output json
Parse the JSON output and add each result to the customer list.
2. Case studies page
Use WebFetch on the company's case studies page (try /case-studies, /customers, /resources/case-studies). Extract customer names from page headings and content.
3. G2/Capterra reviews
If the review-scraper skill is available, use it to find reviewer companies:
python3 skills/capabilities/review-scraper/scripts/scrape_reviews.py \
--platform g2 --url "[g2-product-url]" --max-reviews 50 --output json
First, WebSearch for the company's G2 page: site:g2.com "[company]". Extract reviewer company names from review author info.
4. Web search for press
WebSearch these queries and extract customer mentions from results:
"[company]" customer OR "case study" OR partnership"[company]" "we use" OR "switched to" OR "chose"
Standard Sources (in addition to Quick)
5. Company blog posts
WebSearch: site:[company-domain] customer OR "case study" OR partnership OR "customer story"
6. Wayback Machine logos
Run the scrape_wayback_logos.py script:
python3 skills/capabilities/customer-discovery/scripts/scrape_wayback_logos.py \
--url "[company-url]" --output json
Logos marked still_present: false are especially interesting — they indicate former customers.
7. Founder/exec LinkedIn posts
WebSearch: site:linkedin.com "[company]" customer OR "excited to announce" OR "welcome"
8. Twitter/X mentions
WebSearch: site:twitter.com "[company]" "we use" OR "just switched to" OR "loving"
9. Reddit/HN mentions
WebSearch these queries:
site:reddit.com "we use [company]" OR "[company] customer"site:news.ycombinator.com "[company]" customer OR user
10. Job postings
WebSearch: "experience with [company]" site:linkedin.com/jobs OR site:greenhouse.io OR site:lever.co
Companies requiring experience with the product are likely customers.
11. YouTube testimonials
WebSearch: site:youtube.com "[company]" customer OR testimonial OR review
Deep Sources (in addition to Standard)
12. SEC filings
WebSearch: site:sec.gov "[company]" — Look for mentions in 10-K and 10-Q filings.
13. Podcast transcripts
WebSearch: "[company]" podcast customer OR transcript OR interview
14. GitHub usage signals
WebSearch: site:github.com "[company-package-name]" in dependency files, package.json, requirements.txt, etc.
15. Integration directories
WebFetch marketplace pages where the company lists integrations:
- Salesforce AppExchange
- Zapier integrations page
- Slack App Directory
- Any marketplace relevant to the company
16. BuiltWith detection
python3 skills/capabilities/customer-discovery/scripts/search_builtwith.py \
--technology "[company-slug]" --max-results 50 --output json
17. Crunchbase
WebSearch: site:crunchbase.com "[company]" customers OR partners
Step 4: Deduplicate Results
Merge results by company name using fuzzy matching:
- Normalize: lowercase, strip suffixes (Inc, Corp, LLC, Ltd, Co., GmbH)
- Treat "Acme Inc" = "Acme" = "ACME Corp" = "acme.com" as the same company
- When merging, keep the highest confidence level and all evidence URLs
Step 5: Assign Confidence
Apply these rules:
High confidence:
- Logo on current website (from scrape_website_logos.py with confidence "high")
- Published case study or customer story
- Direct quote or testimonial on the company's site
- Official partnership page listing
Medium confidence:
- G2/Capterra review (reviewer's company)
- Press article mentioning customer relationship
- Job posting requiring experience with the product
- YouTube testimonial or video review
- Logo found only in Wayback Machine (was on site, now removed)
Low confidence:
- Single social media mention (tweet, Reddit post)
- Indirect reference ("heard good things about X")
- BuiltWith detection only (technology on site doesn't mean they're a paying customer)
- HN discussion mention
Step 6: Generate Report
Create two output files:
customer-discovery-[company]/report.md:
# Customer Discovery: [Company Name]
**Date:** YYYY-MM-DD
**Depth:** quick | standard | deep
**Total customers found:** N
## High Confidence (N)
| Customer | Source | Evidence |
|----------|--------|----------|
| Shopify | Case study | [link] |
| ... | ... | ... |
## Medium Confidence (N)
| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |
## Low Confidence (N)
| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |
## Sources Scanned
- Website logo wall: [url] — N customers found
- G2 reviews: N reviews analyzed — N companies identified
- Wayback Machine: N snapshots checked — N logos found (N removed)
- Web search: N queries — N mentions
- ...
## Methodology
This report was generated using the customer-discovery skill, which scans
public data sources to identify companies that use [Company Name]. Confidence
levels reflect the strength and directness of the evidence found.
customer-discovery-[company]/customers.csv:
CSV with columns: company_name,confidence,source_type,evidence_url,notes
Write the CSV using a code block or Python script.
Scripts Reference
| Script | Purpose | Key flags |
|---|---|---|
scrape_website_logos.py |
Extract logos from current website | --url, --output json|summary |
scrape_wayback_logos.py |
Find historical logos via Wayback Machine | --url, --paths, --output json|summary |
search_builtwith.py |
BuiltWith technology detection (deep mode) | --technology, --max-results, --output json|summary |
All scripts require requests: pip3 install requests
External skill scripts (use if available):
skills/capabilities/review-scraper/scripts/scrape_reviews.py— G2/Capterra/Trustpilot reviews (requires Apify token)skills/capabilities/linkedin-post-research/scripts/search_posts.py— LinkedIn post search (requires Crustdata API key)
Cost
- Quick / Standard: Free (uses WebSearch + free APIs like Wayback Machine CDX)
- Deep: Mostly free. BuiltWith paid API is optional (
--api-keyflag); free scraping is used by default. - External skills (review-scraper, linkedin-post-research) may require paid API tokens.