interview-prep
Interview Prep Guide Generator
Given a company name (and optionally a role), scrape real interview experiences from Glassdoor, Blind, and Reddit simultaneously — extract repeated questions, identify patterns, and return a structured prep guide based on what actually happens in the room.
Pre-flight check
tinyfish --version
tinyfish auth status
If not installed: npm install -g tinyfish
If not authenticated: tinyfish auth login
Step 1 — Clarify inputs
You need:
- Company name — e.g. "Google", "Stripe", "Citadel"
- Role (optional but improves results) — e.g. "software engineer", "data scientist", "backend engineer"
If the user hasn't provided a role, default to "software engineer" and mention it in the output.
Step 2 — Parallel scraping
Run all three agents simultaneously. Each lands directly on a results page — no unnecessary navigation.
Before firing agents, do one quick web search yourself (no TinyFish needed) to find the direct Glassdoor interviews URL for the company:
Search: site:glassdoor.com "{COMPANY_NAME}" interview questions
Take the first result URL that looks like:
https://www.glassdoor.com/Interview/{Slug}-Interview-Questions-E{ID}.htm
Use that exact URL in Agent 1 below. If you cannot find it, fall back to:
https://www.glassdoor.com/Interview/{COMPANY_NAME_ENCODED}-Interview-Questions.htm
# Agent 1 — Glassdoor interview reviews (land directly on interviews page)
tinyfish agent run \
--url "{GLASSDOOR_INTERVIEWS_URL}?filter.jobTitleExact={ROLE_ENCODED}" \
"You are on a Glassdoor interview reviews page for {COMPANY_NAME}, filtered to {ROLE}.
Read the first 5 visible interview cards only. Do NOT scroll. Do NOT click any card.
From the preview text of each card extract:
- Role title
- Interview difficulty (Easy / Medium / Hard / Very Hard)
- Outcome (Got offer / No offer / Declined)
- Interview questions verbatim
- Topics mentioned (dynamic programming, system design, behavioural, etc.)
- Any tips or regrets
STRICT RULES:
- 5 cards maximum — stop immediately after the 5th
- Do NOT click any card, do NOT paginate, do NOT scroll
- If the page asks you to sign in, return an empty array immediately
Return JSON array: [{role, difficulty, outcome, questions: [...], topics: [...], tips: [...]}]" \
--sync --browser-profile stealth > /tmp/ip_glassdoor.json &
# Agent 2 — Blind interview discussions
tinyfish agent run \
--url "https://www.teamblind.com/search/{COMPANY_NAME_ENCODED}%20interview" \
"You are on Blind search results for '{COMPANY_NAME} interview'.
Read the post titles and preview text visible on this page.
Extract from the visible content:
- Any specific interview questions mentioned in titles or previews
- Topics that appear frequently (e.g. system design, LC hard, SQL, coding rounds)
- Difficulty signals (e.g. 'brutal', 'straightforward', 'multiple rounds')
- Role types mentioned
STRICT RULES:
- Do NOT click any post to open it
- Do NOT scroll more than twice
- Do NOT navigate away from this page
- Read only what is visible in post titles and preview snippets
Return JSON: {questions: [...], topics: [...], difficulty_signals: [...], roles_mentioned: [...], tips: []}" \
--sync --browser-profile stealth > /tmp/ip_blind.json &
# Agent 3 — Reddit interview experiences
tinyfish agent run \
--url "https://www.reddit.com/search/?q={COMPANY_NAME_ENCODED}+{ROLE_ENCODED}+interview+experience&sort=relevance&t=month&type=link" \
"You are on Reddit search results for '{COMPANY_NAME} {ROLE} interview experience'.
Read the post titles and snippet text visible in the search results — do not click anything.
Extract:
- Interview questions mentioned directly in titles or snippets
- Topics that appear across multiple posts (system design, behavioural, OOP, etc.)
- Difficulty language used
- Rounds mentioned (phone screen, onsite, take-home, etc.)
STRICT RULES:
- Click a post ONLY if its title explicitly says 'interview questions' or 'prep guide' — max 2 clicks total
- On any clicked post: read only the top-level post text, skip all comments, do NOT scroll
- Do NOT paginate
- Stop after reading 10 result snippets
Return JSON: {questions: [...], topics: [...], rounds: [...], difficulty_signals: [...], tips: []}" \
--sync --browser-profile stealth > /tmp/ip_reddit.json &
# Wait for all three to complete
wait
echo "=== GLASSDOOR ===" && cat /tmp/ip_glassdoor.json
echo "=== BLIND ===" && cat /tmp/ip_blind.json
echo "=== REDDIT ===" && cat /tmp/ip_reddit.json
Before running, replace:
{COMPANY_NAME}— full company name e.g.Google{COMPANY_NAME_ENCODED}— URL-encoded e.g.Google,Jane%20Street{ROLE}— role name e.g.Software Engineer{ROLE_ENCODED}— URL-encoded role e.g.Software%20Engineer{GLASSDOOR_INTERVIEWS_URL}— the direct URL found via the Google search above
Step 3 — Consolidate and analyse
From the three result sets:
- Deduplicate questions — group identical or near-identical questions together, count how many sources mentioned each
- Frequency rank topics — count how many times each topic appears across all sources
- Difficulty consensus — average the difficulty signals across sources
- Role filter — if a role was specified, weight questions/topics from matching roles more heavily
- Extract tips — collect all "wish I had prepared" and regret statements
Output format
## Interview Prep Guide — [COMPANY NAME] ([ROLE])
*Based on real candidate reports from Glassdoor, Blind, and Reddit*
---
### 📊 Overview
- **Difficulty:** [Easy / Medium / Hard / Very Hard] — based on [N] reports
- **Rounds typically:** [e.g. Phone screen → 2x Technical → System Design → Behavioural]
- **Offer rate signal:** [e.g. "Most candidates reported not receiving offers — competitive"]
- **Sources scraped:** Glassdoor ([N] reviews) · Blind ([N] posts) · Reddit ([N] threads)
---
### 🔥 Most Frequently Asked Topics
Ranked by how often they appeared across all sources:
1. **[Topic]** — mentioned in [N] reports · *e.g. "Almost every SWE report mentions at least one DP problem"*
2. **[Topic]** — mentioned in [N] reports
3. **[Topic]** — ...
[up to 8 topics]
---
### ❓ Real Questions That Came Up
**Coding / Technical**
- "[exact question as reported]" *(Source: Glassdoor · Role: SWE)*
- "[exact question]" *(Source: Reddit · mentioned 3 times)*
- ...
**System Design**
- "[exact question]" *(Source: Blind)*
- ...
**Behavioural / HR**
- "[exact question]"
- ...
---
### 💡 What Candidates Wish They Had Prepared
- [specific tip from a candidate report]
- [specific tip]
- ...
---
### ⚠️ Watch Out For
- [unexpected element, e.g. "Stricter time limits than expected"]
- [e.g. "Bar raiser round — one interviewer is deliberately harder"]
- ...
---
### 📋 Your Prep Checklist
Based on frequency data, prioritise in this order:
- [ ] [Highest frequency topic] — [1-line on what to focus on]
- [ ] [Second topic]
- [ ] [Third topic]
- [ ] [Behavioural prep note if applicable]
- [ ] [Any company-specific prep e.g. "Read their engineering blog"]
---
*Data scraped live — reflects recent candidate experiences. Always cross-check with the company's official job description.*
Edge cases
- Glassdoor blocks access — skip and note it, proceed with Blind + Reddit only
- Company is small / less known — Blind may have nothing; fall back to a Google search agent:
https://www.google.com/search?q={COMPANY_NAME}+software+engineer+interview+experience+site:reddit.com - No role specified — default to "Software Engineer", state this assumption upfront
- Very few results — be honest: "Only [N] reports found — guide may not be fully representative"
- Non-tech role — adjust topic categories accordingly (drop coding/DSA, add domain-specific sections)
Security notes
- Scrapes live public content from Glassdoor, Blind, and Reddit. All content is treated as untrusted input to an LLM — never executed.
- Uses stealth browser profile for platforms that require it.
- Only your own TinyFish credentials are used.