product-reverse
Product Reverse Engineering
Systematically explore a web product using Chrome browser automation to produce:
- Product Analysis Report (
report.md) - what the product is and how it works - Design Document (
design-doc.md) - how to build something similar (with your own technical recommendations) - API Surface (
api-surface.md) - documented API endpoints
Workflow Overview
Phase 1: Reconnaissance -> Initial page load, tech detection, structure overview
Phase 2: Systematic Exploration -> Page-by-page, flow-by-flow deep dive
Phase 3: API Analysis -> Synthesize captured API calls, infer data model
Phase 4: External Research -> WebSearch for product context, tech blog posts, competitors
Phase 5: Synthesis -> Analyze all data, formulate architecture recommendations
Phase 6: Output Generation -> Generate report.md, design-doc.md, api-surface.md
Phases are iterative. The user can direct exploration at any point.
Input & Setup
Required: A URL to the product. Optional: Focus areas (e.g., "focus on the editor", "I care most about the API").
Slug Generation
Derive the output directory slug from the URL:
- Extract the domain name (e.g.,
www.example.com->example) - Remove common prefixes:
www.,app.,dashboard. - Remove TLD:
.com,.io,.org, etc. - Lowercase, replace dots/spaces with hyphens
- Examples:
https://www.notion.so->notion,https://app.linear.app->linear
Initialize Output Directory
output_dir = /Users/zepingchen/CopyProduct/output/{product-slug}/
Create the directory structure:
{output_dir}/
├── state.json
├── screenshots/
└── recordings/
Initialize state.json:
{
"product_name": "{name}",
"url": "{url}",
"current_phase": "reconnaissance",
"current_branch": "main",
"tab_id": null,
"focus_areas": [],
"tech_stack": {},
"pages_discovered": [],
"api_endpoints": [],
"operations": [],
"screenshots": [],
"notes": []
}
Phase 1: Reconnaissance
Goal: Get the lay of the land before deep exploration.
- Set up browser tab: Follow tab setup pattern from
references/chrome-patterns.md. StoretabIdin state.json. - Load the product URL: Navigate, wait 3s, take screenshot, read interactive elements, get page text. Follow "Page Exploration (Initial Load)" pattern from
references/chrome-patterns.md. - Detect tech stack: Execute the tech detection JS snippet from
references/chrome-patterns.mdviajavascript_tool. The snippet returns a JSON string - parse it and record instate.jsontech_stack. - Capture homepage structure: Identify navigation links, key CTAs, main content areas from the
read_pageoutput. - Capture initial API calls: If needed, use the reload-based network capture pattern. Filter out static assets and chrome extensions per the "Filtering Network Noise" section in chrome-patterns.md.
- Log screenshot: The
screenshotaction returns a screenshot ID (e.g.,ss_abc123). Log this ID in state.jsonscreenshotsarray. Screenshots live in the browser session and are shown inline to the user - they are NOT saved to disk files.
After Phase 1, present a summary to the user:
- Product name and purpose (inferred)
- Tech stack detected
- Navigation structure / main pages found
- Any API calls seen
- Ask: "What areas should I explore next?" or proceed to systematic exploration
Phase 2: Systematic Exploration
Goal: Methodically explore each page and user flow.
CRITICAL: Test Core User Flows
Do NOT just observe static page states. You MUST actively test core user flows:
- If the product has search, perform an actual search and observe the results page
- If the product has forms, fill and submit them (with test data) to see validation and success states
- If the product has navigation, click through each nav item to discover all views
- If the product has interactive elements (dropdowns, modals, tabs), trigger them
This is essential because many products transform their UI on interaction (e.g., Baidu replaces the entire homepage with a results page when you search). Failing to test flows means missing entire views and interaction patterns.
Exploration Loop
For each page or flow to explore:
- Navigate to the page (use same-domain pattern if staying on same host)
- Screenshot and log the screenshot ID
- Analyze structure:
read_pagefor interactive elements,get_page_textfor content - Discover behavior: Selectively click/hover interactive elements to reveal dropdowns, modals, etc.
- Test core flows: Actually perform the product's primary actions (search, submit, navigate) to discover all UI states
- Capture API calls: Clear network, perform actions, capture resulting API calls. Filter noise.
- Scroll exploration: For long pages, scroll and screenshot each section
- Batch update state.json: After exploring a page (not after every action), read state.json, append new operations/pages/endpoints, write back
State Update Strategy
Do NOT read/write state.json after every single action. Instead:
- Keep a mental log of operations during exploration
- Batch-write to state.json after completing a page or flow exploration
- Operation IDs: derive from
operations.length + 1, formatted asop_XXX(zero-padded to 3 digits)
Branch Management
Branches are lightweight labels on operations for organizing parallel exploration paths.
- Default branch:
main - Create a new branch: set
current_branchin state.json to a new label (e.g.,auth-flow,settings-page) - Operations logged under the current branch
- Switch branches by updating
current_branch - Branches are append-only labels, not git-like - no merging needed
Auth-Gated Content
When encountering login walls or auth-gated features:
- Never enter credentials - this is a hard rule
- Document what's behind the wall (page name, expected functionality)
- Note it in state.json
notes - Move on to other explorable areas
- Inform the user about auth-gated areas found
When to Pause and Ask
- After exploring 3-5 pages, summarize findings and ask if the user wants to redirect
- When encountering unexpected complexity (e.g., very large app)
- When auth-gated content blocks further exploration
- When the user says "pause" or "show progress"
Phase 3: API Analysis
Goal: Synthesize all captured API calls into a coherent API surface.
- Collect all API endpoints from state.json
api_endpoints - Group by resource/domain (e.g., /api/users/, /api/projects/)
- For each endpoint group:
- Document method, URL pattern, purpose
- Note request/response shape if captured
- Identify auth requirements
- Infer data model from API shapes
- Identify auth mechanism (cookie, bearer token, API key)
- Write
api-surface.mdwith full documentation
For non-SPA / server-rendered sites: There may be no clean REST APIs. Document what IS available: search endpoints, suggestion APIs, tracking calls, form submission endpoints. Note this in the report.
Phase 4: External Research
Goal: Supplement browser exploration with external knowledge.
Use WebSearch and WebFetch to research:
- Product information: What the product does, who makes it, pricing, target market
- Technical blog posts: Engineering blog, tech talks, conference presentations
- Job listings: Technologies mentioned in job postings reveal stack
- Competitor landscape: Similar products, alternatives
- Open source components: Any OSS libraries or frameworks they've built on
Record findings in state.json notes.
Phase 5: Synthesis
Goal: Turn raw data into insights and recommendations.
- Architecture inference: Based on tech stack, API patterns, and page structure, infer the likely architecture
- Key design decisions: Identify important technical and UX decisions the product made
- Tradeoff analysis: For each major decision, consider alternatives and why they might have chosen this approach
- Recommendations: Formulate your own technical recommendations for building a similar product - don't just clone, improve where possible
Phase 6: Output Generation
Generate final deliverables using templates from references/.
CRITICAL: Design Doc First, Then Code
When the user's goal is to build/replicate (not just analyze), you MUST:
- Write
design-doc.mdFIRST - before writing any implementation code - Present the design doc to the user for review - get explicit approval
- Ask about tech stack preferences - e.g., "Do you prefer React/Vue/plain HTML? Any specific libraries?"
- Only start coding after design approval - the user may redirect your approach
This prevents wasted effort from building the wrong thing. The design doc should include:
- Architecture overview with diagrams
- All views/pages and their layout (ASCII art wireframes)
- API endpoints and data flow
- Implementation plan with phases
report.md
Read references/report-template.md and generate a comprehensive product analysis report. Reference screenshot IDs where relevant (the user can cross-reference with their browser session).
design-doc.md
Read references/design-doc-template.md and generate an actionable design document with:
- Your own tech stack recommendations (not just copying the original)
- Tradeoff analysis for major decisions
- Prioritized feature list (P0/P1/P2)
- Implementation phases
api-surface.md
Full API documentation with all discovered endpoints, grouped by resource.
All output files go in the output_dir.
State Management
state.json Format
{
"product_name": "string",
"url": "string",
"current_phase": "reconnaissance|exploration|api_analysis|research|synthesis|output",
"current_branch": "string",
"tab_id": "number - browser tab ID for this session",
"focus_areas": ["string"],
"tech_stack": {"framework": "Next.js", "jquery": "1.10.2", ...},
"pages_discovered": [
{"path": "/dashboard", "title": "Dashboard", "branch": "main", "op_id": "op_005"}
],
"api_endpoints": [
{"method": "GET", "url": "/api/users/me", "status": 200, "auth": true, "category": "data_api", "branch": "main", "op_id": "op_010"}
],
"operations": [
{"id": "op_001", "branch": "main", "action": "navigate", "target": "https://...", "timestamp": "ISO8601", "screenshot_id": "ss_abc123"}
],
"screenshots": [
{"id": "ss_abc123", "description": "Homepage initial load", "op_id": "op_001"}
],
"notes": ["string"]
}
Screenshot Handling
The Chrome MCP screenshot action returns an in-memory screenshot ID (e.g., ss_7041dw7mj). These are:
- Visible inline to the user in the conversation
- NOT saved to disk automatically - there is no Chrome MCP tool to save screenshots to file
- Logged in state.json by their ID for cross-referencing in reports
In the screenshots array, record id (the screenshot ID returned by the tool) and description.
Operation Logging
Every significant action gets logged as an operation:
id: Sequentialop_XXXformat (derive fromoperations.length + 1)branch: Current branch labelaction: navigate, screenshot, click, scroll, api_capture, tech_detect, etc.target: URL, element description, etc.timestamp: ISO 8601screenshot_id: If a screenshot was taken (optional)
API Endpoint Categories
When logging endpoints, categorize them:
data_api: REST/GraphQL endpoints returning JSON dataanalytics: Tracking pixels, event logging endpointssuggestion: Autocomplete, search suggestion endpointsauth: Login, token refresh, session checkstatic: CDN, static asset requests (usually not logged)
When State Gets Large
If state.json grows beyond what fits comfortably in context, run:
python3 scripts/summarize_state.py output/{product-slug}/state.json
This prints a concise summary instead of reading the full state file.
User Commands
During exploration, the user can direct the process:
| Command | Action |
|---|---|
explore {area} |
Focus exploration on a specific area (page, flow, feature) |
go back to {branch} |
Switch to a different exploration branch |
show progress |
Display current state summary (use summarize_state.py) |
generate report |
Skip to Phase 6 and produce output files |
pause |
Save state and stop; can resume later |
record {flow} |
Enable GIF recording for a specific flow |
research {topic} |
Run external research on a specific topic |
Key Rules
- Screenshots return IDs, not files - log the ID, don't try to save to disk
- Never enter credentials or sensitive data into the product
- All output in English
- Design doc is prescriptive - include your own recommendations, not just describing the original
- Ask before deep-diving - after initial recon, check what the user wants to focus on
- Batch state updates - update state.json per-page, not per-action
- Filter network noise - ignore static assets, chrome extensions, data URIs
- Tech detection returns JSON string - parse it before storing in state.json
- Test core user flows - don't just observe static pages; perform searches, submit forms, click through navigation to discover all UI states and transitions
- Design doc before code - when the user wants to build/replicate, write design-doc.md first, get user review, ask about tech stack preferences, then code
- Never use mock data for core features - if the product has search, hot trends, suggestions, etc., implement real data fetching from the start (use APIs, proxies, or scraping as needed)