spa-reverse-engineer
SPA Reverse Engineering — React + Vite + Workbox + CDP
Reverse engineer modern SPAs to extract APIs, intercept service workers, debug runtime state, and build tooling.
When to use
Use this skill when:
- Analyzing perplexity.ai SPA internals (React component tree, state, hooks)
- Intercepting Workbox service worker caching and request strategies
- Using Chrome DevTools Protocol (CDP) to automate browser interactions
- Building Chrome extensions for traffic interception or state extraction
- Debugging Vite-bundled source maps and module graph
- Extracting GraphQL/REST schemas from SPA network layer
- Writing Puppeteer/Playwright scripts for automated API discovery
Instructions
Step 1: Identify SPA Stack
Detect the technology stack of the target SPA:
// In DevTools Console:
// React detection
window.__REACT_DEVTOOLS_GLOBAL_HOOK__ // React DevTools presence
document.querySelector('#__next') // Next.js
document.querySelector('#root') // Vite/CRA
document.querySelector('#app') // Vue (for comparison)
// Vite detection
document.querySelector('script[type="module"]') // ESM modules
// Check source for /@vite/client or /.vite/ paths
// Workbox / Service Worker
navigator.serviceWorker.getRegistrations() // List SWs
// Check Application → Service Workers in DevTools
// State management
window.__REDUX_DEVTOOLS_EXTENSION__ // Redux
// React DevTools → Components → hooks for Zustand/Jotai/Recoil
Step 2: React Internals Analysis
Component Tree Extraction
// Get React fiber tree from any DOM element
function getFiber(element) {
const key = Object.keys(element).find(k =>
k.startsWith('__reactFiber$') || k.startsWith('__reactInternalInstance$')
);
return element[key];
}
// Walk fiber tree
function walkFiber(fiber, depth = 0) {
if (!fiber) return;
const name = fiber.type?.displayName || fiber.type?.name || fiber.type;
if (typeof name === 'string') {
console.log(' '.repeat(depth) + name);
}
walkFiber(fiber.child, depth + 1);
walkFiber(fiber.sibling, depth);
}
// Start from root
const root = document.getElementById('root');
walkFiber(getFiber(root));
State & Props Extraction
// Extract component state via fiber
function getComponentState(fiber) {
const state = [];
let hook = fiber.memoizedState;
while (hook) {
state.push(hook.memoizedState);
hook = hook.next;
}
return state;
}
// Find specific component by name
function findComponent(fiber, name) {
if (!fiber) return null;
if (fiber.type?.name === name || fiber.type?.displayName === name) {
return fiber;
}
return findComponent(fiber.child, name) || findComponent(fiber.sibling, name);
}
Step 3: Vite Bundle Analysis
Source Map Extraction
# Find source maps from bundled assets
curl -s https://www.perplexity.ai/ | grep -oP 'src="[^"]*\.js"' | while read src; do
url=$(echo $src | grep -oP '"[^"]*"' | tr -d '"')
echo "Checking: $url"
curl -sI "https://www.perplexity.ai${url}.map" | head -5
done
Module Graph
// In Vite dev mode (if accessible):
// /__vite_module_graph shows dependency graph
// In production — analyze chunks:
// Performance → Network → JS files → Initiator chain
// Sources → Webpack/Vite tree → module paths
Step 4: Service Worker & Workbox Interception
Analyze Caching Strategy
// List all cached URLs
async function listCaches() {
const names = await caches.keys();
for (const name of names) {
const cache = await caches.open(name);
const keys = await cache.keys();
console.log(`Cache: ${name} (${keys.length} entries)`);
keys.forEach(k => console.log(` ${k.url}`));
}
}
// Intercept SW fetch events (from SW scope)
self.addEventListener('fetch', event => {
console.log('[SW Intercept]', event.request.method, event.request.url);
});
Workbox Strategy Detection
// Common Workbox strategies to look for in SW source:
// - CacheFirst → Static assets (fonts, images)
// - NetworkFirst → API calls (dynamic data)
// - StaleWhileRevalidate → Frequently updated content
// - NetworkOnly → Always fresh (auth endpoints)
// - CacheOnly → Offline-only content
// Check SW source for workbox patterns:
// workbox.strategies.CacheFirst
// workbox.routing.registerRoute
// workbox.precaching.precacheAndRoute
Step 5: Chrome DevTools Protocol (CDP)
Automated Interception via CDP
import asyncio
from playwright.async_api import async_playwright
async def intercept_with_cdp():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
page = await context.new_page()
# Enable CDP domains
cdp = await page.context.new_cdp_session(page)
# Intercept network at CDP level
await cdp.send('Network.enable')
cdp.on('Network.requestWillBeSent', lambda params:
print(f"[CDP] {params['request']['method']} {params['request']['url']}")
)
cdp.on('Network.responseReceived', lambda params:
print(f"[CDP] {params['response']['status']} {params['response']['url']}")
)
# Intercept WebSocket frames
await cdp.send('Network.enable')
cdp.on('Network.webSocketFrameSent', lambda params:
print(f"[WS→] {params['response']['payloadData'][:200]}")
)
cdp.on('Network.webSocketFrameReceived', lambda params:
print(f"[←WS] {params['response']['payloadData'][:200]}")
)
await page.goto('https://www.perplexity.ai/')
await page.wait_for_timeout(60000)
Runtime JS Evaluation via CDP
# Execute JS in page context
result = await cdp.send('Runtime.evaluate', {
'expression': 'JSON.stringify(window.__NEXT_DATA__)',
'returnByValue': True,
})
next_data = json.loads(result['result']['value'])
Step 6: Chrome Extension Development
Manifest v3 Extension for Traffic Capture
{
"manifest_version": 3,
"name": "pplx-sdk Traffic Capture",
"version": "1.0",
"permissions": [
"webRequest", "activeTab", "storage", "debugger"
],
"host_permissions": ["https://www.perplexity.ai/*"],
"background": {
"service_worker": "background.js"
},
"content_scripts": [{
"matches": ["https://www.perplexity.ai/*"],
"js": ["content.js"],
"run_at": "document_start"
}]
}
Background Script — Request Interception
// background.js
chrome.webRequest.onBeforeRequest.addListener(
(details) => {
if (details.url.includes('/rest/')) {
console.log('[pplx-capture]', details.method, details.url);
if (details.requestBody?.raw) {
const body = new TextDecoder().decode(
new Uint8Array(details.requestBody.raw[0].bytes)
);
chrome.storage.local.set({
[`req_${Date.now()}`]: {
url: details.url,
method: details.method,
body: JSON.parse(body),
timestamp: Date.now()
}
});
}
}
},
{ urls: ["https://www.perplexity.ai/rest/*"] },
["requestBody"]
);
Content Script — React State Extraction
// content.js — inject into page context
const script = document.createElement('script');
script.textContent = `
// Hook into React state updates
const origSetState = React.Component.prototype.setState;
React.Component.prototype.setState = function(state, cb) {
window.postMessage({
type: 'PPLX_STATE_UPDATE',
component: this.constructor.name,
state: JSON.parse(JSON.stringify(state))
}, '*');
return origSetState.call(this, state, cb);
};
`;
document.documentElement.appendChild(script);
// Listen for state updates
window.addEventListener('message', (event) => {
if (event.data.type === 'PPLX_STATE_UPDATE') {
chrome.runtime.sendMessage(event.data);
}
});
Step 7: Map Discoveries to SDK
| SPA Discovery | SDK Target | Action |
|---|---|---|
| React component state | domain/models.py |
Model the state shape |
| API fetch calls | transport/http.py |
Add endpoint methods |
| SSE event handlers | transport/sse.py |
Map event types |
| Service worker cache | shared/ |
Understand caching behavior |
| Auth token flow | shared/auth.py |
Token refresh logic |
| WebSocket frames | transport/ |
New WebSocket transport |
| GraphQL queries | domain/ |
Query/mutation services |
Step 8: SPA Source Code Graph
After runtime analysis, build a static code graph of the SPA source. Delegate to codegraph for structural analysis.
Source Map Recovery
# Extract original source paths from source maps
curl -s https://www.perplexity.ai/ | grep -oP 'src="(/[^"]*\.js)"' | while read -r url; do
echo "Checking: $url"
curl -s "https://www.perplexity.ai${url}.map" 2>/dev/null | \
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null
done | sort -u
Static Analysis (from recovered source or public repo)
# Component tree from source
grep -rn "export \(default \)\?function \|export const .* = (" src/ --include="*.tsx" --include="*.jsx"
# Import graph
grep -rn "import .* from " src/ --include="*.ts" --include="*.tsx" | \
awk -F: '{print $1 " → " $NF}' | sort -u
# Hook usage map
grep -rn "use[A-Z][a-zA-Z]*(" src/ --include="*.tsx" | \
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn
# API call sites (fetch, axios, etc.)
grep -rn "fetch(\|axios\.\|api\.\|apiClient\." src/ --include="*.ts" --include="*.tsx"
Cross-Reference: Runtime ↔ Static
| Runtime Discovery (spa-expert) | Static Discovery (codegraph) | Cross-Reference |
|---|---|---|
| Fiber tree component names | Source component definitions | Match names to source files |
| Hook state values | Hook implementations | Map state shape to hook logic |
| Network API calls | fetch()/axios call sites |
Confirm endpoints in source |
| Context provider values | createContext() definitions |
Map runtime state to types |
| Service worker routes | Workbox config in source | Validate caching strategy |
Perplexity.ai SPA Notes
Known Stack
- Framework: Next.js (React 18+)
- Bundler: Webpack (via Next.js, not raw Vite — skill covers both for broader SPA RE)
- State: React hooks + context (observed patterns)
- Streaming: SSE via fetch() with ReadableStream
- Auth: Cookie-based (
pplx.session-id)
Key DOM Selectors
// Query input
document.querySelector('textarea[placeholder*="Ask"]')
// Response area
document.querySelector('[class*="prose"]')
// Thread list
document.querySelector('[class*="thread"]')
More from pv-udpv/pplx-sdk
code-analysis
Deep code analysis for pplx-sdk — parse Python AST, build dependency graphs, extract knowledge graphs, detect patterns, and generate actionable insights about code structure, complexity, and relationships. Use when analyzing code quality, mapping dependencies, or building understanding of the codebase.
19sse-streaming
Implement and debug SSE (Server-Sent Events) streaming for the Perplexity AI API, including parsing, reconnection, and retry logic.
18reverse-engineer
Reverse engineer Perplexity AI web APIs — intercept browser traffic, decode undocumented endpoints, map request/response schemas, extract auth flows, and translate discoveries into SDK code.
18api-design-principles
Master REST and GraphQL API design principles to build intuitive, scalable, and maintainable APIs that delight developers. Use when designing new APIs, reviewing API specifications, or establishing API design standards.
18test-fix
Diagnose and fix failing pytest tests in the pplx-sdk project, following existing test patterns and conventions.
17ast-grep
Guide for writing ast-grep rules to perform structural code search and analysis. Use when users need to search codebases using Abstract Syntax Tree (AST) patterns, find specific code structures, or perform complex code queries that go beyond simple text search. This skill should be used when users ask to search for code patterns, find specific language constructs, or locate code with particular structural characteristics.
17