web-clone
<Use_When>
- User provides a target URL and wants the site replicated as working code
- User says "clone site", "clone website", "copy webpage", or "web-clone"
- Task requires both visual fidelity AND functional parity with the original
- Reference is a live URL (not a static screenshot — use
$visual-verdictfor screenshot-only tasks) </Use_When>
<Do_Not_Use_When>
- User only has screenshot references without a live URL — use
$visual-verdictdirectly - User wants to modify, redesign, or "improve" the site — use standard implementation flow
- Target requires authentication, payment flows, or backend API parity — out of scope for v1
- Multi-page / multi-route deep cloning — v1 handles single-page scope only </Do_Not_Use_When>
<Scope_Limits> v1 scope: Single page clone of the provided URL.
Included:
- Layout structure (header, nav, content areas, sidebar, footer)
- Typography (font families, sizes, weights, line heights)
- Colors, spacing, borders, border-radius
- Core interactions: navigation links, buttons, form elements, dropdowns, modals, toggles
- Responsive hints from the extracted layout (flexbox/grid patterns)
Excluded:
- Backend API integration or data fetching
- Authentication flows or protected content
- Dynamic/personalized content (user-specific data)
- Multi-page crawling or route graph cloning
- Third-party widget functionality (maps, embeds, chat widgets)
- Image/asset replication (use placeholders for external images)
Legal notice: Only clone sites you own or have explicit permission to replicate. Respect copyright and trademarks. </Scope_Limits>
- Before first tool use, call
ToolSearch("browser")orToolSearch("playwright")to discover available browser tools. - If no browser tools are found, instruct the user:
Playwright MCP is required. Configure it: codex mcp add playwright npx "@playwright/mcp@latest" - Required tools:
browser_navigate,browser_snapshot,browser_take_screenshot,browser_evaluate,browser_wait_for. Optional:browser_click,browser_network_requests.
<Tool_Usage>
- Before first MCP tool use, call
ToolSearch("browser")orToolSearch("playwright")to discover deferred Playwright MCP tools. - If no browser tools are found, stop immediately and instruct the user to configure Playwright MCP.
- Use
browser_snapshot(accessibility tree) for structural understanding — it is far more token-efficient than screenshots. - Use
browser_take_screenshotonly when visual verification is needed (Pass 1 baseline, Pass 4 comparison). - Use
browser_evaluatefor DOM/style extraction — pass the scripts from this skill EXACTLY as written (do not modify them). - If running within ralph, use
state_write/state_readfor web-clone state persistence between iterations. - Skip Codex consultation for straightforward extraction; use it only if verification repeatedly fails on the same issue. </Tool_Usage>
<State_Management> Persist extraction and progress data so the pipeline can resume if interrupted.
- After Pass 1 completes: Write extraction summary to
.omx/state/{scope}/web-clone-extraction.jsoncontaining:target_url,extracted_attimestampscreenshot_path(path totarget-full.png)landmark_count(number of nav, main, footer, form elements)interactive_count(number of detected interactive elements)extraction_size_kb(approximate size of DOM extraction data)
- After each Pass 4 verification: Append the composite verdict to
.omx/state/{scope}/web-clone-verdicts.json. - When running within ralph: Also persist the
visualportion of the composite verdict to.omx/state/{scope}/ralph-progress.jsonfor ralph compatibility, mappingvisual.score→ top-levelscoreandvisual.verdict→ top-levelverdict. - On completion or failure: Write final status with
completed_atorfailed_attimestamp. </State_Management>
<Context_Budget> Pass 1 extraction can produce very large data. Apply these limits proactively:
- DOM tree: If the serialized JSON exceeds ~30KB, reduce
depthparameter from 8 to 4 and re-extract. Focus on top-level structure. - Accessibility snapshot: If it exceeds ~20KB, this is normal for complex pages. Summarize key landmarks rather than keeping the full tree.
- Interactive elements: Cap at 50 elements. If more exist, keep only visible ones (
isVisible: true). - Total extraction context: Aim for under 60KB combined. If exceeded, prioritize: screenshot > accessibility snapshot > interactive elements > DOM styles.
- Image tokens: Full-page screenshots are expensive. Take one baseline in Pass 1 and one comparison in Pass 4. Do not take screenshots between iterations unless debugging a specific region. </Context_Budget>
Pass 1 — Extract
Capture the target page's structure, styles, interactions, and visual baseline.
- Navigate:
browser_navigatetotarget_url. - Wait for render:
browser_wait_forwith appropriate condition (network idle or timeout of 5s) to ensure full render including lazy-loaded content. - Accessibility snapshot:
browser_snapshot— captures the semantic tree (roles, names, values, interactive states). This is your primary structural reference. - Full-page screenshot:
browser_take_screenshotwithfullPage: true— save as reference baselinetarget-full.png. - DOM + computed styles:
browser_evaluatewith the following script. COPY THIS SCRIPT EXACTLY — do not modify it:(() => { const walk = (el, depth = 0) => { if (depth > 8 || !el.tagName) return null; const cs = window.getComputedStyle(el); return { tag: el.tagName.toLowerCase(), id: el.id || undefined, classes: [...el.classList].slice(0, 5), styles: { display: cs.display, position: cs.position, width: cs.width, height: cs.height, padding: cs.padding, margin: cs.margin, fontSize: cs.fontSize, fontFamily: cs.fontFamily, fontWeight: cs.fontWeight, lineHeight: cs.lineHeight, color: cs.color, backgroundColor: cs.backgroundColor, border: cs.border, borderRadius: cs.borderRadius, flexDirection: cs.flexDirection, justifyContent: cs.justifyContent, alignItems: cs.alignItems, gap: cs.gap, gridTemplateColumns: cs.gridTemplateColumns, }, text: el.childNodes.length === 1 && el.childNodes[0].nodeType === 3 ? el.textContent?.trim().slice(0, 100) : undefined, children: [...el.children].map(c => walk(c, depth + 1)).filter(Boolean), }; }; return walk(document.body); })() - Interactive elements:
browser_evaluateto catalog all interactable elements. COPY THIS SCRIPT EXACTLY — do not modify it:(() => { const results = []; document.querySelectorAll( 'button, a[href], input, select, textarea, [role="button"], ' + '[onclick], [aria-haspopup], [aria-expanded], details, dialog' ).forEach(el => { results.push({ tag: el.tagName.toLowerCase(), type: el.type || el.getAttribute('role') || 'interactive', text: (el.textContent || '').trim().slice(0, 80), href: el.href || undefined, ariaLabel: el.getAttribute('aria-label') || undefined, isVisible: el.offsetParent !== null, }); }); return results; })() - Network patterns (optional):
browser_network_requests— note XHR/fetch calls for reference. Do not attempt to replicate backends.
Keep all extraction results in working memory for Pass 2.
Pass 2 — Build Plan
Analyze extraction results and decompose into a component plan.
-
Identify page regions: From DOM tree + accessibility snapshot, identify major sections:
- Navigation bar / header
- Hero / banner section
- Main content area(s)
- Sidebar (if present)
- Footer
- Overlay elements (modals, drawers)
-
Map components: For each region, define:
- Component name and responsibility
- Key style properties (from computed styles)
- Content summary (headings, text, images)
- Child components if nested
-
Create interaction map: From interactive elements list:
- Navigation links → anchor tags with
href - Form elements → proper
<form>with inputs, labels, validation - Buttons → click handlers (toggle, submit, navigate)
- Dropdowns/modals → show/hide toggle with transitions
- Accordions/tabs → state-based visibility
- Navigation links → anchor tags with
-
Extract design tokens: Identify recurring values:
- Color palette (primary, secondary, background, text colors)
- Font stack (families, size scale, weight scale)
- Spacing scale (padding/margin patterns)
- Border radius values
-
Define file structure:
{output_dir}/ ├── index.html (or App.tsx / App.vue) ├── styles/ │ ├── globals.css (reset + tokens) │ └── components.css (or scoped styles) ├── scripts/ │ └── interactions.js (toggle, modal, dropdown logic) └── assets/ (placeholder images)Adapt to
tech_stackif specified (React components, Vue SFCs, etc.).
Pass 3 — Generate Clone
Implement the clone from the plan. Work component-by-component.
- Scaffold: Create the directory structure and base files.
- Design tokens first: Implement CSS custom properties or Tailwind config from extracted tokens.
- Layout shell: Build the page-level layout matching the original's flexbox/grid structure.
- Components: Implement each region top-down:
- Match DOM structure from extraction (semantic tags, landmark roles)
- Apply computed styles — prioritize layout properties, then typography, then decorative
- Use actual extracted text content; use placeholder
<img>for external images
- Interactions: Wire up detected behaviors:
- Navigation: working
<a>tags (to#anchors or stubs for v1) - Forms: proper structure with
<label>, input types, placeholder text - Toggles: JavaScript for dropdowns, modals, accordions
- Hover/focus states: CSS transitions matching original behavior
- Navigation: working
- Responsive: If the original uses responsive breakpoints (detectable from media queries in computed styles or from viewport behavior), add basic responsive rules.
Pass 4 — Verify
Compare the clone against the original across three dimensions.
-
Serve the clone: Start a local server for the generated project:
npx serve {output_dir} -l 3456 --no-clipboardIf
npx serveis unavailable, fall back to:python3 -m http.server 3456 -d {output_dir}. The clone will be accessible athttp://localhost:3456. -
Visual verification:
- Navigate to the clone with Playwright:
browser_navigateto clone URL. - Take full-page screenshot of clone.
- Run
$visual-verdictwith:reference_images=["target-full.png"],generated_screenshot="clone-full.png",category_hint="web-clone". - The visual portion of the verdict feeds directly into the composite verdict below.
- Visual pass threshold: score >= 85.
- Navigate to the clone with Playwright:
-
Structural verification: Compare landmark counts:
- Count
<nav>,<main>,<footer>,<form>,<button>,<a>in both original and clone. - Structure passes when all major landmarks exist (missing landmarks = fail).
- Count
-
Functional spot-check: Test 2–3 detected interactions via Playwright:
- Click a navigation link → verify URL change or scroll behavior
- Toggle a dropdown/modal → verify visibility change
- Interact with a form field → verify it accepts input
- Use
browser_clickandbrowser_snapshotto verify state changes.
-
Emit composite verdict:
{
"visual": {
"score": 82,
"verdict": "revise",
"category_match": true,
"differences": ["Header spacing tighter than original"],
"suggestions": ["Increase nav gap to 24px"]
},
"functional": {
"tested": 3,
"passed": 2,
"failures": ["Dropdown does not open on click"]
},
"structure": {
"landmark_match": true,
"missing": [],
"extra": []
},
"overall_verdict": "revise",
"priority_fixes": [
"Fix dropdown toggle interaction",
"Increase header nav spacing"
]
}
Pass 5 — Iterate
Fix highest-impact issues and re-verify.
- Prioritize fixes by impact: layout > interactions > spacing > typography > colors.
- Apply targeted edits: Fix only the issues listed in
priority_fixes. Do not refactor working code. - Re-verify: Repeat Pass 4.
- Loop: Continue until
overall_verdictispassOR max 5 iterations reached. - Final report: Summarize what was successfully cloned, any remaining differences, and elements that could not be replicated.
<Output_Contract> After each verification pass, emit a composite web-clone verdict JSON:
{
"visual": {
"score": 0,
"verdict": "revise",
"category_match": false,
"differences": ["..."],
"suggestions": ["..."],
"reasoning": "short explanation"
},
"functional": {
"tested": 0,
"passed": 0,
"failures": ["..."]
},
"structure": {
"landmark_match": false,
"missing": ["..."],
"extra": ["..."]
},
"overall_verdict": "revise",
"priority_fixes": ["..."]
}
Rules:
visualfollows theVisualVerdictshape from$visual-verdictfunctional.tested/passedare counts;failureslist specific interaction failuresstructure.landmark_matchistruewhen all major HTML landmarks (nav, main, footer, forms) are presentoverall_verdict:passwhen visual.score >= 85 AND functional.failures is empty AND structure.landmark_match is truepriority_fixes: ordered by impact, drives the next iteration </Output_Contract>
<Iteration_Thresholds>
- Visual pass: score >= 85
- Functional pass: zero failures on tested interactions
- Structure pass: all major landmarks present
- Overall pass: all three dimensions pass
- Max iterations: 5 (report best achieved result if threshold not met) </Iteration_Thresholds>
<Error_Handling>
- Playwright MCP unavailable: Stop. Instruct user to configure it. Do not attempt to clone without browser tools.
- Page fails to load: Report the URL and HTTP status. Suggest the user verify the URL is accessible.
- browser_evaluate returns empty: The page may use heavy client-side rendering. Wait longer (
browser_wait_forwith extended timeout) and retry once. - Visual score stuck below threshold after 3 iterations: Report the current state as best-effort. List the unresolved differences for the user.
- Extraction data too large for context: Truncate deep DOM branches (depth > 6). Focus on top-level structure and defer nested details to iteration fixes. </Error_Handling>
Pass 1: Navigate to HN. Extract: table-based layout, orange (#ff6600) nav bar, story list with links + points + comments, footer. Screenshot saved.
Pass 2: Regions: nav bar (logo + links), story table (30 rows × title + meta), footer. Tokens: orange #ff6600, gray #828282, Verdana font, 10pt base. Interaction map: story links (external), comment links, "more" pagination.
Pass 3: Generate index.html with HN-style table layout, CSS matching extracted colors/fonts, working <a> tags for stories.
Pass 4: Visual score=78 (font size off, spacing between stories too tight). Functional 2/2 (links work). Structure match=true.
Pass 5 iteration 1: Fix font to Verdana 10pt, increase row padding → score=88. Functional 2/2. Structure match. → overall_verdict: pass. Done.
<Final_Checklist>
- Pass 1 extraction completed and summarized (screenshot + accessibility tree + DOM styles + interactions)
- Pass 2 component plan created with file structure
- Pass 3 clone generated and files written to
output_dir - Clone serves locally without errors
- Pass 4 composite verdict emitted with all three dimensions
-
overall_verdictispass, or max 5 iterations reached with best-effort report - When in ralph: visual verdict persisted to
ralph-progress.json - Extraction summary persisted to
web-clone-extraction.json</Final_Checklist>