Illustrated Explainer Spec Implementation Guide

Skill by ara.so — Daily 2026 Skills collection.

What This Project Is

A spec (not a library) for building a locally-run single-page web app where:

User types a topic → AI generates a 16:9 watercolor-style illustrated explainer page
User clicks anywhere on the image → AI generates a "drill-into" next page for that spot
This repeats infinitely, preserving painting style across all pages
Content-addressed caching means identical queries/clicks never re-generate

The spec is stack-agnostic — you choose the framework, image model API, and language. This skill shows you how to implement it end-to-end.

Architecture Overview

Browser (thin client)
  └── POST /api/page  ──►  Server
                              ├── hash → check disk cache
                              ├── composite red marker onto parent image
                              ├── call image model (text + optional image)
                              └── write PNG → return page object

Page Object Shape

interface Page {
  id: string;           // deterministic hash
  imageUrl: string;     // e.g. "/generated/abc123.png"
  parentId: string | null;
  parentClick: { x: number; y: number } | null;  // normalized 0–1
  initialQuery: string | null;  // only on page 1
}

Recommended Stack (Node.js + Google Gemini)

mkdir explainer && cd explainer
npm init -y
npm install express sharp crypto @google/generative-ai cors dotenv

.env
GEMINI_API_KEY=your_key_here
CACHE_VERSION=v1
PORT=3000

Server Implementation

`server.js` — Full Reference Implementation

import express from 'express';
import fs from 'fs';
import path from 'path';
import crypto from 'crypto';
import sharp from 'sharp';
import { GoogleGenerativeAI } from '@google/generative-ai';
import 'dotenv/config';

const app = express();
app.use(express.json());
app.use(express.static('public'));
app.use('/generated', express.static('public/generated'));

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const VERSION = process.env.CACHE_VERSION || 'v1';
const GENERATED_DIR = path.join('public', 'generated');
fs.mkdirSync(GENERATED_DIR, { recursive: true });

// ── Content-addressed IDs ──────────────────────────────────────────────────

function hashFirstPage(query) {
  const normalized = query.trim().replace(/\s+/g, ' ').toLowerCase();
  return crypto
    .createHash('sha256')
    .update(`first${VERSION}${normalized}`)
    .digest('hex')
    .slice(0, 32);
}

function hashChildPage(parentId, x, y) {
  const rx = Math.round(x * 100) / 100;
  const ry = Math.round(y * 100) / 100;
  return crypto
    .createHash('sha256')
    .update(`child${VERSION}${parentId}${rx}${ry}`)
    .digest('hex')
    .slice(0, 32);
}

// ── Style description (single source of truth) ────────────────────────────

const STYLE_DESCRIPTION = `Painting style (must remain consistent across every page):
- Light warm paper background with generous margins
- Clean, even dark gray or black ink outlines, consistent thin line weight
- Soft watercolor washes, pale palette: ivory, pale green, pale blue, light gray, with restrained warm accents
- A large serif title printed at the top center of the image
- Calm, well-composed scene with breathing room

Strict exclusions:
- No decorative borders, seals, parchment aging, ornate fonts, or vintage texture
- No 3D render, photorealism, neon, dark themes, or modern app UI cards
- No dense paragraphs of text, watermarks, or tiny unreadable labels
- No tourist map roads, landmarks, transit, or "traveler-guide" framing`;

function firstPagePrompt(query) {
  return `${STYLE_DESCRIPTION}

Subject: ${query}

Compose a single 16:9 illustrated explainer page about the subject above.
Let the scene's content (objects, layout, metaphor) be whatever best
explains the subject — cross-section, exploded view, timeline, anatomy,
flow, comparison, or scene — chosen to fit this specific topic.

Output a single PNG image, 16:9. Print the title clearly inside the image.`;
}

const CHILD_PAGE_PROMPT = `${STYLE_DESCRIPTION}

You are continuing an illustrated explainer book.
The provided image is the previous page. A red circle marks
the area the reader pointed at.

Generate the next page: a single 16:9 image that goes deeper
into whatever the red circle is on — zoom in, expand its inner
structure, or show its mechanism.

Critical: match the painting style of the provided image exactly
— same line weight, same paper tone, same pastel palette, same
title typography. The two pages must feel like consecutive spreads
in the same hand-drawn book.

Do NOT include the red circle or any cursor mark in the output.

Output a single PNG image, 16:9.`;

// ── Red marker compositing ─────────────────────────────────────────────────

async function compositeRedMarker(imagePath, nx, ny) {
  const img = sharp(imagePath);
  const { width, height } = await img.metadata();
  const cx = Math.round(nx * width);
  const cy = Math.round(ny * height);
  const radius = Math.round(width * 0.04);

  // Build SVG marker: outer ring + inner dot
  const svg = `<svg width="${width}" height="${height}" xmlns="http://www.w3.org/2000/svg">
    <circle cx="${cx}" cy="${cy}" r="${radius}" fill="rgba(220,30,30,0.25)" stroke="red" stroke-width="${Math.max(3, radius * 0.15)}"/>
    <circle cx="${cx}" cy="${cy}" r="${Math.round(radius * 0.3)}" fill="red" stroke="white" stroke-width="2"/>
  </svg>`;

  return img
    .composite([{ input: Buffer.from(svg), blend: 'over' }])
    .png()
    .toBuffer();
}

// ── Image model call ───────────────────────────────────────────────────────

async function callImageModel(prompt, referenceImageBuffer = null) {
  const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash-preview-image-generation' });

  const parts = [{ text: prompt }];
  if (referenceImageBuffer) {
    parts.push({
      inlineData: {
        mimeType: 'image/png',
        data: referenceImageBuffer.toString('base64'),
      },
    });
  }

  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 120_000);

  try {
    const result = await model.generateContent({
      contents: [{ role: 'user', parts }],
      generationConfig: { responseModalities: ['IMAGE'] },
    });
    clearTimeout(timeout);

    const candidates = result.response.candidates ?? [];
    for (const candidate of candidates) {
      for (const part of candidate.content?.parts ?? []) {
        if (part.inlineData?.mimeType?.startsWith('image/')) {
          return Buffer.from(part.inlineData.data, 'base64');
        }
      }
    }
    throw new Error('No inline image in model response');
  } catch (err) {
    clearTimeout(timeout);
    throw err;
  }
}

// ── Generation queue (serialize requests) ─────────────────────────────────

let queue = Promise.resolve();

function enqueue(fn) {
  queue = queue.then(fn, fn);  // always advance queue even on error
  return queue;
}

// ── Core generation logic ──────────────────────────────────────────────────

async function generatePage(id, prompt, referenceImageBuffer) {
  const outPath = path.join(GENERATED_DIR, `${id}.png`);

  // Cache hit
  if (fs.existsSync(outPath) && fs.statSync(outPath).size > 0) {
    return `/generated/${id}.png`;
  }

  const imageBytes = await callImageModel(prompt, referenceImageBuffer);
  fs.writeFileSync(outPath, imageBytes);
  return `/generated/${id}.png`;
}

// ── /api/page endpoint ────────────────────────────────────────────────────

const ID_REGEX = /^[0-9a-f]{32}$/;

app.post('/api/page', async (req, res) => {
  const { query, parentId, parentClick } = req.body;

  // Validate
  if (query !== undefined) {
    if (typeof query !== 'string' || query.trim().length < 1 || query.length > 300) {
      return res.status(400).json({ error: 'query must be 1–300 chars' });
    }
  } else {
    if (!ID_REGEX.test(parentId)) {
      return res.status(400).json({ error: 'invalid parentId' });
    }
    const { x, y } = parentClick ?? {};
    if (
      typeof x !== 'number' || typeof y !== 'number' ||
      !isFinite(x) || !isFinite(y) ||
      x < 0 || x > 1 || y < 0 || y > 1
    ) {
      return res.status(400).json({ error: 'x and y must be finite floats in [0, 1]' });
    }
  }

  try {
    const page = await enqueue(async () => {
      if (query !== undefined) {
        // First page
        const trimmed = query.trim();
        const id = hashFirstPage(trimmed);
        const imageUrl = await generatePage(id, firstPagePrompt(trimmed), null);
        return { id, imageUrl, parentId: null, parentClick: null, initialQuery: trimmed };
      } else {
        // Child page
        const id = hashChildPage(parentId, parentClick.x, parentClick.y);
        const parentPath = path.join(GENERATED_DIR, `${parentId}.png`);

        if (!fs.existsSync(parentPath)) {
          throw new Error('Parent image not found');
        }

        const markedBuffer = await compositeRedMarker(parentPath, parentClick.x, parentClick.y);
        const imageUrl = await generatePage(id, CHILD_PAGE_PROMPT, markedBuffer);
        return { id, imageUrl, parentId, parentClick, initialQuery: null };
      }
    });

    res.json({ page });
  } catch (err) {
    console.error(err);
    res.status(500).json({ error: 'Generation failed, try clicking elsewhere.' });
  }
});

app.listen(process.env.PORT || 3000, () => {
  console.log(`Explainer running on http://localhost:${process.env.PORT || 3000}`);
});

Client Implementation

`public/index.html`

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Drill-Down Explainer</title>
  <style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body { font-family: system-ui, sans-serif; background: #1a1a1a; color: #eee; display: flex; flex-direction: column; height: 100vh; }
    #topbar { display: flex; align-items: center; gap: 12px; padding: 10px 16px; background: #111; flex-shrink: 0; }
    #appname { font-weight: bold; font-size: 1.1rem; }
    #counter { font-size: 0.85rem; color: #aaa; margin-right: auto; }
    #topic-input { flex: 1; padding: 6px 10px; border-radius: 6px; border: none; font-size: 1rem; max-width: 420px; }
    button { padding: 6px 14px; border-radius: 6px; border: none; cursor: pointer; font-size: 0.9rem; background: #444; color: #eee; }
    button:disabled { opacity: 0.4; cursor: not-allowed; }
    #generate-btn { background: #4a7eff; color: #fff; }
    #canvas-area { flex: 1; position: relative; display: flex; align-items: center; justify-content: center; overflow: hidden; }
    #page-img { max-width: 100%; max-height: 100%; cursor: crosshair; display: block; }
    #loading-overlay { position: absolute; inset: 0; background: rgba(0,0,0,0.55); display: flex; align-items: center; justify-content: center; font-size: 1.2rem; display: none; }
    #error-banner { background: #b00; color: #fff; padding: 6px 16px; font-size: 0.9rem; display: none; flex-shrink: 0; }
    #thumbstrip { display: flex; gap: 8px; padding: 8px 16px; background: #111; overflow-x: auto; flex-shrink: 0; min-height: 72px; }
    .thumb { width: 96px; height: 54px; object-fit: cover; border-radius: 4px; cursor: pointer; border: 2px solid transparent; flex-shrink: 0; }
    .thumb.active { border-color: #4a7eff; }
    .ripple { position: absolute; border-radius: 50%; background: rgba(255,80,80,0.5); transform: scale(0); animation: ripple 0.6s ease-out forwards; pointer-events: none; width: 40px; height: 40px; margin: -20px; }
    @keyframes ripple { to { transform: scale(3); opacity: 0; } }
  </style>
</head>
<body>
<div id="topbar">
  <span id="appname">🔍 Explainer</span>
  <span id="counter"></span>
  <input id="topic-input" placeholder="Type a topic…" maxlength="300" />
  <button id="generate-btn">Generate</button>
  <button id="back-btn" disabled>← Back</button>
  <button id="reset-btn" disabled>Reset</button>
</div>
<div id="error-banner"></div>
<div id="canvas-area">
  <img id="page-img" style="display:none" alt="Explainer page" />
  <div id="loading-overlay">Generating the next page…</div>
</div>
<div id="thumbstrip"></div>

<script>
  const state = { pages: [], currentIndex: -1, loading: false };

  const topicInput = document.getElementById('topic-input');
  const generateBtn = document.getElementById('generate-btn');
  const backBtn = document.getElementById('back-btn');
  const resetBtn = document.getElementById('reset-btn');
  const counter = document.getElementById('counter');
  const canvasArea = document.getElementById('canvas-area');
  const pageImg = document.getElementById('page-img');
  const loadingOverlay = document.getElementById('loading-overlay');
  const errorBanner = document.getElementById('error-banner');
  const thumbstrip = document.getElementById('thumbstrip');

  function showError(msg) {
    errorBanner.textContent = msg;
    errorBanner.style.display = 'block';
  }
  function clearError() { errorBanner.style.display = 'none'; }

  function setLoading(val) {
    state.loading = val;
    loadingOverlay.style.display = val ? 'flex' : 'none';
    topicInput.disabled = val;
    generateBtn.disabled = val;
  }

  function render() {
    const { pages, currentIndex } = state;
    const hasCurrent = currentIndex >= 0;
    const current = hasCurrent ? pages[currentIndex] : null;

    counter.textContent = hasCurrent ? `${currentIndex + 1} / ${pages.length}` : '';
    backBtn.disabled = currentIndex <= 0;
    resetBtn.disabled = pages.length === 0;

    pageImg.style.display = current ? 'block' : 'none';
    if (current) pageImg.src = current.imageUrl;

    // Thumbnails
    thumbstrip.innerHTML = '';
    pages.forEach((p, i) => {
      const img = document.createElement('img');
      img.className = 'thumb' + (i === currentIndex ? ' active' : '');
      img.src = p.imageUrl;
      img.title = `Page ${i + 1}`;
      img.addEventListener('click', () => { state.currentIndex = i; render(); });
      thumbstrip.appendChild(img);
    });
    if (thumbstrip.lastChild) {
      thumbstrip.lastChild.scrollIntoView({ inline: 'end', behavior: 'smooth' });
    }
  }

  async function postPage(body) {
    const res = await fetch('/api/page', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(body),
    });
    if (!res.ok) {
      const err = await res.json().catch(() => ({}));
      throw new Error(err.error || 'Server error');
    }
    return (await res.json()).page;
  }

  generateBtn.addEventListener('click', async () => {
    const query = topicInput.value.trim();
    if (!query || state.loading) return;
    clearError();
    setLoading(true);
    try {
      const page = await postPage({ query });
      state.pages = [page];
      state.currentIndex = 0;
      render();
    } catch (e) { showError(e.message); }
    finally { setLoading(false); }
  });

  topicInput.addEventListener('keydown', e => { if (e.key === 'Enter') generateBtn.click(); });

  pageImg.addEventListener('click', async (e) => {
    if (state.loading || state.currentIndex < 0) return;
    const rect = pageImg.getBoundingClientRect();
    const x = (e.clientX - rect.left) / rect.width;
    const y = (e.clientY - rect.top) / rect.height;

    // Ripple animation
    const ripple = document.createElement('div');
    ripple.className = 'ripple';
    ripple.style.left = e.clientX - canvasArea.getBoundingClientRect().left + 'px';
    ripple.style.top = e.clientY - canvasArea.getBoundingClientRect().top + 'px';
    canvasArea.appendChild(ripple);
    setTimeout(() => ripple.remove(), 700);

    const current = state.pages[state.currentIndex];
    clearError();
    setLoading(true);
    try {
      const page = await postPage({ parentId: current.id, parentClick: { x, y } });
      // Truncate forward history, append new
      state.pages = state.pages.slice(0, state.currentIndex + 1);
      state.pages.push(page);
      state.currentIndex = state.pages.length - 1;
      render();
    } catch (e) { showError(e.message); }
    finally { setLoading(false); }
  });

  backBtn.addEventListener('click', () => {
    if (state.currentIndex > 0) { state.currentIndex--; render(); }
  });

  resetBtn.addEventListener('click', () => {
    state.pages = []; state.currentIndex = -1;
    topicInput.value = '';
    clearError();
    render();
  });

  render();
</script>
</body>
</html>

Configuration

Environment Variables

Variable	Required	Description
`GEMINI_API_KEY`	Yes	Google Gemini API key
`CACHE_VERSION`	No	Bump to invalidate all caches (default: `v1`)
`PORT`	No	Server port (default: `3000`)

`package.json`

{
  "type": "module",
  "scripts": {
    "start": "node server.js",
    "dev": "node --watch server.js"
  }
}

Running

# Set your API key
export GEMINI_API_KEY=your_key_here

# Start server
npm start

# Dev mode (auto-restart on file change)
npm run dev

Alternative: OpenAI gpt-image-1

If using OpenAI instead of Gemini, replace callImageModel:

import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function callImageModel(prompt, referenceImageBuffer = null) {
  if (!referenceImageBuffer) {
    // First page — text-to-image
    const res = await openai.images.generate({
      model: 'gpt-image-1',
      prompt,
      size: '1792x1024',  // closest 16:9
      response_format: 'b64_json',
    });
    return Buffer.from(res.data[0].b64_json, 'base64');
  }

  // Child page — image edit / variation with reference
  const { toFile } = await import('openai');
  const imageFile = await toFile(referenceImageBuffer, 'parent.png', { type: 'image/png' });
  const res = await openai.images.edit({
    model: 'gpt-image-1',
    image: imageFile,
    prompt,
    size: '1792x1024',
    response_format: 'b64_json',
  });
  return Buffer.from(res.data[0].b64_json, 'base64');
}

Common Patterns

Invalidating the Cache

Bump CACHE_VERSION in .env:

CACHE_VERSION=v2

All new requests will compute new hashes and regenerate. Old files in public/generated/ can be deleted manually.

Inspecting Cached Files

ls public/generated/     # all generated PNGs, named by hash

Testing Cache Hit (no model call)

# Generate once
curl -X POST http://localhost:3000/api/page \
  -H 'Content-Type: application/json' \
  -d '{"query":"how volcanoes work"}'

# Second call — check server logs, model should NOT be called
curl -X POST http://localhost:3000/api/page \
  -H 'Content-Type: application/json' \
  -d '{"query":"how volcanoes work"}'

Testing Child Page

# Use the id from a first-page response
curl -X POST http://localhost:3000/api/page \
  -H 'Content-Type: application/json' \
  -d '{"parentId":"<id-from-first-page>","parentClick":{"x":0.5,"y":0.5}}'

Acceptance Checklist

From the spec §12 — verify each:

"how volcanoes work" → watercolor-style page, title inside image, no map elements
"how a smartphone is built" → cross-section/exploded view, same style
Click visible object → next page drills into that object, style matches
Drill 5 pages deep → style stays consistent across all pages
Back button returns to previous page; thumbnails jump without network request
Reset clears state back to empty input
Restart server, type same query → returns instantly (disk cache hit)
Two rapid clicks → second request waits for first to complete (check server logs)

Troubleshooting

Problem	Fix
`No inline image in model response`	Model returned text only; check model name supports image output and `responseModalities: ['IMAGE']` is set
Style drifts across pages	Ensure `STYLE_DESCRIPTION` is one const — never duplicated or paraphrased in prompts
Red marker not visible on dark images	Increase ring radius (`width * 0.05`) or add white stroke on outer ring
Second click fires before first finishes	Check serialization queue — both requests must be inside `enqueue()`
Cache miss after server restart	Verify `CACHE_VERSION` hasn't changed and `public/generated/` is not being cleaned on start
`Parent image not found` 500 error	Client sent a `parentId` for a page whose PNG was deleted; clear state and start over
Images too slow	Add a lightweight loading progress bar; generation typically takes 10–30s per page

illustrated-explainer-spec