agent-discovery
Agent Discovery
This skill is about an opinionated blend of traditional SEO, static truth in the initial response, and UX ergonomics for agents and operators.
Don't reduce this to "AI SEO".
If a page can't be crawled, cited, parsed, navigated, or actioned by an agent harness, it is broken in a way normal SEO reports won't fully show.
Thesis
Treat agent discoverability as four stacked layers:
- Search baseline — crawlable pages, clean robots policy, sitemap, structured data, canonical answers, freshness.
- Initial-response truth — the important facts must be present in the first HTML response.
- Machine-readable projections — markdown/text/JSON surfaces that project the same canonical resource without drift.
- Operator ergonomics — the site should be easy to drive from pi, OpenCode, Claude Code, ChatGPT, or a browser agent without guesswork.
If layer 1 is broken, you won't get found. If layer 2 is broken, agents won't extract the truth. If layer 3 is broken, harnesses waste tokens scraping HTML. If layer 4 is broken, operators and browser agents hit dead ends.
When to Use
Use this skill when the task mentions:
- agent SEO / AEO / GEO / LLM SEO / AAIO
- agent discoverability
- optimize site for ChatGPT / Claude / Perplexity / Copilot
- llms.txt / sitemap.md / markdown endpoints / content negotiation
- AGENTS.md / coding-agent docs / operator docs
- agent-friendly docs / machine-readable docs
- browser-agent UX / agent automation / accessible automation
- making a website easier for agent harnesses to use
Core Rules
1. Do the boring SEO work first
Before fancy protocols, verify:
robots.txtallows the crawlers you actually want- sitemap exists and stays current
- pages that should never rank use
noindex - titles, descriptions, H1s, and headings agree on the topic
- JSON-LD matches visible content exactly
- author/date/source signals are visible where trust matters
- stale pages get refreshed, redirected, or archived
Useful mental model from the audit side:
- crawlability and indexation
- technical foundations
- on-page clarity
- content quality / trust
- authority / citations / mentions
2. Initial HTML is the truth surface
The important facts must survive:
curl- a text browser
- no-JS mode
- cheap retrieval pipelines
Don't hide core facts behind:
- client-only fetches
- tabs and accordions with empty initial HTML
- modal-only disclosures
- images or PDFs with no HTML equivalent
- click handlers on
divs pretending to be controls
Static rendering is a principle, not a framework fetish.
A static shell with small dynamic holes is fine. A blank SPA shell is dogshit for both search and agents.
3. One resource, multiple truthful projections
Give the same resource multiple machine-friendly shapes:
- human page →
text/html - markdown twin or negotiated markdown →
text/markdown - API / structured route →
application/json - text hint surface (
llms.txt, index text) →text/plain
The rule is projection, not duplication.
Do not maintain three separate truths for HTML, markdown, and JSON. Project them from one canonical content source.
4. llms.txt is a hint surface, not a ranking hack
Use llms.txt or similar text hints as:
- a fast discovery point
- a cheap orientation surface
- a pointer to better machine-readable endpoints
Do not claim it boosts ranking by itself. Google has been explicit: AI discovery does not require special AI-only markup.
5. AGENTS.md beats hoping skills trigger
For coding-agent surfaces, persistent repo context matters more than wishful tool invocation.
Vercel's evals are useful here:
- skills alone underperformed for general framework guidance
- explicit instructions improved triggering, but wording was fragile
- compressed
AGENTS.md/ repo-instruction context won for broad, always-on guidance
Use:
- AGENTS.md for persistent repo rules, paths, commands, retrieval hints
- skills for vertical, action-specific workflows
That split matters for pi, OpenCode, and Claude Code.
6. Accessibility is agent UX
Browser agents and automation stacks lean on the accessibility tree.
Prefer:
- real
<a>links withhref - real
<button>elements - labels on form controls
autocompletewhere data entry matters- proper landmarks and heading hierarchy
- explicit UI state (
aria-expanded,role=status,aria-live)
If Playwright can't find it by role or label, an agent harness will likely struggle too.
7. Operator UX matters too
An operator using an agent harness should not need to reverse-engineer your product.
Good patterns:
- stable, guessable URLs
- obvious markdown twins or negotiated markdown
- copyable commands and prompts
- JSON responses that advertise next steps (
next_actions/ affordances) - machine-readable discovery routes (
/api,/sitemap.md, etc.) - deterministic MIME types
Bad patterns:
- opaque blobs of JSON with no next move
- downloadable markdown buried behind UI chrome
- hidden routes that only work if you already know them
- HTML fallback pretending to be markdown
- "click around and figure it out" operator flows
joelclaw Implementation Map
When you need concrete evidence, start here.
Crawl + discovery surfaces
apps/web/app/robots.ts- allows crawl globally and advertises both XML and markdown sitemaps
apps/web/app/sitemap.md/route.ts- markdown discovery index with posts, ADRs, feeds, and
.mdtwins
- markdown discovery index with posts, ADRs, feeds, and
apps/web/app/llms.txt/route.ts- plain-text hint surface pointing agents to
sitemap.md,feed.xml, and markdown access
- plain-text hint surface pointing agents to
Markdown projections
apps/web/proxy.ts- canonicalizes
/{slug}.mdand rewrites to the markdown route handler
- canonicalizes
apps/web/app/[slug]/md/route.ts- renders real
text/markdown; charset=utf-8 - prepends agent context
- rewrites internal links to other
.mdtwins
- renders real
Structured discovery + navigation
apps/web/app/api/route.ts- API discovery endpoint with
nextActions
- API discovery endpoint with
apps/web/app/api/search/route.ts- HATEOAS JSON search envelope with markdown snippets
apps/web/components/clawmail-source-comment.tsx- source-visible navigation prompt telling agents which endpoints to hit and what MIME types to verify
Trust + rendering truth
apps/web/app/[slug]/page.tsx- cached article shell, JSON-LD injection, visible metadata, copy-for-agent affordance
apps/web/lib/jsonld.ts- BlogPosting / Blog / Person / BreadcrumbList helpers
apps/web/lib/posts.ts- Convex-canonical content reads; HTML/markdown projections come from one source
apps/web/components/copy-as-prompt.tsx- operator-facing affordance to grab a prompt directly into a harness
Static rendering example, not doctrine
apps/web/next.config.tscacheComponents: true
apps/web/app/[slug]/page.tsx'use cache',cacheLife,cacheTagfor fast static shells with truthful invalidation
Use those as examples of the principle:
- cached shell
- canonical source
- small dynamic seams
- honest machine projections
Not as a claim that Next.js is the only valid way.
Verification Checklist
Crawl + indexing
curl -s https://example.com/robots.txt
curl -I -A 'OAI-SearchBot/1.3' https://example.com/
curl -I -A 'Googlebot' https://example.com/
Check Search Console and Bing Webmaster Tools too.
Initial-response truth
curl -sL https://example.com/page | rg -n 'important fact|<h1>|<table>|application/ld\+json'
lynx -dump https://example.com/page
If curl cannot see the fact, many agents will not either.
Markdown / text / JSON projections
curl -I https://example.com/sitemap.md
curl -I https://example.com/page.md
curl -sS https://example.com/api | jq
Verify exact MIME types:
text/markdown; charset=utf-8text/plain; charset=utf-8application/json; charset=utf-8
If a markdown route returns text/html, treat it as broken.
Accessibility and browser-agent UX
- run Lighthouse / axe
- inspect the accessibility tree
- write Playwright tests with
getByRole/getByLabel - smoke the key flows with a browser agent, not just curl
Measurement
Track:
- AI referrers
- citation presence for core queries
- community mentions / repeated phrasing in the wild
- crawl success by user-agent
- content refresh cadence (30 / 90 / 180 day review works fine)
Anti-Patterns
- Over-indexing on framework-specific tricks instead of content truth
- Claiming
llms.txtis the magic ranking lever - Shipping agent protocols on top of broken crawlability
- Maintaining separate truths for HTML, markdown, and JSON
- Returning raw JSON with no next move
- Hiding important facts behind client-side interactivity
- Assuming accessibility is unrelated to agent automation
- Treating MCP as a substitute for honest routes and MIME types
Use This Mental Shortcut
Ask four questions:
- Can an indexer find it?
- Can a retriever extract the truth from the first response?
- Can a harness get a cheaper markdown/JSON version without scraping?
- Can an operator or browser agent actually drive the flow without guessing?
If any answer is no, fix that first.
That's the work.
More from joelhooks/joelclaw
docker-sandbox
Create, manage, and execute agent tools (claude, codex) inside Docker sandboxes for isolated code execution. Use when running agent loops, spawning tool subprocesses, or any task requiring process isolation. Triggers on "sandbox", "isolated execution", "docker sandbox", "safe agent execution", or when working on agent loop infrastructure.
86joel-writing-style
Joel's writing voice and style guide for joelclaw.com content. Use when writing, editing, or reviewing any blog post, essay, book chapter, or prose content for joelclaw.com. Also use when asked to 'write like Joel,' 'match Joel's voice,' 'draft a post,' 'write content for the blog,' or 'review this for voice.' This skill captures Joel's specific writing patterns derived from ~90,000 words of published content spanning 2012–2026. Cross-reference with copy-editing and copywriting skills for marketing-specific copy.
81nextjs-static-shells
Static-first Next.js 16 architecture patterns: cached shells with dynamic slots, provider islands, 'use cache' boundaries, and link preloading strategy. Use when building or refactoring Next.js routes to maximize static rendering, implementing 'use cache' with dynamic personalization, splitting entry vs static renderers, scoping client providers, or tuning prefetch behavior. Triggers on 'static shell', 'use cache pattern', 'dynamic slots', 'provider island', 'prefetch strategy', 'static first', 'cache boundary', 'route goes dynamic unexpectedly', or any Next.js architecture work involving mixed static/dynamic rendering.
48contacts
Add, enrich, and manage contacts in Joel's Vault. Fire the Inngest enrichment pipeline for full multi-source dossiers, or create quick contacts manually. Use when: 'add a contact', 'enrich this person', 'who is X', 'VIP contact', 'update contact', or any task involving the Vault/Contacts directory.
43granola
Access and process Granola meeting notes and transcripts via the granola CLI (MCP-backed). Use when pulling meeting data, analyzing transcripts, backfilling meetings, or any task involving Granola meeting content.
41gateway
Operate the joelclaw gateway daemon — the always-on pi session that receives events, notifications, and messages. Use the joelclaw CLI for ALL gateway operations. Use when: 'restart gateway', 'gateway status', 'is gateway healthy', 'push to gateway', 'gateway not responding', 'telegram not working', 'messages not going through', 'gateway stuck', 'gateway debug', 'check gateway', 'drain queue', 'test gateway', 'stream events', or any task involving the gateway daemon.
40