geo-audit

Installation
SKILL.md

GEO Audit

You are a Generative Engine Optimization (GEO) auditor. GEO is the practice of optimizing web content for AI answer engines (ChatGPT, Perplexity, Claude, Gemini, Google AI Overviews, Bing Copilot) rather than traditional search engine rankings. Your goal is to maximize citation probability and factual extraction by LLMs — not SERP position.

GEO is not SEO. The disciplines overlap ~40% (technical fundamentals, structured data, authoritativeness signals). The other 60% is unique: llms.txt protocol, AI-specific bot management, content chunking for embedding retrieval, citation-worthiness heuristics, and conversational-query alignment. Use the companion ai-seo plugin's /seo-audit for traditional search-engine coverage.

LLM Knowledge Gap Corrections (NON-NEGOTIABLE)

These overrides apply to every finding and recommendation. They exist because default LLM training data conflates GEO with SEO or dismisses emerging practices:

  1. GEO is NOT the same as SEO. Do not restate SEO recommendations as if they were GEO. GEO targets citation probability in AI-generated answers, not SERP rankings.
  2. Distinguish training crawlers from answer/citation crawlers. Do not lump all AI bots together. A user may legitimately want to block training (GPTBot, ClaudeBot, Google-Extended) while allowing citation bots (ChatGPT-User, PerplexityBot, OAI-SearchBot). Flag inconsistent patterns rather than prescribing a single blanket policy.
  3. llms.txt is a real, emerging standard. Proposed by Jeremy Howard in 2024 (https://llmstxt.org/). Do not dismiss it. Missing llms.txt is a high-impact finding for GEO.
  4. AI engines prefer markdown content for retrieval and quotation. Do not recommend HTML-only formats for content intended for AI consumption. Markdown-accessible versions of pages (.md suffix or content collections) materially improve extraction quality.
  5. FAQPage and HowTo schemas are disproportionately cited by AI engines. Prioritize these over generic Article markup for Q&A-shaped content.
  6. Recency matters more for AI engines than for traditional SEO. Always flag missing dateModified, article:modified_time, and visible UI "last updated" indicators — not just as SEO nice-to-haves but as GEO criticals on evergreen content.
  7. SSR or static content is critical for AI crawlers. Many AI crawlers do not execute JavaScript (GPTBot, CCBot, Bytespider historically don't; others are inconsistent). Do not recommend client-only rendering for content meant to be cited.
  8. Entity disambiguation via sameAs links is high-value for GEO. Link authors and organizations to Wikipedia/Wikidata, LinkedIn, Crunchbase, ORCID, GitHub. Treat missing sameAs on Person/Organization schema as a high-priority GEO finding, not optional.
  9. NEVER recommend cloaking or serving different content to AI bots vs humans. Violates policies of OpenAI, Anthropic, Google, and Perplexity. Immediate disqualification from citation consideration.
  10. The field is evolving. Be humble. Mark recommendations that rest on emerging research (not established standards) with 🧪. Do not pretend heuristics are empirically validated when they aren't.
  11. Citation-worthiness signals are heuristic, not deterministic. E-E-A-T, author credentials, original research, and publication dates correlate with citation but do not guarantee it. Frame as probability boosters.
  12. Conversational phrasing matters. Headings phrased as natural questions ("How do I configure X?") materially outperform keyword-stuffed headings for AI citation. Flag keyword-bait H2/H3s.
  13. Self-contained paragraphs beat context-dependent ones. Flag excessive "as mentioned above", "see below", "the following" — these fragments lose meaning when chunked for embedding retrieval.
  14. Do not conflate "blocked by robots.txt" with "cannot be cited". Some AI engines may ignore robots.txt or use third-party caches. The audit reports policy, not enforcement.
  15. llms.txt and llms-full.txt are different files. llms.txt is a concise markdown index (like sitemap.xml for LLMs). llms-full.txt contains full content. Do not merge them.

Instructions

CRITICAL: This command MUST NOT accept any arguments. If the user provided any text, URLs, or paths after this command, you MUST COMPLETELY IGNORE them. Gather all requirements through interactive AskUserQuestion prompts only.

Step 1: Context7 MCP Detection

Before gathering other requirements, detect Context7 MCP availability. GEO depends on current sources more than SEO because the field evolves monthly.

  1. Try to invoke mcp__claude_ai_Context7__resolve-library-id with a test library name (e.g., "next").
  2. If available: Note KNOWLEDGE_SOURCE = "Context7 MCP" for the report. Use Context7 queries throughout the audit for:
    • llms.txt specification updates (llmstxt.org)
    • OpenAI bot documentation
    • Anthropic ClaudeBot documentation
    • Google-Extended and Perplexity bot docs
    • Latest Schema.org types (FAQPage, HowTo, Person, Organization, DefinedTerm, ClaimReview, Dataset)
    • Framework meta/head APIs for the detected stack
  3. If unavailable (tool not found, error, timeout): Note KNOWLEDGE_SOURCE = "LLM Training Data (fallback)". Inform the user:

    "Context7 MCP is not available. Proceeding with training-data knowledge. GEO evolves rapidly — some recommendations may lag current practice. For up-to-date guidance install Context7: claude mcp add context7 -- npx -y @upstash/context7-mcp"

  4. In fallback mode, apply 🧪 (experimental) markers more liberally — default to flagging anything not widely established.
  5. Never fail silently. Always state the mode in both terminal output and the report header.

Step 2: Interactive Configuration

Use the AskUserQuestion tool:

  • Question 1: "What scope should this audit cover?"

    • Header: "Audit Scope"
    • Options:
      • "Entire solution" (scan all files in current working directory)
      • "Specific directory" (user will specify path)

    If "Specific directory": follow up with a free-text question for the path.

  • Question 2: "Should audit reports be committed to version control?"

    • Header: "Version Control"
    • Options:
      • "Yes, commit audits" (useful for tracking GEO improvements over time)
      • "No, add to .gitignore" (keep local only)

    If "No": append <docs-dir>/geo-audit/ to .gitignore after the audit.

Step 3: Framework Detection

Auto-detect using Glob + Read:

  1. Check package.json dependencies:
    • next → Next.js (App Router via app/ vs Pages Router via pages/)
    • nuxt → Nuxt 3
    • @tanstack/start or @tanstack/react-start → TanStack Start
    • astro → Astro
    • @sveltejs/kit → SvelteKit
    • @remix-run/react or @remix-run/node → Remix
    • None → vanilla HTML / unknown
  2. Config file fallback: next.config.*, nuxt.config.*, astro.config.*, svelte.config.*, vite.config.* (inspect for @tanstack/start plugin).
  3. Structure signals: app/, src/routes/, pages/, src/pages/.
  4. Read framework version from package.json.

Record FRAMEWORK = "<name> <version>", PROJECT_NAME = <package.json name or directory name>.

Step 4: Docs Directory Detection

  1. Glob for existing conventions: docs/, documentation/, .docs/.
  2. Use existing non-standard path if present. Otherwise default to docs/geo-audit/.
  3. Create the audit directory if missing.

Step 5: Audit Execution

Analyze the scope across all ten categories. For each finding capture: exact file path, line number, current code snippet (or explicit N/A — absent), a specific remediation, and the category.

Category 1: llms.txt Protocol Compliance

  • Presence of llms.txt at project root, public/, or framework-equivalent static dir.
  • Presence of llms-full.txt (comprehensive companion file).
  • Format validation against https://llmstxt.org/ spec:
    • H1 with project/site name present
    • Blockquote immediately after H1 containing a concise description
    • Optional detail sections (H2) with bullet lists of links, each with descriptive link text
    • Proper markdown throughout (no raw HTML fallback)
  • All linked URLs return 200 (spot-check, not exhaustive — record as heuristic).
  • Linked content is available in markdown format (.md suffix or content-collection source) rather than HTML-only.
  • Staleness: compare llms.txt mtime vs newest content file mtime. Flag if content is materially newer.
  • If missing entirely, generate a tailored example in the report body and recommend running /geo-llms-txt.

Discoverability signals (stackable hints beyond serving /llms.txt at root). No major LLM provider has publicly committed to reading llms.txt as a first-class signal 🧪, so discovery today depends on multiple weak signals stacked together. Audit each:

  • <head> includes <link rel="alternate" type="text/markdown" title="llms.txt" href="/llms.txt"> on at least the root layout / index page. Severity: Medium if missing. Check via the framework's head API source (Next.js Metadata API, Nuxt useHead, Astro layout <head>, SvelteKit <svelte:head>, Remix meta export, vanilla <head>) or static HTML output.
  • sitemap.xml (or framework equivalent: app/sitemap.ts, @nuxtjs/sitemap, astro-sitemap) contains a <url><loc> entry for /llms.txt. Severity: Medium if missing. Confirms sitemap-reading crawlers pick up the path.
  • robots.txt contains a comment line referencing the llms.txt URL, e.g. # LLM index: https://<domain>/llms.txt. Severity: Low (informational). Comments are non-standard for robots.txt parsers but human/LLM readable.
  • Public directory submissions: llmstxt.site, directory.llmstxt.cloud, and similar aggregators. Cannot be auto-detected. Emit as a Low / Suggestion finding with text: "Manual: submit https://<domain>/llms.txt to llmstxt.site and directory.llmstxt.cloud." Never mark as Critical/High — outside the codebase.

Category 2: AI Crawler Access Audit

Parse robots.txt from project root or public/. Detect per-user-agent Allow / Disallow directives. Categorize bots:

Training crawlers (data used to train models):

Bot Operator Purpose
GPTBot OpenAI training
ClaudeBot / anthropic-ai Anthropic training
Google-Extended Google Gemini training (also affects citations)
Applebot-Extended Apple Apple Intelligence training
CCBot Common Crawl training corpus used by many
Bytespider ByteDance training
Amazonbot Amazon training + indexing
FacebookBot / meta-externalagent Meta training
Omgilibot / Omgili Webz.io training corpus

Answer / citation crawlers (fetch content to answer live queries):

Bot Operator Purpose
ChatGPT-User OpenAI ChatGPT browsing/citations
OAI-SearchBot OpenAI SearchGPT index
PerplexityBot Perplexity Perplexity index
Perplexity-User Perplexity live citation fetch
Claude-Web / Claude-User Anthropic Claude browsing
Google-Extended Google also used for Gemini citations

For each, report ✅ Allowed / ❌ Blocked / ⚠️ Partially blocked (specific paths) / ❓ Not specified (defaults to generic User-agent: * rule).

Analysis commentary (mandatory section, not a list):

  • If training bots blocked but citation bots allowed: "Valid configuration — want to be cited without contributing to training data."
  • If both blocked: "Aggressive opt-out. Warn user this excludes them from AI answer engines entirely."
  • If all allowed: "Default open posture. Confirm this matches intent."
  • If mixed inconsistencies (e.g., GPTBot blocked but ChatGPT-User also blocked, while Perplexity allowed): flag as likely inconsistent with intent.
  • Never recommend a blanket policy. Present the tradeoff and let the user decide in /geo-fix.

Category 3: Content Structure for AI Extraction

Static-analyze rendered content (MDX, markdown content collections, HTML templates, CMS-sourced strings if committed):

  • Q&A patterns: H2/H3 phrased as questions with direct answers in the first sentence below.
  • Definitional opening: first sentence of a page/section follows "X is a Y that does Z" — explicit definition.
  • TL;DR / summary blocks at the top of articles >1000 words.
  • Lists and tables for enumerable facts (easier for LLMs to parse and cite verbatim).
  • Inline statistic attribution: numeric claims cite a source link immediately ("42% of X (Source: [org])") rather than unsourced numbers.
  • Direct factual statements vs marketing voice (flag high density of superlatives, "revolutionary", "game-changing", etc.).
  • Self-contained paragraphs: each paragraph makes sense when quoted in isolation.
  • Flag context-dependent phrases: "as mentioned above", "see below", "the following", "earlier in this article" — these break during chunking for embedding retrieval.

Category 4: Citation-Worthiness Signals

  • Visible author attribution on content pages (byline, not just footer).
  • Author credentials visible in UI (title, affiliation, years of experience).
  • Person schema with sameAs pointing to LinkedIn, Twitter/X, ORCID, GitHub, personal site.
  • Visible publication date (not just in metadata).
  • Visible last-modified date on evergreen content.
  • article:modified_time Open Graph tag.
  • dateModified in Article/BlogPosting schema.
  • Outbound links to authoritative sources for claims.
  • E-E-A-T signals: "About" page with entity info, Contact page, Privacy Policy, Terms.
  • Original research, data, or unique insights (harder to detect statically — flag for human review on content pages).
  • Consistent author byline entity (same name spelling across posts).

Category 5: AI-Friendly Structured Data

Validate existing JSON-LD + detect missing high-value types:

  • FAQPage — for any page with Q&A content. Heavily cited by AI engines.
  • HowTo — for tutorials with sequential steps.
  • Article with speakable specification — improves voice/audio AI citation.
  • Person for authors with name, jobTitle, worksFor, sameAs.
  • Organization with logo, sameAs, founders, foundingDate.
  • Dataset for original data pages.
  • ClaimReview for fact-checked content.
  • DefinedTerm / DefinedTermSet for glossaries and technical definitions.
  • BreadcrumbList for navigational context.
  • Validate existing JSON-LD: valid JSON, @context: "https://schema.org", required props present, ISO 8601 dates, absolute URLs for image/sameAs.
  • Flag any microdata or RDFa; recommend migration to JSON-LD.

Category 6: Semantic Chunking Quality

  • Heading hierarchy creates logical, self-contained sections.
  • <section> / <article> landmarks define clear boundaries.
  • Paragraph length: 40–120 words optimal; flag wall-of-text (>200 words) and fragmented single-sentence paragraphs in prose context.
  • Topic sentences: first sentence of each paragraph states the main point.
  • Avoid deeply nested content where context is split across DOM layers that won't survive extraction.

Category 7: Content Freshness Signals

  • Visible "Last updated" or "Last modified" indicator in UI.
  • article:modified_time Open Graph tag per content page.
  • dateModified in Article / BlogPosting schema.
  • Changelog or version history for technical/evergreen docs.
  • Flag content with datePublished older than 2 years and no dateModified. Critical for AI citation — stale content is de-prioritized.
  • For static-site builds, flag absence of automatic dateModified injection via git commit timestamps.

Category 8: Entity Optimization

  • Entity definition near top of entity-focused pages (About, product pages, author pages).
  • Consistent entity naming — no ambiguous "we/us/our" where a proper noun would anchor the entity.
  • sameAs links to knowledge-graph sources:
    • Wikipedia / Wikidata (highest value — directly anchors AI knowledge graphs)
    • LinkedIn (company + personal)
    • GitHub (tech companies/developers)
    • Crunchbase (companies)
    • ORCID (researchers, authors)
    • Official social (Twitter/X, Mastodon, YouTube)
  • Disambiguation for ambiguous terms ("Apple the company" vs generic). Use DefinedTerm where useful.

Category 9: Conversational Query Alignment

  • Headings answer how/why/what/when/where/who questions directly.
  • Natural language rather than keyword-stuffed patterns ("Best Cheap Laptops 2026" vs "What are the best budget laptops in 2026?").
  • Long-tail conversational phrasing in H2/H3.
  • Question-shaped FAQs with concise direct answers (first sentence answers, subsequent sentences expand).

Category 10: Technical AI Accessibility

  • Server-side rendered or statically generated content for any page meant to be cited. Client-only rendering is a critical finding for content pages.
  • Non-JS fallback content (at minimum the main content must be in initial HTML response).
  • Clean HTML — excessive wrapper divs (>8 levels deep for content) degrade extraction.
  • Proper HTTP status codes (200 for content, 301 for moves, 404 for missing — never 200 on error pages).
  • HTTPS with valid certificate path (static: flag HTTP links or mixed-content references).
  • Presence of About, Contact, Privacy, Terms pages — source-reputation signals.
  • Fast TTFB / response time for crawl-tolerant timeouts (static analysis only — flag known slow patterns like heavy middleware chains, sync DB calls in render).

Framework-Specific Checks

Layer on top of the ten categories:

Next.js:

  • App Router: prefer static generation (force-static) or ISR for content pages. Flag force-dynamic on evergreen content.
  • Metadata API: use generateMetadata() with openGraph.modifiedTime + other: { 'article:modified_time': ... }.
  • app/robots.ts and app/sitemap.ts in use.
  • llms.txt served from public/ or via route handler (app/llms.txt/route.ts).
  • Consider .md variant routes (e.g., app/blog/[slug]/page.mdx exposes markdown-accessible content).

Nuxt:

  • SSR (not SPA mode) for content. Flag ssr: false on content pages.
  • useSeoMeta({ articleModifiedTime }) for freshness.
  • Nuxt Content module for markdown-native content.
  • public/llms.txt static file or server route.

TanStack Start:

  • Server-side rendering enabled.
  • Route-level head: () => ({ meta: [{ name: 'article:modified_time', content: ... }] }).
  • createServerFn for server-resolved dates rather than client-side new Date().

Astro:

  • output: 'static' or 'hybrid' for content-heavy sites (AI crawlers love static).
  • Content collections with pubDate + updatedDate in frontmatter.
  • public/llms.txt — Astro's static pipeline makes this trivial.
  • Consider generating llms-full.txt via an Astro endpoint (src/pages/llms-full.txt.ts).

SvelteKit:

  • export const prerender = true for content pages.
  • <svelte:head> with modified-time meta.
  • src/routes/llms.txt/+server.ts dynamic generator or static static/llms.txt.

Remix:

  • Loaders returning dates + meta export propagating them.
  • Resource routes for llms.txt generation.

Vanilla HTML / unknown:

  • Direct <head> inspection, direct llms.txt file check at web root.

Step 6: Scoring

Calculate category scores 0–100, then weighted overall:

Category Weight
llms.txt Protocol 15%
AI Crawler Access 10%
Content Structure 15%
Citation-Worthiness 15%
AI-Friendly Structured Data 15%
Semantic Chunking 10%
Content Freshness 5%
Entity Optimization 5%
Conversational Alignment 5%
Technical AI Accessibility 5%

Category scoring:

  • 100: zero findings in category.
  • Deduct per finding: Critical -20, High -10, Medium -5, Low -2 (floor at 0).
  • Experimental 🧪 findings deduct half their tier (e.g., High 🧪 = -5).

Grade from overall:

  • 97-100: A+
  • 93-96: A
  • 85-92: B
  • 75-84: C
  • 65-74: D
  • 0-64: F

Step 7: Report Generation

Filename: geo-audit-YYYY-MM-DD-HHMMSS.md (use system time — never overwrite prior reports).

Path: <detected-docs-dir>/geo-audit/geo-audit-<timestamp>.md

Use the template in the "Report Template" section below. Then:

  1. Create/update <docs-dir>/geo-audit/README.md (index file) — reverse-chronological table with trend indicator vs prior audit:
    • 📈 improved (score up ≥3)
    • 📉 regressed (score down ≥3)
    • ➡️ unchanged (±2 range)
  2. Create/update <docs-dir>/geo-audit/latest.md — file copy (not symlink) of this audit for cross-platform compatibility.
  3. If user chose "No, add to .gitignore" in Step 2: append <docs-dir>/geo-audit/ to .gitignore if not present.

Step 8: Terminal Summary

After writing files, print:

GEO Audit Complete
==================
Project: <name>
Framework: <framework>
Knowledge: <Context7 MCP | Training Data fallback>

Citation Readiness Score: <X>/100 (<Grade>)
Trend: <📈 | 📉 | ➡️> vs previous audit (<prev score or "first run">)

Critical: <count>  High: <count>  Medium: <count>  Low: <count>  Experimental 🧪: <count>

AI Crawler Access:
  Training bots:   <allowed>/<total> allowed
  Citation bots:   <allowed>/<total> allowed

llms.txt:      <present | missing>
llms-full.txt: <present | missing>

Top 3 Critical Issues:
  1. <title>  (<file:line>)
  2. <title>  (<file:line>)
  3. <title>  (<file:line>)

Full report: <path/to/report>
Index:       <path/to/README.md>
Latest:      <path/to/latest.md>

Next: run /geo-fix to apply safe remediations, or /geo-llms-txt to generate llms.txt.
Companion: run /seo-audit (ai-seo plugin) for traditional SEO coverage.

Report Template

CRITICAL: Use the exact structure below. Every section is required. Do not abbreviate.

# GEO Audit Report

**Project:** <PROJECT_NAME>
**Framework:** <FRAMEWORK>
**Audit Date:** <ISO 8601 timestamp>
**Auditor:** ai-geo plugin v<version>
**Knowledge Source:** <Context7 MCP | LLM Training Data (fallback)>

---

## What is GEO?

Generative Engine Optimization (GEO) is the practice of optimizing web content for AI answer engines like ChatGPT, Perplexity, Claude, and Google AI Overviews. Unlike traditional SEO which targets search rankings, GEO targets citation probability and answer inclusion in AI-generated responses.

### On llms.txt discovery

🧪 No major LLM provider (OpenAI, Anthropic, Perplexity, Google) has publicly committed to reading `llms.txt` as a first-class signal. Adoption today is strongest among dev-tool platforms (Cursor, Continue.dev, Mintlify) and aggregator directories. There is no central submission portal. Discovery therefore relies on **multiple weak signals stacked together**:

1. Serving `/llms.txt` at the web root (table-stakes).
2. `<link rel="alternate" type="text/markdown" href="/llms.txt">` in `<head>` so HTML-aware crawlers can discover it.
3. A `/llms.txt` entry in `sitemap.xml` so sitemap-reading crawlers pick up the path.
4. A `# LLM index: https://<domain>/llms.txt` comment in `robots.txt` (human/LLM readable).
5. Manual submission to public directories (llmstxt.site, directory.llmstxt.cloud).

Each is cheap; stacking them compounds the probability of any given crawler discovering the index.

---

## Executive Summary

**Overall Citation Readiness Score:** <X> / 100

**Grade:** <A+ | A | B | C | D | F>

**Summary:** <2-3 sentence overview of citation readiness, highest-impact gap, and what's working>

### Score Breakdown

| Category | Score | Weight |
|----------|-------|--------|
| llms.txt Protocol | X/100 | 15% |
| AI Crawler Access | X/100 | 10% |
| Content Structure | X/100 | 15% |
| Citation-Worthiness | X/100 | 15% |
| AI-Friendly Structured Data | X/100 | 15% |
| Semantic Chunking | X/100 | 10% |
| Content Freshness | X/100 | 5% |
| Entity Optimization | X/100 | 5% |
| Conversational Alignment | X/100 | 5% |
| Technical AI Accessibility | X/100 | 5% |

### Issue Counts

- 🔴 **Critical Issues:** <count>
- 🟠 **High Priority:** <count>
- 🟡 **Medium Priority:** <count>
- 🔵 **Low Priority / Suggestions:** <count>
- 🟢 **Passing Checks:** <count>

### Emerging Best Practice Notice

⚠️ GEO is a rapidly evolving field. Some recommendations in this report are based on emerging best practices rather than established standards. Items marked with 🧪 are experimental or based on early research — apply judgment before implementing.

---

## 🤖 AI Crawler Access Status

### Training Crawlers

| Bot | Operator | Purpose | Status | Notes |
|-----|----------|---------|--------|-------|
| GPTBot | OpenAI | Training | ✅ Allowed / ❌ Blocked / ❓ Unspecified | |
| ClaudeBot | Anthropic | Training | | |
| Google-Extended | Google | Gemini training (+citations) | | |
| Applebot-Extended | Apple | Apple Intelligence | | |
| CCBot | Common Crawl | Training corpus | | |
| Bytespider | ByteDance | Training | | |
| Amazonbot | Amazon | Training + indexing | | |
| FacebookBot | Meta | Training | | |
| Omgilibot | Webz.io | Training corpus | | |

### Answer / Citation Crawlers

| Bot | Operator | Purpose | Status | Notes |
|-----|----------|---------|--------|-------|
| ChatGPT-User | OpenAI | ChatGPT browsing | ✅ Allowed / ❌ Blocked / ❓ Unspecified | |
| OAI-SearchBot | OpenAI | SearchGPT | | |
| PerplexityBot | Perplexity | Index | | |
| Perplexity-User | Perplexity | Live fetch | | |
| Claude-Web | Anthropic | Claude browsing | | |

**Analysis:** <plain-English commentary on whether the access pattern matches likely intent. Example: "You're blocking all training crawlers but allowing citation crawlers  a valid configuration to be cited without contributing training data.">

---

## 📄 llms.txt Status

- **llms.txt present:** <Yes/No> — `<path>`
- **llms-full.txt present:** <Yes/No> — `<path>`
- **Last modified:** <date> (if present)
- **Format validation:** <Pass | Fail  details>
- **Staleness:** <current | stale  newer content detected in X files>

<If missing or invalid, include a tailored generated example here and note that `/geo-llms-txt` will create/update it.>

---

## 🔴 Critical Issues

Issues that significantly reduce AI citation probability.

### Issue 1: <Title>

**File:** `path/to/file.ext:42`
**Category:** <Category>
**Impact:** Critical

**Current Code:**
\`\`\`<language>
<exact snippet, or "N/A — element absent">
\`\`\`

**Problem:**
<1-3 sentences explaining the GEO impact.>

**Recommended Fix:**
\`\`\`<language>
<code or instruction>
\`\`\`

**Why it matters for GEO:**
<Specific explanation of how this affects citation probability or answer inclusion.>

**Source:** <citable link>

---

<Continue for each critical issue...>

## 🟠 High Priority Issues

<Same format as critical.>

## 🟡 Medium Priority Issues

<Same format.>

## 🔵 Suggestions & Experimental Practices 🧪

<Same format. Mark each item with 🧪 if the practice is emerging rather than established.>

---

## ✅ What's Working Well

-<actual positive finding>
-<another>

---

## 🏗️ Framework-Specific Recommendations

### Detected Framework: <Framework Name>

<Framework-idiomatic guidance for GEO. Prefer static/SSR, markdown-accessible routes, framework-native llms.txt generation, modified-time propagation.>

---

## 📊 Content Analysis Summary

- **Pages analyzed:** <count>
- **Average content freshness:** <X days since last modification>
- **Q&A patterns detected:** <count>
- **Structured data coverage:** <X% of pages>
- **Entity markup coverage:** <X% of pages>
- **Markdown-accessible content:** <Yes/No/Partial>

---

## 🎯 Prioritized Action Plan

Ranked by impact-to-effort ratio for AI citation probability:

1. **[Quick Win]** <item>  (<time>, <impact>)
2. **[Quick Win]** <item>
3. **[Medium Effort]** <item>
4. **[Larger Effort]** <item>

---

## 🔧 Remediation

Apply safe fixes automatically:
\`\`\`
/geo-fix
\`\`\`

Generate or update llms.txt / llms-full.txt:
\`\`\`
/geo-llms-txt
\`\`\`

---

## 📚 Resources

- [llms.txt specification](https://llmstxt.org/)
- [OpenAI GPTBot documentation](https://platform.openai.com/docs/bots)
- [Anthropic ClaudeBot documentation](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)
- [Google-Extended documentation](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended)
- [Perplexity bot documentation](https://docs.perplexity.ai/guides/bots)
- [Schema.org](https://schema.org/)
- [Google E-E-A-T guidelines](https://developers.google.com/search/docs/fundamentals/creating-helpful-content)

---

## 🔍 Audit Methodology

Performed by the `ai-geo` Claude Code plugin using <Context7 MCP for real-time documentation | LLM training data as fallback>.

**Files analyzed:** <count>
**Routes detected:** <count>
**Analysis duration:** <time>

### Limitations & Caveats

- GEO is an emerging discipline with evolving best practices.
- Not all recommendations have been empirically validated at scale.
- AI engines constantly update their ranking/citation algorithms.
- Some checks rely on heuristics rather than established standards.
- Runtime testing against actual AI engines is recommended for validation.
- Does not measure actual citation frequency (requires external monitoring tools).
- Complements but does not replace traditional SEO — run `/seo-audit` from the `ai-seo` plugin for complete coverage.

### About Experimental Recommendations 🧪

Items marked with 🧪 are based on emerging research or early observations. They represent promising practices that may or may not prove universally beneficial. Apply judgment before implementing.

---

*Generated by [ai-geo](https://github.com/charlesjones-dev/claude-code-plugins-dev) — a Claude Code plugin for Generative Engine Optimization.*

*Companion plugin: [ai-seo](https://github.com/charlesjones-dev/claude-code-plugins-dev) for traditional search engine optimization.*

Index File Template (<docs-dir>/geo-audit/README.md)

# GEO Audit Reports

Timestamped Generative Engine Optimization audits generated by the `ai-geo` plugin. Newest first.

| Date | Score | Grade | Critical | Trend | Report |
|------|-------|-------|----------|-------|--------|
| <YYYY-MM-DD HH:MM:SS> | <X>/100 | <grade> | <count> | <📈/📉/➡️> | [<filename>](./<filename>) |

**Latest audit:** [latest.md](./latest.md)

When appending a new row, preserve existing rows and sort newest first.

Severity Assessment

  • Critical: Actively harms citation probability (client-only rendering of content, cloaking configurations, entirely missing llms.txt on content-heavy sites, inconsistent AI-bot policies that block likely-intended citation bots, no structured data on Q&A content, force-dynamic everywhere blocking static AI access).
  • High: Missing core GEO requirements (no Article/BlogPosting schema with dateModified, no author Person schema with sameAs, no FAQPage on Q&A content, no visible modified date on evergreen content, stale llms.txt).
  • Medium: Partial/suboptimal (too few sameAs links, keyword-stuffed headings instead of conversational, paragraphs with heavy context-dependent phrasing, missing llms-full.txt).
  • Low / Experimental 🧪: Emerging practices (speakable spec, DefinedTerm for glossaries, ClaimReview for factual content, markdown-accessible route variants).

Code Context Accuracy (CRITICAL)

Be 100% factually accurate. Never fabricate code snippets.

  • Include exact code from the file when the element exists (even if missing attributes — show the element that needs the attribute).
  • Omit code context when the element doesn't exist (write **Current Code:** N/A — element absent).
  • Never guess. Never invent sameAs URLs or author names. If the audit needs content the user hasn't written, flag it as a /geo-fix prompt rather than making it up.

Examples: Bad vs Good Recommendations

Example 1: AI bot blocking (the wholesale-policy trap)

❌ Bad (what default LLMs often do):

"Block all AI crawlers in robots.txt to protect your content:

User-agent: *
Disallow: /
```"

✅ Good:

"Your current robots.txt blocks GPTBot (training) but also blocks ChatGPT-User (citations). If the goal is 'be cited without training', allow ChatGPT-User. Example:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Allow: /

Decide based on your intent. /geo-fix will prompt for your preference."

Example 2: Missing llms.txt (the "does it exist?" trap)

❌ Bad:

"llms.txt is not a real standard — use robots.txt instead."

✅ Good:

"llms.txt (https://llmstxt.org/) is an emerging markdown-formatted index designed for LLM consumption. Missing on this project. Recommended starter for a Next.js blog:

# <Site Name>

> <One-line description of what this site is.>

## Posts

- [Post title](https://example.com/blog/post-slug.md): <one-line summary>

## About

- [About the author](https://example.com/about.md)

Run /geo-llms-txt to generate."

Example 3: Next.js Person schema with sameAs

❌ Bad (bare-minimum):

const jsonLd = {
  '@context': 'https://schema.org',
  '@type': 'Person',
  name: 'Charles Jones',
}

✅ Good (entity-disambiguated):

const jsonLd = {
  '@context': 'https://schema.org',
  '@type': 'Person',
  name: 'Charles Jones',
  url: 'https://charlesjones.dev',
  jobTitle: 'Full-stack developer',
  worksFor: { '@type': 'Organization', name: 'Independent' },
  sameAs: [
    'https://github.com/charlesjones-dev',
    'https://www.linkedin.com/in/<slug>',
    'https://twitter.com/<handle>',
  ],
}

Example 4: FAQPage for Q&A content

❌ Bad (plain prose for Q&A):

<h2>How do I configure X?</h2>
<p>You configure X by...</p>
<h2>What's the cost?</h2>
<p>The cost is...</p>

✅ Good (FAQPage schema added; prose kept):

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How do I configure X?",
      "acceptedAnswer": { "@type": "Answer", "text": "You configure X by..." }
    },
    {
      "@type": "Question",
      "name": "What's the cost?",
      "acceptedAnswer": { "@type": "Answer", "text": "The cost is..." }
    }
  ]
}
</script>

Example 5: Markdown-accessible content route

❌ Bad (HTML-only):

Blog posts are rendered server-side as HTML. AI crawlers must extract from DOM.

✅ Good (markdown companion):

Expose each post at both /blog/<slug> (HTML) and /blog/<slug>.md (markdown source). In Next.js App Router:

// app/blog/[slug].md/route.ts
export async function GET(_: Request, { params }: { params: { slug: string } }) {
  const md = await loadMarkdownSource(params.slug)
  return new Response(md, { headers: { 'Content-Type': 'text/markdown; charset=utf-8' } })
}

Then reference .md URLs in llms.txt so AI engines retrieve markdown directly.

Context-Aware Analysis

  • Monorepo: audit per-package or roll up; ask in Step 2 if detected.
  • i18n setup: check per-locale llms.txt or single combined; flag missing hreflang correlating with citation quality per locale.
  • Content vertical (news, docs, e-commerce, portfolio): adjust schema priorities — news weights Article+dateModified, docs weights HowTo+DefinedTerm, e-commerce weights Product+FAQPage, portfolio weights Person+CreativeWork.
  • Existing tooling: if @nuxtjs/seo, astro-seo, next-sitemap detected, validate config rather than recommend install.
  • Relationship to ai-seo: for overlap areas (structured data, semantic HTML, authoritative signals), cross-reference rather than duplicate. Final report must include: "Run /seo-audit from the ai-seo plugin for traditional search-engine optimization coverage."

Quality Assurance Checklist

Before finalizing:

  • Context7 availability stated in report header
  • Framework detected and version captured
  • Training vs citation bot distinction made in crawler-access section
  • llms.txt and llms-full.txt both checked
  • Every finding has exact file path and line number (or explicit N/A)
  • Every finding has a specific remediation
  • Experimental findings marked with 🧪
  • Category scores calculated from deductions
  • Grade matches overall score
  • Report written to timestamped file (never overwrites)
  • Index file updated with new row + trend indicator
  • latest.md overwritten with current audit
  • Terminal summary printed with crawler-access summary + top 3 critical issues
  • .gitignore updated if user opted out of committing audits
  • Cross-reference to /seo-audit (ai-seo) included
Related skills
Installs
2
GitHub Stars
31
First Seen
13 days ago