content-semantics
Content & Semantics
Fixes Category 3 (Content & Semantics, 20% weight) issues from IsAgentReady.com. AI agents use the accessibility tree — not visual rendering — to parse pages. Good semantic HTML and heading hierarchy directly improve AI comprehension.
When to Use
- IsAgentReady scan shows issues in Content & Semantics category
- Site has low scores on checkpoints 3.1–3.8
- User asks to fix semantic HTML, headings, SSR, alt text, ARIA, or link texts
- Building a new site and want AI agent readiness from the start
When NOT to Use
- Issues are in other categories (use
ai-content-discovery,structured-data,agent-protocols, orsecurity-trust) - Problem is purely visual styling (not semantic structure)
- Site already scores A+ on Content & Semantics
Checkpoints Overview
| ID | Checkpoint | Points | Priority |
|---|---|---|---|
| 3.1 | Server-side rendered content | 20 | Critical |
| 3.2 | Heading hierarchy | 20 | Critical |
| 3.3 | Semantic HTML elements | 20 | Important |
| 3.4 | ARIA landmarks | 10 | Important |
| 3.5 | Image alt text | 15 | Important |
| 3.6 | Language attribute | 5 | Nice-to-have |
| 3.7 | Descriptive link texts | 10 | Nice-to-have |
| 3.8 | Question-based headings | 10 | Important |
Total: 110 points. Category weight: 20% of overall score.
Checkpoint 3.1: Server-Side Rendered Content (20 pts)
What the scanner checks: Whether <body> contains >200 characters of visible text in raw HTML (no JS execution). Detects SPA shells (<div id="root"></div>). Checks for <noscript> fallback.
Scoring:
- 20 pts — >200 chars visible text in raw HTML
- 10 pts — Limited text but
<noscript>fallback present - 5 pts — Some visible text but under 200 chars
- 0 pts — Empty SPA shell or no visible text
Why it matters: AI agents cannot execute JavaScript. They parse raw HTML to understand page content. Without SSR, your content is invisible to most AI crawlers and assistants.
Fix Workflow
-
Test current state — check what AI agents see:
curl -s https://example.com | sed 's/<script[^>]*>.*<\/script>//g; s/<style[^>]*>.*<\/style>//g; s/<[^>]*>//g' | tr -s ' \n' | head -20If this returns minimal text, your content relies on JavaScript.
-
Detect SPA shell — look for empty containers:
curl -s https://example.com | grep -E '<div id="(root|app|__next)"></div>' -
Implement SSR — see references/ssr-strategies.md for framework-specific guides (Next.js, Nuxt, Astro, Remix, SvelteKit, Angular).
-
Add
<noscript>fallback (partial credit if full SSR isn't feasible):<noscript> <p>This site requires JavaScript. Visit our <a href="/sitemap">sitemap</a> for a text-based overview.</p> </noscript> -
Verify — re-run the curl test and confirm >200 chars of visible text.
References:
Checkpoint 3.2: Heading Hierarchy (20 pts)
What the scanner checks: Exactly one <h1>, sequential heading levels (no h1→h3 skip), non-empty heading text.
Scoring:
- 20 pts — Single
<h1>, no level skips, all headings have text - 15 pts — Good hierarchy but some headings have empty text
- 10 pts — Single
<h1>present but levels are skipped - 5 pts — Multiple
<h1>tags - 0 pts — No
<h1>found or no headings at all
Why it matters: AI agents use headings to build a content outline and determine topic hierarchy. A clear h1→h2→h3 structure helps AI systems extract and summarize your content accurately.
Fix Workflow
-
Audit current headings:
curl -s https://example.com | grep -oE '<h[1-6][^>]*>.*?</h[1-6]>' | head -30 -
Fix the hierarchy — ensure exactly one
<h1>and sequential levels:<h1>Page Title — One Per Page</h1> <h2>Main Section</h2> <h3>Subsection</h3> <h3>Subsection</h3> <h2>Another Section</h2> <h3>Subsection</h3> <h4>Detail</h4> -
Common fixes:
- Multiple
<h1>tags → keep one, demote others to<h2> <h1>→<h3>skip → add intermediate<h2>- Empty headings → add meaningful text or remove the tag
- Logo in
<h1>→ move logo out, use text<h1>for page title
- Multiple
-
Verify — re-check that heading levels are sequential.
See references/gotchas.md for common heading hierarchy pitfalls (component-level <h1>, level resets).
References:
Checkpoint 3.3: Semantic HTML Elements (20 pts)
What the scanner checks: Presence of 5 semantic landmark elements: <header>, <nav>, <main>, <article> or <section>, <footer>. 4 points each.
Scoring:
- 20 pts (pass) — All 5 elements present
- 12–16 pts (partial) — 3–4 elements present
- 0–8 pts (fail) — Fewer than 3 elements
Why it matters: AI agents navigate pages using the accessibility tree, not visual layout. Semantic elements like <main>, <nav>, and <article> provide structural meaning that AI systems rely on to parse content.
Fix Workflow
-
Check which elements are present:
curl -s https://example.com | grep -oE '<(header|nav|main|article|section|footer)[\s>]' | sort -u -
Implement the full semantic structure:
<body> <header> <nav> <a href="/">Home</a> <a href="/about">About</a> <a href="/contact">Contact</a> </nav> </header> <main> <article> <h1>Page Title</h1> <section> <h2>Introduction</h2> <p>Content here...</p> </section> <section> <h2>Details</h2> <p>More content...</p> </section> </article> </main> <footer> <nav aria-label="Footer"> <a href="/privacy">Privacy Policy</a> <a href="/terms">Terms of Service</a> </nav> <p>© 2026 Example Inc.</p> </footer> </body> -
Replace
<div>wrappers with semantic equivalents:<div class="header">→<header><div class="nav">→<nav><div class="main">→<main><div class="footer">→<footer><div class="sidebar">→<aside>
See references/semantic-html-guide.md for the complete element reference and <article> vs <section> guidance.
References:
Checkpoint 3.4: ARIA Landmarks (10 pts)
What the scanner checks:
- Implicit landmarks from semantic elements (
<main>→ main,<nav>→ navigation,<header>→ banner,<footer>→ contentinfo,<aside>→ complementary) - Explicit ARIA roles (
role="navigation",role="main", etc.) - Additive ARIA attributes:
aria-label,aria-labelledby,aria-describedby,aria-live,aria-expanded,aria-current,aria-hidden
Scoring:
- 10 pts — 3+ landmark regions AND at least 1 additive ARIA attribute
- 7 pts — 3+ landmark regions, no additive ARIA
- 5 pts — 1+ landmark AND 1+ additive ARIA attribute
- 3 pts — Any landmark or ARIA attribute present
- 0 pts — No landmarks or ARIA attributes found
Why it matters: ARIA landmarks help AI agents identify page regions (navigation, main content, footer). This is the same API used by screen readers and is increasingly used by AI browsing agents.
Fix Workflow
-
Use semantic HTML first — these provide implicit landmarks without extra attributes:
<header> <!-- implicit role="banner" --> <nav> <!-- implicit role="navigation" --> <main> <!-- implicit role="main" --> <footer> <!-- implicit role="contentinfo" --> <aside> <!-- implicit role="complementary" --> -
Add ARIA attributes for disambiguation and state:
<nav aria-label="Primary"> <a href="/" aria-current="page">Home</a> <a href="/about">About</a> </nav> <nav aria-label="Footer"> <a href="/privacy">Privacy</a> </nav> <div aria-live="polite" aria-label="Search results"> <!-- Dynamic content area --> </div> <button aria-expanded="false" aria-label="Toggle menu">Menu</button> -
Do NOT add redundant roles to semantic elements:
<!-- WRONG — redundant, not recommended by W3C --> <main role="main"> <!-- CORRECT — semantic element is sufficient --> <main> -
Verify — check for 3+ landmark types plus at least one
aria-*attribute.
References:
Checkpoint 3.5: Image Alt Text (15 pts)
What the scanner checks: All <img> tags have non-empty alt attributes. Score = (images with alt / total images) * 15. No images on page = full score.
Why it matters: AI agents cannot see images. Alt text is the only way for AI systems to understand image content, context, and relevance to the surrounding text.
Fix Workflow
-
Find images missing alt text:
curl -s https://example.com | grep -oE '<img [^>]*>' | grep -v 'alt=' -
Add appropriate alt text:
<!-- Content images: describe what the image shows --> <img src="team-photo.jpg" alt="Engineering team at the 2026 company offsite"> <img src="chart.png" alt="Revenue growth chart showing 40% increase in Q4 2025"> <img src="product.jpg" alt="Wireless noise-cancelling headphones in matte black"> <!-- Decorative images: use empty alt (NOT missing alt) --> <img src="divider.svg" alt=""> <img src="bg-pattern.png" alt=""> <!-- Icons with adjacent text: use empty alt to avoid repetition --> <img src="email-icon.svg" alt=""> <span>Email us</span> <!-- Logos: use the company/brand name --> <img src="logo.svg" alt="Acme Inc."> -
Alt text guidelines:
- Be specific and concise (under 125 characters)
- Describe the content and function, not appearance
- Don't start with "Image of..." or "Picture of..."
- For charts/graphs, describe the key data point or trend
- For decorative images, use
alt=""(empty, not missing)
References:
Checkpoint 3.6: Language Attribute (5 pts)
What the scanner checks: <html> tag has a lang attribute with a valid BCP 47 language code.
Scoring:
- 5 pts — Valid BCP 47 code (e.g.,
en,nl,en-US) - 2 pts —
langattribute present but invalid value - 0 pts — No
langattribute
Why it matters: The lang attribute tells AI agents what language your content is in, enabling correct text processing, translation, and language-specific understanding.
Fix Workflow
-
Check current state:
curl -s https://example.com | grep -oE '<html[^>]*lang="[^"]*"' -
Add or fix the lang attribute:
<!-- English --> <html lang="en"> <!-- Regional variants --> <html lang="en-US"> <html lang="en-GB"> <html lang="nl-NL"> <html lang="de-DE"> <html lang="fr-FR"> <html lang="ja"> <html lang="zh-Hans"> -
For multilingual content, use
langon specific elements:<html lang="en"> <body> <p>This is English text.</p> <blockquote lang="fr">Ceci est en français.</blockquote> </body> </html>
References:
Checkpoint 3.7: Descriptive Link Texts (10 pts)
What the scanner checks: Samples up to 50 links. Flags generic text: "click here", "read more", "learn more", "here", "link", "more", "details", "info", "more details". Score = (descriptive links / total sampled) * 10.
Why it matters: AI agents use link text to understand navigation and discover related content. Generic text like "click here" provides no context about where the link leads.
Fix Workflow
-
Find generic links:
curl -s https://example.com | grep -oiE '<a [^>]*>([^<]*)<\/a>' | grep -iE '>(click here|read more|learn more|here|link|more|details|info|more details)<' -
Replace with descriptive text:
<!-- WRONG --> <a href="/pricing">Click here</a> <a href="/blog/ai-trends">Read more</a> <a href="/docs">Learn more</a> To sign up, <a href="/register">click here</a>. <!-- CORRECT --> <a href="/pricing">View pricing plans</a> <a href="/blog/ai-trends">Read our AI trends analysis</a> <a href="/docs">Explore the documentation</a> <a href="/register">Create your free account</a>. -
Patterns for common cases:
- Blog posts: "Read [article title]" instead of "Read more"
- CTAs: describe the action — "Start free trial", "Download the report"
- Navigation: use the destination name — "About us", "API documentation"
- Lists: "[Item name] details" instead of "Details"
-
For "Read more" in card layouts, make the entire card clickable or use
aria-label:<!-- Option A: descriptive visible text --> <a href="/blog/ai-trends">Read AI trends analysis</a> <!-- Option B: aria-label when visual design requires short text --> <a href="/blog/ai-trends" aria-label="Read AI trends analysis">Read more</a>
References:
Checkpoint 3.8: Question-Based Headings (10 pts)
What the scanner checks: H2 headings that use question format — ending with ? or starting with a question word (how, what, why, when, where, which, who, can, does, do, is, are, will, should, would, could, shall).
Scoring:
- 10 pts — 2+ question-format H2 headings, or (3 or fewer total H2s and at least 30% are questions)
- 5 pts — At least 1 question-format H2
- 0 pts — H2 headings exist but none are questions
- Skip (0/0) — No H2 headings present
Why it matters: 78.4% of ChatGPT citations come from pages with question-based H2 headings. Question headings match how users query AI systems, making your content more likely to be selected as a source for AI-generated answers.
Fix Workflow
-
Audit current H2 headings:
curl -s https://example.com | grep -oE '<h2[^>]*>.*?</h2>' -
Rewrite H2 headings as questions:
<!-- WRONG --> <h2>AI Agent Readiness</h2> <h2>Content Discovery by AI Systems</h2> <!-- CORRECT --> <h2>What Is AI Agent Readiness?</h2> <h2>How Do AI Systems Discover Your Content?</h2> <h2>Why Does Structured Data Matter for AI?</h2> -
Recognized question starters: how, what, why, when, where, which, who, can, does, do, is, are, will, should, would, could, shall — or any heading ending with
?. -
Not all headings need to be questions — aim for 2+ question H2s on pages with 4+ H2s, or 30%+ on pages with 3 or fewer.
-
Pair with FAQPage schema (checkpoint 2.7) for maximum AI citation potential — match H2 questions with JSON-LD Question/Answer pairs.
-
Verify:
curl -s https://example.com | grep -oE '<h2[^>]*>.*?</h2>' | grep -iE '(\?|>(How|What|Why|When|Where|Which|Who|Can|Does|Do|Is|Are|Will|Should|Would|Could|Shall) )'
References:
Quick Wins Checklist
For the fastest score improvement, fix in this order:
- Add
<html lang="en">— 5 pts, one line (checkpoint 3.6) - Add semantic elements — wrap existing
<div>s with<header>,<nav>,<main>,<footer>(checkpoint 3.3) - Fix heading hierarchy — ensure one
<h1>, sequential levels (checkpoint 3.2) - Add ARIA attributes —
aria-labelon duplicate navs,aria-current="page"(checkpoint 3.4) - Add alt text to all images (checkpoint 3.5)
- Fix generic link texts — replace "click here" and "read more" (checkpoint 3.7)
- Rewrite H2s as questions — match how users query AI systems (checkpoint 3.8)
- Implement SSR — largest effort but highest single-checkpoint score (checkpoint 3.1)
Key Gotchas
Common mistakes that cause checkpoint failures:
- Multiple
<h1>tags — Components that each add their own<h1>break hierarchy - Heading level skips — Going from
<h1>directly to<h3>without<h2> - SPA with empty shell —
<div id="root"></div>scores 0 without SSR - Missing alt vs empty alt —
<img src="...">(no alt) fails;<img src="..." alt="">(empty) is valid for decorative images - Generic link text — "Click here" and "Read more" provide no context for AI agents
- Statement headings instead of questions — "AI Agent Readiness" scores 0; "What Is AI Agent Readiness?" scores points
See references/gotchas.md for detailed correct vs incorrect examples of each.
References
- references/semantic-html-guide.md — Semantic elements, ARIA mapping, heading patterns, link text patterns
- references/ssr-strategies.md — Framework-specific SSR implementation guides
- references/gotchas.md — Common pitfalls with correct vs incorrect examples
Instructions
- Identify failing checkpoints from the IsAgentReady.com scan results
- Follow the fix workflow for each failing checkpoint above
- Apply the code examples — adapt HTML structure and content to the user's site
- Verify each fix using the curl commands provided in each workflow
- Re-scan at isagentready.com to confirm improvements
If $ARGUMENTS is provided, interpret it as the URL to fix or the specific checkpoint to address.