Agent Readiness Audit

Audit the live site, not the source tree alone. Prefer the same fetch path an external agent would use in the wild: direct HTTP requests, sitemap sampling, and page-level inspection.

Do not reduce the result to a homepage-only scan or a binary checklist.

1. Set scope

Use $ARGUMENTS as the base URL when provided. Otherwise infer the base URL from context and state the assumption.

Decide whether the host being audited is:

a docs-only host
an app/tool host
a mixed host

This matters for optional checks such as MCP, plugin manifests, or other tool discovery files. Do not penalize a docs-only host for missing tooling manifests that belong on a separate service.

For docs.docker.com, treat the public docs host as docs-only. Docker's MCP server is published separately, so missing MCP files on the docs host should be reported as N/A, not as a failure.

2. Gather sitewide signals

Always check these resources first:

/llms.txt
/llms-full.txt
/robots.txt
/sitemap.xml

Only check host-level tool manifests when the host is an app/tool host, mixed host, or explicitly advertises them:

/.well-known/ai-plugin.json
/.well-known/agent.json
/.well-known/agents.json

Use the bundled script for a baseline:

bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
  "$ARGUMENTS"

The script produces baseline evidence only. You still need to interpret what matters for a docs property and score it with the rubric.

For docs-only hosts, you may skip tool-manifest probes to reduce noise:

CHECK_TOOL_MANIFESTS=0 \
  bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
  "$ARGUMENTS"

3. Sample representative pages

Use the sitemap when available. Do not rely on the homepage alone.

If llms.txt exists, sample some URLs from it as well. This helps catch stale or misleading discovery surfaces that a sitemap-only sample would miss.

Sample at least 12 pages when the site is large enough, and cover multiple page types:

homepage or docs landing page
section landing pages
task guides
product manuals
reference or API pages
tutorial or learning pages

If the sitemap is missing or unusable, discover pages through internal links and note the lower confidence.

If the site has distinct delivery patterns, sample each one. For example:

normal content pages
generated reference pages
versioned docs
localized docs

4. Run fetch-path checks on each sample

For each sampled page, verify:

HTML fetch status, content type, and final URL
Accept: text/markdown behavior
direct markdown route behavior such as <page>.md or another stable path
page-level markdown alternate links and whether they actually resolve
whether page actions such as "Open Markdown" agree with the working route
whether the HTML title or H1 matches the markdown H1 closely enough for retrieval parity
whether main content is present in the initial HTML
redirect chain length and canonical URL consistency
obvious chrome/noise in the markdown response

Do not assume a .md mirror exists just because another site uses one. Verify the actual markdown path the site exposes.

Treat these as separate signals:

negotiated markdown works
a stable direct markdown URL works
the page advertises the correct markdown URL

If the page advertises dead markdown alternates but a working markdown route exists, do not fail markdown delivery outright. Score it as a discoverability and consistency problem instead.

For API or generated reference pages, also verify whether a machine-readable asset such as OpenAPI YAML is directly linked and fetchable.

5. Judge structure and legibility

Measure structural signals:

exactly one h1
sane heading hierarchy
main and article presence where appropriate
canonical tags
JSON-LD or breadcrumb structured data
stable anchors and deep-linkable headings

Also make a qualitative judgment about agent legibility:

markdown strips site chrome cleanly
headings are specific and task-oriented
code blocks stay intelligible without client-side JS
the page is not dominated by banners, injected chat, or nav noise

Measure code block labeling explicitly when code samples are common. A page type with many untagged fenced blocks should lose points even if the prose is otherwise clean.

For page types that intentionally render interactive UIs with JavaScript, judge them separately from normal docs pages. If the HTML shell is thin, check whether the page still provides:

a fetchable markdown summary
a directly linked machine-readable asset
a usable non-JS fallback

6. Score with the rubric

Use references/rubric.md.

Rules:

score only what you verified
mark non-applicable checks as N/A
normalize the final score against applicable points only
do not let optional manifest checks dominate the grade

Apply the foundational caps from the rubric. A site with broken discovery or broken markdown delivery should not earn a high grade because it has clean metadata.

Do not average away a weak page type. If one major page type, such as API reference, is materially worse than the rest of the corpus, call it out as the weakest segment and reflect it in the category notes.

7. Compare with external scanners when useful

If external scanner results are available, compare them to your live findings. Treat them as secondary evidence.

If a scanner and the live fetch disagree:

trust the live fetch
report the mismatch explicitly
explain whether the scanner is testing a different assumption

8. Produce a remediation list

Turn findings into a short backlog:

P0: fetchability or discovery blockers
P1: recurring structural or parity issues
P2: polish, optional manifests, or low-impact enhancements

For each remediation, include:

the failing signal
why it matters to agents
a concrete fix
whether it is sitewide or page-type-specific

9. Report in a stable format

Use references/report-template.md.

Always include:

overall score and grade
confidence level
sampled URLs or sample strategy
category scores
highest-priority findings
remediation backlog

Notes

Favor docs-delivery checks over marketing-site heuristics.
Do not fail a docs host for lacking MCP or plugin manifests unless the host itself is meant to expose tools.
Treat raw byte size as supporting evidence, not as a primary scoring input.
Prefer short evidence excerpts and commands over long copied page text.

agent-readiness-audit

Agent Readiness Audit

1. Set scope

2. Gather sitewide signals

3. Sample representative pages

4. Run fetch-path checks on each sample

5. Judge structure and legibility

6. Score with the rubric

7. Compare with external scanners when useful

8. Produce a remediation list

9. Report in a stable format

Notes

More from docker/docs

research

write

check-pr

triage-issue

create-pr

fix-issue