agent-readiness-audit

Installation
SKILL.md

Agent Readiness Audit

Audit the live site, not the source tree alone. Prefer the same fetch path an external agent would use in the wild: direct HTTP requests, sitemap sampling, and page-level inspection.

Do not reduce the result to a homepage-only scan or a binary checklist.

1. Set scope

Use $ARGUMENTS as the base URL when provided. Otherwise infer the base URL from context and state the assumption.

Decide whether the host being audited is:

  • a docs-only host
  • an app/tool host
  • a mixed host

This matters for optional checks such as MCP, plugin manifests, or other tool discovery files. Do not penalize a docs-only host for missing tooling manifests that belong on a separate service.

For docs.docker.com, treat the public docs host as docs-only. Docker's MCP server is published separately, so missing MCP files on the docs host should be reported as N/A, not as a failure.

2. Gather sitewide signals

Always check these resources first:

  • /llms.txt
  • /llms-full.txt
  • /robots.txt
  • /sitemap.xml

Only check host-level tool manifests when the host is an app/tool host, mixed host, or explicitly advertises them:

  • /.well-known/ai-plugin.json
  • /.well-known/agent.json
  • /.well-known/agents.json

Use the bundled script for a baseline:

bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
  "$ARGUMENTS"

The script produces baseline evidence only. You still need to interpret what matters for a docs property and score it with the rubric.

For docs-only hosts, you may skip tool-manifest probes to reduce noise:

CHECK_TOOL_MANIFESTS=0 \
  bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
  "$ARGUMENTS"

3. Sample representative pages

Use the sitemap when available. Do not rely on the homepage alone.

If llms.txt exists, sample some URLs from it as well. This helps catch stale or misleading discovery surfaces that a sitemap-only sample would miss.

Sample at least 12 pages when the site is large enough, and cover multiple page types:

  • homepage or docs landing page
  • section landing pages
  • task guides
  • product manuals
  • reference or API pages
  • tutorial or learning pages

If the sitemap is missing or unusable, discover pages through internal links and note the lower confidence.

If the site has distinct delivery patterns, sample each one. For example:

  • normal content pages
  • generated reference pages
  • versioned docs
  • localized docs

4. Run fetch-path checks on each sample

For each sampled page, verify:

  • HTML fetch status, content type, and final URL
  • Accept: text/markdown behavior
  • direct markdown route behavior such as <page>.md or another stable path
  • page-level markdown alternate links and whether they actually resolve
  • whether page actions such as "Open Markdown" agree with the working route
  • whether the HTML title or H1 matches the markdown H1 closely enough for retrieval parity
  • whether main content is present in the initial HTML
  • redirect chain length and canonical URL consistency
  • obvious chrome/noise in the markdown response

Do not assume a .md mirror exists just because another site uses one. Verify the actual markdown path the site exposes.

Treat these as separate signals:

  • negotiated markdown works
  • a stable direct markdown URL works
  • the page advertises the correct markdown URL

If the page advertises dead markdown alternates but a working markdown route exists, do not fail markdown delivery outright. Score it as a discoverability and consistency problem instead.

For API or generated reference pages, also verify whether a machine-readable asset such as OpenAPI YAML is directly linked and fetchable.

5. Judge structure and legibility

Measure structural signals:

  • exactly one h1
  • sane heading hierarchy
  • main and article presence where appropriate
  • canonical tags
  • JSON-LD or breadcrumb structured data
  • stable anchors and deep-linkable headings

Also make a qualitative judgment about agent legibility:

  • markdown strips site chrome cleanly
  • headings are specific and task-oriented
  • code blocks stay intelligible without client-side JS
  • the page is not dominated by banners, injected chat, or nav noise

Measure code block labeling explicitly when code samples are common. A page type with many untagged fenced blocks should lose points even if the prose is otherwise clean.

For page types that intentionally render interactive UIs with JavaScript, judge them separately from normal docs pages. If the HTML shell is thin, check whether the page still provides:

  • a fetchable markdown summary
  • a directly linked machine-readable asset
  • a usable non-JS fallback

6. Score with the rubric

Use references/rubric.md.

Rules:

  • score only what you verified
  • mark non-applicable checks as N/A
  • normalize the final score against applicable points only
  • do not let optional manifest checks dominate the grade

Apply the foundational caps from the rubric. A site with broken discovery or broken markdown delivery should not earn a high grade because it has clean metadata.

Do not average away a weak page type. If one major page type, such as API reference, is materially worse than the rest of the corpus, call it out as the weakest segment and reflect it in the category notes.

7. Compare with external scanners when useful

If external scanner results are available, compare them to your live findings. Treat them as secondary evidence.

If a scanner and the live fetch disagree:

  • trust the live fetch
  • report the mismatch explicitly
  • explain whether the scanner is testing a different assumption

8. Produce a remediation list

Turn findings into a short backlog:

  • P0: fetchability or discovery blockers
  • P1: recurring structural or parity issues
  • P2: polish, optional manifests, or low-impact enhancements

For each remediation, include:

  • the failing signal
  • why it matters to agents
  • a concrete fix
  • whether it is sitewide or page-type-specific

9. Report in a stable format

Use references/report-template.md.

Always include:

  • overall score and grade
  • confidence level
  • sampled URLs or sample strategy
  • category scores
  • highest-priority findings
  • remediation backlog

Notes

  • Favor docs-delivery checks over marketing-site heuristics.
  • Do not fail a docs host for lacking MCP or plugin manifests unless the host itself is meant to expose tools.
  • Treat raw byte size as supporting evidence, not as a primary scoring input.
  • Prefer short evidence excerpts and commands over long copied page text.
Related skills
Installs
1
Repository
docker/docs
GitHub Stars
4.6K
First Seen
4 days ago