google-news-seo
Google News SEO Diagnostic Engine
Language / 语言:Detect the user's language and respond in the same language throughout. 用户用中文提问则全程用中文回复;用英文提问则全程用英文回复。
Disclaimer / 声明:本 skill 为启发式审计框架,非 Google 官方结论;评分与清单不能替代 Search Console、Publisher Center 或法律顾问对内容政策的判断。
HTTP fetch · 抓取工具约定
凡需拉取线上页面 HTML 的步骤,本文件统一写为 web_fetch。在运行时请映射到你环境中实际可用的等价能力(例如 Cursor 内置网页抓取、MCP fetch 等),使用工具面板中的确切名称,避免因大小写或别名不一致导致未调用。
Priority rubric · 优先级分级(对齐业界)
- P0(Blocking / Compliance):直接影响抓取/收录,或涉及误导性标注与内容政策风险。例:
robots阻断、文章不可访问、datePublished明显错误、作者署名存在明显欺骗。 - P1(High impact):显著影响 Google News 可见性、稳定收录或排名竞争力,但通常不构成单点阻断。例:News sitemap 信号弱、Schema 关键字段缺失、作者页不完整。
- P2(Best practice / Optimization):优化点击率、可读性或质量信号,属于增益项。例:标题长度区间、关键词前置、heading 层级微调。
Industry note / 行业口径:依据 Google Search Central 与 Publisher Center 文档,Article/NewsArticle 结构化数据是推荐增强,并非 Top Stories/Google News 的绝对硬性门槛。只有当 Schema 与页面事实明显冲突或构成误导时,才可升级为 P0/P0-candidate。
Google News System Model
Google News operates on a two-layer architecture. This skill evaluates both layers independently.
| Layer | Name | Function |
|---|---|---|
| Layer 1 | News Index System | Determines whether a site enters the Google News index |
| Layer 2 | News Ranking System | Determines whether articles rank in topic clusters and appear in Top Stories |
Layer 1 must pass before Layer 2 matters. If a site is not indexed, ranking optimization is premature.
Diagnostic routing:
- Layer 1 fails → focus on index eligibility (publisher trust, crawlability, Schema, sitemap)
- Layer 1 passes, Layer 2 fails → focus on ranking signals (freshness, content type, cluster compatibility, competitors)
- Both pass → surface remaining optimization opportunities via competitor gap analysis
0 · Initial Assessment 审计前评估
Before starting any checks, gather context by asking the following questions. Skip any question already answered in the user's prompt.
| # | Question / 问题 |
|---|---|
| 1 | Site type / 站点类型 — Is this a dedicated news publisher, a corporate blog with a news section, or another site type? |
| 2 | Target topics / keywords / 目标关键词或议题 — Which topics, keywords, or named entities are most important for this audit? |
| 3 | Current status / 当前状态 — Any known issues, recent migrations, or Google Search Console alerts? |
| 4 | Scope / 审计范围 — Full Google News Diagnostic (Layer 1 + Layer 2 + Competitor Analysis), article-level audit, or a specific area (Schema only / EEAT only / Technical only / On-Page only)? |
Skip rule / 跳过规则: If the user's prompt already implies answers (e.g., provides a URL + says "check the Schema"), skip the answered questions and proceed directly.
Full Diagnostic trigger / 全站诊断触发: If scope is "Full Diagnostic" or the user asks "why is my site not in Google News" / "why don't my articles rank" / "why do competitors outrank me", run all Layer 1 + Layer 2 checks and generate the Google News Diagnostic Report (Section 10).
Acknowledgement / 确认语句: Before starting checks, output one line:
Auditing [URL / article / site] — scope: [scope summary]. Starting checks...
0.5 · News Index Presence Detection 索引存在检测
Run this check first when performing a Full Diagnostic or when the user asks why their site is not appearing in Google News.
Methodology limits / 方法局限
通过 WebSearch 构造的 site:<domain> 类查询 不等于 Google 内部「Google News 收录」状态:结果受索引区域、个性化、抓取快照时间与搜索界面模块影响。本节结论必须标注为 启发式(approximate),并建议用户用 Google Search Console、Publisher Center、在搜索工具中选择「新闻」或站点专属报告、以及 Rich Results Test 等交叉验证。
Step 1 — Simulate Google News index query
Use WebSearch to simulate a site:<domain> query in a Google News context. Construct the search as:
site:<domain> news articles
Analyze the returned results to determine whether the site's articles appear in Google News coverage.
Step 2 — Classify index presence
| Result | Status | Definition |
|---|---|---|
| 0 results detected | Not Indexed | No evidence of Google News indexing |
| 1–5 results | Limited Presence | Some articles indexed; inconsistent coverage |
| 6+ results across multiple days | Strong Presence | Active Google News publisher |
Also record:
- Total detected article count
- Timestamp of the most recently indexed article
- Whether results span multiple days/topics (diversity indicator)
Step 3 — Route diagnostic focus
| Index Status | Diagnostic Focus |
|---|---|
| Not Indexed | Prioritize Layer 1 checks; flag as root cause category |
| Limited Presence | Run both Layer 1 and Layer 2; present findings for each |
| Strong Presence | Note Layer 1 as passing; focus diagnostic output on Layer 2 ranking checks |
Output format
News Index Status: [Not Indexed ❌ / Limited Presence ⚠️ / Strong Presence ✅] (heuristic — not official Google News index)
Detected article count: [N]
Latest indexed article: [timestamp or "not detected"]
Diagnostic focus: [Layer 1 / Both Layers / Layer 2]
Suggested verification: [e.g. GSC coverage, Publisher Center, News tab site: search]
1 · Prerequisites 前置要求
Determine input type first / 先判断产物类型:
| Input | Action |
|---|---|
| Live URL / 线上 URL | Fetch page, extract JSON-LD Schema |
| Raw Schema JSON | Parse directly |
| URL or HTML + EEAT scan request / URL 或 HTML + EEAT 扫描请求 | Run full E-E-A-T analysis → see Sections 7–9 / 执行 EEAT 全维度分析 → 见第 7–9 节 |
Schema Fetch Protocol / Schema 获取流程
Retrieve JSON-LD in three sequential phases. Only advance to the next phase if the current one yields no JSON-LD blocks.
⚠️ Anti-pattern — do NOT do this: Do NOT report "JSON-LD 无法通过抓取自动提取(前端渲染)" or "client-side rendered" based solely on a failed
web_fetch. SSR/SSG sites (Next.js, Nuxt, Hugo, WordPress) pre-render JSON-LD into static HTML —curlcan fetch it directly. A fetch failure alone is not evidence of CSR.
Phase 1 — web_fetch(优先)
Use web_fetch to retrieve the page. If the response contains one or more <script type="application/ld+json"> blocks → extract and proceed to Schema analysis. Stop here.
Phase 2 — curl fallback(web_fetch 无结果时)
Note: Attempting curl fallback — SSR/SSG sites pre-render JSON-LD into static HTML; curl can retrieve it directly without JS execution.
sed -iportability / 跨平台:下文对临时 Python 脚本的占位符替换——macOS (BSD sed) 使用sed -i '' "s|…|…|g" file;Linux (GNU sed) 使用sed -i "s|…|…|g" file(无中间'')。任选与当前 OS 匹配的写法。
If Shell tool is unavailable, skip directly to Phase 3.
# Step A: fetch raw HTML with Googlebot UA
# Note: macOS mktemp doesn't support extensions; use a plain suffix
TMPFILE=$(mktemp /tmp/page_XXXXXX)
curl -sL \
-H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
--max-time 15 \
"<URL>" > "$TMPFILE" 2>&1
echo "File size: $(wc -c < "$TMPFILE") bytes"
- 403 / 429: retry once with Chrome UA:
-H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" - Timeout / non-zero exit / file size 0: skip to Phase 3
# Step B: extract JSON-LD — write Python script via heredoc to avoid quoting issues
cat > /tmp/extract_jsonld.py << 'PYEOF'
import re, json
html = open('/tmp/page_XXXXXX.html').read() # replace XXXXXX with actual suffix from $TMPFILE
pattern = re.compile(r'<script[^>]+type=["\']application/ld\+json["\'][^>]*>(.*?)</script>', re.DOTALL | re.IGNORECASE)
blocks = pattern.findall(html)
print(f'Total blocks found: {len(blocks)}')
for i, b in enumerate(blocks, 1):
print(f'=== JSON-LD Block {i} ===')
try:
print(json.dumps(json.loads(b.strip()), indent=2, ensure_ascii=False))
except Exception as e:
print(f'Parse error: {e}')
print(b.strip()[:500])
PYEOF
# Replace the filename placeholder with the actual $TMPFILE path, then run:
sed -i '' "s|/tmp/page_XXXXXX.html|$TMPFILE|g" /tmp/extract_jsonld.py
python3 /tmp/extract_jsonld.py
# Step B fallback — grep (if Python 3 unavailable)
grep -oE '<script[^>]+type="application/ld\+json"[^>]*>.*?</script>' "$TMPFILE" | sed 's/<[^>]*>//g'
# Step C: clean up
rm -f "$TMPFILE" /tmp/extract_jsonld.py
If extraction returns results → use for Schema analysis; note "Schema retrieved via curl fallback".
- Multiple blocks: extract all; use the block with
"@type": "NewsArticle"for Schema analysis - Malformed JSON: output raw block, flag as "partially retrievable", continue best-effort analysis
Phase 3 — 🔍 Manual(仅在两阶段均无结果时)
Mark Schema detection as 🔍 Manual. Output exactly:
Schema could not be auto-detected (web_fetch and curl both returned no JSON-LD).
Verify manually: 🔗 https://search.google.com/test/rich-results?url=<URL>
❌ Do NOT say: "前端渲染" / "client-side rendering" / "JavaScript-rendered" ✅ Only say: "could not be auto-detected"
3 hard requirements for Google News inclusion / 三项硬性门槛:
- Dedicated news publisher / 专属新闻网站 — the site's core purpose must be news, not a product with a news section
- Content policy compliance / 内容政策合规 — no dangerous / deceptive / manipulated-media content; AI-generated content must be transparently disclosed
- Technical compliance / 技术合规 — permanent URLs, HTML-rendered content,
robots.txtmust not block Googlebot-News
1.5 · Publisher Trust Detection 发布者信任度检测
Layer 1 check. Google evaluates whether the site operates as a legitimate news publisher before indexing it.
Step 1 — Verify trust pages
Use web_fetch to check each URL. A page passes if it returns HTTP 200 with non-trivial content (not a redirect to homepage).
| Page | URL pattern | Points |
|---|---|---|
| About | <domain>/about |
8 pts |
| Editorial Policy | <domain>/editorial-policy |
8 pts |
| Contact | <domain>/contact |
8 pts |
| Team | <domain>/team |
8 pts |
| Authors | <domain>/authors |
8 pts |
Subtotal (pages): 40 pts
Step 2 — Detect newsroom description
Scan the About page text for presence of keywords indicating a professional newsroom:
newsroom · editorial team · journalism · reporters · editor-in-chief · press · media organization
- Keywords present → 30 pts
- Absent → 0 pts, flag as P1: "No editorial identity statement found on About page"
Step 3 — Detect Organization Schema
Check homepage and About page JSON-LD for @type: Organization or @type: NewsMediaOrganization with fields: name, url, logo.
| Condition | Points |
|---|---|
| All three fields present | 30 pts |
| 1–2 fields present | 15 pts |
| Schema absent | 0 pts |
Publisher Trust Score Output
Publisher Trust Score: [0-100]
| Check | Result | Notes |
|-------|--------|-------|
| /about | Pass ✅ / Fail ❌ | |
| /editorial-policy | Pass ✅ / Fail ❌ | |
| /contact | Pass ✅ / Fail ❌ | |
| /team | Pass ✅ / Fail ❌ | |
| /authors | Pass ✅ / Fail ❌ | |
| Newsroom description | Pass ✅ / Fail ❌ | |
| Organization Schema | Pass ✅ / Partial ⚠️ / Fail ❌ | [missing fields] |
1.6 · Author Authority Detection 作者权威性检测
Layer 1 check. Google builds an author credibility graph. Unverifiable or AI-generated author attribution is a strong negative signal.
Step 1 — Verify author presence and schema match
Check both:
- Visible byline on the page (author name in HTML)
author.namein NewsArticle Schema
| Condition | Result |
|---|---|
| Both present and matching | Pass ✅ — 30 pts |
| Schema author missing | Fail ❌ — 0 pts (P1) |
| Mismatch between Schema and page byline | Fail ❌ — 0 pts (P0) |
Step 2 — Fetch author profile page
Use web_fetch on the URL from author.url (Schema) or the byline hyperlink.
A complete author profile requires:
- Author name visible
- Biography ≥ 50 words
- At least one of: job title, social media link, publication count
| Condition | Points |
|---|---|
| Full profile (name + bio 50w+ + credential) | 40 pts |
| Partial profile (name + bio only) | 20 pts |
| Profile missing or 404 | 0 pts (P1) |
Step 3 — Detect suspicious AI author names
Scan author.name (Schema) and page byline for obvious AI-as-author labels (e.g. role phrases), not arbitrary substrings inside human names:
AI Agent·ChatGPT·GPT-4·OpenAI(as sole byline) ·System Writer·Auto Writer·AI Writer·Gemini(as product credited as author) ·Claudeonly when clearly denoting the model as author (e.g. "Written by Claude"), not when part of a plausible human name (e.g. "Jean-Claude", "Claude Smith").
- Clear AI-as-author attribution → flag P0 候选,且必须标注 需人工确认;不得仅凭姓氏或常见人名片段判负。
- Ambiguous / substring-only match → 🔍 Manual,不扣满分项。
- No concerning pattern → 10 pts
Step 4 — Detect social / credential signals
Check author profile for: Twitter/X link, LinkedIn link, professional email, years of experience mentioned in bio, or domain expertise statement.
- 1+ signals present → 20 pts
- No signals → 0 pts
Author Authority Score Output
Author Authority Score: [0-100]
| Check | Result | Notes |
|-------|--------|-------|
| Author byline present | Pass ✅ / Fail ❌ | |
| Schema author.name matches byline | Pass ✅ / Mismatch ⚠️ / Fail ❌ | |
| Author profile page | Full ✅ / Partial ⚠️ / Missing ❌ | |
| No suspicious AI name | Pass ✅ / Flag ❌ | [matched keyword if flagged] |
| Social / credential signal | Present ✅ / Absent ⚠️ | |
2 · NewsArticle Schema Checklist Schema 审查清单
Core checks — high impact / 核心检查(高影响,默认 P1)
Calibration note / 评级口径:以下项默认按 P1 处理(影响可见性与富结果质量);若出现误导性标注(例如作者/发布时间与页面事实冲突),再升级为 P0 候选并要求人工确认。
- [ ] @context = "https://schema.org" (not http://)
- [ ] @type = "NewsArticle" (matches actual content type)
- [ ] dateModified >= datePublished (must not be earlier)
- [ ] image URL contains no AI-tool markers (qwen_generated / ChatGPT Image / dall-e / midjourney)
- [ ] author.name matches the byline shown on the page (Schema ≠ AI agent while page shows human)
- [ ] author is a real person with a verifiable author page (not a team name or AI Agent)
- [ ] publisher.logo exists and is a valid ImageObject
Recommended — affects rich results / 建议补充项(影响富摘要)
- [ ] mainEntityOfPage points to the article URL
- [ ] description field present (populated from article summary)
- [ ] articleSection set
- [ ] BreadcrumbList last item has an "item" URL
3 · AI Content Checks AI 内容专项
| Issue / 问题 | Risk / 风险 | Fix / 修复 |
|---|---|---|
| Image filename contains AI tool name | Manipulated media policy violation | Rename on upload; strip AI-tool prefixes |
Schema author ≠ page byline |
Deceptive markup | Use the same real editor name in both |
| No human editor byline | Insufficient E-E-A-T | Establish human editorial attribution |
AI Agent listed as author |
Unverifiable authority | Replace with the reviewing editor |
Recommended attribution pattern for AI content / AI 内容推荐署名模式:
- Schema
author→ real editor'sPersonnode - Page display → "Generated by AI, reviewed by [Editor Name]" / "AI 生成,经 [编辑姓名] 审校"
- Editor's author page must include: real name, professional bio, contact
4 · Schema Fix Template 修复模板
{
"@context": "https://schema.org",
"@type": "NewsArticle",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://example.com/news/article-slug/"
},
"headline": "Article title / 文章标题",
"description": "100-200 char summary / 文章摘要",
"datePublished": "2026-03-03T19:55:26-05:00",
"dateModified": "2026-03-03T20:10:00-05:00",
"image": ["https://cdn.example.com/images/article-cover.jpg"],
"author": [{
"@type": "Person",
"name": "Editor Name / 编辑姓名",
"url": "https://example.com/author/editor-slug/",
"jobTitle": "Senior Editor"
}],
"publisher": {
"@type": "Organization",
"name": "Publication Name",
"url": "https://example.com",
"logo": {
"@type": "ImageObject",
"url": "https://example.com/logo.png",
"width": 600,
"height": 60
}
},
"articleSection": "Markets",
"isAccessibleForFree": true
}
5 · Systemic Bugs 系统性 Bug(批量修复)
If the same issue appears across multiple articles, fix at the template level. 若多篇文章存在相同问题,需从模板层修复,不要逐篇处理。
| Bug | Root cause / 根因 | Fix / 修复 |
|---|---|---|
dateModified < datePublished |
Field assignment order / 赋值顺序错误 | Set modified to last-edited timestamp |
"http://schema.org" |
Hardcoded old protocol / 模板写死旧协议 | Global replace → "https://schema.org" |
publisher.logo missing |
Not in template / 模板缺字段 | Add logo ImageObject to Schema template |
description missing |
Not mapped from summary / 未映射摘要 | Auto-populate from summary / excerpt |
BreadcrumbList last item has no item |
Template omission / 模板漏写 | Set last item's item = article URL |
| AI-tool name in image filename | No rename on upload / 上传未重命名 | Strip AI-tool prefixes in upload pipeline |
5.5 · Technical SEO 技术检查
| Check / 检查项 | Pass Condition / 通过条件 | Priority | Auto / Manual |
|---|---|---|---|
| robots.txt accessible | <domain>/robots.txt returns HTTP 200 |
P1 | Auto |
| Googlebot-News not blocked | No Disallow rule under User-agent: Googlebot-News covering the article path |
P0 | Auto |
| News Sitemap exists | Sitemap URL (from robots.txt or <domain>/news-sitemap.xml) is accessible and contains <news:news> tags |
P1 | Auto |
News Sitemap has <news:news> tags |
Sitemap is valid per Google News Sitemap spec | P1 | Auto |
| News Sitemap freshness | At least one <news:publication_date> within the last 48 hours |
P1 | Auto |
| News Sitemap Health Score | accessible (30 pts) + valid news namespace (40 pts) + 48h freshness (30 pts) | — | Auto |
| Crawlability Score | article text in HTML (30 pts) + server-rendered Schema (30 pts) + canonical valid (20 pts) + robots not blocking (20 pts) | — | Auto |
| Core Web Vitals — LCP | LCP < 2.5s (Good); 2.5–4s (Needs Improvement ⚠️); > 4s (Poor ❌) | P1 | 🔍 Manual |
| Core Web Vitals — INP | INP < 200ms (Good); 200–500ms (⚠️); > 500ms (❌) | P1 | 🔍 Manual |
| Core Web Vitals — CLS | CLS < 0.1 (Good); 0.1–0.25 (⚠️); > 0.25 (❌) | P1 | 🔍 Manual |
| HTTPS | Article URL uses https:// |
P0 | Auto |
CWV verification / CWV 验证: Use PageSpeed Insights: https://pagespeed.web.dev/report?url=<url>
News Sitemap Health Score calculation:
Score = (accessible ? 30 : 0) + (news namespace valid ? 40 : 0) + (article within 48h ? 30 : 0)
5.6 · On-Page SEO 文章页检查
Title Tag / 标题标签
| Check | Pass Condition | Priority |
|---|---|---|
| Title tag present | <title> element exists and is non-empty |
P0 |
| Length 50–70 characters | Title length between 50 and 70 characters | P2 |
| Primary keyword near start | Key topic/entity within the first 60 characters | P2 |
| Unique (not duplicated) | Title is not identical to other pages (🔍 Manual) | P1 |
| No keyword stuffing | Same keyword not repeated > 2 times | P2 |
Meta Description
| Check | Pass Condition | Priority |
|---|---|---|
| Meta description present | <meta name="description"> exists and non-empty |
P2 |
| Length 120–160 characters | Description length between 120 and 160 characters | P2 |
| Contains primary keyword | Primary topic/entity appears in description | P2 |
Canonical Tag
| Check | Pass Condition | Priority |
|---|---|---|
| Canonical tag present | <link rel="canonical"> exists |
P1 |
| Points to correct URL | Canonical URL matches the article URL (self-referencing) or a valid canonical version | P1 / P0 if target returns non-200 |
Heading Structure
| Check | Pass Condition | Priority |
|---|---|---|
| Exactly one H1 | Page contains exactly one <h1> element |
P2 |
| H1 matches headline | H1 text consistent with article headline | P2 |
| Logical hierarchy | No heading levels skipped (e.g., no H1 → H3 without H2) | P2 |
| H1 contains primary keyword | Primary topic/entity appears in H1 | P2 |
5.7 · Crawlability and Rendering Check 爬取与渲染检测
Layer 1 check. Verifies that Googlebot can access and read the article content without JavaScript execution.
Step 1 — Fetch with Googlebot UA
TMPFILE=$(mktemp /tmp/crawl_XXXXXX)
curl -sL \
-H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
--max-time 15 \
"<ARTICLE_URL>" > "$TMPFILE" 2>&1
echo "File size: $(wc -c < "$TMPFILE") bytes"
Step 2 — Detect article body in initial HTML
sed -i: macOS 用sed -i '' "s|…|…|g";Linux (GNU) 用sed -i "s|…|…|g"(与 §1 Schema Phase 2 相同)。
cat > /tmp/check_body.py << 'PYEOF'
import re
html = open('TMPFILE_PATH').read()
clean = re.sub(r'<(script|style)[^>]*>.*?</(script|style)>', '', html, flags=re.DOTALL|re.IGNORECASE)
clean = re.sub(r'<[^>]+>', ' ', clean)
clean = re.sub(r'\s+', ' ', clean).strip()
print(f"Visible text length: {len(clean)} chars")
print(f"First 300 chars: {clean[:300]}")
PYEOF
sed -i '' "s|TMPFILE_PATH|$TMPFILE|g" /tmp/check_body.py
python3 /tmp/check_body.py
- ≥ 200 chars of visible text → article text present → 30 pts
- < 200 chars → flag: "Article text not detected in initial HTML — recommend verifying with Rich Results Test"
❌ Do NOT label as "client-side rendered" — only say "article text not detected in initial HTML"
Step 3 — Detect server-rendered Schema
Check whether raw HTML contains <script type="application/ld+json"> with "@type": "NewsArticle" before any dynamically loaded bundles.
| Result | Points | Note |
|---|---|---|
| NewsArticle Schema in raw HTML | 30 pts | Server-rendered ✅ |
| JSON-LD found but no NewsArticle type | 15 pts | Partial |
| No JSON-LD in raw HTML | 0 pts | Flag: "Potential hydration-only Schema — verify Next.js/Nuxt SSR is configured to pre-render Schema (use getServerSideProps or static export)" |
Step 4 — Verify canonical and robots
- Canonical tag present and pointing to article URL → 20 pts
- Not blocked by robots.txt for Googlebot or Googlebot-News → 20 pts
Crawlability Score Output
Crawlability Score: [0-100]
| Check | Result | Points | Notes |
|-------|--------|--------|-------|
| Article text in initial HTML | Pass ✅ / Fail ❌ | 30 / 0 | |
| Server-rendered Schema | Full ✅ / Partial ⚠️ / Absent ❌ | 30/15/0 | |
| Canonical tag valid | Pass ✅ / Fail ❌ | 20 / 0 | |
| Not blocked by robots | Pass ✅ / Fail ❌ | 20 / 0 | |
5.8 · URL Structure Analysis URL 结构分析
Layer 1 check. URL patterns affect Googlebot's ability to recognize and categorize news content.
Recommended patterns ✅
/news/<descriptive-slug>/article/<descriptive-slug>/<year>/<month>/<descriptive-slug>/<category>/<descriptive-slug>
Problematic patterns ❌
| Pattern | Example | Priority | Recommendation |
|---|---|---|---|
| Query-parameter article ID | ?id=123, ?article_id=456, ?p=789, ?postid=1 |
P1 | Migrate to path-based URL; implement 301 redirect |
| Numeric-only slug | /news/12345 (no keywords) |
P2 | Add descriptive keywords to URL slug |
| Session ID in URL | ?session=abc, ?PHPSESSID= |
P0 | Remove session parameters from indexable URLs |
| Tracking params without canonical | ?utm_source= without self-referencing canonical |
P1 | Ensure canonical points to clean URL |
Output
URL Structure: [Recommended ✅ / Needs Improvement ⚠️ / Problematic ❌]
Detected pattern: [pattern description]
Issues found: [list or "None"]
5.9 · Freshness Signal Analysis 新鲜度信号分析
Layer 2 check. Google News heavily weights freshness; publication timing is a competitive ranking factor.
Step 1 — Validate datePublished / dateModified
| Check | Pass Condition | Priority | Points |
|---|---|---|---|
| datePublished present | ISO 8601 timestamp in Schema | P0 | required |
| dateModified ≥ datePublished | Modified timestamp not earlier than published | P0 | 30 pts if valid |
| No excessive modification | Not modified > 5× within 24 hours | P1 | deduct 15 if flagged |
Step 2 — Estimate publication speed
Compare datePublished with the article's <news:publication_date> in the News Sitemap (if available), or use HTTP Last-Modified / Date response headers as proxy.
| Speed | Condition | Points |
|---|---|---|
| Fast | Sitemap entry within 30 min of datePublished | 40 pts |
| Moderate | 30 min – 1 hour gap | 20 pts |
| Slow | > 1 hour gap | 0 pts |
Step 3 — Flag manipulation signals
dateModifiedupdated repeatedly with no detectable content change → flag as "possible freshness manipulation" (P1)datePublishedset in the future → flag as P0 error
Freshness Score Output
Freshness Score: [0-100]
| Signal | Result | Points |
|--------|--------|--------|
| datePublished / dateModified valid | Pass ✅ / Fail ❌ | 30 / 0 |
| Publication speed | Fast ✅ / Moderate ⚠️ / Slow ❌ | 40/20/0 |
| No manipulation signals | Pass ✅ / Flagged ⚠️ | 30 / 15 |
5.10 · Content Type Classification 内容类型分类
Layer 2 check. Content type affects cluster placement, ranking duration, and competitive differentiation.
Step 1 — Classify article type
Evaluate based on headline, body length, quote count, and source references:
| Type | Signals | Base Score |
|---|---|---|
| Breaking News | Published within 2h of event; headline includes "Breaking", "Just In", "Developing"; body < 600 words | 90–100 |
| Analysis | Body 800+ words; 3+ named sources or citations; analytical headline ("Why", "How", "What This Means For") | 70–90 |
| Digest / Roundup | 5+ outbound links to sources; "Roundup", "Weekly", "Today's News" in headline | 40–60 |
| Aggregation | No direct quotes; no named reporter byline; summarizes other sources only | 20–40 |
Step 2 — Detect AI template patterns
Scan H2/H3 headings within the article body for:
Key Takeaways·Pros and Cons·Opposing Views·Summary·FAQ·Related Topics·Bottom Line
- Any pattern detected → flag as "AI template structure detected" (P1) → deduct 20 pts from base score
- Recommend restructuring to inverted-pyramid news format
Content Type Score Output
Content Type: [Breaking News / Analysis / Digest / Aggregation]
AI Template Detected: [Yes ⚠️ / No ✅]
Content Type Score: [0-100]
5.11 · Publisher Authority Estimation 发布者权威估算
Layer 2 check. Google evaluates publisher-level authority when ranking articles in topic clusters.
Estimate publisher authority using available signals:
| Signal | How to check | Points |
|---|---|---|
| Article volume | Count total articles in News Sitemap; 10+ = strong, 3–9 = moderate, < 3 = low | 25 pts |
| Organization Schema | @type: Organization or NewsMediaOrganization present on homepage |
25 pts |
| Brand mentions | WebSearch "<publisher name>" news — check result count and source quality |
25 pts |
| Internal link structure | Article pages link to author pages, topic hubs, related articles | 25 pts |
Publisher Authority Score Output
Publisher Authority Score: [0-100]
| Signal | Result | Notes |
|--------|--------|-------|
| Article volume | Strong ✅ / Moderate ⚠️ / Low ❌ | [count] |
| Organization Schema | Present ✅ / Absent ❌ | |
| Brand mentions | Strong ✅ / Moderate ⚠️ / Weak ❌ | |
| Internal link structure | Good ✅ / Partial ⚠️ / Poor ❌ | |
5.12 · Topic Cluster Compatibility 话题聚类兼容性
Layer 2 check. Google News groups articles about the same event into topic clusters; these signals determine cluster membership.
Step 1 — Headline entity analysis
Check whether the headline contains at least one named entity (person, organization, location, financial instrument, or event name).
- Named entity present → 25 pts; note the detected entity
- Generic headline (no named entities) → 0 pts (P1): "Add specific entity names to headline for cluster matching"
Step 2 — Entity density in body
Count named entities per 100 words in the article body (use recognizable proper nouns as proxy):
- ≥ 1 entity per 100 words → 25 pts
- < 1 entity per 200 words → 0 pts (P1): "Increase named entity density — add specific company names, people, or locations"
Step 3 — Timeliness check
- Article published within 6 hours of the referenced event → 25 pts
- Published 6–24 hours after → 15 pts
- Published > 24 hours after → 0 pts
Step 4 — Original reporting signals
Check for any of:
-
Direct quoted speech with attribution (
"…" said [Name]) -
Exclusive data, statistics, or documents
-
Bylined reporter with a domain-specific author profile
-
1+ signals present → 25 pts
-
None detected → 0 pts (P1): "Add original reporting signals — direct quotes or exclusive data"
Topic Cluster Compatibility Score Output
Topic Cluster Compatibility Score: [0-100]
| Signal | Result | Points |
|--------|--------|--------|
| Headline named entity | Present ✅ / Absent ❌ | 25 / 0 |
| Entity density | High ✅ / Low ❌ | 25 / 0 |
| Timeliness | < 6h ✅ / 6-24h ⚠️ / > 24h ❌ | 25/15/0 |
| Original reporting | Present ✅ / Absent ❌ | 25 / 0 |
5.13 · Top Stories Detection Top Stories 检测
Layer 2 check. Determines whether the article topic triggers a Google Top Stories carousel and whether the analyzed site appears in it.
Step 1 — Extract topic keyword
From the article headline, identify the primary entity or event name. Use the most specific named entity (e.g., "Apple WWDC 2026" rather than "tech event").
Step 2 — Search for Top Stories carousel
Use WebSearch to search the topic keyword. Analyze results for:
- A "Top Stories", "News", or "In the News" carousel module
- Publisher names and headlines appearing in the carousel
- Timestamps of carousel articles
| Result | Output |
|---|---|
| Top Stories carousel found | Topic triggers Top Stories; extract publisher list |
| No carousel found | "No Top Stories carousel detected for this topic" |
Step 3 — Compare site against carousel publishers
Check whether the analyzed domain appears in the list of carousel publishers.
| Result | Status |
|---|---|
| Analyzed site in carousel | Top Stories Presence: Confirmed ✅ |
| Analyzed site absent, competitors present | Top Stories Presence: Gap detected ❌ |
| No carousel exists | Top Stories Presence: Topic not triggering carousel ⚠️ |
Output
Top Stories Presence: [Confirmed ✅ / Gap detected ❌ / Not triggering ⚠️]
Competitors in carousel:
| Publisher | Headline | Published |
|-----------|----------|-----------|
| [name] | [headline] | [time] |
6 · Output Report Format 总结报告格式
After analysis, output the report in the following structure. 分析完成后按对应结构输出。
Full Diagnostic mode (Section 10 triggered): System Model → Dual-Layer Scorecard → Executive Summary → Detailed Check Tables → Competitor Gap Analysis → Priority Fix List
Article / Scoped audit mode: Executive Summary → Detailed Check Tables → Priority Fix List
Dual-Layer Scorecard (Full Diagnostic only)
Output at the top of Full Diagnostic reports, before the Executive Summary.
## Google News Diagnostic Report
**Google News SEO Score: XX / 100** [🟢 Strong / 🟡 Developing / 🔴 At Risk]
| Layer | Score | Status |
|-------|-------|--------|
| Layer 1 — Index Eligibility | XX/100 | Pass ✅ / Partial ⚠️ / Fail ❌ |
| Layer 2 — Ranking Potential | XX/100 | Pass ✅ / Partial ⚠️ / Fail ❌ |
**Diagnosis**: [plain-language statement — see Section 10 for routing logic]
Executive Summary 执行摘要
Output this section first (after Dual-Layer Scorecard if present), before all detailed tables.
Overall Health / 总体健康度:
| Condition | Rating |
|---|---|
| 0 P0 issues AND ≤ 2 P1 issues | 🟢 Good |
| 1–2 P0 issues OR 3–5 P1 issues | 🟡 Needs Work |
| 3+ P0 issues OR 6+ P1 issues | 🔴 Critical |
Top Issues / 优先问题(最多 5 项):
- [P0] **[Area]**: Brief description of issue
- [P1] **[Area]**: Brief description of issue
If more than 5 issues exist, show the 5 most severe and add: "See full findings below for all issues." If no issues found: "No critical issues found ✅"
Quick Wins / 快速修复(最多 3 项,低成本高收益):
- **[Fix name]**: One-sentence instruction (e.g., "Add publisher.name to NewsArticle Schema")
Quick Win criteria: fixable in < 30 minutes, no code deployment required (e.g., CMS field edit, meta tag addition, file rename). If none qualify: "No quick wins identified — remaining issues require development work."
Executive Summary template / 执行摘要模板:
### Executive Summary
**Overall Health**: 🟢 Good / 🟡 Needs Work / 🔴 Critical
**Scope**: [what was audited]
**Top Issues**
- [P0] **Schema**: dateModified is earlier than datePublished
- [P1] **On-Page**: Meta description missing on article page
- [P1] **Technical**: No News Sitemap found
**Quick Wins**
- **Fix dateModified**: Set dateModified to the last-edited timestamp in your CMS Schema output
- **Add meta description**: Map the article excerpt/summary field to meta description in your theme template
Detailed Findings / 逐项检查表
Table / 表格:
| Check item / 检查项 | Result / 结果 | Notes / 说明 |
|---|---|---|
| (item) | Pass ✅ / Fail ❌ / Manual 🔍 | Issue description or fix suggestion |
Priority Fix List / 优先级修复列表
- P0 — blocks indexing or violates content policy / 影响收录或违反内容政策
- P1 — affects rich results or EEAT signal strength / 影响富摘要或 EEAT 信号
- P2 — best practice / 规范性
6.5 · Competitor Gap Analysis 竞争对手差距分析
Run during Full Diagnostic or when the user asks "why do competitors rank higher?" / "为什么竞争对手排名更高?"
Step 1 — Identify top competitors
Use WebSearch to search: "<article topic>" news
Extract the top 3–5 news publisher results: domain, headline, publish timestamp.
Competitors detected:
| Rank | Publisher | Headline | Published |
|------|-----------|----------|-----------|
| 1 | [domain] | [headline] | [time] |
Step 2 — Fetch competitor articles
For each competitor URL, use the three-phase fetch protocol (web_fetch → curl → Manual) to extract:
datePublishedfrom Schema or page- NewsArticle Schema completeness: count present required fields out of 9 (
@type,headline,image,datePublished,dateModified,author,publisher,publisher.logo,mainEntityOfPage) - Author authority: named author with linked profile page (Yes / Partial / No)
Step 3 — Compute gap metrics
Publication speed gap:
Your site: datePublished → [timestamp]
Earliest competitor: [publisher] → [timestamp]
Gap: [X minutes / hours] → [Advantage ✅ / Disadvantage ❌]
Full comparison table:
| Publisher | Schema Completeness | Author Authority | Publish Speed |
|-----------|---------------------|------------------|---------------|
| Your site | XX% (X/9 fields) | Full / Partial / None | [timestamp] |
| [Competitor 1] | XX% | Full / Partial / None | [timestamp] |
| [Competitor 2] | XX% | Full / Partial / None | [timestamp] |
Step 4 — Output gap analysis summary
## Competitor Gap Analysis
**Topic**: [search topic]
**Competitors analyzed**: [N]
### Key Gaps
**Publication Speed**
Your site: [X min after event]
Competitors avg: [Y min after event]
Gap: [difference] → [recommendation if gap > 15 min]
**Schema Completeness**
Your site: [XX%]
Competitors avg: [XX%]
Gap: [difference] → [recommendation if gap > 10%]
**Author Authority**
[Competitors provide full author profiles / Your site matches competitor standard]
### Recommendations
- [Specific action to close each identified gap]
7 · EEAT Scan 触发与输入处理
Trigger words / 触发词:
EEAT 扫描 / Run EEAT scan / 扫描 EEAT / EEAT audit / EEAT 审计 / 做个 EEAT 扫描
Step 1 — Read the signal checklist / 第一步:读取检查项清单
Before scanning, read eeat-reference.md (same directory as this file) to load all 24 signal definitions.
Read: eeat-reference.md
Step 2 — Determine input type / 第二步:判断输入类型
| Input | Action |
|---|---|
| Live URL / 线上 URL | Use web_fetch to fetch the page; extract full page HTML, JSON-LD, and visible text |
| Raw HTML / 原始 HTML | Parse provided HTML directly; no fetch needed |
| URL unreachable / 无法抓取 | Mark all signals that require live page inspection as 🔍 Manual; proceed with available information |
Step 3 — Execute dimensional scans / 第三步:按维度执行扫描
Run the four dimensions in order: Experience → Expertise → Authoritativeness → Trustworthiness
For each signal in eeat-reference.md:
- Check the pass condition against the fetched page content
- Record result: Pass ✅ / Fail ❌ / 🔍 Manual
- For Fail results, note what was found and what is expected
- For Manual signals, note what requires manual verification
8 · EEAT 评分算法
Per-dimension score / 维度评分:
维度分 = floor(该维度通过项数 / 该维度有效项数 × 100)
- 有效项数 = 总项数 − 🔍 Manual 项数(Manual 项从分母中排除)
- 四个维度各自独立计算
Total score / 总分:
总分 = floor((经验分 + 专业度分 + 权威性分 + 可信度分) / 4)
Rating labels / 评级标签:
| Score range | Label |
|---|---|
| 80–100 | 良好 ✅ |
| 50–79 | 需改进 ⚠️ |
| 0–49 | 差 ❌ |
9 · EEAT 报告格式
After completing all scans, output the report in the following structure. Use the same language as the user's prompt throughout (Chinese prompt → Chinese report; English prompt → English report).
Report template / 报告模板:
## EEAT 扫描报告
**扫描对象**:https://example.com/article/ (或 "Raw HTML input")
**扫描日期**:YYYY-MM-DD
**总分:XX / 100**
---
### 维度总览
| 维度 | 得分 | 评级 |
|------|------|------|
| 经验 (Experience) | XX | 良好 ✅ / 需改进 ⚠️ / 差 ❌ |
| 专业度 (Expertise) | XX | ... |
| 权威性 (Authoritativeness) | XX | ... |
| 可信度 (Trustworthiness) | XX | ... |
---
### 经验 (Experience) — XX/100
| 检查项 | 结果 | 说明 |
|--------|------|------|
| 第一手内容标识 | Pass ✅ | 符合要求 |
| 经历日期明确 | Fail ❌ | 未注明具体经历时间 |
| 作者署名可见 | Pass ✅ | 符合要求 |
| 作者简介链接存在 | 🔍 Manual | 需手动核查作者页是否可访问 |
| 原创媒体 | Fail ❌ | 图片文件名含 "dall-e" |
### 专业度 (Expertise) — XX/100
| 检查项 | 结果 | 说明 |
|--------|------|------|
| ... | ... | ... |
### 权威性 (Authoritativeness) — XX/100
| 检查项 | 结果 | 说明 |
|--------|------|------|
| ... | ... | ... |
### 可信度 (Trustworthiness) — XX/100
| 检查项 | 结果 | 说明 |
|--------|------|------|
| ... | ... | ... |
---
### 行动建议
**P0 — 立即修复(影响收录或违反内容政策)**
- **[经验]** 原创媒体 — 将图片文件名中的 "dall-e" 前缀去除,上传流程中禁止保留 AI 工具名称
- **[可信度]** HTTPS — 将站点迁移至 HTTPS 并配置 301 重定向
**P1 — 应尽快修复**
- **[专业度]** 作者资质说明 — 在作者简介中补充职业背景或领域经验描述
- **[权威性]** 发布方名称缺失 — 在 NewsArticle Schema 的 publisher.name 字段填写机构名称
**P2 — 建议跟进**
- **[专业度]** 内容深度 — 文章不足 500 字,建议扩充至覆盖 3 个以上子议题
10 · Google News Diagnostic Report 完整诊断报告
Trigger: Run this section when scope is "Full Diagnostic", or when the user asks why their site is not in Google News / why articles don't rank / why competitors outrank them.
Score Aggregation / 评分聚合
Each Layer 1 and Layer 2 sub-metric below is scored 0–100 first (use the checklists in earlier sections; binary checks map to 100 / 50 / 0 as each subsection defines).
Layer 1 Score (0–100) — weights within Layer 1 sum to 100%:
Layer1 = 0.25×PublisherTrust + 0.25×AuthorAuthority + 0.25×SchemaHealth
+ 0.17×NewsSitemapHealth + 0.08×Crawlability
| Sub-check | Weight (of Layer 1) |
|---|---|
| Publisher Trust Score | 25% |
| Author Authority Score | 25% |
| Schema Health (completeness % as 0–100) | 25% |
| News Sitemap Health Score | 17% |
| Crawlability Score | 8% |
Layer 2 Score (0–100) — weights within Layer 2 sum to 100% (renormalized from 15∶10∶10∶5):
Layer2 = 0.375×Freshness + 0.25×ContentType + 0.25×TopicClusterCompat
+ 0.125×TopStoriesSignal
| Sub-check | Weight (of Layer 2) |
|---|---|
| Freshness Score | 37.5% |
| Content Type Score | 25% |
| Topic Cluster Compatibility Score | 25% |
| Top Stories Presence (fixed mapping: Confirmed→100, Gap→0, Not triggering→50) | 12.5% |
Final Google News SEO Score (0–100):
Google News SEO Score = 0.6 × Layer1 + 0.4 × Layer2
(即:Layer 1 占最终总分的 60% 权重,Layer 2 占 40%;上表百分比是 层内 合成 Layer1/Layer2 时用,勿与 60/40 混淆。)
Rating labels:
| Score | Label |
|---|---|
| 80–100 | 🟢 Strong |
| 50–79 | 🟡 Developing |
| 0–49 | 🔴 At Risk |
Diagnosis Routing / 诊断结论路由
| Condition | Diagnosis Statement |
|---|---|
| Layer 1 score < 50 | "Primary issue: This site is likely not eligible for Google News indexing. Fix Layer 1 issues before optimizing for ranking." |
| Layer 1 50–69 (Partial) | "Site has partial Google News index presence. Resolve remaining Layer 1 gaps to achieve consistent indexing, then address Layer 2 ranking." |
| Layer 1 ≥ 70, Layer 2 < 50 | "Site is indexed but articles are not competitive in Google News clusters. Focus on Layer 2 ranking improvements." |
| Layer 1 ≥ 70, Layer 2 ≥ 70 | "Site and articles meet Google News baseline requirements. Competitor gap analysis shows remaining optimization opportunities." |
Full Diagnostic Report Template / 完整诊断报告模板
## Google News Diagnostic Report
**Analyzed**: [URL or domain]
**Date**: [YYYY-MM-DD]
**Google News SEO Score**: XX / 100 🟢 Strong / 🟡 Developing / 🔴 At Risk
---
### Dual-Layer Scorecard
| Layer | Score | Status |
|-------|-------|--------|
| Layer 1 — Index Eligibility | XX/100 | Pass ✅ / Partial ⚠️ / Fail ❌ |
| Layer 2 — Ranking Potential | XX/100 | Pass ✅ / Partial ⚠️ / Fail ❌ |
**Diagnosis**: [statement from routing table above]
---
### Layer 1 — Index Eligibility
| Check | Score | Status |
|-------|-------|--------|
| News Index Status | — | Not Indexed ❌ / Limited ⚠️ / Strong ✅ |
| Publisher Trust | XX/100 | Pass / Partial / Fail |
| Author Authority | XX/100 | Pass / Partial / Fail |
| Schema Health | XX% complete | Pass / Partial / Fail |
| News Sitemap Health | XX/100 | Pass / Partial / Fail |
| Crawlability | XX/100 | Pass / Partial / Fail |
| URL Structure | — | Recommended ✅ / Issues ⚠️ |
---
### Layer 2 — Ranking Potential
| Check | Score | Status |
|-------|-------|--------|
| Freshness | XX/100 | Fast ✅ / Moderate ⚠️ / Slow ❌ |
| Content Type | [type] / XX pts | Breaking / Analysis / Digest / Aggregation |
| Publisher Authority | XX/100 | Strong / Moderate / Weak |
| Topic Cluster Compatibility | XX/100 | High / Medium / Low |
| Top Stories Presence | — | Confirmed ✅ / Gap ❌ / Not triggering ⚠️ |
---
### Competitor Gap Analysis Summary
[See Section 6.5 output]
---
### Priority Fix List
**P0 — Fix immediately (blocks indexing)**
- [item]
**P1 — Fix soon (affects ranking)**
- [item]
**P2 — Best practice**
- [item]
References 参考资源
- Google News ranking factors, optimization strategies, AI content policy, News Sitemap examples, two-layer architecture model, Topic Cluster signals: 见 reference.md
- EEAT signal definitions and priority table: 见 eeat-reference.md
More from wghust/stark-skills
openspec-design
Extends OpenSpec with design asset integration. Use when the user asks to "run openspec-design", "enhance openspec design", or "apply openspec-design". When invoked, updates the project's openspec/AGENTS.md to extend the proposal flow with Figma MCP, design asset directory, and design-map.md.
16insight-pdf
Generates professional corporate/business report PDFs from text or Markdown via HTML. Uses ECharts for advanced visualizations (heatmaps, radar, gauge, sankey), rich design system (gradient covers, stat cards, callout boxes, progress bars, timelines), and professional typography. Use when the user wants enterprise-quality report PDFs with modern data storytelling.
16nextjs-debug
Diagnose and fix Next.js project startup, compilation, and runtime errors by analyzing logs and project source code. Use when the user reports a Next.js startup error, compilation failure, hydration mismatch, module not found, Server/Client boundary violation, environment variable issue, or port conflict. Also triggers on "next dev fails", "next build error", "Turbopack error", "use client missing", "module not found", "Cannot find module", "hydration failed". 诊断并修复 Next.js 项目启动、编译和运行时错误。当用户粘贴启动日志、描述 Next.js 报错、询问"启动失败"、"编译报错"、"模块找不到"、"Server Component 报错"、"hydration 错误"、"环境变量不生效"时使用。
14c4-interactive-html
|
13mac-wash
|
12git-intelligence
|
9