blog-factcheck

SKILL.md

Blog Fact-Check

Verify statistics, claims, and source attributions in blog posts. Pure Claude pipeline with no external NLP dependencies.

Workflow

Step 1: Read the Blog Post

Read the target file and identify all sections containing data claims.

Step 2: Extract Statistical Claims

Scan the full text for every claim that includes a number, percentage, dollar amount, or named source. Build a claims list with these fields:

Field Description
claim_text The exact sentence or phrase containing the statistic
value The numeric value (e.g., "42%", "$1.2M", "3x")
attribution Named source if present (e.g., "HubSpot", "Gartner 2025")
url Cited URL if present (from markdown link or parenthetical)
location Heading or line number where the claim appears

Step 3: Verify Cited Claims

For each claim that includes a URL:

  1. Fetch the source page via WebFetch
  2. Search the returned content for the specific numeric value
  3. If exact value found, check surrounding context matches the claim topic
  4. Assign a confidence score (see Verification Scoring below)

Process claims sequentially to avoid rate-limiting source sites.

Step 4: Flag Uncited Claims

For claims without a URL:

  • Mark status as UNVERIFIED
  • Suggest a search query the user can run to find a source
  • If the attribution names a specific organization, suggest their domain

Step 5: Generate Verification Report

Output the full results table, summary statistics, and recommended actions.

Claim Extraction Patterns

Identify claims matching these structures:

Fully cited (highest priority):

  • [Number]% [claim] ([Source], [Year]) - parenthetical citation
  • [claim] [Number]% ... [markdown link to source] - inline link
  • According to [Source], [Number]... - attribution lead

Uncited statistics (flag for sourcing):

  • [Number]% of [noun phrase] - standalone percentage
  • [Number]x more/less/higher/lower - multiplier claims
  • $[Number] [claim] - dollar figures without attribution

Weak signals (check context before extracting):

  • studies show, research indicates, data suggests + nearby number
  • survey found, report reveals, analysis shows + nearby number
  • Round numbers in isolation (e.g., "millions of users") - skip unless specific

Verification Scoring

Score Status Criteria
1.0 VERIFIED Exact number found on cited page in matching context
0.7-0.9 PARAPHRASE Similar data found but with different wording, rounding, or timeframe
0.3-0.6 WEAK Source page exists and covers the topic but the specific statistic is not visible
0.0 NOT FOUND Cited page does not contain the claimed data anywhere
N/A UNVERIFIED No source URL provided for the claim

Scoring guidance:

  • A claim of "43%" when the source says "nearly half" scores 0.8
  • A claim of "2024" data when the source only has "2023" scores 0.7
  • A claim citing a homepage when the stat lives on a subpage scores 0.3
  • A 404 or unreachable URL scores 0.0

Output Format

Verification Report: [Post Title]

File: [path] Claims found: [total] Verified: [count] | Paraphrase: [count] | Weak: [count] | Not Found: [count] | Unverified: [count]

# Claim Source URL Score Status Notes
1 "73% of marketers..." https://example.com/report 1.0 VERIFIED Exact match found in section 3
2 "5x ROI improvement" https://example.com/study 0.8 PARAPHRASE Source says "nearly 5x"
3 "60% prefer video" (none) N/A UNVERIFIED Try: "video preference statistics 2025"

Recommended Actions

  • [List claims that need source URLs]
  • [List claims with weak or not-found scores that need replacement sources]
  • [List claims where the source data may be outdated]

Integration

This skill can be called from blog-analyze as an optional deep-verification step. When invoked from the analyzer, only claims scoring below 0.7 are flagged in the analysis report.

Standalone usage: /blog factcheck path/to/post.md

Limitations

  • Paywalled content: WebFetch cannot access content behind login walls. These score as WEAK (0.5) with a note about paywall detection.
  • Dynamic pages: JavaScript-rendered content may not be available via WebFetch. If the page returns minimal content, note this in the status.
  • PDF sources: WebFetch may not extract PDF text reliably. Flag PDF URLs for manual verification.
  • Archived pages: If a URL returns 404, suggest checking web.archive.org.
  • Rate limits: Process no more than 10 URLs per run to avoid overwhelming source servers. If a post has more than 10 cited URLs, verify the first 10 and list the remainder as SKIPPED.
Weekly Installs
6
GitHub Stars
275
First Seen
2 days ago
Installed on
cursor6
opencode5
gemini-cli5
claude-code5
github-copilot5
codex5