blog-factcheck
Blog Fact-Check
Verify statistics, claims, and source attributions in blog posts. Pure Claude pipeline with no external NLP dependencies.
Workflow
Step 1: Read the Blog Post
Read the target file and identify all sections containing data claims.
Step 2: Extract Statistical Claims
Scan the full text for every claim that includes a number, percentage, dollar amount, or named source. Build a claims list with these fields:
| Field | Description |
|---|---|
| claim_text | The exact sentence or phrase containing the statistic |
| value | The numeric value (e.g., "42%", "$1.2M", "3x") |
| attribution | Named source if present (e.g., "HubSpot", "Gartner 2025") |
| url | Cited URL if present (from markdown link or parenthetical) |
| location | Heading or line number where the claim appears |
Step 3: Verify Cited Claims
For each claim that includes a URL:
- Fetch the source page via WebFetch
- Search the returned content for the specific numeric value
- If exact value found, check surrounding context matches the claim topic
- Assign a confidence score (see Verification Scoring below)
Process claims sequentially to avoid rate-limiting source sites.
Step 4: Flag Uncited Claims
For claims without a URL:
- Mark status as UNVERIFIED
- Suggest a search query the user can run to find a source
- If the attribution names a specific organization, suggest their domain
Step 5: Generate Verification Report
Output the full results table, summary statistics, and recommended actions.
Claim Extraction Patterns
Identify claims matching these structures:
Fully cited (highest priority):
[Number]% [claim] ([Source], [Year])- parenthetical citation[claim] [Number]% ... [markdown link to source]- inline linkAccording to [Source], [Number]...- attribution lead
Uncited statistics (flag for sourcing):
[Number]% of [noun phrase]- standalone percentage[Number]x more/less/higher/lower- multiplier claims$[Number] [claim]- dollar figures without attribution
Weak signals (check context before extracting):
studies show,research indicates,data suggests+ nearby numbersurvey found,report reveals,analysis shows+ nearby number- Round numbers in isolation (e.g., "millions of users") - skip unless specific
Verification Scoring
| Score | Status | Criteria |
|---|---|---|
| 1.0 | VERIFIED | Exact number found on cited page in matching context |
| 0.7-0.9 | PARAPHRASE | Similar data found but with different wording, rounding, or timeframe |
| 0.3-0.6 | WEAK | Source page exists and covers the topic but the specific statistic is not visible |
| 0.0 | NOT FOUND | Cited page does not contain the claimed data anywhere |
| N/A | UNVERIFIED | No source URL provided for the claim |
Scoring guidance:
- A claim of "43%" when the source says "nearly half" scores 0.8
- A claim of "2024" data when the source only has "2023" scores 0.7
- A claim citing a homepage when the stat lives on a subpage scores 0.3
- A 404 or unreachable URL scores 0.0
Output Format
Verification Report: [Post Title]
File: [path] Claims found: [total] Verified: [count] | Paraphrase: [count] | Weak: [count] | Not Found: [count] | Unverified: [count]
| # | Claim | Source URL | Score | Status | Notes |
|---|---|---|---|---|---|
| 1 | "73% of marketers..." | https://example.com/report | 1.0 | VERIFIED | Exact match found in section 3 |
| 2 | "5x ROI improvement" | https://example.com/study | 0.8 | PARAPHRASE | Source says "nearly 5x" |
| 3 | "60% prefer video" | (none) | N/A | UNVERIFIED | Try: "video preference statistics 2025" |
Recommended Actions
- [List claims that need source URLs]
- [List claims with weak or not-found scores that need replacement sources]
- [List claims where the source data may be outdated]
Integration
This skill can be called from blog-analyze as an optional deep-verification step.
When invoked from the analyzer, only claims scoring below 0.7 are flagged in the
analysis report.
Standalone usage: /blog factcheck path/to/post.md
Limitations
- Paywalled content: WebFetch cannot access content behind login walls. These score as WEAK (0.5) with a note about paywall detection.
- Dynamic pages: JavaScript-rendered content may not be available via WebFetch. If the page returns minimal content, note this in the status.
- PDF sources: WebFetch may not extract PDF text reliably. Flag PDF URLs for manual verification.
- Archived pages: If a URL returns 404, suggest checking web.archive.org.
- Rate limits: Process no more than 10 URLs per run to avoid overwhelming source servers. If a post has more than 10 cited URLs, verify the first 10 and list the remainder as SKIPPED.