OpenEvidence Citation Audit v2

Independently verify every citation in an OpenEvidence response using parallel PubMed/bioRxiv/ClinicalTrials lookups, detect transitive citation errors, find cross-citation contradictions, and produce a structured accuracy report.

Prerequisites

OpenEvidence MCP server connected and authenticated
PubMed MCP server available (citation verification + full text)
bioRxiv MCP server available (preprint verification)
Clinical Trials MCP server available (endpoint verification)
WebFetch available (DOI resolution fallback)

Trigger

Use this skill when:

User asks to "audit", "verify", or "check" an OpenEvidence response
User asks to query OpenEvidence and validate its citations
User wants to critically appraise OE's evidence base

Input

Either:

A topic/question -- skill queries OE then audits the response
An existing OE article ID -- skill fetches and audits it directly

Output Structure

{topic-slug}/
  original.md          # Raw OE extracted answer
  assets/
    citations.bib      # BibTeX block from OE
  report.md            # Structured audit report with provenance + contradictions

Known OE Failure Modes (why this skill exists)

OE uses a full-text RAG system with these indexed corpora (decoded from ROT-1 obfuscated origin field):

Origin Key	Decoded	Risk
`qvcnfe_bctusbdut_*`	pubmed_abstracts_hindex_35pct_ada3small	Low -- abstract-level claims
`kbdd_gvmmufyu_tdsbqfe`	jacc_fulltext_scraped	Medium -- full-text chunks
`mbodfu_gvmmufyu_tdsbqfe`	lancet_fulltext_scraped	HIGH -- reviews quoting others
`ofkn_sfwjfx_bsujdmf_*`	nejm_review_article_fulltext_sftp	HIGH -- reviews quoting others
`hvjefmjoft_gvmmufyu_*`	guidelines_fulltext_usa_manual	Medium -- guideline recommendations
`nfejb_boopubufe_hfnjoj`	media_annotated_gemini	Medium -- AI-annotated figures

Primary failure mode: Transitive Citation. When a review (e.g., Nauck 2026) writes "GLP-1 RA reduce stroke by 13%" citing Kristensen 2019, OE retrieves that chunk and attributes the stroke finding to Nauck. But Nauck is just quoting -- it's not their finding. If another meta-analysis (Galli 2025) finds NO stroke benefit, OE doesn't detect the contradiction.

Workflow

Phase 1: Query OpenEvidence

1. Call oe_auth_status() -- abort if invalid
2. Call oe_ask with:
   - question: <user's topic>
   - include_bibtex: true
   - crossref_validate: true
   - wait_for_completion: true
   - timeout_sec: 120
3. Save extracted_answer_raw -> {topic-slug}/original.md
4. Save BibTeX block -> {topic-slug}/assets/citations.bib
5. Record: article_id, citationCount, crossrefValidatedCount

Phase 2: Parse, Map, and Decode Provenance

Orchestrator extracts from the structured_article (not just BibTeX):

# Access structured spans with citation metadata
sections = article.output.structured_article.articlesection_set
for section in sections:
    for para in section.articleparagraph_set:
        for span in para.articlespan_set:
            text = span.text
            for citation in span.citations:
                # Decode provenance
                raw_origin = citation.metadata.origin
                origin = ''.join(chr(ord(c) - 1) for c in raw_origin)
                impact = citation.metadata.why_cited.impact_score

For each citation, build an enhanced descriptor:

- index: N
  authors: "LastName et al."
  title: "..."
  journal: "..."
  year: YYYY
  doi: "10.xxxx/..."
  pmid: "NNNNNNN"
  claim_text: "The exact sentence from OE span"
  strategy: pubmed_pmid | pubmed_doi | pubmed_title | biorxiv | web_doi
  # NEW v2 fields:
  oe_origin: "lancet_fulltext_scraped"
  oe_impact_score: 21.54
  risk_level: HIGH | MEDIUM | LOW
  has_quantitative_claim: true  # contains HR, CI, %, p-value
  needs_transitive_check: true  # review/guideline + quantitative = yes

Risk level assignment:

HIGH: origin contains "fulltext_scraped" AND (study_type is review OR guideline)
MEDIUM: origin contains "fulltext" OR "media_annotated"
LOW: origin is "pubmed_abstracts"

Phase 3: Parallel Citation Verification (Enhanced)

Launch one Agent per citation using model: "haiku".

Each agent now has ENHANCED instructions:

Step A: Verify Paper Exists (unchanged)

get_article_metadata with PMID → search_articles → bioRxiv get_preprint → WebFetch DOI

Step B: Fetch Content (AGGRESSIVE)

1. get_article_metadata → check for PMCID
2. If PMCID exists: ALWAYS call get_full_text_article (don't skip)
3. If no PMCID and abstract is empty/generic:
   → WebFetch("https://doi.org/{DOI}", prompt="Extract the structured abstract, key findings, and conclusions")
4. If abstract mentions NCT number:
   → Call Clinical Trials MCP: get_trial_details(nct_id)
   → Extract primary/secondary endpoints, sample size, status

Step C: Score Claim Accuracy (Multi-dimensional)

Dimension	Score	Weight	How to assess
Paper exists	0 or 1	required	PubMed/DOI lookup
Metadata match	0-1	10%	Author, year, journal correct?
Claim direction	0-1	25%	Does paper support the direction of the claim?
Numbers verified	0-1	35%	Specific HRs, CIs, % match?
Correct attribution	0-1	20%	Is this the paper's OWN finding (not quoting another)?
No contradiction	0-1	10%	Does any other evidence contradict? (filled in Phase 4)

Step D: Classify and Flag

- study_type: RCT | meta-analysis | cohort | review | guideline | editorial | preprint
- is_primary_source: true/false  # Did this paper GENERATE the data, or just CITE it?
- transitive_risk: true/false    # Review + quantitative claim about a specific trial
- trial_name_mentioned: "LEADER" | "SUSTAIN-6" | null  # For trace-back

Agent Output Format (v2)

CITATION_REPORT:
- citation_index: [i]
- exists: true/false
- existence_details: "..."
- correct_doi: "..."
- correct_pmid: "..."
- claim_text: "..."
- dimensions:
    metadata_match: [0-1]
    claim_direction: [0-1]
    numbers_verified: [0-1]
    correct_attribution: [0-1]
- composite_score: [0-1]
- study_type: "..."
- is_primary_source: true/false
- transitive_risk: true/false
- trial_name_mentioned: "..." or null
- sample_size: "..."
- journal: "..."
- peer_reviewed: true/false
- full_text_available: true/false
- data_source: abstract_only / full_text / doi_resolution / clinical_trials_registry
- key_findings_from_source: "..." # What the paper ACTUALLY found (for contradiction check)
- warnings: ["..."]
END_CITATION_REPORT

Phase 3b: Transitive Trace-Back (NEW)

Triggered for: citations where transitive_risk: true AND trial_name_mentioned is not null.

Launch additional haiku agents to find and verify the ORIGINAL source:

Prompt: "The review [Nauck 2026] claims 'stroke reduction 13-17%' 
         referencing what appears to be [LEADER / SUSTAIN-6 / Kristensen 2019].
         Search PubMed for the ORIGINAL trial/meta-analysis.
         Verify if the number matches the original source."

Output:

TRACEBACK_REPORT:
- original_citation_index: [i]
- traced_to_pmid: "..."
- traced_to_title: "..."
- traced_to_study_type: "RCT" or "meta-analysis"
- number_matches_original: true/false
- original_finding: "..."
- attribution_correct: true/false  # Should OE have cited the original instead?
END_TRACEBACK_REPORT

Phase 4: Cross-Citation Contradiction Scan (NEW)

Launch 1 sonnet-model agent that receives ALL Phase 3 + 3b reports and:

Instructions:
1. Read all CITATION_REPORT entries
2. Extract key_findings_from_source for each
3. For EACH quantitative claim in the OE response:
   - Check if multiple citations report DIFFERENT findings on the same outcome
   - Flag contradictions with severity:
     - CRITICAL: One source confirms, another explicitly denies
     - WARNING: Sources report different magnitudes (>20% difference)
     - NOTE: Sources use different populations/timeframes (may explain difference)
4. Produce a CONTRADICTION_REPORT