check-citations

Verify academic citations against CrossRef, Semantic Scholar, and OpenAlex. Detects AI-hallucinated references, chimeric citations (real title + wrong authors), and suspicious patterns before submission.

When to Use

After writing or editing a .bib file with AI assistance
Before submitting a paper, thesis, or report
When reviewing AI-generated literature sections
As a CI/CD check in LaTeX manuscript pipelines
When auditing existing bibliographies for dead or fabricated references

Background

6-55% of AI-generated citations are fabricated (varies by model/domain)
100+ hallucinated references found in NeurIPS 2025 accepted papers
Universities increasingly treat fake citations as academic misconduct
Three hallucination types: fully fabricated, chimeric (real title + wrong authors), modified real (slightly altered metadata)

Usage

Quick Check (Single File)

python scripts/citation_checker.py references.bib

Check All `.bib` Files in a Directory

python scripts/citation_checker.py path/to/report/

JSON Output (CI/CD Pipelines)

python scripts/citation_checker.py references.bib --json

Verbose Mode (Debug API Responses)

python scripts/citation_checker.py references.bib --verbose

How It Works

Cascading Multi-Source Verification

Each citation is checked against three independent databases:

Source	Coverage	Strength
CrossRef	140M+ DOI-registered works	Best for journal/conference papers with DOIs
Semantic Scholar	200M+ papers	Best author disambiguation, arXiv coverage
OpenAlex	240M+ works	Broadest coverage, fully open

Verification logic:

Found in 2+ sources with matching title → verified (high confidence)
Found in 1 source only → suspicious (manual check recommended)
Found in 0 sources → not_found (likely hallucinated)

Chimeric Detection

When a citation's title matches a real paper but the authors don't overlap at all, it's flagged as a possible chimeric hallucination — the most dangerous type because the title looks real on Google Scholar.

Red Flag Heuristics

Invalid DOI format (doesn't start with 10.xxxx/)
Suspiciously generic title patterns ("A Comprehensive Survey of...")
Future publication year
Missing author or year fields
Single-word author names (incomplete metadata)

Exit Codes

Code	Meaning
0	All citations verified
1	One or more citations not found
2	Suspicious citations only (no hard failures)

Dependencies

pip install requests

No API keys required — uses free tiers of all three databases.

Accuracy (Tested)

Category	Result	Description
Known-good	9/10 (90%)	Famous ML papers (Vaswani, Devlin, Brown, He, etc.)
Known-bad	10/10 (100%)	Fabricated papers with plausible titles
Chimeric	5/5 (100%)	Real titles with wrong authors
False positive rate	10%	1 miss: unpublished tech report without DOI
False negative rate	0%	No fake paper was ever verified

The core guarantee: fake papers are never marked as real.

Limitations

Papers without DOI that have many derivatives (e.g., BERT without DOI) may not be found via title search alone — always include DOIs when available
Semantic Scholar free tier rate-limits at ~100 requests/5 minutes — batch verification is slower
Cannot verify papers behind paywalls or not indexed in any of the three databases
Book chapters, technical reports, and grey literature have lower coverage

Integration with LaTeX Workflows

Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit
python scripts/citation_checker.py references.bib --json > /tmp/cite_check.json
NOT_FOUND=$(python3 -c "import json; d=json.load(open('/tmp/cite_check.json')); print(d['summary']['not_found'])")
if [ "$NOT_FOUND" -gt "0" ]; then
    echo "BLOCKED: $NOT_FOUND unfound citations. Run 'python scripts/citation_checker.py references.bib --verbose' to investigate."
    exit 1
fi

GitHub Actions

- name: Check citations
  run: |
    pip install requests
    python scripts/citation_checker.py references.bib --json > citation_report.json
    python -c "
    import json, sys
    r = json.load(open('citation_report.json'))
    if r['summary']['not_found'] > 0:
        print(f'FAIL: {r[\"summary\"][\"not_found\"]} citations not found')
        sys.exit(1)
    print(f'PASS: {r[\"summary\"][\"verified\"]} verified, {r[\"summary\"][\"suspicious\"]} suspicious')
    "

check-citations

check-citations

When to Use

Background

Usage

Quick Check (Single File)

Check All .bib Files in a Directory

JSON Output (CI/CD Pipelines)

Verbose Mode (Debug API Responses)

How It Works

Cascading Multi-Source Verification

Chimeric Detection

Red Flag Heuristics

Exit Codes

Dependencies

Accuracy (Tested)

Limitations

Integration with LaTeX Workflows

Pre-commit Hook

GitHub Actions

Check All `.bib` Files in a Directory