citation-audit
Citation Audit Skill
Purpose
Verify every citation in a manuscript against its actual source. LLMs hallucinate citations, invent arXiv IDs, misattribute findings, and confuse authors. This skill catches all of that by fetching and reading each cited work.
Why This Exists
LLMs are unreliable with citations in three distinct ways:
- Ghost papers — The paper does not exist. Title, authors, or venue are fabricated.
- Wrong metadata — The paper exists but the bib entry has the wrong arXiv ID, wrong authors, wrong year, or wrong venue.
- Inverted claims — The paper exists and the bib is correct, but the manuscript mischaracterizes what the paper says.
All three are invisible to structural audits (cross-reference checks, compilation tests). They require reading the actual cited work.
Inputs
- The manuscript
.texfile(s) - The
.bibfile - Web access (to fetch papers from arXiv, conference sites, URLs)
Execution
Phase 1: Extract citation contexts
For each \citep{}, \citet{}, \cite{} in the manuscript:
- Record the bib key
- Record the surrounding sentence or paragraph (the claim context)
- Classify the claim type:
- FACTUAL: "X et al. found Y" / "X et al. measured Y"
- METHODOLOGICAL: "We follow X" / "We use the benchmark from X"
- POSITIONAL: "Unlike X, we..." / "X does not measure..."
- PARENTHETICAL: "(X, 2024)" — no specific claim, just a reference
- For FACTUAL and POSITIONAL claims, extract the specific assertion the manuscript makes about the cited work
Phase 2: Verify bib entry metadata
For each bib entry, verify against the actual source:
For arXiv papers (eprint field present):
- Fetch
https://arxiv.org/abs/{eprint_id} - Compare: title, authors, year
- If the fetched paper has a DIFFERENT title/authors than the bib entry, this is a WRONG ID or GHOST PAPER
For conference/journal papers (booktitle or journal field):
- Search for the paper by title + author on the web
- Verify: venue, year, author list
- If the paper cannot be found at the stated venue, flag as UNVERIFIABLE or GHOST PAPER
For web resources (howpublished with URL):
- Fetch the URL
- Verify it loads and the content matches the described resource
- If the URL is dead or redirects to unrelated content, flag as DEAD LINK
For each entry, check:
- Paper exists (reachable via arXiv, DOI, URL, or web search)
- Title matches (exact or near-exact)
- Authors match (at least first author correct)
- Year matches
- Venue matches (if applicable)
- Entry type appropriate (
@inproceedingsfor conferences,@articlefor journals,@miscfor preprints/blogs)
Phase 3: Verify claim accuracy
For each FACTUAL or POSITIONAL claim:
- Read the cited paper (abstract + relevant sections at minimum)
- Compare the manuscript's claim against what the paper actually says
- Classify:
- ACCURATE — The claim faithfully represents the cited work
- INACCURATE — The claim mischaracterizes the cited work
- INVERTED — The claim says the opposite of what the paper found
- OVERCLAIMED — The claim is stronger than what the paper supports
- UNDERCLAIMED — The cited work supports a stronger claim than stated
- UNVERIFIABLE — Cannot access the paper to verify
For INACCURATE and INVERTED findings, provide:
- What the manuscript claims
- What the cited paper actually says
- The specific section/page of the cited paper that contradicts the claim
- A suggested correction
Phase 4: Check for missing citations
Scan the manuscript for:
- Claims that cite no source but should (empirical claims without attribution)
- Tools, benchmarks, or datasets mentioned by name without citation
- Methods described as "standard" or "well-known" that have a canonical citation
Output Format
Per-citation report
### [bib_key] — [VERDICT]
**Bib entry:** [title] by [authors] ([year])
**Actual paper:** [actual title] by [actual authors] ([actual year])
**Metadata match:** title [✓/✗] | authors [✓/✗] | year [✓/✗] | venue [✓/✗]
**Claim in manuscript (line N):** "[exact text]"
**What the paper actually says:** "[summary of actual finding]"
**Claim accuracy:** [ACCURATE / INACCURATE / INVERTED / OVERCLAIMED / UNDERCLAIMED]
**Fix required:** [description of what needs to change, or "None"]
Summary table
| Bib Key | Exists | Metadata | Claim | Verdict |
|---------|--------|----------|-------|---------|
| key1 | ✓ | ✓ | ✓ | PASS |
| key2 | ✓ | ✗ | ✗ | FAIL |
| key3 | ✗ | — | — | GHOST |
Verdict categories
- PASS — Paper exists, metadata correct, claims accurate
- METADATA — Paper exists, bib entry has errors (wrong ID, wrong authors, wrong year)
- CLAIM — Paper exists, metadata correct, but manuscript mischaracterizes it
- GHOST — Paper does not exist as described
- DEAD — URL/link is broken
- UNVERIFIABLE — Cannot access the paper to verify
Severity
- CRITICAL: GHOST papers, INVERTED claims
- HIGH: Wrong arXiv IDs, wrong authors, INACCURATE claims
- MEDIUM: Wrong year, wrong venue, OVERCLAIMED
- LOW: Missing citations, incomplete bib entries, UNDERCLAIMED
Important notes
- NEVER trust your own knowledge of papers. ALWAYS fetch and verify. Your training data contains hallucinated citations. The only way to verify is to read the actual source.
- For arXiv papers, always fetch the abstract page to confirm the paper exists and matches.
- For conference papers, search DBLP, ACM DL, or the conference site.
- WebFetch and WebSearch are your primary tools. Do not skip verification because a citation "looks right."
- Blog posts and documentation URLs change. Always check that the URL still works and points to the described content.
- When a bib entry has both an
eprint(arXiv ID) and abooktitle(venue), verify both independently.
Phase 5: Fix (when invoked with "fix" or "on")
When the user invokes with an argument containing "fix" or "on", execute Phases 1–4 as above, then apply fixes for every non-PASS citation.
What to auto-fix (no user confirmation needed)
These are mechanical corrections with a single correct answer:
METADATA errors (paper exists, bib entry wrong):
- Wrong arXiv ID → replace
eprintwith the correct ID - Wrong authors → replace with authors from the actual paper
- Wrong year → replace with year from the actual paper
- Wrong title → replace with title from the actual paper
- Wrong venue → replace with venue from the actual paper
- Wrong entry type → change
@misc/@inproceedingsas appropriate
DEAD links:
- URL redirects → update
howpublishedURL to the final destination - URL 404 but resource found at different URL → update URL
- URL 404 and resource gone → flag as HUMAN-REQUIRED
Minor author corrections:
- Misspelled author names → fix spelling
- Missing authors from author list → add them
- Collective author name where individual names are available → replace (keep collective name as a note if it is how the group identifies)
What requires HUMAN-REQUIRED decision
Present these and wait for the user:
GHOST papers:
- Paper does not exist at all → present options: (a) Replace with a real paper that makes the same point (b) Remove the citation and adjust the prose (c) The user knows the paper exists and provides the correct reference
INVERTED or INACCURATE claims:
- The manuscript says X about a paper that actually says Y → present:
- What the manuscript claims
- What the paper actually says
- A suggested rewrite of the prose that accurately represents the paper
- Whether the paper still supports the manuscript's argument (and how)
- Let the user decide the final wording
Dead URLs with no replacement found:
- Blog post / resource deleted with no archive or alternative
Fix procedure
- Apply all auto-fixes to the
.bibfile - For each HUMAN-REQUIRED item, present the options clearly
- After user decisions, apply prose changes to the
.texfile - Verify: re-read the
.biband.texto confirm all fixes applied - Update the audit report: mark each finding as
[FIXED],[RESOLVED], or[DEFERRED]
Safety rules
- NEVER invent a replacement citation. If a ghost paper needs replacing, search for real papers that make the cited point. Present candidates to the user with abstracts. Let the user choose.
- NEVER change the manuscript's argument. If an inverted claim needs fixing, present the rewrite as a suggestion, not an edit.
- NEVER remove a citation without user confirmation, even if it is a ghost paper. The user may know something you do not.
- When fixing URLs, always verify the new URL loads and contains the expected content before writing it.
Save report as
[name]-citation-audit.md in the manuscript directory.
More from mathews-tom/armory
architecture-diagram
Generate layered architecture diagrams as self-contained HTML with inline SVG icons, CSS Grid containers, and connection overlays. Triggers on: "architecture diagram", "infra diagram", "system diagram", "deployment diagram", "topology", "draw architecture". NOT for architecture reviews, use architecture-reviewer.
61architecture-reviewer
Architecture reviews across 7 dimensions (structural, scalability, enterprise readiness, performance, security, ops, data) with scored reports. Triggers on: "review architecture", "critique design", "audit system", "assess scalability", "enterprise readiness", "technical due diligence". NOT for diagrams, use architecture-diagram.
59concept-to-video
Turn concepts into animated explainer videos using Manim (Python) with MP4/GIF output, audio overlay, multi-scene composition. Triggers on: "create a video", "animate this", "make an explainer", "manim animation", "motion graphic". NOT for React video, use remotion-video.
57youtube-analysis
Extract YouTube transcripts and produce structured concept analysis with multi-level summaries, key concepts, takeaways. Uses youtube-transcript-api with yt-dlp fallback. Triggers on: "analyze youtube video", "youtube transcript", "summarize this video", "extract concepts from video", "video key points", or any youtube.com/youtu.be URL.
57code-refiner
Deep code simplification and refactoring preserving behavior across Python, Go, TypeScript, Rust. Targets complexity, anti-patterns, readability debt. Triggers on: "simplify this code", "refactor for clarity", "reduce complexity", "make this more readable", "tech debt cleanup", "too much nesting".
56humanize
Detects and removes AI-generated writing patterns while preserving meaning and facts. Triggers on: "humanize text", "make this sound human", "remove AI patterns", "rewrite to sound natural", "make this less AI", "de-slop this", "not sound like ChatGPT", "human pass".
56