bib-validate
Bibliography Validation
Read-only skill. Never edit source files — produce a categorised report only.
Citation key rule: Existing keys in the project always take precedence. They come from the user's reference management system and are canonical. When suggesting replacements (typo corrections, preprint upgrades, metadata fixes), always keep the user's key and update the .bib entry metadata around it — never suggest renaming a key to match some "standard" format.
When to Use
- Before compiling a final version of a paper
- After adding new citations to check nothing was missed
- When
biber/bibtexreports undefined citations - As part of a pre-submission checklist (pair with
/proofread)
When NOT to Use
- Finding new references — use
/literaturefor discovery - Building a bibliography from scratch — use
/literaturewith.bibgeneration - General proofreading — use
/proofread(which also flags citation format issues)
Phase 0: Session Log (Suggested)
Bibliography validation with preprint staleness checks can be context-heavy (OpenAlex lookups, web searches for published versions). Before starting, suggest running /session-log to capture prior work as a recovery checkpoint. If the user declines, proceed without it.
Convention
Default bibliography file is references.bib — this is the standard across all projects (per the /latex skill convention). However, the skill also supports:
- Any
.bibfile found in the same directory as the.texfiles being audited - Embedded bibliographies using
\begin{thebibliography}/\bibitem{key}blocks - Both external and embedded simultaneously (rare but possible)
Bibliography Detection
At the start of validation, detect which bibliography method the project uses:
1. External .bib file (standard)
Look for .bib files in the project directory. Priority order:
references.bib(preferred — standard naming convention across all projects)- Any other
.bibfile in the same directory as the.texfiles
If multiple .bib files are found, validate all of them and produce a combined report. Note which file each issue belongs to. If a legacy-named .bib file (e.g., paperpile.bib) exists alongside references.bib, flag it as a potential cleanup opportunity (the project may have migrated from Paperpile).
Full validation applies: cross-reference checks and quality checks.
2. Embedded \begin{thebibliography} / \bibitem{key}
Some LaTeX documents define references inline rather than using an external .bib file. Detect by scanning .tex files for \begin{thebibliography}.
Extract keys from \bibitem entries:
\bibitem{key}— standard form, key is the argument in braces\bibitem[label]{key}— optional label form (e.g.,\bibitem[Smith et al., 2020]{smith2020}), key is in the second set of braces
Only cross-reference checks apply (missing keys, unused keys, typos). Quality checks (required fields, year, author formatting) are skipped because embedded bibliographies don't have structured metadata.
3. Both (rare)
If a project has both a .bib file and \begin{thebibliography} blocks, validate both:
- Run full validation on the
.bibfile - Run cross-reference checks on
\bibitementries - Merge both key sets when checking for missing citations
Workflow
- Find files: Locate all
.texfiles in the project - Detect bibliography type: Check for
.bibfiles and/or\begin{thebibliography}blocks - Extract citation keys from .tex: Scan for all citation commands
- Extract entry keys from bibliography source(s):
- External: Parse all
@type{key,entries from.bibfile(s) - Embedded: Parse all
\bibitem{key}and\bibitem[label]{key}entries
- External: Parse all
- Cross-reference: Compare the two sets
- Quality checks: Validate
.bibentry completeness (external only) - Produce report: Write results to stdout (or save if requested)
Citation Commands to Scan
Scan .tex files for all of these patterns:
| Command | Example |
|---|---|
\cite{key} |
Basic citation |
\citet{key} |
Textual: Author (Year) |
\citep{key} |
Parenthetical: (Author, Year) |
\textcite{key} |
biblatex textual |
\autocite{key} |
biblatex auto |
\parencite{key} |
biblatex parenthetical |
\citeauthor{key} |
Author name only |
\citeyear{key} |
Year only |
\nocite{key} |
Include in bibliography without in-text citation |
Also handle multi-key citations: \citep{key1, key2, key3}
Cross-Reference Checks
Critical: Missing Entries
Citation keys used in .tex but not defined in the bibliography source (.bib file or \bibitem entries).
These will cause compilation errors.
Warning: Unused Entries
Keys defined in the bibliography source but never cited in any .tex file.
Not errors, but may indicate:
- Forgotten citations (should they be
\nocite?) - Leftover entries from earlier drafts
- Entries intended for a different paper
Warning: Possible Typos (Fuzzy Match)
For each missing key, check if a similar key exists in the bibliography using edit distance:
- Edit distance = 1: Very likely a typo
- Edit distance = 2: Possibly a typo
- Flag these with the suggested correction
Common typo patterns:
- Year off by one:
smith2020vssmith2021 - Missing/extra letter:
santannavssant'annavssantana - Underscore vs camelCase:
smith_jonesvssmithjones
Zotero Library Cross-Reference
After the disk-based cross-reference, check each cited key against the user's Zotero library via the refpile MCP server. This ensures the local .bib and Zotero stay in sync.
For each citation key found in the .tex files:
- Call
search_library(refpile MCP) with the citation key as query - Match on the
citationKeyfield in results
Status categories:
| .bib | Zotero | Status | Report |
|---|---|---|---|
| Yes | Yes | Healthy | ✓ In sync |
| Yes | No | Drift | ⚠ In local .bib but not in Zotero — may need import |
| No | Yes | Export gap | ℹ In Zotero but not exported to local .bib |
| No | No | Missing | ✗ Missing from both — add to Zotero first |
Include this as a "Zotero Sync" section in the report, after the cross-reference results and before quality checks.
Graceful degradation: If the refpile MCP is unavailable (Zotero not running, server not started), skip this phase with a warning: "Zotero cross-reference skipped — refpile MCP unavailable. Run with Zotero open for full validation." Continue with disk-only validation.
Quality Checks on .bib Entries
These checks apply only to external .bib files. Embedded bibliographies lack structured metadata, so quality checks are skipped for them.
Required Fields by Entry Type
| Entry Type | Required Fields |
|---|---|
@article |
author, title, journal, year |
@book |
author/editor, title, publisher, year |
@incollection |
author, title, booktitle, publisher, year |
@inproceedings |
author, title, booktitle, year |
@techreport |
author, title, institution, year |
@unpublished |
author, title, note, year |
@phdthesis |
author, title, school, year |
Year Reasonableness
- Flag entries with year < 1900 or year > current year + 1
- Flag entries with no year at all
Author Formatting
- Check for inconsistent author formats within the file
- Flag entries where author field contains "and others" or "et al." — this is never valid in BibTeX. All authors must be listed explicitly. Severity: Warning.
- Flag entries with organisation names that might need
{{braces}}to prevent splitting
DOI Resolution (optional — triggered by --verify-dois flag or when issues are suspected)
Preferred method: bibliography MCP scholarly_verify_dois. Collect all DOIs from the .bib file and call scholarly_verify_dois (up to 50 per call). This batch-verifies each DOI against all enabled sources (OpenAlex, Scopus, WoS). Results:
- VERIFIED (2+ sources confirm) — DOI is valid, metadata can be trusted
- SINGLE_SOURCE (1 source only) — DOI exists but warrants a manual spot-check
- NOT_FOUND — DOI not found in any source; resolve manually via WebFetch
Fallback for NOT_FOUND DOIs: Resolve via https://doi.org/[DOI] and confirm the returned metadata matches the entry:
- Title match: Does the DOI landing page title match the
.bibtitle? - Author match: Does the first author on the landing page match the
.bibfirst author? - Journal match: Does the venue match?
Flag mismatches as:
- Warning: DOI mismatch — DOI resolves to a different paper than claimed. This usually means the DOI is wrong (adjacent DOI in the same journal volume) or the authors are wrong (conflation of researchers in the same subfield).
This check catches:
- Wrong DOIs (e.g., off-by-one in the DOI suffix)
- Author conflation (real researchers incorrectly attributed to a paper)
- Metadata copied from secondary sources without verification
For manual WebFetch resolution, process in batches of 5 to avoid rate limiting. Only flag confirmed mismatches — if the DOI cannot be resolved (404, timeout), note it as "unresolvable" at Info level.
Preprint Staleness Check
For every entry that looks like a preprint, check whether a peer-reviewed version has since been published. Full detection signals, lookup protocol, and classification: references/preprint-check.md
Severity Levels
| Level | Meaning |
|---|---|
| Critical | Missing entry for a cited key — will cause compilation error |
| Warning | Unused entry, possible typo, missing required field |
| Info | Year oddity, formatting suggestion, bibliography type note |
Bibliography Output
After validation, offer these actions if applicable:
- Embedded bibliography → offer to create
references.bib: If the project uses\begin{thebibliography}, offer to extract the references into a properreferences.bibfile (one@miscentry per\bibitem, with the full text as anotefield). The author can then enrich the entries with proper metadata. - Non-standard
.bibname → offer to rename: If the existing.bibfile is not namedreferences.bib, offer to rename it toreferences.biband update the\bibliography{}command in the.texfile.
These are offers only — do not make changes without explicit confirmation.
Report Format
Full report template with all sections: references/report-template.md
Sections: Summary table → Critical (missing entries) → Warning (typos, unused, missing fields, DOI mismatches, stale preprints) → Info (year issues) → Limitations (for embedded bibliographies).
Optional: Metadata Verification via MCP Tools
When missing entries or suspicious metadata are flagged, check these sources in order:
- Zotero library (refpile MCP) — call
search_libraryby title. The user may already have the reference but with a different key. If found, use the Zotero citation key. - Bibliography MCP (scholarly sources):
scholarly_search— search by title to find the correct entry across OpenAlex + Scopus + WoSscholarly_verify_dois— batch-verify DOIs across all sources (preferred over manual DOI resolution)openalex_lookup_doi— look up full metadata for a specific DOI
For Python client fallback (citation networks, institution analysis): references/openalex-verification.md
Deep Verification Mode (Parallel, Disk-Based)
Triggered by: --deep-verify flag, 40+ entries, or "deep verify" / "verify all references". Spawns parallel sub-agents that verify batches and write results to disk. Full architecture, batch JSON format, and assembly: references/deep-verify.md
Council Mode (Optional)
For high-stakes submissions. Trigger: "council bib-validate", "thorough bib check". Full details: references/council-mode.md
Quality Scoring
When producing a full validation report, apply numeric quality scoring using the shared framework:
- Framework:
../shared/quality-scoring.md— severity tiers, thresholds, verdict rules
Map validation findings to the framework tiers:
- Critical (-15 to -25): Missing entry for a cited key (compilation error)
- Major (-5 to -14): DOI mismatch, stale preprint with published version available, "et al." in author field
- Minor (-1 to -4): Missing optional fields, year oddities, unused entries
Compute the score and include the Score Block in the report after the summary table.
Cross-References
/proofread— For overall paper quality including citation format/literature— For finding and adding new references (includes full OpenAlex workflows)/latex— For compilation with reference checking/latex-autofix— For compilation and error resolution. Run after fixing bibliography issues to verify citations compile cleanly./latex-autofix— After fixing bibliography issues, run to verify citations compile cleanly