academic-latex-pipeline
Academic LaTeX Pipeline
Converts academic survey Markdown (often from Obsidian) into polished LaTeX PDFs. The pipeline has five phases, each with a decision gate before proceeding.
When to Use
- User has a
.mdsurvey/paper and wants a PDF - User wants to fix formatting in an existing LaTeX-compiled PDF
- User needs Korean font support in LaTeX (XeLaTeX + Noto Sans CJK KR)
- User wants to replace Mermaid diagrams with TikZ figures
- User mentions
build_latex.pyor survey compilation - User wants to restructure a LaTeX project into section-based folders
Phase Overview
Phase 1: Content Quality → iterative-academic-writing skill, Critical=0 to pass
Phase 2: LaTeX Build → MD→TEX→PDF pipeline with Korean fonts
Phase 3: Format Review → Page-by-page visual inspection, fix overflows
Phase 4: Figure Validation → TikZ rendering, captions, sizing
Phase 5: Git Management → latex-project-manager skill for structure + push
Phase 1: Content Writing Loop
Invoke iterative-academic-writing skill on the source .md file. The skill applies 14 academic writing principles with FactBase verification and hallucination detection.
Gate: Critical issues = 0 → proceed to Phase 2.
This phase ensures content quality before expensive LaTeX processing. Don't skip it — fixing content errors after PDF generation wastes time.
Phase 2: LaTeX Build Pipeline
2.1 Project Structure
Projects use section-based folder organization for maintainability:
ProjectName/
├── main.tex # Shared preamble + project selector switch
├── <project>/
│ ├── content.tex # \input orchestrator for all sections
│ ├── refs.bib # BibTeX bibliography (NOT inline thebibliography)
│ ├── figures/ # Images and generated figures
│ │ └── .gitkeep
│ └── sections/
│ ├── 00_frontmatter.tex # \title, \author, \maketitle, \abstract
│ ├── 01_background.tex # Each \section in its own file
│ ├── ...
│ └── NN_bibliography.tex # \bibliographystyle + \bibliography
├── .gitignore
└── build_and_compile.sh # Optional: shell wrapper for compilation
For MD→TEX projects (Obsidian source), also include:
├── SourceDocument.md # Obsidian source (excluded from git)
└── build_latex.py # Python build script (MD → TEX → PDF)
Multi-project layout: Use \newcommand{\professor}{project_name} in main.tex to switch between projects sharing the same preamble. See latex-project-manager skill for details.
2.2 Bibliography Management (CRITICAL)
Always use BibTeX .bib files. Never use inline \begin{thebibliography}.
- Create
refs.bibin the project folder root - Use
\bibliographystyle{plainnat}+\bibliography{<project>/refs} - All references must use
\citep{}or\citet{}— no plain text "(Author, Year)" - 3-pass compilation:
pdflatex → bibtex → pdflatex → pdflatex
2.3 Build Script (build_latex.py)
The build script handles the full MD→TEX transformation:
-
Preprocess MD: Strip wikilinks
[[...]], remove Obsidian YAML frontmatter, clean tags -
Pandoc conversion:
pandoc input.md -f markdown -t latex -
Inject preamble with Korean font support:
\usepackage{fontspec} \usepackage{ucharclasses} \setmainfont{Noto Sans CJK KR} \newfontfamily\hangulfont{Noto Sans CJK KR} \setTransitionsForCJK{\hangulfont}{}{}Why
ucharclassesinstead ofxeCJK? ThexeCJKpackage requiresctexhook.stywhich is missing from many LaTeX distributions.ucharclassesis more portable. -
Replace Mermaid blocks with TikZ figures
-
Wrap examples in
tcolorboxenvironments -
Inject citations: Match
PaperName (Year)→\cite{key_year} -
Fix tables: Use
p{Xcm}columns instead ofl/c/rto prevent overflow
2.4 Font Installation
mkdir -p ~/.local/share/fonts
# Download Noto Sans CJK KR from github.com/googlei18n/noto-cjk/releases
fc-cache -fv ~/.local/share/fonts/
2.5 Compilation (3-pass)
pdflatex -interaction=nonstopmode main.tex # Pass 1 (or xelatex for Korean)
bibtex main # Citations
pdflatex -interaction=nonstopmode main.tex # Pass 2 (resolve refs)
pdflatex -interaction=nonstopmode main.tex # Pass 3 (final)
2.6 Overfull Hbox Prevention (Preamble)
\tolerance=1000
\emergencystretch=3em
\hfuzz=2pt
Gate: Compilation succeeds without errors → proceed to Phase 3.
Phase 3: Format Review Loop
Review the PDF page by page. Check for:
Critical (must fix, loop back):
- Table overflow beyond margins
- Missing or blank figures
- Unreadable/clipped text
[[wikilink]]artifacts surviving preprocessing- Undefined citations
Minor (can defer):
- Spacing tweaks, caption capitalization, color preferences
For each critical issue:
- Table overflow → adjust column widths, use
tabularxwithXcolumns - Missing figures → test TikZ in standalone mode, simplify
- Wikilinks → fix regex in build script's preprocessing step
- Undefined citations → add entries to
refs.bib
Recompile after fixes. Gate: no Critical issues → Phase 4.
Phase 4: Figure/Image Review
For each TikZ figure:
- Does it render correctly?
- Is the caption present and descriptive?
- Is sizing appropriate (
\resizebox{\textwidth}{!}{...})? - Is placement correct (
[H]float specifier)?
Test problematic TikZ in isolation:
\documentclass[tikz]{standalone}
\usepackage{tikz}
\begin{document}
% TikZ code here
\end{document}
Gate: all figures correct → Phase 5.
Phase 5: Git Management
Use latex-project-manager push for structured git operations.
Files to include in repo:
main.tex,<project>/content.tex,<project>/sections/*.tex<project>/refs.bib<project>/figures/*.gitignorebuild_latex.py,build_and_compile.sh(if applicable)
Files to exclude:
- Original
.mdObsidian source (stays in Obsidian vault only) .obsidian/directory- LaTeX build artifacts (
.aux,.log,.out,.toc,.bbl,.blg) - PDF files (compiled on-demand)
.gitignore template:
*.aux
*.log
*.out
*.toc
*.bbl
*.blg
*.synctex.gz
*.fls
*.fdb_latexmk
*.pdf
.DS_Store
Git authentication:
- GitHub:
$GITHUB_TOKENenv var or user-provided token - Overleaf:
$OVERLEAF_TOKENenv var (requires Premium plan for git access)
Common Issues & Fixes
| Issue | Fix |
|---|---|
| Korean text missing | Verify Noto Sans CJK KR installed, check fc-list | grep Noto |
| Overfull hbox | Increase \tolerance, \emergencystretch, reword long lines |
| Table overflow | Use p{2cm} or X columns, reduce content |
| Broken tcolorbox | Check \tcbuselibrary{most} is loaded |
| Undefined citations | Add missing keys to refs.bib, rerun bibtex |
| Mermaid not replaced | Check regex pattern in build script |
pgfplots \\ in labels |
Use {Label Text} with align=center instead |
$\to$ in \legend |
Use \textrightarrow{} instead |
| Inline thebibliography | Convert to refs.bib + \bibliography{} |
English Version Generation
For bilingual projects, create a separate English build:
- Translate MD content (keep same structure)
- Use English-specific preamble (no CJK fonts needed, use standard
\usepackage[T1]{fontenc}) - Generate
survey_main_EN.tex→survey_main_EN.pdf - Both versions share
refs.bib
Related Skills
iterative-academic-writing— Phase 1 content evaluationlatex-project-manager— Phase 5 project structuring and git pushpdf— General PDF manipulation (merge, split, forms)