Vaccine Design

Computational pipeline for designing peptide/subunit vaccine candidates through epitope prediction, population coverage optimization, and immunogenicity assessment.

Reasoning Strategy

Vaccine design requires presenting the right epitopes to elicit protective immunity — not just any immune response, but one that is neutralizing, durable, and broadly applicable. For T-cell vaccines, the core tool is MHC binding prediction (IEDB tools): predict peptide-MHC affinity across multiple HLA alleles, then select epitopes with broad coverage of the target population. For antibody vaccines, prioritize surface-exposed conserved regions — a deeply buried or hypervariable region makes a poor antibody target. MHC binding does not equal immunogenicity; many good binders are not immunogenic in vivo due to tolerance, poor processing, or lack of T-cell help. A multi-epitope strategy (combining MHC-I for CD8+ CTL response, MHC-II for CD4+ helper response, and B-cell epitopes for antibody induction) is more robust than any single epitope. Conservation across pathogen strains is critical — an epitope that mutates under immune pressure (like HIV envelope hypervariable regions) is a poor vaccine target.

LOOK UP DON'T GUESS: Do not predict MHC binding or population coverage from memory — use IEDB_predict_mhci_binding and IEDB_predict_mhcii_binding for predictions and IEDB_search_epitopes for validated experimental data. Do not assume what's on the pathogen surface; retrieve annotated sequences from UniProt or BVBRC.

Key principles:

Epitope-driven — vaccines work by presenting epitopes to T/B cells; start with epitope prediction
Population coverage matters — HLA diversity means no single epitope covers everyone; design for breadth
Multi-epitope is better — combine CD8+ (MHC-I) and CD4+ (MHC-II) epitopes for robust immunity
Conservation = broad protection — conserved epitopes across strains provide cross-protective immunity
Evidence grading — T1: clinical trial data, T2: in-vivo immunogenicity, T3: in-vitro binding, T4: computational prediction only

When to Use

"Design a vaccine against [pathogen]"
"Predict T-cell epitopes for [protein]"
"What MHC-I epitopes does [protein] have?"
"Assess population coverage of these epitopes"
"Find conserved epitopes across [pathogen] strains"

Not this skill: For HLA typing or allele frequency only, use tooluniverse-hla-immunogenomics. For antibody engineering, use tooluniverse-antibody-engineering.

Core Tools

Tool	Use For
`IEDB_search_epitopes`	Search experimentally validated epitopes
`IEDB_get_epitope`	Get detailed epitope data (assay results, MHC restriction)
`iedb_search_mhc`	Search validated MHC binding assay data
`IEDB_predict_mhci_binding`	Predict MHC-I binding (NetMHCpan EL; rank < 0.5% = strong binder)
`IEDB_predict_mhcii_binding`	Predict MHC-II binding (NetMHCIIpan EL; CD4+ helper epitopes)
`UniProt_get_entry_by_accession`	Get antigen protein sequence
`UniProt_search`	Find pathogen protein sequences
`BVBRC_search_genome_features`	Search pathogen proteomes
`alphafold_get_prediction`	Get/predict antigen 3D structure
`EnsemblVEP_annotate_hgvs`	Check epitope conservation across variants
`PubMed_search_articles`	Find published vaccine studies
`search_clinical_trials`	Find ongoing vaccine clinical trials

Workflow

Phase 0: Antigen Selection
  Pathogen → essential surface proteins → sequence retrieval
    |
Phase 1: T-Cell Epitope Prediction
  MHC-I (CD8+ CTL) and MHC-II (CD4+ helper) binding prediction
    |
Phase 2: B-Cell Epitope Prediction
  Linear and conformational B-cell epitopes for antibody response
    |
Phase 3: Population Coverage
  HLA allele frequencies → design for target population
    |
Phase 4: Conservation Analysis
  Cross-strain epitope conservation → broad protection
    |
Phase 5: Candidate Assembly & Report
  Multi-epitope construct design → immunogenicity assessment

Phase 0: Antigen Selection

Best antigens for vaccines: Surface-exposed, essential for pathogen function, conserved across strains.

# Find pathogen surface proteins
UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")
# Or search BVBRC for annotated pathogen proteomes
BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")

Antigen prioritization: prefer surface-exposed (secreted/outer membrane) over cytoplasmic; >95% conserved across strains; essential for pathogen viability; known immunogen in natural infection. Use UniProt subcellular location annotations and PubMed to verify these properties.

Phase 1: T-Cell Epitope Prediction

MHC-I epitopes (CD8+ cytotoxic T cells — kill infected cells):

# Option A: Search for KNOWN validated epitopes from IEDB
iedb_search_mhc(
    mhc_class="I",
    qualitative_measure="Positive",
    filters={"source_organism_iri": "eq.NCBITaxon:2697049"},  # SARS-CoV-2
    select=["linear_sequence", "mhc_restriction", "qualitative_measure"],
    limit=50
)

# Option B: PREDICT novel peptide binding (recommended for new proteins)
IEDB_predict_mhci_binding(
    sequence="YOUR_PROTEIN_SEQUENCE",  # full protein or peptide
    allele="HLA-A*02:01",             # or H-2-Kd for mouse
    method="netmhcpan_el",            # EL = eluted ligand (recommended)
    length=9                           # 8-11 for MHC-I
)
# Returns peptides ranked by percentile_rank:
# < 0.5% = strong binder (include in vaccine)
# 0.5-2% = moderate binder (consider)
# > 2% = weak/non-binder (exclude)

MHC-II epitopes (CD4+ helper T cells — activate B cells and CD8+ T cells):

iedb_search_mhc(
    mhc_class="II",
    qualitative_measure="Positive",
    filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
    limit=50
)

Binding affinity interpretation:

IC50 (nM)	Classification	Vaccine Relevance
< 50	Strong binder	Include — high presentation probability
50-500	Moderate binder	Consider — may contribute to response
500-5000	Weak binder	Exclude — unlikely to be presented
> 5000	Non-binder	Exclude

HLA supertype strategy: For broad coverage, predict against HLA supertypes:

A2 supertype (A02:01, A02:06, A*68:02) — covers ~40% globally
A3 supertype (A03:01, A11:01, A*31:01) — covers ~25%
B7 supertype (B07:02, B35:01, B*51:01) — covers ~25%
A2 + A3 + B7 + B44 combined — covers >90% of most populations

Phase 2: B-Cell Epitope Prediction

B-cell epitopes trigger antibody production. Look for:

Linear epitopes: Continuous peptide sequences (easier to synthesize)
Conformational epitopes: 3D surface patches (requires structural data)

# Check for known B-cell epitopes
IEDB_search_epitopes(query="[protein_name]", epitope_type="B cell")
# Get structure for conformational epitope prediction
alphafold_get_prediction(uniprot_id="[accession]")

B-cell epitope criteria: Surface-exposed loops, hydrophilic regions, flexible regions (high B-factor). Combine computational prediction with structural analysis.

Phase 3: Population Coverage

# Search for epitopes restricted to common HLA alleles in target population
# NOTE: No HLA frequency tool exists in ToolUniverse. For population coverage:
# 1. Use IEDB Analysis Resource (tools.iedb.org/population) for population coverage calculation
# 2. Use the HLA supertype strategy (see above) to ensure broad coverage
# 3. Search PubMed for published HLA frequency data: PubMed_search_articles(query="HLA allele frequency [population]")

Population coverage targets:

Coverage Level	Interpretation	Action
>90%	Excellent — vaccine will work in most individuals	Proceed to development
70-90%	Good — most people covered; some populations underserved	Add more epitopes for uncovered HLA types
50-70%	Moderate — significant gaps	Redesign with broader HLA coverage
<50%	Poor — vaccine will miss too many people	Fundamental redesign needed

Phase 4: Conservation Analysis

Check if epitopes are conserved across pathogen strains/variants:

# Search for protein variants across strains
PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")
# Check specific mutations in epitope regions
EnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")

Conservation interpretation:

100% conserved across all known strains → ideal vaccine target
>95% conserved → good target; monitor emerging variants
80-95% conserved → may need strain-specific variants in construct
<80% conserved → avoid; pathogen evolves to escape this epitope

Phase 5: Candidate Assembly & Report

Multi-epitope construct design principles:

Include 3-5 MHC-I epitopes (CD8+ response)
Include 2-3 MHC-II epitopes (CD4+ helper response)
Include 1-2 B-cell epitopes (antibody response)
Connect with appropriate linkers (AAY for MHC-I, GPGPG for MHC-II)
Add adjuvant sequence if needed (e.g., flagellin domain for TLR5)

Report structure:

Antigen Selection — rationale, conservation, essentiality
Epitope Map — all predicted epitopes with binding affinities and HLA restrictions
Top Epitopes — ranked by binding strength × conservation × population coverage
Population Coverage — % coverage per major world population
Conservation Analysis — strain coverage, escape risk assessment
Construct Design — multi-epitope sequence with linkers
Clinical Precedent — existing vaccines/trials for related antigens
Limitations — predicted only (T4 evidence); needs experimental validation

Limitations

All predictions are computational (T4 evidence) — experimental validation (binding assays, immunogenicity studies) is required before any clinical development
No immunogenicity guarantee — MHC binding ≠ immunogenicity; many good binders are not immunogenic in vivo
B-cell epitope prediction is less reliable than T-cell; conformational epitopes require accurate structures
No adjuvant optimization — adjuvant selection requires empirical testing
Pathogen evasion — rapidly evolving pathogens (HIV, influenza) may escape epitope-based vaccines

tooluniverse-vaccine-design