skills/mims-harvard/tooluniverse/tooluniverse-immune-repertoire-analysis

tooluniverse-immune-repertoire-analysis

SKILL.md

ToolUniverse Immune Repertoire Analysis

Comprehensive skill for analyzing T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data to characterize adaptive immune responses, clonal expansion, and antigen specificity.

Overview

Adaptive immune receptor repertoire sequencing (AIRR-seq) enables comprehensive profiling of T-cell and B-cell populations through high-throughput sequencing of TCR and BCR variable regions. This skill provides an 8-phase workflow for:

  • Clonotype identification and tracking
  • Diversity and clonality assessment
  • V(D)J gene usage analysis
  • CDR3 sequence characterization
  • Clonal expansion and convergence detection
  • Epitope specificity prediction
  • Integration with single-cell phenotyping
  • Longitudinal repertoire tracking

Core Workflow

Phase 1: Data Import & Clonotype Definition

Load AIRR-seq data from common formats (MiXCR, ImmunoSEQ, AIRR standard, 10x Genomics VDJ). Standardize columns to: cloneId, count, frequency, cdr3aa, cdr3nt, v_gene, j_gene, chain. Define clonotypes using one of three methods:

  • cdr3aa: Amino acid CDR3 sequence only
  • cdr3nt: Nucleotide CDR3 sequence
  • vj_cdr3: V gene + J gene + CDR3aa (most common, recommended)

Aggregate by clonotype, sort by count, assign ranks.

Phase 2: Diversity & Clonality Analysis

Calculate diversity metrics for the repertoire:

  • Shannon entropy: Overall diversity (higher = more diverse)
  • Simpson index: Probability two random clones are same
  • Inverse Simpson: Effective number of clonotypes
  • Gini coefficient: Inequality in clonotype distribution
  • Clonality: 1 - Pielou's evenness (higher = more clonal)
  • Richness: Number of unique clonotypes

Generate rarefaction curves to assess whether sequencing depth is sufficient.

Phase 3: V(D)J Gene Usage Analysis

Analyze V and J gene usage patterns weighted by clonotype count:

  • V gene family usage frequencies
  • J gene family usage frequencies
  • V-J pairing frequencies
  • Statistical testing for biased usage (chi-square test vs. uniform expectation)

Phase 4: CDR3 Sequence Analysis

Characterize CDR3 sequences:

  • Length distribution: Typical TCR CDR3 = 12-18 aa; BCR CDR3 = 10-20 aa
  • Amino acid composition: Weighted by clonotype frequency
  • Flag unusual length distributions (may indicate PCR bias)

Phase 5: Clonal Expansion Detection

Identify expanded clonotypes above a frequency threshold (default: 95th percentile). Track clonotypes longitudinally across multiple timepoints to measure persistence, mean/max frequency, and fold changes.

Phase 6: Convergence & Public Clonotypes

  • Convergent recombination: Same CDR3 amino acid from different nucleotide sequences (evidence of antigen-driven selection)
  • Public clonotypes: Shared across multiple samples/individuals (may indicate common antigen responses)

Phase 7: Epitope Prediction & Specificity

Query epitope databases for known TCR-epitope associations:

  • IEDB (IEDB_search_tcells): Search by CDR3 receptor sequence
  • VDJdb (manual): https://vdjdb.cdr3.net/search
  • PubMed literature (PubMed_search): Search for CDR3 + epitope/antigen/specificity

Phase 8: Integration with Single-Cell Data

Link TCR/BCR clonotypes to cell phenotypes from paired single-cell RNA-seq:

  • Map clonotypes to cell barcodes
  • Identify expanded clonotype phenotypes on UMAP
  • Analyze clonotype-cluster associations (cross-tabulation)
  • Find cluster-specific clonotypes (>80% cells in one cluster)
  • Differential gene expression: expanded vs. non-expanded cells

ToolUniverse Tool Integration

Key Tools Used:

  • IEDB_search_tcells - Known T-cell epitopes
  • IEDB_search_bcells - Known B-cell epitopes
  • PubMed_search - Literature on TCR/BCR specificity
  • UniProt_get_protein - Antigen protein information

Integration with Other Skills:

  • tooluniverse-single-cell - Single-cell transcriptomics
  • tooluniverse-rnaseq-deseq2 - Bulk RNA-seq analysis
  • tooluniverse-variant-analysis - Somatic hypermutation analysis (BCR)

Quick Start

from tooluniverse import ToolUniverse

# 1. Load data
tcr_data = load_airr_data("clonotypes.txt", format='mixcr')

# 2. Define clonotypes
clonotypes = define_clonotypes(tcr_data, method='vj_cdr3')

# 3. Calculate diversity
diversity = calculate_diversity(clonotypes['count'])
print(f"Shannon entropy: {diversity['shannon_entropy']:.2f}")

# 4. Detect expanded clones
expansion = detect_expanded_clones(clonotypes)
print(f"Expanded clonotypes: {expansion['n_expanded']}")

# 5. Analyze V(D)J usage
vdj_usage = analyze_vdj_usage(tcr_data)

# 6. Query epitope databases
top_clones = expansion['expanded_clonotypes']['clonotype'].head(10)
epitopes = query_epitope_database(top_clones)

References

  • Dash P, et al. (2017) Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature
  • Glanville J, et al. (2017) Identifying specificity groups in the T cell receptor repertoire. Nature
  • Stubbington MJT, et al. (2016) T cell fate and clonality inference from single-cell transcriptomes. Nature Methods
  • Vander Heiden JA, et al. (2014) pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics

See Also

  • ANALYSIS_DETAILS.md - Detailed code snippets for all 8 phases
  • USE_CASES.md - Complete use cases (immunotherapy, vaccine, autoimmune, single-cell integration) and best practices
Weekly Installs
107
GitHub Stars
1.1K
First Seen
Feb 19, 2026
Installed on
gemini-cli104
codex104
opencode103
github-copilot103
cursor101
amp100