tooluniverse-rare-disease-diagnosis
Rare Disease Diagnosis Advisor
Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, update progressively
- Phenotype-driven - Convert symptoms to HPO terms before searching
- Multi-database triangulation - Cross-reference Orphanet, OMIM, OpenTargets
- Evidence grading - Grade diagnoses by supporting evidence strength
- Actionable output - Prioritized differential diagnosis with next steps
- Genetic counseling aware - Consider inheritance patterns and family history
- English-first queries - Always use English terms in tool calls (phenotype descriptions, gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
When to Use
Apply when user asks:
- "Patient has [symptoms], what rare disease could this be?"
- "Unexplained developmental delay with [features]"
- "WES found VUS in [gene], is this pathogenic?"
- "What genes should we test for [phenotype]?"
- "Differential diagnosis for [rare symptom combination]"
Report-First Approach (MANDATORY)
- Create the report file FIRST:
[PATIENT_ID]_rare_disease_report.mdwith all section headers and[Researching...]placeholders - Progressively update as you gather data
- Output separate data files:
[PATIENT_ID]_gene_panel.csv- Prioritized genes for testing[PATIENT_ID]_variant_interpretation.csv- If variants provided
Every finding MUST include source citation (ORPHA code, OMIM number, tool name).
See REPORT_TEMPLATE.md for the full template and example outputs for each phase.
Tool Parameter Corrections
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
OpenTargets_get_associated_diseases_by_target_ensemblId |
ensemblID |
ensemblId |
ClinVar_get_variant_by_id |
variant_id |
id |
MyGene_query_genes |
gene |
q |
gnomAD_get_variant_frequencies |
variant |
variant_id |
Workflow Overview
Phase 1: Phenotype Standardization
Convert symptoms to HPO terms, identify core vs. variable features, note onset/inheritance
|
Phase 2: Disease Matching
Search Orphanet, cross-reference OMIM, query DisGeNET -> Ranked differential diagnosis
|
Phase 3: Gene Panel Identification
Extract genes from top diseases, validate with ClinGen, check expression (GTEx)
|
Phase 3.5: Expression & Tissue Context
CELLxGENE cell-type expression, ChIPAtlas regulatory context
|
Phase 3.6: Pathway Analysis
KEGG pathways, Reactome processes, IntAct protein interactions
|
Phase 4: Variant Interpretation (if provided)
ClinVar lookup, gnomAD frequency, computational predictions (CADD, AlphaMissense, EVE, SpliceAI)
|
Phase 5: Structure Analysis (for VUS)
AlphaFold2 prediction, domain impact assessment (InterPro)
|
Phase 6: Literature Evidence
PubMed studies, BioRxiv/MedRxiv preprints, OpenAlex citation analysis
|
Phase 7: Report Synthesis
Prioritized differential, recommended testing, next steps
For detailed code examples and algorithms for each phase, see DIAGNOSTIC_WORKFLOW.md.
Phase Summaries
Phase 1: Phenotype Standardization
- Use
HPO_search_terms(query=symptom)to convert each clinical description to HPO terms - Classify features as Core (always present), Variable (>50%), Occasional (<50%), or Age-specific
- Record age of onset and family history for inheritance pattern hints
Phase 2: Disease Matching
- Orphanet:
Orphanet_search_diseases(operation="search_diseases", query=keyword)thenOrphanet_get_genes(operation="get_genes", orpha_code=code)for each hit - OMIM:
OMIM_search(operation="search", query=gene)thenOMIM_get_entryandOMIM_get_clinical_synopsisfor details - DisGeNET:
DisGeNET_search_gene(operation="search_gene", gene=symbol)for gene-disease association scores - Score phenotype overlap: Excellent (>80%), Good (60-80%), Possible (40-60%), Unlikely (<40%)
Phase 3: Gene Panel Identification
- Extract genes from top candidate diseases
- ClinGen validation (critical):
ClinGen_search_gene_validity,ClinGen_search_dosage_sensitivity,ClinGen_search_actionability - ClinGen classification determines panel inclusion:
- Definitive/Strong/Moderate: Include in panel
- Limited: Include but flag
- Disputed/Refuted: Exclude
- Expression: Use
MyGene_query_genesfor Ensembl ID, thenGTEx_get_median_gene_expressionto confirm tissue expression - Prioritization scoring: Tier 1 (top disease gene +5), Tier 2 (multi-disease +3), Tier 3 (ClinGen Definitive +3), Tier 4 (tissue expression +2), Tier 5 (pLI >0.9 +1)
Phase 3.5: Expression & Tissue Context
- CELLxGENE:
CELLxGENE_get_expression_dataandCELLxGENE_get_cell_metadatafor cell-type specific expression - ChIPAtlas:
ChIPAtlas_enrichment_analysisandChIPAtlas_get_peak_datafor regulatory context (TF binding) - Confirms candidate genes are expressed in disease-relevant tissues/cells
Phase 3.6: Pathway Analysis
- KEGG:
kegg_find_genes(query="hsa:{gene}")thenkegg_get_gene_infofor pathway membership - IntAct:
intact_search_interactions(query=gene, species="human")for protein-protein interactions - Identify convergent pathways across candidate genes (strengthens candidacy)
Phase 4: Variant Interpretation (if provided)
- ClinVar:
ClinVar_search_variants(query=hgvs)for existing classifications - gnomAD:
gnomAD_get_variant_frequencies(variant_id=id)for population frequency- Ultra-rare (<0.00001), Rare (<0.0001), Low frequency (<0.01), Common (likely benign)
- Computational predictions (for VUS):
- CADD:
CADD_get_variant_score- PHRED >=20 supports PP3 - AlphaMissense:
AlphaMissense_get_variant_score- pathogenic classification = strong PP3 - EVE:
EVE_get_variant_score- score >0.5 supports PP3 - SpliceAI:
SpliceAI_predict_splice- delta score >=0.5 indicates splice impact
- CADD:
- ACMG criteria: PVS1 (null variant), PS1 (same AA change), PM2 (absent from pop), PP3 (computational), BA1 (>5% AF)
- Consensus from 2+ concordant predictors strengthens PP3 evidence
Phase 5: Structure Analysis (for VUS)
- Perform when: VUS, missense in critical domain, novel variant, or additional evidence needed
- AlphaFold2:
NvidiaNIM_alphafold2(sequence=seq, algorithm="mmseqs2")for structure prediction - Domain impact:
InterPro_get_protein_domains(accession=uniprot_id)to check functional domains - Assess pLDDT confidence at variant position, domain location, structural role
Phase 6: Literature Evidence
- PubMed:
PubMed_search_articles(query="disease AND genetics")for published studies - Preprints:
BioRxiv_search_preprints,ArXiv_search_papers(category="q-bio")for latest findings - Citations:
openalex_search_worksfor citation analysis of key papers - Note: preprints are not peer-reviewed; flag accordingly
Phase 7: Report Synthesis
- Compile all phases into final report with evidence grading
- Provide prioritized differential diagnosis with next steps
- Include specialist referral suggestions and family screening recommendations
Evidence Grading
| Tier | Criteria | Example |
|---|---|---|
| T1 (High) | Phenotype match >80% + gene match | Marfan with FBN1 mutation |
| T2 (Medium-High) | Phenotype match 60-80% OR likely pathogenic variant | Good phenotype fit |
| T3 (Medium) | Phenotype match 40-60% OR VUS in candidate gene | Possible diagnosis |
| T4 (Low) | Phenotype <40% OR uncertain gene | Low probability |
Completeness Checklist
Phase 1 (Phenotype): All symptoms as HPO terms, core vs. variable distinguished, onset documented, family history noted
Phase 2 (Disease Matching): >=5 candidates (or all matching), overlap % calculated, inheritance patterns, ORPHA + OMIM IDs
Phase 3 (Gene Panel): >=5 genes prioritized, ClinGen evidence level per gene, expression validated, testing strategy recommended
Phase 4 (Variants): ClinVar classification, gnomAD frequency, ACMG criteria applied, classification justified
Phase 5 (Structure): Structure predicted (if VUS), pLDDT reported, domain impact assessed, structural evidence summarized
Phase 6 (Recommendations): >=3 next steps, specialist referrals, family screening addressed
See CHECKLIST.md for the full interactive checklist.
Fallback Chains
| Primary Tool | Fallback 1 | Fallback 2 |
|---|---|---|
Orphanet_search_by_hpo |
OMIM_search |
PubMed phenotype search |
ClinVar_get_variant |
gnomAD_get_variant |
VEP annotation |
NvidiaNIM_alphafold2 |
alphafold_get_prediction |
UniProt features |
GTEx_expression |
HPA_expression |
Tissue-specific literature |
gnomAD_get_variant |
ExAC_frequencies |
1000 Genomes |
Reference Files
- DIAGNOSTIC_WORKFLOW.md - Detailed code examples and algorithms for each phase
- REPORT_TEMPLATE.md - Report template, phase output examples, CSV formats
- TOOLS_REFERENCE.md - Complete tool documentation
- CHECKLIST.md - Interactive completeness checklist
- EXAMPLES.md - Worked diagnosis examples