tooluniverse-multiomic-disease-characterization
Multi-Omics Disease Characterization Pipeline
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Disease disambiguation FIRST - Resolve all identifiers before omics analysis
- Layer-by-layer analysis - Systematically cover all omics layers
- Cross-layer integration - Identify genes/targets appearing in multiple layers
- Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
- Tissue context - Emphasize disease-relevant tissues/organs
- Quantitative scoring - Multi-Omics Confidence Score (0-100)
- Druggable focus - Prioritize targets with therapeutic potential
- Biomarker identification - Highlight diagnostic/prognostic markers
- Mechanistic synthesis - Generate testable hypotheses
- Source references - Every statement must cite tool/database
- Completeness checklist - Mandatory section showing analysis coverage
- English-first queries - Always use English terms in tool calls. Respond in user's language
When to Use This Skill
Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes
NOT for (use other skills instead):
- Single gene/target validation -> Use
tooluniverse-drug-target-validation - Drug safety profiling -> Use
tooluniverse-adverse-event-detection - General disease overview -> Use
tooluniverse-disease-research - Variant interpretation -> Use
tooluniverse-variant-interpretation - GWAS-specific analysis -> Use
tooluniverse-gwas-*skills - Pathway-only analysis -> Use
tooluniverse-systems-biology
Input Parameters
| Parameter | Required | Description | Example |
|---|---|---|---|
| disease | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | Alzheimer disease, MONDO_0004975 |
| tissue | No | Tissue/organ of interest | brain, liver, blood |
| focus_layers | No | Specific omics layers to emphasize | genomics, transcriptomics, pathways |
Multi-Omics Confidence Score (0-100)
Score Components
Data Availability (0-40 points):
- Genomics data available (GWAS or rare variants): 10 points
- Transcriptomics data available (DEGs or expression): 10 points
- Protein data available (PPI or expression): 5 points
- Pathway data available (enriched pathways): 10 points
- Clinical/drug data available (approved drugs or trials): 5 points
Evidence Concordance (0-40 points):
- Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
- Consistent direction (genetics + expression concordant): 10 points
- Pathway-gene concordance (genes found in enriched pathways): 10 points
Evidence Quality (0-20 points):
- Strong genetic evidence (GWAS p < 5e-8): 10 points
- Clinical validation (approved drugs): 10 points
Score Interpretation
| Score | Tier | Interpretation |
|---|---|---|
| 80-100 | Excellent | Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance |
| 60-79 | Good | Good coverage across most layers, some gaps |
| 40-59 | Moderate | Moderate coverage, limited cross-layer integration |
| 0-39 | Limited | Limited data, single-layer analysis dominates |
Evidence Grading System
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Direct human evidence, clinical proof | FDA-approved drug, GWAS hit (p<5e-8), clinical trial result |
| T2 | [T2] | Experimental evidence | Differential expression (validated), functional screen, mouse KO |
| T3 | [T3] | Computational/database evidence | PPI network, pathway mapping, expression correlation |
| T4 | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction |
Report Template
Create this file structure at the start: {disease_name}_multiomic_report.md
# Multi-Omics Disease Characterization: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
**Multi-Omics Confidence Score**: (to be calculated)
---
## Executive Summary
(2-3 sentence disease mechanism synthesis - fill after all layers complete)
---
## 1. Disease Definition & Context
### Disease Identifiers
| System | ID | Source |
|--------|-----|--------|
### Description
### Synonyms
### Disease Hierarchy (parents/children)
### Affected Tissues/Organs
### Therapeutic Areas
**Sources**: (tools used)
---
## 2. Genomics Layer
### 2.1 GWAS Associations
| SNP | P-value | Effect | Gene | Study | Source |
|-----|---------|--------|------|-------|--------|
### 2.2 GWAS Studies Summary
| Study ID | Trait | Sample Size | Year | Source |
|----------|-------|-------------|------|--------|
### 2.3 Associated Genes (Genetic Evidence)
| Gene | Ensembl ID | Association Score | Evidence Type | Source |
|------|------------|-------------------|---------------|--------|
### 2.4 Rare Variants (ClinVar)
| Variant | Gene | Clinical Significance | Source |
|---------|------|-----------------------|--------|
### Genomics Layer Summary
- Total GWAS hits:
- Top genes by genetic evidence:
- Genetic architecture:
**Sources**: (tools used)
---
## 3. Transcriptomics Layer
### 3.1 Differential Expression Studies
| Experiment | Condition | Up-regulated | Down-regulated | Source |
|------------|-----------|--------------|----------------|--------|
### 3.2 Expression Atlas Disease Evidence
| Gene | Score | Source |
|------|-------|--------|
### 3.3 Tissue Expression Patterns (GTEx/HPA)
| Gene | Tissue | Expression Level | Source |
|------|--------|-----------------|--------|
### 3.4 Biomarker Candidates (Expression-Based)
| Gene | Tissue Specificity | Fold Change | Evidence | Source |
|------|-------------------|-------------|----------|--------|
### Transcriptomics Layer Summary
- Differential expression datasets:
- Top DEGs:
- Tissue-specific patterns:
**Sources**: (tools used)
---
## 4. Proteomics & Interaction Layer
### 4.1 Protein-Protein Interactions (STRING)
| Protein A | Protein B | Score | Source |
|-----------|-----------|-------|--------|
### 4.2 Hub Genes (Network Centrality)
| Gene | Degree | Betweenness | Role | Source |
|------|--------|-------------|------|--------|
### 4.3 Protein Complexes (IntAct)
| Complex | Members | Function | Source |
|---------|---------|----------|--------|
### 4.4 Tissue-Specific PPI Network
| Gene | Interaction Score | Tissue | Source |
|------|-------------------|--------|--------|
### Proteomics Layer Summary
- Total PPIs:
- Hub genes:
- Network modules:
**Sources**: (tools used)
---
## 5. Pathway & Network Layer
### 5.1 Enriched Pathways (Enrichr/Reactome)
| Pathway | Database | P-value | Genes | Source |
|---------|----------|---------|-------|--------|
### 5.2 Reactome Pathway Details
| Pathway ID | Name | Genes Involved | Source |
|------------|------|----------------|--------|
### 5.3 KEGG Pathways
| Pathway ID | Name | Description | Source |
|------------|------|-------------|--------|
### 5.4 WikiPathways
| Pathway ID | Name | Organism | Source |
|------------|------|----------|--------|
### Pathway Layer Summary
- Top enriched pathways:
- Key pathway nodes:
- Cross-pathway connections:
**Sources**: (tools used)
---
## 6. Gene Ontology & Functional Annotation
### 6.1 Biological Processes
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|
### 6.2 Molecular Functions
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|
### 6.3 Cellular Components
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|
**Sources**: (tools used)
---
## 7. Therapeutic Landscape
### 7.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Target | Phase | Source |
|------|-----------|-----------|--------|-------|--------|
### 7.2 Druggable Targets
| Gene | Tractability | Modality | Clinical Precedent | Source |
|------|-------------|----------|-------------------|--------|
### 7.3 Drug Repurposing Candidates
| Drug | Original Indication | Mechanism | Target | Source |
|------|---------------------|-----------|--------|--------|
### 7.4 Clinical Trials
| NCT ID | Title | Phase | Status | Intervention | Source |
|--------|-------|-------|--------|--------------|--------|
### Therapeutic Summary
- Approved drugs:
- Clinical pipeline:
- Novel targets:
**Sources**: (tools used)
---
## 8. Multi-Omics Integration
### 8.1 Cross-Layer Gene Concordance
| Gene | Genomics | Transcriptomics | Proteomics | Pathways | Layers | Evidence Tier |
|------|----------|-----------------|------------|----------|--------|---------------|
### 8.2 Multi-Omics Hub Genes (Top 20)
| Rank | Gene | Layers Found | Key Evidence | Druggable | Source |
|------|------|-------------|--------------|-----------|--------|
### 8.3 Biomarker Candidates
| Biomarker | Type | Evidence Layers | Confidence | Source |
|-----------|------|-----------------|------------|--------|
### 8.4 Mechanistic Hypotheses
1. (Hypothesis with supporting evidence from multiple layers)
2. ...
### 8.5 Systems-Level Insights
- Key disrupted processes:
- Critical pathway nodes:
- Therapeutic intervention points:
- Testable hypotheses:
---
## Multi-Omics Confidence Score
| Component | Points | Max | Details |
|-----------|--------|-----|---------|
| Genomics data | | 10 | |
| Transcriptomics data | | 10 | |
| Protein data | | 5 | |
| Pathway data | | 10 | |
| Clinical data | | 5 | |
| Multi-layer genes | | 20 | |
| Direction concordance | | 10 | |
| Pathway-gene concordance | | 10 | |
| Genetic evidence quality | | 10 | |
| Clinical validation | | 10 | |
| **TOTAL** | | **100** | |
**Score**: XX/100 - [Tier]
---
## Data Availability Checklist
| Omics Layer | Data Available | Tools Used | Findings |
|-------------|---------------|------------|----------|
| Genomics (GWAS) | Yes/No | | |
| Genomics (Rare Variants) | Yes/No | | |
| Transcriptomics (DEGs) | Yes/No | | |
| Transcriptomics (Expression) | Yes/No | | |
| Proteomics (PPI) | Yes/No | | |
| Proteomics (Expression) | Yes/No | | |
| Pathways (Enrichment) | Yes/No | | |
| Pathways (KEGG/Reactome) | Yes/No | | |
| Gene Ontology | Yes/No | | |
| Drugs/Therapeutics | Yes/No | | |
| Clinical Trials | Yes/No | | |
| Literature | Yes/No | | |
---
## Completeness Checklist
- [ ] Disease disambiguation complete (IDs resolved)
- [ ] Genomics layer analyzed (GWAS + variants)
- [ ] Transcriptomics layer analyzed (DEGs + expression)
- [ ] Proteomics layer analyzed (PPI + interactions)
- [ ] Pathway layer analyzed (enrichment + mapping)
- [ ] Gene Ontology analyzed (BP + MF + CC)
- [ ] Therapeutic landscape analyzed (drugs + targets + trials)
- [ ] Cross-layer integration complete (concordance analysis)
- [ ] Multi-Omics Confidence Score calculated
- [ ] Biomarker candidates identified
- [ ] Hub genes identified
- [ ] Mechanistic hypotheses generated
- [ ] Executive summary written
- [ ] All sections have source citations
---
## References
### Data Sources Used
| # | Tool | Parameters | Section | Items Retrieved |
|---|------|------------|---------|-----------------|
### Database Versions
- OpenTargets: (current)
- GWAS Catalog: (current)
- STRING: (current)
- Reactome: (current)
Phase 0: Disease Disambiguation (ALWAYS FIRST)
Objective: Resolve disease to standard identifiers for all downstream queries.
Tools Used
OpenTargets_get_disease_id_description_by_name (primary):
- Input:
diseaseName(string) - Disease name - Output:
{data: {search: {hits: [{id, name, description}]}}} - Use: Get MONDO/EFO IDs and description
- CRITICAL: Disease IDs from OpenTargets use underscore format (e.g.,
MONDO_0004975), NOT colon format
OSL_get_efo_id_by_disease_name (secondary):
- Input:
disease(string) - Disease name - Output:
{efo_id, name} - Use: Get EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
- Input:
efoId(string) - Disease ID (e.g.,MONDO_0004975) - Output:
{data: {disease: {id, name, description, dbXRefs}}} - Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)
OpenTargets_get_disease_synonyms_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
- Input:
inputId(string) - Any known disease ID (e.g.,OMIM:104300,UMLS:C0002395) - Output:
{data: {disease: {id, name, dbXRefs: [str], ...}}} - Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.
Workflow
- Search by disease name to get primary ID (OpenTargets)
- Get full description and cross-references
- Get synonyms for search term expansion
- Get therapeutic areas for context
- Get disease hierarchy (parents/children)
- If user provided OMIM/other ID, map to MONDO/EFO first
Collision-Aware Search
When disease name returns multiple hits:
- Check if user's input matches any hit exactly
- If ambiguous, present top 3-5 options and ask user to select
- Always prefer the most specific disease (not parent categories)
- For cancer, prefer the specific tumor type over generic "cancer"
Key Disease IDs to Track
After disambiguation, store these for all downstream queries:
efo_id- Primary ID for OpenTargets queries (e.g.,MONDO_0004975)disease_name- Canonical name (e.g.,Alzheimer disease)synonyms- For literature search expansiontherapeutic_areas- For contextdbXRefs- Cross-references (OMIM, UMLS, DOID, etc.)
Phase 1: Genomics Layer
Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.
Tools Used
OpenTargets_get_associated_targets_by_disease_efoId (primary):
- Input:
efoId(string) - Disease EFO/MONDO ID - Output:
{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}} - Use: Get ALL disease-associated genes ranked by overall evidence score
- NOTE: Returns top 25 by default. For comprehensive analysis, note the total
count
OpenTargets_get_evidence_by_datasource:
- Input:
efoId(string),ensemblId(string), optionaldatasourceIds(array),size(int, default 50) - Output:
{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}} - Use: Get specific evidence types. Key datasourceIds for genomics:
['ot_genetics_portal']- GWAS/genetics['gene2phenotype', 'genomics_england', 'orphanet']- Rare variants['eva']- ClinVar variants
gwas_search_associations (GWAS Catalog):
- Input:
disease_trait(string),size(int, default 20) - Output:
{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}} - Use: Get genome-wide significant associations
- NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results
gwas_get_studies_for_trait:
- Input:
disease_trait(string),size(int) - Output:
{data: [...studies], metadata: {pagination}} - NOTE: May return empty if trait name does not match exactly. Try synonyms
gwas_get_variants_for_trait:
- Input:
disease_trait(string),size(int) - Output:
{data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
- Input:
gene_name(string) - Output: Associations for a specific gene
OpenTargets_search_gwas_studies_by_disease:
- Input:
diseaseIds(array of strings),enableIndirect(bool, default true),size(int, default 10) - Output:
{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}} - Use: Get GWAS studies from OpenTargets genetics portal
clinvar_search_variants:
- Input:
condition(string) orgene(string), optionalmax_results(int) - Output: List of ClinVar variants with clinical significance
- Use: Rare variant / monogenic disease evidence
Workflow
- Get associated genes from OpenTargets (overall scores)
- For top 10-15 genes, get genetic evidence specifically via
OpenTargets_get_evidence_by_datasource - Search GWAS Catalog for associations
- Search OpenTargets GWAS studies
- Search ClinVar for rare variants
- For top GWAS genes, check
GWAS_search_associations_by_gene
Gene Tracking
Maintain a dictionary of genes found in genomics layer:
genomics_genes = {
'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
# ...
}
Phase 2: Transcriptomics Layer
Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
Tools Used
ExpressionAtlas_search_differential:
- Input: optional
gene(string),condition(string),species(string, default 'homo sapiens') - Output: Differential expression studies and results
- Use: Find studies where genes are differentially expressed in disease
ExpressionAtlas_search_experiments:
- Input: optional
gene(string),condition(string),species(string) - Output: Expression experiments relevant to condition
- Use: Find all Expression Atlas experiments for the disease
expression_atlas_disease_target_score:
- Input:
efoId(string),pageSize(int, required) - Output: Genes scored by expression evidence for the disease
- Use: Get expression-based disease-gene association scores
europepmc_disease_target_score:
- Input:
efoId(string),pageSize(int, required) - Output: Genes scored by literature evidence for the disease
- Use: Complement expression evidence with literature-mined associations
HPA_get_rna_expression_by_source (Human Protein Atlas):
- Input:
gene_name(string),source_type(string: 'tissue', 'blood', 'brain'),source_name(string: e.g., 'brain', 'liver') - Output:
{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}} - NOTE: ALL 3 params required.
source_typeoptions: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
HPA_get_rna_expression_in_specific_tissues:
- Input:
gene_name(string),tissues(array of strings) - Output: Expression across specified tissues
HPA_get_cancer_prognostics_by_gene:
- Input:
gene_name(string) - Output: Cancer prognostic data (if cancer context)
HPA_get_subcellular_location:
- Input:
gene_name(string) - Output: Subcellular localization data
HPA_search_genes_by_query:
- Input:
query(string) - Output: Matching genes in HPA
Workflow
- Search Expression Atlas for differential expression studies
- Get expression-based disease scores
- Get literature-based disease scores (EuropePMC)
- For top 10-15 genes from genomics layer, check tissue expression via HPA
- Check disease-relevant tissue expression patterns
- For cancer: check prognostic biomarkers
Gene Tracking
Add transcriptomics genes to tracking:
transcriptomics_genes = {
'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
# ...
}
Phase 3: Proteomics & Interaction Layer
Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.
Tools Used
STRING_get_interaction_partners (primary PPI):
- Input:
protein_ids(array of strings - gene names work),species(int, default 9606),confidence_score(float, default 0.4),limit(int, default 20) - Output:
{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]} - Use: Get interaction partners for disease genes
- NOTE:
protein_idsis an array, NOT string. Gene symbols like['APOE']work
STRING_get_network:
- Input:
protein_ids(array),species(int),confidence_score(float) - Output: Network of interactions between input proteins
- Use: Build disease-specific PPI network
STRING_functional_enrichment:
- Input:
protein_ids(array),species(int) - Output: Functional enrichment results (GO, KEGG, etc.)
- Use: Functional characterization of disease gene set
STRING_ppi_enrichment:
- Input:
protein_ids(array),species(int) - Output: Statistical test for PPI enrichment (more interactions than expected)
- Use: Test if disease genes form a connected module
intact_get_interactions:
- Input:
identifier(string - UniProt ID or gene name) - Output: Molecular interaction data from IntAct
intact_search_interactions:
- Input:
query(string),first(int, default 0),max(int, default 25) - Output: Search results for interactions
HPA_get_protein_interactions_by_gene:
- Input:
gene_name(string) - Output:
{gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
- Input:
gene_list(array),tissue(string),max_node(int),interaction(string),string_mode(bool) - Output: Tissue-specific PPI network
- NOTE: ALL params required.
interactionoptions: 'coexpression', 'interaction', 'coexpression_and_interaction'.string_mode: true/false
Workflow
- Take top 15-20 genes from genomics + transcriptomics layers
- Query STRING for interaction partners of each gene
- Build composite PPI network using STRING_get_network
- Test PPI enrichment (are genes more connected than random?)
- Get functional enrichment from STRING
- For disease-relevant tissue, get tissue-specific network (HumanBase)
- Identify hub genes (highest degree centrality)
- Check IntAct for experimentally validated interactions
Hub Gene Analysis
Calculate network centrality metrics:
- Degree: Number of interaction partners
- Betweenness: Number of shortest paths through node
- Hub score: Genes with degree > mean + 1 SD are hubs
Phase 4: Pathway & Network Layer
Objective: Identify enriched biological pathways and cross-pathway connections.
Tools Used
enrichr_gene_enrichment_analysis (primary enrichment):
- Input:
gene_list(array of gene symbols, min 2),libs(array of library names) - Output:
{status: 'success', data: '{...JSON string with enrichment results...}'} - Key libraries:
['KEGG_2021_Human'],['Reactome_2022'],['WikiPathway_2023_Human'],['GO_Biological_Process_2023'],['GO_Molecular_Function_2023'],['GO_Cellular_Component_2023'] - NOTE:
datafield is a JSON string, needs parsing. Containsconnected_pathsand per-library results - NOTE:
libsis REQUIRED as array
ReactomeAnalysis_pathway_enrichment:
- Input:
identifiers(string - space-separated gene list), optionalpage_size(int, default 20),include_disease(bool),projection(bool) - Output:
{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}} - Use: Reactome-specific pathway enrichment with statistical testing
Reactome_map_uniprot_to_pathways:
- Input:
id(string - UniProt accession) - Output: List of Reactome pathways containing this protein
- Use: Map individual proteins to pathways
Reactome_get_pathway:
- Input:
stId(string - Reactome stable ID, e.g., 'R-HSA-73817') - Output: Pathway details
Reactome_get_pathway_reactions:
- Input:
stId(string) - Output: Reactions within pathway
kegg_search_pathway:
- Input:
keyword(string) - Output: Array of KEGG pathway matches
kegg_get_pathway_info:
- Input:
pathway_id(string, e.g., 'hsa04930') - Output: Detailed pathway information
WikiPathways_search:
- Input:
query(string), optionalorganism(string, e.g., 'Homo sapiens') - Output: Matching community-curated pathways
Workflow
- Collect all genes from genomics + transcriptomics layers (top 20-30)
- Run Enrichr enrichment for KEGG, Reactome, WikiPathways
- Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
- Search KEGG for disease-specific pathways
- Search WikiPathways for disease pathways
- For top Reactome pathways, get detailed reactions
- Identify cross-pathway connections (genes in multiple pathways)
Phase 5: Gene Ontology & Functional Annotation
Objective: Characterize biological processes, molecular functions, and cellular components.
Tools Used
enrichr_gene_enrichment_analysis (GO enrichment):
- Use with
libs=['GO_Biological_Process_2023']for BP - Use with
libs=['GO_Molecular_Function_2023']for MF - Use with
libs=['GO_Cellular_Component_2023']for CC
GO_get_annotations_for_gene:
- Input:
gene_id(string - gene symbol or UniProt ID) - Output: List of GO annotations with terms, aspects, evidence codes
GO_search_terms:
- Input:
query(string) - Output: Matching GO terms
QuickGO_annotations_by_gene:
- Input:
gene_product_id(string - UniProt accession, e.g., 'UniProtKB:P02649'), optionalaspect(string: 'biological_process', 'molecular_function', 'cellular_component'),taxon_id(int: 9606),limit(int: 25) - Output: GO annotations with evidence codes
OpenTargets_get_target_gene_ontology_by_ensemblID:
- Input:
ensemblId(string) - Output: GO terms associated with target
Workflow
- Run Enrichr GO enrichment for all 3 aspects using combined gene list
- For top 5 genes, get detailed GO annotations from QuickGO
- For top genes, get OpenTargets GO terms
- Summarize key biological processes, molecular functions, cellular components
Phase 6: Therapeutic Landscape
Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
Tools Used
OpenTargets_get_associated_drugs_by_disease_efoId (primary):
- Input:
efoId(string),size(int, REQUIRED - use 100) - Output:
{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}} - Use: All drugs associated with disease (approved + investigational)
OpenTargets_get_target_tractability_by_ensemblID:
- Input:
ensemblId(string) - Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)
OpenTargets_get_associated_drugs_by_target_ensemblID:
- Input:
ensemblId(string),size(int, REQUIRED) - Output: Drugs targeting this gene/protein
search_clinical_trials:
- Input:
query_term(string, REQUIRED), optionalcondition(string),intervention(string),pageSize(int, default 10) - Output: Clinical trial results
- NOTE:
query_termis REQUIRED even ifconditionis provided
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
- Input:
chemblId(string) - Output: Mechanism of action details
Workflow
- Get all drugs for disease from OpenTargets
- For top disease-associated genes, check tractability
- For top genes with no approved drugs, identify repurposing candidates
- Search clinical trials for disease
- For top approved drugs, get mechanism of action
Drug Tracking
drug_targets = {
'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
# ...
}
Phase 7: Multi-Omics Integration
Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.
Cross-Layer Gene Concordance Analysis
This is the core integrative step. For each gene found in the analysis:
-
Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
-
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
-
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation
Biomarker Identification
For each multi-omics hub gene, assess biomarker potential:
- Diagnostic: Gene expression distinguishes disease vs healthy
- Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
- Predictive: Variant/expression predicts treatment response (pharmacogenomics)
- Evidence level: Number of supporting omics layers
Mechanistic Hypothesis Generation
From the integrated data:
- Identify the most supported biological processes (GO + pathways)
- Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
- Identify intervention points (druggable nodes in the causal chain)
- Generate testable hypotheses
Confidence Score Calculation
Calculate the Multi-Omics Confidence Score (0-100) based on:
- Data availability across layers
- Cross-layer concordance
- Evidence quality
- Clinical validation
Phase 8: Report Finalization
Executive Summary
Write a 2-3 sentence synthesis covering:
- Disease mechanism in systems terms
- Key genes/pathways identified
- Therapeutic opportunities
Final Report Quality Checklist
Before presenting to user, verify:
- All 8 sections have content (or marked as "No data available")
- Every data point has a source citation
- Executive summary reflects key findings
- Multi-Omics Confidence Score calculated
- Top 20 genes ranked by multi-omics evidence
- Top 10 enriched pathways listed
- Biomarker candidates identified
- Cross-layer concordance table complete
- Therapeutic opportunities summarized
- Mechanistic hypotheses generated
- Data Availability Checklist complete
- Completeness Checklist complete
- References section lists all tools used
Tool Parameter Quick Reference
| Tool | Key Parameters | Notes |
|---|---|---|
OpenTargets_get_disease_id_description_by_name |
diseaseName |
Primary disambiguation |
OSL_get_efo_id_by_disease_name |
disease |
Secondary disambiguation |
OpenTargets_get_associated_targets_by_disease_efoId |
efoId |
Returns top 25 genes |
OpenTargets_get_evidence_by_datasource |
efoId, ensemblId, datasourceIds[], size |
Per-gene evidence |
OpenTargets_search_gwas_studies_by_disease |
diseaseIds[], size |
GWAS studies |
gwas_search_associations |
disease_trait, size |
GWAS Catalog |
clinvar_search_variants |
condition or gene, max_results |
Rare variants |
ExpressionAtlas_search_differential |
condition, species |
DEGs |
expression_atlas_disease_target_score |
efoId, pageSize (REQUIRED) |
Expression scores |
europepmc_disease_target_score |
efoId, pageSize (REQUIRED) |
Literature scores |
HPA_get_rna_expression_by_source |
gene_name, source_type, source_name (ALL REQUIRED) |
Tissue expression |
STRING_get_interaction_partners |
protein_ids[], species (9606), limit |
PPI partners |
STRING_get_network |
protein_ids[], species |
PPI network |
STRING_functional_enrichment |
protein_ids[], species |
Functional enrichment |
STRING_ppi_enrichment |
protein_ids[], species |
Network significance |
intact_search_interactions |
query, max |
Experimental PPIs |
humanbase_ppi_analysis |
gene_list[], tissue, max_node, interaction, string_mode (ALL REQ) |
Tissue PPI |
enrichr_gene_enrichment_analysis |
gene_list[], libs[] (BOTH REQUIRED) |
Pathway/GO enrichment |
ReactomeAnalysis_pathway_enrichment |
identifiers (space-sep string) |
Reactome enrichment |
Reactome_map_uniprot_to_pathways |
id (UniProt accession) |
Protein-pathway mapping |
kegg_search_pathway |
keyword |
KEGG pathway search |
WikiPathways_search |
query, organism |
WikiPathways search |
GO_get_annotations_for_gene |
gene_id |
GO annotations |
QuickGO_annotations_by_gene |
gene_product_id (e.g., 'UniProtKB:P02649') |
Detailed GO |
OpenTargets_get_associated_drugs_by_disease_efoId |
efoId, size (REQUIRED) |
Disease drugs |
OpenTargets_get_target_tractability_by_ensemblID |
ensemblId |
Druggability |
search_clinical_trials |
query_term (REQUIRED), condition, pageSize |
Clinical trials |
PubMed_search_articles |
query, limit |
Literature |
ensembl_lookup_gene |
gene_id, species ('homo_sapiens' REQUIRED) |
Gene lookup |
MyGene_query_genes |
query, species, fields, size |
Gene info |
OpenTargets_get_similar_entities_by_disease_efoId |
efoId, threshold, size (ALL REQUIRED) |
Similar diseases |
Response Format Notes (Verified)
OpenTargets Associated Targets
{
"data": {
"disease": {
"id": "MONDO_0004975",
"name": "Alzheimer disease",
"associatedTargets": {
"count": 2456,
"rows": [
{
"target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
"score": 0.87
}
]
}
}
}
}
GWAS Catalog Associations
{
"data": [
{
"association_id": 216440893,
"p_value": 2e-09,
"or_per_copy_num": 0.94,
"or_value": "0.94",
"efo_traits": [{"..."}],
"risk_frequency": "NR"
}
],
"metadata": {"pagination": {"totalElements": 1061816}}
}
STRING Interactions
{
"status": "success",
"data": [
{
"stringId_A": "9606.ENSP00000252486",
"stringId_B": "9606.ENSP00000466775",
"preferredName_A": "APOE",
"preferredName_B": "APOC2",
"score": 0.999
}
]
}
Reactome Enrichment
{
"data": {
"token": "...",
"pathways_found": 154,
"pathways": [
{
"pathway_id": "R-HSA-1251985",
"name": "Nuclear signaling by ERBB4",
"species": "Homo sapiens",
"is_disease": false,
"is_lowest_level": true,
"entities_found": 3,
"entities_total": 47,
"entities_ratio": 0.00291,
"p_value": 4.0e-06,
"fdr": 0.00068,
"reactions_found": 3,
"reactions_total": 34
}
]
}
}
HPA RNA Expression
{
"status": "success",
"data": {
"gene_name": "APOE",
"source_type": "tissue",
"source_name": "brain",
"expression_value": "2714.9",
"expression_level": "very high",
"expression_unit": "nTPM"
}
}
Enrichr Results
{
"status": "success",
"data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}
NOTE: The data field is a JSON string that needs parsing.
Common Use Patterns
1. Comprehensive Disease Profiling
User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report
2. Therapeutic Target Discovery
User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent
3. Biomarker Identification
User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential
4. Mechanism Elucidation
User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections
5. Drug Repurposing
User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes
6. Systems Biology
User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules
Edge Case Handling
Rare Diseases (limited data)
- Genomics layer may dominate (single gene)
- Limited GWAS data (monogenic)
- Focus on ClinVar variants, pathway consequences
- Confidence score will be lower (less cross-layer data)
Common Diseases (overwhelming data)
- Thousands of GWAS associations
- Prioritize by effect size and significance
- Focus on top 20-30 genes for downstream analysis
- Use strict significance thresholds (p < 5e-8)
Cancer
- Include somatic mutations (if CIViC/cBioPortal available)
- Check cancer prognostics via HPA
- Include tumor-specific expression patterns
- Clinical trial landscape may be extensive
Monogenic Diseases
- Single gene dominates
- ClinVar/OMIM evidence is primary
- Pathway analysis reveals downstream effects
- Therapeutic landscape may be limited (gene therapy, enzyme replacement)
Polygenic Diseases
- Many weak genetic signals
- GWAS provides the gene list
- Pathway enrichment reveals convergent biology
- Network analysis identifies hub genes
Tissue Ambiguity
- Diseases affecting multiple tissues
- Query HPA for all relevant tissues
- Compare tissue-specific expression patterns
- Use tissue context from disease ontology
Fallback Strategies
If disease name not found
- Try synonyms
- Try broader disease category
- Try OMIM/UMLS ID mapping
- Report disambiguation failure and ask user
If no GWAS data
- Check ClinVar for rare variants
- Use OpenTargets genetic evidence
- Note in report as "Limited genetic data"
- Adjust confidence score accordingly
If no expression data
- Try different disease name/synonym
- Check HPA for individual gene expression
- Use OpenTargets expression evidence
- Note as "Limited transcriptomics data"
If no pathway enrichment
- Reduce gene list stringency
- Try different pathway databases
- Map individual genes to pathways via Reactome
- Note as "No significant pathway enrichment"
If no drugs found
- Check if disease is rare/orphan
- Look for drugs targeting individual genes
- Check clinical trials for investigational therapies
- Note as "No approved drugs - novel therapeutic opportunity"