tooluniverse-multiomic-disease-characterization
Multi-Omics Disease Characterization Pipeline
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Disease disambiguation FIRST - Resolve all identifiers before omics analysis
- Layer-by-layer analysis - Systematically cover all omics layers
- Cross-layer integration - Identify genes/targets appearing in multiple layers
- Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
- Tissue context - Emphasize disease-relevant tissues/organs
- Quantitative scoring - Multi-Omics Confidence Score (0-100)
- Druggable focus - Prioritize targets with therapeutic potential
- Biomarker identification - Highlight diagnostic/prognostic markers
- Mechanistic synthesis - Generate testable hypotheses
- Source references - Every statement must cite tool/database
- Completeness checklist - Mandatory section showing analysis coverage
- English-first queries - Always use English terms in tool calls. Respond in user's language
When to Use This Skill
Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes
NOT for (use other skills instead):
- Single gene/target validation -> Use
tooluniverse-drug-target-validation - Drug safety profiling -> Use
tooluniverse-adverse-event-detection - General disease overview -> Use
tooluniverse-disease-research - Variant interpretation -> Use
tooluniverse-variant-interpretation - GWAS-specific analysis -> Use
tooluniverse-gwas-*skills - Pathway-only analysis -> Use
tooluniverse-systems-biology
Input Parameters
| Parameter | Required | Description | Example |
|---|---|---|---|
| disease | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | Alzheimer disease, MONDO_0004975 |
| tissue | No | Tissue/organ of interest | brain, liver, blood |
| focus_layers | No | Specific omics layers to emphasize | genomics, transcriptomics, pathways |
Multi-Omics Confidence Score (0-100)
Score Components
Data Availability (0-40 points):
- Genomics data available (GWAS or rare variants): 10 points
- Transcriptomics data available (DEGs or expression): 10 points
- Protein data available (PPI or expression): 5 points
- Pathway data available (enriched pathways): 10 points
- Clinical/drug data available (approved drugs or trials): 5 points
Evidence Concordance (0-40 points):
- Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
- Consistent direction (genetics + expression concordant): 10 points
- Pathway-gene concordance (genes found in enriched pathways): 10 points
Evidence Quality (0-20 points):
- Strong genetic evidence (GWAS p < 5e-8): 10 points
- Clinical validation (approved drugs): 10 points
Score Interpretation
| Score | Tier | Interpretation |
|---|---|---|
| 80-100 | Excellent | Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance |
| 60-79 | Good | Good coverage across most layers, some gaps |
| 40-59 | Moderate | Moderate coverage, limited cross-layer integration |
| 0-39 | Limited | Limited data, single-layer analysis dominates |
Evidence Grading System
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Direct human evidence, clinical proof | FDA-approved drug, GWAS hit (p<5e-8), clinical trial result |
| T2 | [T2] | Experimental evidence | Differential expression (validated), functional screen, mouse KO |
| T3 | [T3] | Computational/database evidence | PPI network, pathway mapping, expression correlation |
| T4 | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction |
Report Template
Create this file structure at the start: {disease_name}_multiomic_report.md
# Multi-Omics Disease Characterization: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
**Multi-Omics Confidence Score**: (to be calculated)
---
## Executive Summary
(2-3 sentence disease mechanism synthesis - fill after all layers complete)
---
## 1. Disease Definition & Context
### Disease Identifiers
| System | ID | Source |
|--------|-----|--------|
### Description
### Synonyms
### Disease Hierarchy (parents/children)
### Affected Tissues/Organs
### Therapeutic Areas
**Sources**: (tools used)
---
## 2. Genomics Layer
### 2.1 GWAS Associations
| SNP | P-value | Effect | Gene | Study | Source |
|-----|---------|--------|------|-------|--------|
### 2.2 GWAS Studies Summary
| Study ID | Trait | Sample Size | Year | Source |
|----------|-------|-------------|------|--------|
### 2.3 Associated Genes (Genetic Evidence)
| Gene | Ensembl ID | Association Score | Evidence Type | Source |
|------|------------|-------------------|---------------|--------|
### 2.4 Rare Variants (ClinVar)
| Variant | Gene | Clinical Significance | Source |
|---------|------|-----------------------|--------|
### Genomics Layer Summary
- Total GWAS hits:
- Top genes by genetic evidence:
- Genetic architecture:
**Sources**: (tools used)
---
## 3. Transcriptomics Layer
### 3.1 Differential Expression Studies
| Experiment | Condition | Up-regulated | Down-regulated | Source |
|------------|-----------|--------------|----------------|--------|
### 3.2 Expression Atlas Disease Evidence
| Gene | Score | Source |
|------|-------|--------|
### 3.3 Tissue Expression Patterns (GTEx/HPA)
| Gene | Tissue | Expression Level | Source |
|------|--------|-----------------|--------|
### 3.4 Biomarker Candidates (Expression-Based)
| Gene | Tissue Specificity | Fold Change | Evidence | Source |
|------|-------------------|-------------|----------|--------|
### Transcriptomics Layer Summary
- Differential expression datasets:
- Top DEGs:
- Tissue-specific patterns:
**Sources**: (tools used)
---
## 4. Proteomics & Interaction Layer
### 4.1 Protein-Protein Interactions (STRING)
| Protein A | Protein B | Score | Source |
|-----------|-----------|-------|--------|
### 4.2 Hub Genes (Network Centrality)
| Gene | Degree | Betweenness | Role | Source |
|------|--------|-------------|------|--------|
### 4.3 Protein Complexes (IntAct)
| Complex | Members | Function | Source |
|---------|---------|----------|--------|
### 4.4 Tissue-Specific PPI Network
| Gene | Interaction Score | Tissue | Source |
|------|-------------------|--------|--------|
### Proteomics Layer Summary
- Total PPIs:
- Hub genes:
- Network modules:
**Sources**: (tools used)
---
## 5. Pathway & Network Layer
### 5.1 Enriched Pathways (Enrichr/Reactome)
| Pathway | Database | P-value | Genes | Source |
|---------|----------|---------|-------|--------|
### 5.2 Reactome Pathway Details
| Pathway ID | Name | Genes Involved | Source |
|------------|------|----------------|--------|
### 5.3 KEGG Pathways
| Pathway ID | Name | Description | Source |
|------------|------|-------------|--------|
### 5.4 WikiPathways
| Pathway ID | Name | Organism | Source |
|------------|------|----------|--------|
### Pathway Layer Summary
- Top enriched pathways:
- Key pathway nodes:
- Cross-pathway connections:
**Sources**: (tools used)
---
## 6. Gene Ontology & Functional Annotation
### 6.1 Biological Processes
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|
### 6.2 Molecular Functions
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|
### 6.3 Cellular Components
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|
**Sources**: (tools used)
---
## 7. Therapeutic Landscape
### 7.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Target | Phase | Source |
|------|-----------|-----------|--------|-------|--------|
### 7.2 Druggable Targets
| Gene | Tractability | Modality | Clinical Precedent | Source |
|------|-------------|----------|-------------------|--------|
### 7.3 Drug Repurposing Candidates
| Drug | Original Indication | Mechanism | Target | Source |
|------|---------------------|-----------|--------|--------|
### 7.4 Clinical Trials
| NCT ID | Title | Phase | Status | Intervention | Source |
|--------|-------|-------|--------|--------------|--------|
### Therapeutic Summary
- Approved drugs:
- Clinical pipeline:
- Novel targets:
**Sources**: (tools used)
---
## 8. Multi-Omics Integration
### 8.1 Cross-Layer Gene Concordance
| Gene | Genomics | Transcriptomics | Proteomics | Pathways | Layers | Evidence Tier |
|------|----------|-----------------|------------|----------|--------|---------------|
### 8.2 Multi-Omics Hub Genes (Top 20)
| Rank | Gene | Layers Found | Key Evidence | Druggable | Source |
|------|------|-------------|--------------|-----------|--------|
### 8.3 Biomarker Candidates
| Biomarker | Type | Evidence Layers | Confidence | Source |
|-----------|------|-----------------|------------|--------|
### 8.4 Mechanistic Hypotheses
1. (Hypothesis with supporting evidence from multiple layers)
2. ...
### 8.5 Systems-Level Insights
- Key disrupted processes:
- Critical pathway nodes:
- Therapeutic intervention points:
- Testable hypotheses:
---
## Multi-Omics Confidence Score
| Component | Points | Max | Details |
|-----------|--------|-----|---------|
| Genomics data | | 10 | |
| Transcriptomics data | | 10 | |
| Protein data | | 5 | |
| Pathway data | | 10 | |
| Clinical data | | 5 | |
| Multi-layer genes | | 20 | |
| Direction concordance | | 10 | |
| Pathway-gene concordance | | 10 | |
| Genetic evidence quality | | 10 | |
| Clinical validation | | 10 | |
| **TOTAL** | | **100** | |
**Score**: XX/100 - [Tier]
---
## Data Availability Checklist
| Omics Layer | Data Available | Tools Used | Findings |
|-------------|---------------|------------|----------|
| Genomics (GWAS) | Yes/No | | |
| Genomics (Rare Variants) | Yes/No | | |
| Transcriptomics (DEGs) | Yes/No | | |
| Transcriptomics (Expression) | Yes/No | | |
| Proteomics (PPI) | Yes/No | | |
| Proteomics (Expression) | Yes/No | | |
| Pathways (Enrichment) | Yes/No | | |
| Pathways (KEGG/Reactome) | Yes/No | | |
| Gene Ontology | Yes/No | | |
| Drugs/Therapeutics | Yes/No | | |
| Clinical Trials | Yes/No | | |
| Literature | Yes/No | | |
---
## Completeness Checklist
- [ ] Disease disambiguation complete (IDs resolved)
- [ ] Genomics layer analyzed (GWAS + variants)
- [ ] Transcriptomics layer analyzed (DEGs + expression)
- [ ] Proteomics layer analyzed (PPI + interactions)
- [ ] Pathway layer analyzed (enrichment + mapping)
- [ ] Gene Ontology analyzed (BP + MF + CC)
- [ ] Therapeutic landscape analyzed (drugs + targets + trials)
- [ ] Cross-layer integration complete (concordance analysis)
- [ ] Multi-Omics Confidence Score calculated
- [ ] Biomarker candidates identified
- [ ] Hub genes identified
- [ ] Mechanistic hypotheses generated
- [ ] Executive summary written
- [ ] All sections have source citations
---
## References
### Data Sources Used
| # | Tool | Parameters | Section | Items Retrieved |
|---|------|------------|---------|-----------------|
### Database Versions
- OpenTargets: (current)
- GWAS Catalog: (current)
- STRING: (current)
- Reactome: (current)
Phase 0: Disease Disambiguation (ALWAYS FIRST)
Objective: Resolve disease to standard identifiers for all downstream queries.
Tools Used
OpenTargets_get_disease_id_description_by_name (primary):
- Input:
diseaseName(string) - Disease name - Output:
{data: {search: {hits: [{id, name, description}]}}} - Use: Get MONDO/EFO IDs and description
- CRITICAL: Disease IDs from OpenTargets use underscore format (e.g.,
MONDO_0004975), NOT colon format
OSL_get_efo_id_by_disease_name (secondary):
- Input:
disease(string) - Disease name - Output:
{efo_id, name} - Use: Get EFO/MONDO ID
OpenTargets_get_disease_description_by_efoId:
- Input:
efoId(string) - Disease ID (e.g.,MONDO_0004975) - Output:
{data: {disease: {id, name, description, dbXRefs}}} - Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)
OpenTargets_get_disease_synonyms_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, synonyms: [{relation, terms}]}}}
OpenTargets_get_disease_therapeutic_areas_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}
OpenTargets_get_disease_ancestors_parents_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, ancestors: [{id, name}]}}}
OpenTargets_get_disease_descendants_children_by_efoId:
- Input:
efoId(string) - Output:
{data: {disease: {id, name, descendants: [{id, name}]}}}
OpenTargets_map_any_disease_id_to_all_other_ids:
- Input:
inputId(string) - Any known disease ID (e.g.,OMIM:104300,UMLS:C0002395) - Output:
{data: {disease: {id, name, dbXRefs: [str], ...}}} - Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.
Workflow
- Search by disease name to get primary ID (OpenTargets)
- Get full description and cross-references
- Get synonyms for search term expansion
- Get therapeutic areas for context
- Get disease hierarchy (parents/children)
- If user provided OMIM/other ID, map to MONDO/EFO first
Collision-Aware Search
When disease name returns multiple hits:
- Check if user's input matches any hit exactly
- If ambiguous, present top 3-5 options and ask user to select
- Always prefer the most specific disease (not parent categories)
- For cancer, prefer the specific tumor type over generic "cancer"
Key Disease IDs to Track
After disambiguation, store these for all downstream queries:
efo_id- Primary ID for OpenTargets queries (e.g.,MONDO_0004975)disease_name- Canonical name (e.g.,Alzheimer disease)synonyms- For literature search expansiontherapeutic_areas- For contextdbXRefs- Cross-references (OMIM, UMLS, DOID, etc.)
Phase 1: Genomics Layer
Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.
Tools Used
OpenTargets_get_associated_targets_by_disease_efoId (primary):
- Input:
efoId(string) - Disease EFO/MONDO ID - Output:
{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}} - Use: Get ALL disease-associated genes ranked by overall evidence score
- NOTE: Returns top 25 by default. For comprehensive analysis, note the total
count
OpenTargets_get_evidence_by_datasource:
- Input:
efoId(string),ensemblId(string), optionaldatasourceIds(array),size(int, default 50) - Output:
{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}} - Use: Get specific evidence types. Key datasourceIds for genomics:
['ot_genetics_portal']- GWAS/genetics['gene2phenotype', 'genomics_england', 'orphanet']- Rare variants['eva']- ClinVar variants
gwas_search_associations (GWAS Catalog):
- Input:
disease_trait(string),size(int, default 20) - Output:
{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}} - Use: Get genome-wide significant associations
- NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results
gwas_get_studies_for_trait:
- Input:
disease_trait(string),size(int) - Output:
{data: [...studies], metadata: {pagination}} - NOTE: May return empty if trait name does not match exactly. Try synonyms
gwas_get_variants_for_trait:
- Input:
disease_trait(string),size(int) - Output:
{data: [...variants], metadata: {pagination}}
GWAS_search_associations_by_gene:
- Input:
gene_name(string) - Output: Associations for a specific gene
OpenTargets_search_gwas_studies_by_disease:
- Input:
diseaseIds(array of strings),enableIndirect(bool, default true),size(int, default 10) - Output:
{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}} - Use: Get GWAS studies from OpenTargets genetics portal
clinvar_search_variants:
- Input:
condition(string) orgene(string), optionalmax_results(int) - Output: List of ClinVar variants with clinical significance
- Use: Rare variant / monogenic disease evidence
Workflow
- Get associated genes from OpenTargets (overall scores)
- For top 10-15 genes, get genetic evidence specifically via
OpenTargets_get_evidence_by_datasource - Search GWAS Catalog for associations
- Search OpenTargets GWAS studies
- Search ClinVar for rare variants
- For top GWAS genes, check
GWAS_search_associations_by_gene
Gene Tracking
Maintain a dictionary of genes found in genomics layer:
genomics_genes = {
'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
# ...
}
Phase 2: Transcriptomics Layer
Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
Tools Used
ExpressionAtlas_search_differential:
- Input: optional
gene(string),condition(string),species(string, default 'homo sapiens') - Output: Differential expression studies and results
- Use: Find studies where genes are differentially expressed in disease
ExpressionAtlas_search_experiments:
- Input: optional
gene(string),condition(string),species(string) - Output: Expression experiments relevant to condition
- Use: Find all Expression Atlas experiments for the disease
expression_atlas_disease_target_score:
- Input:
efoId(string),pageSize(int, required) - Output: Genes scored by expression evidence for the disease
- Use: Get expression-based disease-gene association scores
europepmc_disease_target_score:
- Input:
efoId(string),pageSize(int, required) - Output: Genes scored by literature evidence for the disease
- Use: Complement expression evidence with literature-mined associations
HPA_get_rna_expression_by_source (Human Protein Atlas):
- Input:
gene_name(string),source_type(string: 'tissue', 'blood', 'brain'),source_name(string: e.g., 'brain', 'liver') - Output:
{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}} - NOTE: ALL 3 params required.
source_typeoptions: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'
HPA_get_rna_expression_in_specific_tissues:
- Input:
gene_name(string),tissues(array of strings) - Output: Expression across specified tissues
HPA_get_cancer_prognostics_by_gene:
- Input:
gene_name(string) - Output: Cancer prognostic data (if cancer context)
HPA_get_subcellular_location:
- Input:
gene_name(string) - Output: Subcellular localization data
HPA_search_genes_by_query:
- Input:
query(string) - Output: Matching genes in HPA
Workflow
- Search Expression Atlas for differential expression studies
- Get expression-based disease scores
- Get literature-based disease scores (EuropePMC)
- For top 10-15 genes from genomics layer, check tissue expression via HPA
- Check disease-relevant tissue expression patterns
- For cancer: check prognostic biomarkers
Gene Tracking
Add transcriptomics genes to tracking:
transcriptomics_genes = {
'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
# ...
}
Phase 3: Proteomics & Interaction Layer
Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.
Tools Used
STRING_get_interaction_partners (primary PPI):
- Input:
protein_ids(array of strings - gene names work),species(int, default 9606),confidence_score(float, default 0.4),limit(int, default 20) - Output:
{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]} - Use: Get interaction partners for disease genes
- NOTE:
protein_idsis an array, NOT string. Gene symbols like['APOE']work
STRING_get_network:
- Input:
protein_ids(array),species(int),confidence_score(float) - Output: Network of interactions between input proteins
- Use: Build disease-specific PPI network
STRING_functional_enrichment:
- Input:
protein_ids(array),species(int) - Output: Functional enrichment results (GO, KEGG, etc.)
- Use: Functional characterization of disease gene set
STRING_ppi_enrichment:
- Input:
protein_ids(array),species(int) - Output: Statistical test for PPI enrichment (more interactions than expected)
- Use: Test if disease genes form a connected module
intact_get_interactions:
- Input:
identifier(string - UniProt ID or gene name) - Output: Molecular interaction data from IntAct
intact_search_interactions:
- Input:
query(string),first(int, default 0),max(int, default 25) - Output: Search results for interactions
HPA_get_protein_interactions_by_gene:
- Input:
gene_name(string) - Output:
{gene, interactions, interactor_count, interactors: [...]}
humanbase_ppi_analysis:
- Input:
gene_list(array),tissue(string),max_node(int),interaction(string),string_mode(bool) - Output: Tissue-specific PPI network
- NOTE: ALL params required.
interactionoptions: 'coexpression', 'interaction', 'coexpression_and_interaction'.string_mode: true/false
Workflow
- Take top 15-20 genes from genomics + transcriptomics layers
- Query STRING for interaction partners of each gene
- Build composite PPI network using STRING_get_network
- Test PPI enrichment (are genes more connected than random?)
- Get functional enrichment from STRING
- For disease-relevant tissue, get tissue-specific network (HumanBase)
- Identify hub genes (highest degree centrality)
- Check IntAct for experimentally validated interactions
Hub Gene Analysis
Calculate network centrality metrics:
- Degree: Number of interaction partners
- Betweenness: Number of shortest paths through node
- Hub score: Genes with degree > mean + 1 SD are hubs
Phase 4: Pathway & Network Layer
Objective: Identify enriched biological pathways and cross-pathway connections.
Tools Used
enrichr_gene_enrichment_analysis (primary enrichment):
- Input:
gene_list(array of gene symbols, min 2),libs(array of library names) - Output:
{status: 'success', data: '{...JSON string with enrichment results...}'} - Key libraries:
['KEGG_2021_Human'],['Reactome_2022'],['WikiPathway_2023_Human'],['GO_Biological_Process_2023'],['GO_Molecular_Function_2023'],['GO_Cellular_Component_2023'] - NOTE:
datafield is a JSON string, needs parsing. Containsconnected_pathsand per-library results - NOTE:
libsis REQUIRED as array
ReactomeAnalysis_pathway_enrichment:
- Input:
identifiers(string - space-separated gene list), optionalpage_size(int, default 20),include_disease(bool),projection(bool) - Output:
{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}} - Use: Reactome-specific pathway enrichment with statistical testing
Reactome_map_uniprot_to_pathways:
- Input:
id(string - UniProt accession) - Output: List of Reactome pathways containing this protein
- Use: Map individual proteins to pathways
Reactome_get_pathway:
- Input:
stId(string - Reactome stable ID, e.g., 'R-HSA-73817') - Output: Pathway details
Reactome_get_pathway_reactions:
- Input:
stId(string) - Output: Reactions within pathway
kegg_search_pathway:
- Input:
keyword(string) - Output: Array of KEGG pathway matches
kegg_get_pathway_info:
- Input:
pathway_id(string, e.g., 'hsa04930') - Output: Detailed pathway information
WikiPathways_search:
- Input:
query(string), optionalorganism(string, e.g., 'Homo sapiens') - Output: Matching community-curated pathways
Workflow
- Collect all genes from genomics + transcriptomics layers (top 20-30)
- Run Enrichr enrichment for KEGG, Reactome, WikiPathways
- Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
- Search KEGG for disease-specific pathways
- Search WikiPathways for disease pathways
- For top Reactome pathways, get detailed reactions
- Identify cross-pathway connections (genes in multiple pathways)
Phase 5: Gene Ontology & Functional Annotation
Objective: Characterize biological processes, molecular functions, and cellular components.
Tools Used
enrichr_gene_enrichment_analysis (GO enrichment):
- Use with
libs=['GO_Biological_Process_2023']for BP - Use with
libs=['GO_Molecular_Function_2023']for MF - Use with
libs=['GO_Cellular_Component_2023']for CC
GO_get_annotations_for_gene:
- Input:
gene_id(string - gene symbol or UniProt ID) - Output: List of GO annotations with terms, aspects, evidence codes
GO_search_terms:
- Input:
query(string) - Output: Matching GO terms
QuickGO_annotations_by_gene:
- Input:
gene_product_id(string - UniProt accession, e.g., 'UniProtKB:P02649'), optionalaspect(string: 'biological_process', 'molecular_function', 'cellular_component'),taxon_id(int: 9606),limit(int: 25) - Output: GO annotations with evidence codes
OpenTargets_get_target_gene_ontology_by_ensemblID:
- Input:
ensemblId(string) - Output: GO terms associated with target
Workflow
- Run Enrichr GO enrichment for all 3 aspects using combined gene list
- For top 5 genes, get detailed GO annotations from QuickGO
- For top genes, get OpenTargets GO terms
- Summarize key biological processes, molecular functions, cellular components
Phase 6: Therapeutic Landscape
Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
Tools Used
OpenTargets_get_associated_drugs_by_disease_efoId (primary):
- Input:
efoId(string),size(int, REQUIRED - use 100) - Output:
{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}} - Use: All drugs associated with disease (approved + investigational)
OpenTargets_get_target_tractability_by_ensemblID:
- Input:
ensemblId(string) - Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)
OpenTargets_get_associated_drugs_by_target_ensemblID:
- Input:
ensemblId(string),size(int, REQUIRED) - Output: Drugs targeting this gene/protein
search_clinical_trials:
- Input:
query_term(string, REQUIRED), optionalcondition(string),intervention(string),pageSize(int, default 10) - Output: Clinical trial results
- NOTE:
query_termis REQUIRED even ifconditionis provided
OpenTargets_get_drug_mechanisms_of_action_by_chemblId:
- Input:
chemblId(string) - Output: Mechanism of action details
Workflow
- Get all drugs for disease from OpenTargets
- For top disease-associated genes, check tractability
- For top genes with no approved drugs, identify repurposing candidates
- Search clinical trials for disease
- For top approved drugs, get mechanism of action
Drug Tracking
drug_targets = {
'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
# ...
}
Phase 7: Multi-Omics Integration
Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.
Cross-Layer Gene Concordance Analysis
This is the core integrative step. For each gene found in the analysis:
-
Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
-
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
-
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation
Biomarker Identification
For each multi-omics hub gene, assess biomarker potential:
- Diagnostic: Gene expression distinguishes disease vs healthy
- Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
- Predictive: Variant/expression predicts treatment response (pharmacogenomics)
- Evidence level: Number of supporting omics layers
Mechanistic Hypothesis Generation
From the integrated data:
- Identify the most supported biological processes (GO + pathways)
- Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
- Identify intervention points (druggable nodes in the causal chain)
- Generate testable hypotheses
Confidence Score Calculation
Calculate the Multi-Omics Confidence Score (0-100) based on:
- Data availability across layers
- Cross-layer concordance
- Evidence quality
- Clinical validation
Phase 8: Report Finalization
Executive Summary
Write a 2-3 sentence synthesis covering:
- Disease mechanism in systems terms
- Key genes/pathways identified
- Therapeutic opportunities
Final Report Quality Checklist
Before presenting to user, verify:
- All 8 sections have content (or marked as "No data available")
- Every data point has a source citation
- Executive summary reflects key findings
- Multi-Omics Confidence Score calculated
- Top 20 genes ranked by multi-omics evidence
- Top 10 enriched pathways listed
- Biomarker candidates identified
- Cross-layer concordance table complete
- Therapeutic opportunities summarized
- Mechanistic hypotheses generated
- Data Availability Checklist complete
- Completeness Checklist complete
- References section lists all tools used
Tool Parameter Quick Reference
| Tool | Key Parameters | Notes |
|---|---|---|
OpenTargets_get_disease_id_description_by_name |
diseaseName |
Primary disambiguation |
OSL_get_efo_id_by_disease_name |
disease |
Secondary disambiguation |
OpenTargets_get_associated_targets_by_disease_efoId |
efoId |
Returns top 25 genes |
OpenTargets_get_evidence_by_datasource |
efoId, ensemblId, datasourceIds[], size |
Per-gene evidence |
OpenTargets_search_gwas_studies_by_disease |
diseaseIds[], size |
GWAS studies |
gwas_search_associations |
disease_trait, size |
GWAS Catalog |
clinvar_search_variants |
condition or gene, max_results |
Rare variants |
ExpressionAtlas_search_differential |
condition, species |
DEGs |
expression_atlas_disease_target_score |
efoId, pageSize (REQUIRED) |
Expression scores |
europepmc_disease_target_score |
efoId, pageSize (REQUIRED) |
Literature scores |
HPA_get_rna_expression_by_source |
gene_name, source_type, source_name (ALL REQUIRED) |
Tissue expression |
STRING_get_interaction_partners |
protein_ids[], species (9606), limit |
PPI partners |
STRING_get_network |
protein_ids[], species |
PPI network |
STRING_functional_enrichment |
protein_ids[], species |
Functional enrichment |
STRING_ppi_enrichment |
protein_ids[], species |
Network significance |
intact_search_interactions |
query, max |
Experimental PPIs |
humanbase_ppi_analysis |
gene_list[], tissue, max_node, interaction, string_mode (ALL REQ) |
Tissue PPI |
enrichr_gene_enrichment_analysis |
gene_list[], libs[] (BOTH REQUIRED) |
Pathway/GO enrichment |
ReactomeAnalysis_pathway_enrichment |
identifiers (space-sep string) |
Reactome enrichment |
Reactome_map_uniprot_to_pathways |
id (UniProt accession) |
Protein-pathway mapping |
kegg_search_pathway |
keyword |
KEGG pathway search |
WikiPathways_search |
query, organism |
WikiPathways search |
GO_get_annotations_for_gene |
gene_id |
GO annotations |
QuickGO_annotations_by_gene |
gene_product_id (e.g., 'UniProtKB:P02649') |
Detailed GO |
OpenTargets_get_associated_drugs_by_disease_efoId |
efoId, size (REQUIRED) |
Disease drugs |
OpenTargets_get_target_tractability_by_ensemblID |
ensemblId |
Druggability |
search_clinical_trials |
query_term (REQUIRED), condition, pageSize |
Clinical trials |
PubMed_search_articles |
query, limit |
Literature |
ensembl_lookup_gene |
gene_id, species ('homo_sapiens' REQUIRED) |
Gene lookup |
MyGene_query_genes |
query, species, fields, size |
Gene info |
OpenTargets_get_similar_entities_by_disease_efoId |
efoId, threshold, size (ALL REQUIRED) |
Similar diseases |
Response Format Notes (Verified)
OpenTargets Associated Targets
{
"data": {
"disease": {
"id": "MONDO_0004975",
"name": "Alzheimer disease",
"associatedTargets": {
"count": 2456,
"rows": [
{
"target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
"score": 0.87
}
]
}
}
}
}
GWAS Catalog Associations
{
"data": [
{
"association_id": 216440893,
"p_value": 2e-09,
"or_per_copy_num": 0.94,
"or_value": "0.94",
"efo_traits": [{"..."}],
"risk_frequency": "NR"
}
],
"metadata": {"pagination": {"totalElements": 1061816}}
}
STRING Interactions
{
"status": "success",
"data": [
{
"stringId_A": "9606.ENSP00000252486",
"stringId_B": "9606.ENSP00000466775",
"preferredName_A": "APOE",
"preferredName_B": "APOC2",
"score": 0.999
}
]
}
Reactome Enrichment
{
"data": {
"token": "...",
"pathways_found": 154,
"pathways": [
{
"pathway_id": "R-HSA-1251985",
"name": "Nuclear signaling by ERBB4",
"species": "Homo sapiens",
"is_disease": false,
"is_lowest_level": true,
"entities_found": 3,
"entities_total": 47,
"entities_ratio": 0.00291,
"p_value": 4.0e-06,
"fdr": 0.00068,
"reactions_found": 3,
"reactions_total": 34
}
]
}
}
HPA RNA Expression
{
"status": "success",
"data": {
"gene_name": "APOE",
"source_type": "tissue",
"source_name": "brain",
"expression_value": "2714.9",
"expression_level": "very high",
"expression_unit": "nTPM"
}
}
Enrichr Results
{
"status": "success",
"data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}
NOTE: The data field is a JSON string that needs parsing.
Common Use Patterns
1. Comprehensive Disease Profiling
User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report
2. Therapeutic Target Discovery
User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent
3. Biomarker Identification
User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential
4. Mechanism Elucidation
User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections
5. Drug Repurposing
User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes
6. Systems Biology
User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules
Edge Case Handling
Rare Diseases (limited data)
- Genomics layer may dominate (single gene)
- Limited GWAS data (monogenic)
- Focus on ClinVar variants, pathway consequences
- Confidence score will be lower (less cross-layer data)
Common Diseases (overwhelming data)
- Thousands of GWAS associations
- Prioritize by effect size and significance
- Focus on top 20-30 genes for downstream analysis
- Use strict significance thresholds (p < 5e-8)
Cancer
- Include somatic mutations (if CIViC/cBioPortal available)
- Check cancer prognostics via HPA
- Include tumor-specific expression patterns
- Clinical trial landscape may be extensive
Monogenic Diseases
- Single gene dominates
- ClinVar/OMIM evidence is primary
- Pathway analysis reveals downstream effects
- Therapeutic landscape may be limited (gene therapy, enzyme replacement)
Polygenic Diseases
- Many weak genetic signals
- GWAS provides the gene list
- Pathway enrichment reveals convergent biology
- Network analysis identifies hub genes
Tissue Ambiguity
- Diseases affecting multiple tissues
- Query HPA for all relevant tissues
- Compare tissue-specific expression patterns
- Use tissue context from disease ontology
Fallback Strategies
If disease name not found
- Try synonyms
- Try broader disease category
- Try OMIM/UMLS ID mapping
- Report disambiguation failure and ask user
If no GWAS data
- Check ClinVar for rare variants
- Use OpenTargets genetic evidence
- Note in report as "Limited genetic data"
- Adjust confidence score accordingly
If no expression data
- Try different disease name/synonym
- Check HPA for individual gene expression
- Use OpenTargets expression evidence
- Note as "Limited transcriptomics data"
If no pathway enrichment
- Reduce gene list stringency
- Try different pathway databases
- Map individual genes to pathways via Reactome
- Note as "No significant pathway enrichment"
If no drugs found
- Check if disease is rare/orphan
- Look for drugs targeting individual genes
- Check clinical trials for investigational therapies
- Note as "No approved drugs - novel therapeutic opportunity"
More from wu-yc/labclaw
rowan
Cloud-based quantum chemistry platform with Python API. Preferred for computational chemistry workflows including pKa prediction, geometry optimization, conformer searching, molecular property calculations, protein-ligand docking (AutoDock Vina), and AI protein cofolding (Chai-1, Boltz-1/2). Use when tasks involve quantum chemistry calculations, molecular property prediction, DFT or semiempirical methods, neural network potentials (AIMNet2), protein-ligand binding predictions, or automated computational chemistry pipelines. Provides cloud compute resources with no local setup required.
18tooluniverse-chemical-safety
Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.
18tooluniverse-drug-repurposing
Identify drug repurposing candidates using ToolUniverse for target-based, compound-based, and disease-driven strategies. Searches existing drugs for new therapeutic indications by analyzing targets, bioactivity, safety profiles, and literature evidence. Use when exploring drug repurposing opportunities, finding new indications for approved drugs, or when users mention drug repositioning, off-label uses, or therapeutic alternatives.
18rdkit
Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms.
17tooluniverse-clinical-guidelines
Search and retrieve clinical practice guidelines across 12+ authoritative sources including NICE, WHO, ADA, AHA/ACC, NCCN, SIGN, CPIC, CMA, CTFPHC, GIN, MAGICapp, PubMed, EuropePMC, TRIP, and OpenAlex. Covers disease management, cardiology, oncology, diabetes, pharmacogenomics, and more. Use when users ask about clinical guidelines, treatment recommendations, standard of care, evidence-based medicine, or drug-gene dosing recommendations.
17tooluniverse-protein-therapeutic-design
Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.
17