tooluniverse-multiomic-disease-characterization

Originally frommims-harvard/tooluniverse

Installation

SKILL.md

Multi-Omics Disease Characterization Pipeline

Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.

KEY PRINCIPLES:

Report-first approach - Create report file FIRST, then populate progressively
Disease disambiguation FIRST - Resolve all identifiers before omics analysis
Layer-by-layer analysis - Systematically cover all omics layers
Cross-layer integration - Identify genes/targets appearing in multiple layers
Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
Tissue context - Emphasize disease-relevant tissues/organs
Quantitative scoring - Multi-Omics Confidence Score (0-100)
Druggable focus - Prioritize targets with therapeutic potential
Biomarker identification - Highlight diagnostic/prognostic markers
Mechanistic synthesis - Generate testable hypotheses
Source references - Every statement must cite tool/database
Completeness checklist - Mandatory section showing analysis coverage
English-first queries - Always use English terms in tool calls. Respond in user's language

When to Use This Skill

Apply when users:

Ask about disease mechanisms across omics layers
Need multi-omics characterization of a disease
Want to understand disease at the systems biology level
Ask "What pathways/genes/proteins are involved in [disease]?"
Need biomarker discovery for a disease
Want to identify druggable targets from disease profiling
Ask for integrated genomics + transcriptomics + proteomics analysis
Need cross-layer concordance analysis
Ask about disease network biology / hub genes

NOT for (use other skills instead):

Single gene/target validation -> Use tooluniverse-drug-target-validation
Drug safety profiling -> Use tooluniverse-adverse-event-detection
General disease overview -> Use tooluniverse-disease-research
Variant interpretation -> Use tooluniverse-variant-interpretation
GWAS-specific analysis -> Use tooluniverse-gwas-* skills
Pathway-only analysis -> Use tooluniverse-systems-biology

Input Parameters

Parameter	Required	Description	Example
disease	Yes	Disease name, OMIM ID, EFO ID, or MONDO ID	`Alzheimer disease`, `MONDO_0004975`
tissue	No	Tissue/organ of interest	`brain`, `liver`, `blood`
focus_layers	No	Specific omics layers to emphasize	`genomics`, `transcriptomics`, `pathways`

Multi-Omics Confidence Score (0-100)

Score Components

Data Availability (0-40 points):

Genomics data available (GWAS or rare variants): 10 points
Transcriptomics data available (DEGs or expression): 10 points
Protein data available (PPI or expression): 5 points
Pathway data available (enriched pathways): 10 points
Clinical/drug data available (approved drugs or trials): 5 points

Evidence Concordance (0-40 points):

Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes)
Consistent direction (genetics + expression concordant): 10 points
Pathway-gene concordance (genes found in enriched pathways): 10 points

Evidence Quality (0-20 points):

Strong genetic evidence (GWAS p < 5e-8): 10 points
Clinical validation (approved drugs): 10 points

Score Interpretation

Score	Tier	Interpretation
80-100	Excellent	Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance
60-79	Good	Good coverage across most layers, some gaps
40-59	Moderate	Moderate coverage, limited cross-layer integration
0-39	Limited	Limited data, single-layer analysis dominates

Evidence Grading System

Tier	Symbol	Criteria	Examples
T1	[T1]	Direct human evidence, clinical proof	FDA-approved drug, GWAS hit (p<5e-8), clinical trial result
T2	[T2]	Experimental evidence	Differential expression (validated), functional screen, mouse KO
T3	[T3]	Computational/database evidence	PPI network, pathway mapping, expression correlation
T4	[T4]	Annotation/prediction only	GO annotation, text-mined association, predicted interaction

Report Template

Create this file structure at the start: {disease_name}_multiomic_report.md

# Multi-Omics Disease Characterization: {Disease Name}

**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
**Multi-Omics Confidence Score**: (to be calculated)

---

## Executive Summary

(2-3 sentence disease mechanism synthesis - fill after all layers complete)

---

## 1. Disease Definition & Context

### Disease Identifiers
| System | ID | Source |
|--------|-----|--------|

### Description
### Synonyms
### Disease Hierarchy (parents/children)
### Affected Tissues/Organs
### Therapeutic Areas

**Sources**: (tools used)

---

## 2. Genomics Layer

### 2.1 GWAS Associations
| SNP | P-value | Effect | Gene | Study | Source |
|-----|---------|--------|------|-------|--------|

### 2.2 GWAS Studies Summary
| Study ID | Trait | Sample Size | Year | Source |
|----------|-------|-------------|------|--------|

### 2.3 Associated Genes (Genetic Evidence)
| Gene | Ensembl ID | Association Score | Evidence Type | Source |
|------|------------|-------------------|---------------|--------|

### 2.4 Rare Variants (ClinVar)
| Variant | Gene | Clinical Significance | Source |
|---------|------|-----------------------|--------|

### Genomics Layer Summary
- Total GWAS hits:
- Top genes by genetic evidence:
- Genetic architecture:

**Sources**: (tools used)

---

## 3. Transcriptomics Layer

### 3.1 Differential Expression Studies
| Experiment | Condition | Up-regulated | Down-regulated | Source |
|------------|-----------|--------------|----------------|--------|

### 3.2 Expression Atlas Disease Evidence
| Gene | Score | Source |
|------|-------|--------|

### 3.3 Tissue Expression Patterns (GTEx/HPA)
| Gene | Tissue | Expression Level | Source |
|------|--------|-----------------|--------|

### 3.4 Biomarker Candidates (Expression-Based)
| Gene | Tissue Specificity | Fold Change | Evidence | Source |
|------|-------------------|-------------|----------|--------|

### Transcriptomics Layer Summary
- Differential expression datasets:
- Top DEGs:
- Tissue-specific patterns:

**Sources**: (tools used)

---

## 4. Proteomics & Interaction Layer

### 4.1 Protein-Protein Interactions (STRING)
| Protein A | Protein B | Score | Source |
|-----------|-----------|-------|--------|

### 4.2 Hub Genes (Network Centrality)
| Gene | Degree | Betweenness | Role | Source |
|------|--------|-------------|------|--------|

### 4.3 Protein Complexes (IntAct)
| Complex | Members | Function | Source |
|---------|---------|----------|--------|

### 4.4 Tissue-Specific PPI Network
| Gene | Interaction Score | Tissue | Source |
|------|-------------------|--------|--------|

### Proteomics Layer Summary
- Total PPIs:
- Hub genes:
- Network modules:

**Sources**: (tools used)

---

## 5. Pathway & Network Layer

### 5.1 Enriched Pathways (Enrichr/Reactome)
| Pathway | Database | P-value | Genes | Source |
|---------|----------|---------|-------|--------|

### 5.2 Reactome Pathway Details
| Pathway ID | Name | Genes Involved | Source |
|------------|------|----------------|--------|

### 5.3 KEGG Pathways
| Pathway ID | Name | Description | Source |
|------------|------|-------------|--------|

### 5.4 WikiPathways
| Pathway ID | Name | Organism | Source |
|------------|------|----------|--------|

### Pathway Layer Summary
- Top enriched pathways:
- Key pathway nodes:
- Cross-pathway connections:

**Sources**: (tools used)

---

## 6. Gene Ontology & Functional Annotation

### 6.1 Biological Processes
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|

### 6.2 Molecular Functions
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|

### 6.3 Cellular Components
| GO Term | Name | P-value | Genes | Source |
|---------|------|---------|-------|--------|

**Sources**: (tools used)

---

## 7. Therapeutic Landscape

### 7.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Target | Phase | Source |
|------|-----------|-----------|--------|-------|--------|

### 7.2 Druggable Targets
| Gene | Tractability | Modality | Clinical Precedent | Source |
|------|-------------|----------|-------------------|--------|

### 7.3 Drug Repurposing Candidates
| Drug | Original Indication | Mechanism | Target | Source |
|------|---------------------|-----------|--------|--------|

### 7.4 Clinical Trials
| NCT ID | Title | Phase | Status | Intervention | Source |
|--------|-------|-------|--------|--------------|--------|

### Therapeutic Summary
- Approved drugs:
- Clinical pipeline:
- Novel targets:

**Sources**: (tools used)

---

## 8. Multi-Omics Integration

### 8.1 Cross-Layer Gene Concordance
| Gene | Genomics | Transcriptomics | Proteomics | Pathways | Layers | Evidence Tier |
|------|----------|-----------------|------------|----------|--------|---------------|

### 8.2 Multi-Omics Hub Genes (Top 20)
| Rank | Gene | Layers Found | Key Evidence | Druggable | Source |
|------|------|-------------|--------------|-----------|--------|

### 8.3 Biomarker Candidates
| Biomarker | Type | Evidence Layers | Confidence | Source |
|-----------|------|-----------------|------------|--------|

### 8.4 Mechanistic Hypotheses
1. (Hypothesis with supporting evidence from multiple layers)
2. ...

### 8.5 Systems-Level Insights
- Key disrupted processes:
- Critical pathway nodes:
- Therapeutic intervention points:
- Testable hypotheses:

---

## Multi-Omics Confidence Score

| Component | Points | Max | Details |
|-----------|--------|-----|---------|
| Genomics data | | 10 | |
| Transcriptomics data | | 10 | |
| Protein data | | 5 | |
| Pathway data | | 10 | |
| Clinical data | | 5 | |
| Multi-layer genes | | 20 | |
| Direction concordance | | 10 | |
| Pathway-gene concordance | | 10 | |
| Genetic evidence quality | | 10 | |
| Clinical validation | | 10 | |
| **TOTAL** | | **100** | |

**Score**: XX/100 - [Tier]

---

## Data Availability Checklist

| Omics Layer | Data Available | Tools Used | Findings |
|-------------|---------------|------------|----------|
| Genomics (GWAS) | Yes/No | | |
| Genomics (Rare Variants) | Yes/No | | |
| Transcriptomics (DEGs) | Yes/No | | |
| Transcriptomics (Expression) | Yes/No | | |
| Proteomics (PPI) | Yes/No | | |
| Proteomics (Expression) | Yes/No | | |
| Pathways (Enrichment) | Yes/No | | |
| Pathways (KEGG/Reactome) | Yes/No | | |
| Gene Ontology | Yes/No | | |
| Drugs/Therapeutics | Yes/No | | |
| Clinical Trials | Yes/No | | |
| Literature | Yes/No | | |

---

## Completeness Checklist

- [ ] Disease disambiguation complete (IDs resolved)
- [ ] Genomics layer analyzed (GWAS + variants)
- [ ] Transcriptomics layer analyzed (DEGs + expression)
- [ ] Proteomics layer analyzed (PPI + interactions)
- [ ] Pathway layer analyzed (enrichment + mapping)
- [ ] Gene Ontology analyzed (BP + MF + CC)
- [ ] Therapeutic landscape analyzed (drugs + targets + trials)
- [ ] Cross-layer integration complete (concordance analysis)
- [ ] Multi-Omics Confidence Score calculated
- [ ] Biomarker candidates identified
- [ ] Hub genes identified
- [ ] Mechanistic hypotheses generated
- [ ] Executive summary written
- [ ] All sections have source citations

---

## References

### Data Sources Used
| # | Tool | Parameters | Section | Items Retrieved |
|---|------|------------|---------|-----------------|

### Database Versions
- OpenTargets: (current)
- GWAS Catalog: (current)
- STRING: (current)
- Reactome: (current)

Phase 0: Disease Disambiguation (ALWAYS FIRST)

Objective: Resolve disease to standard identifiers for all downstream queries.

Tools Used

OpenTargets_get_disease_id_description_by_name (primary):

Input: diseaseName (string) - Disease name
Output: {data: {search: {hits: [{id, name, description}]}}}
Use: Get MONDO/EFO IDs and description
CRITICAL: Disease IDs from OpenTargets use underscore format (e.g., MONDO_0004975), NOT colon format

OSL_get_efo_id_by_disease_name (secondary):

Input: disease (string) - Disease name
Output: {efo_id, name}
Use: Get EFO/MONDO ID

OpenTargets_get_disease_description_by_efoId:

Input: efoId (string) - Disease ID (e.g., MONDO_0004975)
Output: {data: {disease: {id, name, description, dbXRefs}}}
Use: Get full description, cross-references (OMIM, UMLS, DOID, etc.)

OpenTargets_get_disease_synonyms_by_efoId:

Input: efoId (string)
Output: {data: {disease: {id, name, synonyms: [{relation, terms}]}}}

OpenTargets_get_disease_therapeutic_areas_by_efoId:

Input: efoId (string)
Output: {data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}

OpenTargets_get_disease_ancestors_parents_by_efoId:

Input: efoId (string)
Output: {data: {disease: {id, name, ancestors: [{id, name}]}}}

OpenTargets_get_disease_descendants_children_by_efoId:

Input: efoId (string)
Output: {data: {disease: {id, name, descendants: [{id, name}]}}}

OpenTargets_map_any_disease_id_to_all_other_ids:

Input: inputId (string) - Any known disease ID (e.g., OMIM:104300, UMLS:C0002395)
Output: {data: {disease: {id, name, dbXRefs: [str], ...}}}
Use: Cross-map between OMIM, UMLS, ICD10, DOID, etc.

Workflow

Search by disease name to get primary ID (OpenTargets)
Get full description and cross-references
Get synonyms for search term expansion
Get therapeutic areas for context
Get disease hierarchy (parents/children)
If user provided OMIM/other ID, map to MONDO/EFO first

Collision-Aware Search

When disease name returns multiple hits:

Check if user's input matches any hit exactly
If ambiguous, present top 3-5 options and ask user to select
Always prefer the most specific disease (not parent categories)
For cancer, prefer the specific tumor type over generic "cancer"

Key Disease IDs to Track

After disambiguation, store these for all downstream queries:

efo_id - Primary ID for OpenTargets queries (e.g., MONDO_0004975)
disease_name - Canonical name (e.g., Alzheimer disease)
synonyms - For literature search expansion
therapeutic_areas - For context
dbXRefs - Cross-references (OMIM, UMLS, DOID, etc.)

Phase 1: Genomics Layer

Objective: Identify genetic variants, GWAS associations, and genetically implicated genes.

Tools Used

OpenTargets_get_associated_targets_by_disease_efoId (primary):

Input: efoId (string) - Disease EFO/MONDO ID
Output: {data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}
Use: Get ALL disease-associated genes ranked by overall evidence score
NOTE: Returns top 25 by default. For comprehensive analysis, note the total count

OpenTargets_get_evidence_by_datasource:

Input: efoId (string), ensemblId (string), optional datasourceIds (array), size (int, default 50)
Output: {data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}
Use: Get specific evidence types. Key datasourceIds for genomics:
- ['ot_genetics_portal'] - GWAS/genetics
- ['gene2phenotype', 'genomics_england', 'orphanet'] - Rare variants
- ['eva'] - ClinVar variants

gwas_search_associations (GWAS Catalog):

Input: disease_trait (string), size (int, default 20)
Output: {data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}
Use: Get genome-wide significant associations
NOTE: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results

gwas_get_studies_for_trait:

Input: disease_trait (string), size (int)
Output: {data: [...studies], metadata: {pagination}}
NOTE: May return empty if trait name does not match exactly. Try synonyms

gwas_get_variants_for_trait:

Input: disease_trait (string), size (int)
Output: {data: [...variants], metadata: {pagination}}

GWAS_search_associations_by_gene:

Input: gene_name (string)
Output: Associations for a specific gene

OpenTargets_search_gwas_studies_by_disease:

Input: diseaseIds (array of strings), enableIndirect (bool, default true), size (int, default 10)
Output: {data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}
Use: Get GWAS studies from OpenTargets genetics portal

clinvar_search_variants:

Input: condition (string) or gene (string), optional max_results (int)
Output: List of ClinVar variants with clinical significance
Use: Rare variant / monogenic disease evidence

Workflow

Get associated genes from OpenTargets (overall scores)
For top 10-15 genes, get genetic evidence specifically via OpenTargets_get_evidence_by_datasource
Search GWAS Catalog for associations
Search OpenTargets GWAS studies
Search ClinVar for rare variants
For top GWAS genes, check GWAS_search_associations_by_gene

Gene Tracking

Maintain a dictionary of genes found in genomics layer:

genomics_genes = {
    'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'},
    'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'},
    # ...
}

Phase 2: Transcriptomics Layer

Objective: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.

Tools Used

ExpressionAtlas_search_differential:

Input: optional gene (string), condition (string), species (string, default 'homo sapiens')
Output: Differential expression studies and results
Use: Find studies where genes are differentially expressed in disease

ExpressionAtlas_search_experiments:

Input: optional gene (string), condition (string), species (string)
Output: Expression experiments relevant to condition
Use: Find all Expression Atlas experiments for the disease

expression_atlas_disease_target_score:

Input: efoId (string), pageSize (int, required)
Output: Genes scored by expression evidence for the disease
Use: Get expression-based disease-gene association scores

europepmc_disease_target_score:

Input: efoId (string), pageSize (int, required)
Output: Genes scored by literature evidence for the disease
Use: Complement expression evidence with literature-mined associations

HPA_get_rna_expression_by_source (Human Protein Atlas):

Input: gene_name (string), source_type (string: 'tissue', 'blood', 'brain'), source_name (string: e.g., 'brain', 'liver')
Output: {status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}
NOTE: ALL 3 params required. source_type options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell'

HPA_get_rna_expression_in_specific_tissues:

Input: gene_name (string), tissues (array of strings)
Output: Expression across specified tissues

HPA_get_cancer_prognostics_by_gene:

Input: gene_name (string)
Output: Cancer prognostic data (if cancer context)

HPA_get_subcellular_location:

Input: gene_name (string)
Output: Subcellular localization data

HPA_search_genes_by_query:

Input: query (string)
Output: Matching genes in HPA

Workflow

Search Expression Atlas for differential expression studies
Get expression-based disease scores
Get literature-based disease scores (EuropePMC)
For top 10-15 genes from genomics layer, check tissue expression via HPA
Check disease-relevant tissue expression patterns
For cancer: check prognostic biomarkers

Gene Tracking

Add transcriptomics genes to tracking:

transcriptomics_genes = {
    'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'},
    # ...
}

Phase 3: Proteomics & Interaction Layer

Objective: Map protein-protein interactions, identify hub genes, and characterize interaction networks.

Tools Used

STRING_get_interaction_partners (primary PPI):

Input: protein_ids (array of strings - gene names work), species (int, default 9606), confidence_score (float, default 0.4), limit (int, default 20)
Output: {status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}
Use: Get interaction partners for disease genes
NOTE: protein_ids is an array, NOT string. Gene symbols like ['APOE'] work

STRING_get_network:

Input: protein_ids (array), species (int), confidence_score (float)
Output: Network of interactions between input proteins
Use: Build disease-specific PPI network

STRING_functional_enrichment:

Input: protein_ids (array), species (int)
Output: Functional enrichment results (GO, KEGG, etc.)
Use: Functional characterization of disease gene set

STRING_ppi_enrichment:

Input: protein_ids (array), species (int)
Output: Statistical test for PPI enrichment (more interactions than expected)
Use: Test if disease genes form a connected module

intact_get_interactions:

Input: identifier (string - UniProt ID or gene name)
Output: Molecular interaction data from IntAct

intact_search_interactions:

Input: query (string), first (int, default 0), max (int, default 25)
Output: Search results for interactions

HPA_get_protein_interactions_by_gene:

Input: gene_name (string)
Output: {gene, interactions, interactor_count, interactors: [...]}

humanbase_ppi_analysis:

Input: gene_list (array), tissue (string), max_node (int), interaction (string), string_mode (bool)
Output: Tissue-specific PPI network
NOTE: ALL params required. interaction options: 'coexpression', 'interaction', 'coexpression_and_interaction'. string_mode: true/false

Workflow

Take top 15-20 genes from genomics + transcriptomics layers
Query STRING for interaction partners of each gene
Build composite PPI network using STRING_get_network
Test PPI enrichment (are genes more connected than random?)
Get functional enrichment from STRING
For disease-relevant tissue, get tissue-specific network (HumanBase)
Identify hub genes (highest degree centrality)
Check IntAct for experimentally validated interactions

Hub Gene Analysis

Calculate network centrality metrics:

Degree: Number of interaction partners
Betweenness: Number of shortest paths through node
Hub score: Genes with degree > mean + 1 SD are hubs

Phase 4: Pathway & Network Layer

Objective: Identify enriched biological pathways and cross-pathway connections.

Tools Used

enrichr_gene_enrichment_analysis (primary enrichment):

Input: gene_list (array of gene symbols, min 2), libs (array of library names)
Output: {status: 'success', data: '{...JSON string with enrichment results...}'}
Key libraries: ['KEGG_2021_Human'], ['Reactome_2022'], ['WikiPathway_2023_Human'], ['GO_Biological_Process_2023'], ['GO_Molecular_Function_2023'], ['GO_Cellular_Component_2023']
NOTE: data field is a JSON string, needs parsing. Contains connected_paths and per-library results
NOTE: libs is REQUIRED as array

ReactomeAnalysis_pathway_enrichment:

Input: identifiers (string - space-separated gene list), optional page_size (int, default 20), include_disease (bool), projection (bool)
Output: {data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}
Use: Reactome-specific pathway enrichment with statistical testing

Reactome_map_uniprot_to_pathways:

Input: id (string - UniProt accession)
Output: List of Reactome pathways containing this protein
Use: Map individual proteins to pathways

Reactome_get_pathway:

Input: stId (string - Reactome stable ID, e.g., 'R-HSA-73817')
Output: Pathway details

Reactome_get_pathway_reactions:

Input: stId (string)
Output: Reactions within pathway

kegg_search_pathway:

Input: keyword (string)
Output: Array of KEGG pathway matches

kegg_get_pathway_info:

Input: pathway_id (string, e.g., 'hsa04930')
Output: Detailed pathway information

WikiPathways_search:

Input: query (string), optional organism (string, e.g., 'Homo sapiens')
Output: Matching community-curated pathways

Workflow

Collect all genes from genomics + transcriptomics layers (top 20-30)
Run Enrichr enrichment for KEGG, Reactome, WikiPathways
Run ReactomeAnalysis for more detailed Reactome enrichment with p-values
Search KEGG for disease-specific pathways
Search WikiPathways for disease pathways
For top Reactome pathways, get detailed reactions
Identify cross-pathway connections (genes in multiple pathways)

Phase 5: Gene Ontology & Functional Annotation

Objective: Characterize biological processes, molecular functions, and cellular components.

Tools Used

enrichr_gene_enrichment_analysis (GO enrichment):

Use with libs=['GO_Biological_Process_2023'] for BP
Use with libs=['GO_Molecular_Function_2023'] for MF
Use with libs=['GO_Cellular_Component_2023'] for CC

GO_get_annotations_for_gene:

Input: gene_id (string - gene symbol or UniProt ID)
Output: List of GO annotations with terms, aspects, evidence codes

GO_search_terms:

Input: query (string)
Output: Matching GO terms

QuickGO_annotations_by_gene:

Input: gene_product_id (string - UniProt accession, e.g., 'UniProtKB:P02649'), optional aspect (string: 'biological_process', 'molecular_function', 'cellular_component'), taxon_id (int: 9606), limit (int: 25)
Output: GO annotations with evidence codes

OpenTargets_get_target_gene_ontology_by_ensemblID:

Input: ensemblId (string)
Output: GO terms associated with target

Workflow

Run Enrichr GO enrichment for all 3 aspects using combined gene list
For top 5 genes, get detailed GO annotations from QuickGO
For top genes, get OpenTargets GO terms
Summarize key biological processes, molecular functions, cellular components

Phase 6: Therapeutic Landscape

Objective: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.

Tools Used

OpenTargets_get_associated_drugs_by_disease_efoId (primary):

Input: efoId (string), size (int, REQUIRED - use 100)
Output: {data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}
Use: All drugs associated with disease (approved + investigational)

OpenTargets_get_target_tractability_by_ensemblID:

Input: ensemblId (string)
Output: Tractability assessment (small molecule, antibody, PROTAC, etc.)

OpenTargets_get_associated_drugs_by_target_ensemblID:

Input: ensemblId (string), size (int, REQUIRED)
Output: Drugs targeting this gene/protein

search_clinical_trials:

Input: query_term (string, REQUIRED), optional condition (string), intervention (string), pageSize (int, default 10)
Output: Clinical trial results
NOTE: query_term is REQUIRED even if condition is provided

OpenTargets_get_drug_mechanisms_of_action_by_chemblId:

Input: chemblId (string)
Output: Mechanism of action details

Workflow

Get all drugs for disease from OpenTargets
For top disease-associated genes, check tractability
For top genes with no approved drugs, identify repurposing candidates
Search clinical trials for disease
For top approved drugs, get mechanism of action

Drug Tracking

drug_targets = {
    'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3},
    'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4},
    # ...
}

Phase 7: Multi-Omics Integration

Objective: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses.

Cross-Layer Gene Concordance Analysis

This is the core integrative step. For each gene found in the analysis:

Count layers: In how many omics layers does this gene appear?
- Genomics (GWAS, rare variants, genetic association)
- Transcriptomics (DEGs, expression score)
- Proteomics (PPI hub, protein expression)
- Pathways (enriched pathway member)
- Therapeutics (drug target)
Score genes: Genes appearing in 3+ layers are "multi-omics hub genes"
Direction concordance: Do genetics and expression agree?
- Risk allele + upregulated = concordant gain-of-function
- Risk allele + downregulated = concordant loss-of-function
- Discordant = needs investigation

Biomarker Identification

For each multi-omics hub gene, assess biomarker potential:

Diagnostic: Gene expression distinguishes disease vs healthy
Prognostic: Expression/variant predicts outcome (cancer prognostics from HPA)
Predictive: Variant/expression predicts treatment response (pharmacogenomics)
Evidence level: Number of supporting omics layers

Mechanistic Hypothesis Generation

From the integrated data:

Identify the most supported biological processes (GO + pathways)
Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease
Identify intervention points (druggable nodes in the causal chain)
Generate testable hypotheses

Confidence Score Calculation

Calculate the Multi-Omics Confidence Score (0-100) based on:

Data availability across layers
Cross-layer concordance
Evidence quality
Clinical validation

Phase 8: Report Finalization

Executive Summary

Write a 2-3 sentence synthesis covering:

Disease mechanism in systems terms
Key genes/pathways identified
Therapeutic opportunities

Final Report Quality Checklist

Before presenting to user, verify:

Tool Parameter Quick Reference

Tool	Key Parameters	Notes
`OpenTargets_get_disease_id_description_by_name`	`diseaseName`	Primary disambiguation
`OSL_get_efo_id_by_disease_name`	`disease`	Secondary disambiguation
`OpenTargets_get_associated_targets_by_disease_efoId`	`efoId`	Returns top 25 genes
`OpenTargets_get_evidence_by_datasource`	`efoId`, `ensemblId`, `datasourceIds[]`, `size`	Per-gene evidence
`OpenTargets_search_gwas_studies_by_disease`	`diseaseIds[]`, `size`	GWAS studies
`gwas_search_associations`	`disease_trait`, `size`	GWAS Catalog
`clinvar_search_variants`	`condition` or `gene`, `max_results`	Rare variants
`ExpressionAtlas_search_differential`	`condition`, `species`	DEGs
`expression_atlas_disease_target_score`	`efoId`, `pageSize` (REQUIRED)	Expression scores
`europepmc_disease_target_score`	`efoId`, `pageSize` (REQUIRED)	Literature scores
`HPA_get_rna_expression_by_source`	`gene_name`, `source_type`, `source_name` (ALL REQUIRED)	Tissue expression
`STRING_get_interaction_partners`	`protein_ids[]`, `species` (9606), `limit`	PPI partners
`STRING_get_network`	`protein_ids[]`, `species`	PPI network
`STRING_functional_enrichment`	`protein_ids[]`, `species`	Functional enrichment
`STRING_ppi_enrichment`	`protein_ids[]`, `species`	Network significance
`intact_search_interactions`	`query`, `max`	Experimental PPIs
`humanbase_ppi_analysis`	`gene_list[]`, `tissue`, `max_node`, `interaction`, `string_mode` (ALL REQ)	Tissue PPI
`enrichr_gene_enrichment_analysis`	`gene_list[]`, `libs[]` (BOTH REQUIRED)	Pathway/GO enrichment
`ReactomeAnalysis_pathway_enrichment`	`identifiers` (space-sep string)	Reactome enrichment
`Reactome_map_uniprot_to_pathways`	`id` (UniProt accession)	Protein-pathway mapping
`kegg_search_pathway`	`keyword`	KEGG pathway search
`WikiPathways_search`	`query`, `organism`	WikiPathways search
`GO_get_annotations_for_gene`	`gene_id`	GO annotations
`QuickGO_annotations_by_gene`	`gene_product_id` (e.g., 'UniProtKB:P02649')	Detailed GO
`OpenTargets_get_associated_drugs_by_disease_efoId`	`efoId`, `size` (REQUIRED)	Disease drugs
`OpenTargets_get_target_tractability_by_ensemblID`	`ensemblId`	Druggability
`search_clinical_trials`	`query_term` (REQUIRED), `condition`, `pageSize`	Clinical trials
`PubMed_search_articles`	`query`, `limit`	Literature
`ensembl_lookup_gene`	`gene_id`, `species` ('homo_sapiens' REQUIRED)	Gene lookup
`MyGene_query_genes`	`query`, `species`, `fields`, `size`	Gene info
`OpenTargets_get_similar_entities_by_disease_efoId`	`efoId`, `threshold`, `size` (ALL REQUIRED)	Similar diseases

Response Format Notes (Verified)

OpenTargets Associated Targets

{
  "data": {
    "disease": {
      "id": "MONDO_0004975",
      "name": "Alzheimer disease",
      "associatedTargets": {
        "count": 2456,
        "rows": [
          {
            "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"},
            "score": 0.87
          }
        ]
      }
    }
  }
}

GWAS Catalog Associations

{
  "data": [
    {
      "association_id": 216440893,
      "p_value": 2e-09,
      "or_per_copy_num": 0.94,
      "or_value": "0.94",
      "efo_traits": [{"..."}],
      "risk_frequency": "NR"
    }
  ],
  "metadata": {"pagination": {"totalElements": 1061816}}
}

STRING Interactions

{
  "status": "success",
  "data": [
    {
      "stringId_A": "9606.ENSP00000252486",
      "stringId_B": "9606.ENSP00000466775",
      "preferredName_A": "APOE",
      "preferredName_B": "APOC2",
      "score": 0.999
    }
  ]
}

Reactome Enrichment

{
  "data": {
    "token": "...",
    "pathways_found": 154,
    "pathways": [
      {
        "pathway_id": "R-HSA-1251985",
        "name": "Nuclear signaling by ERBB4",
        "species": "Homo sapiens",
        "is_disease": false,
        "is_lowest_level": true,
        "entities_found": 3,
        "entities_total": 47,
        "entities_ratio": 0.00291,
        "p_value": 4.0e-06,
        "fdr": 0.00068,
        "reactions_found": 3,
        "reactions_total": 34
      }
    ]
  }
}

HPA RNA Expression

{
  "status": "success",
  "data": {
    "gene_name": "APOE",
    "source_type": "tissue",
    "source_name": "brain",
    "expression_value": "2714.9",
    "expression_level": "very high",
    "expression_unit": "nTPM"
  }
}

Enrichr Results

{
  "status": "success",
  "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}"
}

NOTE: The data field is a JSON string that needs parsing.

Common Use Patterns

1. Comprehensive Disease Profiling

User: "Characterize Alzheimer's disease across omics layers"
-> Run all 8 phases
-> Produce full multi-omics report

2. Therapeutic Target Discovery

User: "What are druggable targets for rheumatoid arthritis?"
-> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration)
-> Focus on tractability and clinical precedent

3. Biomarker Identification

User: "Find diagnostic biomarkers for pancreatic cancer"
-> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers)
-> Focus on tissue-specific expression and diagnostic potential

4. Mechanism Elucidation

User: "What pathways are dysregulated in Crohn's disease?"
-> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses)
-> Focus on pathway enrichment and cross-pathway connections

5. Drug Repurposing

User: "What existing drugs could be repurposed for ALS?"
-> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing)
-> Focus on drugs targeting disease-associated genes

6. Systems Biology

User: "What are the hub genes and key pathways in type 2 diabetes?"
-> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis)
-> Focus on hub genes and network modules

Edge Case Handling

Rare Diseases (limited data)

Genomics layer may dominate (single gene)
Limited GWAS data (monogenic)
Focus on ClinVar variants, pathway consequences
Confidence score will be lower (less cross-layer data)

Common Diseases (overwhelming data)

Thousands of GWAS associations
Prioritize by effect size and significance
Focus on top 20-30 genes for downstream analysis
Use strict significance thresholds (p < 5e-8)

Cancer

Include somatic mutations (if CIViC/cBioPortal available)
Check cancer prognostics via HPA
Include tumor-specific expression patterns
Clinical trial landscape may be extensive

Monogenic Diseases

Single gene dominates
ClinVar/OMIM evidence is primary
Pathway analysis reveals downstream effects
Therapeutic landscape may be limited (gene therapy, enzyme replacement)

Polygenic Diseases

Many weak genetic signals
GWAS provides the gene list
Pathway enrichment reveals convergent biology
Network analysis identifies hub genes

Tissue Ambiguity

Diseases affecting multiple tissues
Query HPA for all relevant tissues
Compare tissue-specific expression patterns
Use tissue context from disease ontology

Fallback Strategies

If disease name not found

Try synonyms
Try broader disease category
Try OMIM/UMLS ID mapping
Report disambiguation failure and ask user

If no GWAS data

Check ClinVar for rare variants
Use OpenTargets genetic evidence
Note in report as "Limited genetic data"
Adjust confidence score accordingly

If no expression data

Try different disease name/synonym
Check HPA for individual gene expression
Use OpenTargets expression evidence
Note as "Limited transcriptomics data"

If no pathway enrichment

Reduce gene list stringency
Try different pathway databases
Map individual genes to pathways via Reactome
Note as "No significant pathway enrichment"

If no drugs found

Check if disease is rare/orphan
Look for drugs targeting individual genes
Check clinical trials for investigational therapies
Note as "No approved drugs - novel therapeutic opportunity"

Related skills

More from wu-yc/labclaw

Installs

Repository

wu-yc/labclaw

GitHub Stars

977

First Seen

Mar 15, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

tooluniverse-multiomic-disease-characterization

Multi-Omics Disease Characterization Pipeline

When to Use This Skill

Input Parameters

Multi-Omics Confidence Score (0-100)

Score Components

Score Interpretation

Evidence Grading System

Report Template

Phase 0: Disease Disambiguation (ALWAYS FIRST)

Tools Used

Workflow

Collision-Aware Search

Key Disease IDs to Track

Phase 1: Genomics Layer

Tools Used

Workflow

Gene Tracking

Phase 2: Transcriptomics Layer

Tools Used

Workflow

Gene Tracking

Phase 3: Proteomics & Interaction Layer

Tools Used

Workflow

Hub Gene Analysis

Phase 4: Pathway & Network Layer

Tools Used

Workflow

Phase 5: Gene Ontology & Functional Annotation

Tools Used

Workflow

Phase 6: Therapeutic Landscape

Tools Used

Workflow

Drug Tracking

Phase 7: Multi-Omics Integration

Cross-Layer Gene Concordance Analysis

Biomarker Identification

Mechanistic Hypothesis Generation

Confidence Score Calculation

Phase 8: Report Finalization

Executive Summary

Final Report Quality Checklist

Tool Parameter Quick Reference

Response Format Notes (Verified)

OpenTargets Associated Targets

GWAS Catalog Associations

STRING Interactions

Reactome Enrichment

HPA RNA Expression

Enrichr Results

Common Use Patterns

1. Comprehensive Disease Profiling

2. Therapeutic Target Discovery

3. Biomarker Identification

4. Mechanism Elucidation

5. Drug Repurposing

6. Systems Biology

Edge Case Handling

Rare Diseases (limited data)

Common Diseases (overwhelming data)

Cancer

Monogenic Diseases

Polygenic Diseases

Tissue Ambiguity

Fallback Strategies

If disease name not found

If no GWAS data

If no expression data

If no pathway enrichment

If no drugs found

More from wu-yc/labclaw

rowan

tooluniverse-chemical-safety

tooluniverse-drug-repurposing

rdkit

tooluniverse-clinical-guidelines

tooluniverse-protein-therapeutic-design