skills/wu-yc/labclaw/tooluniverse-spatial-omics-analysis

tooluniverse-spatial-omics-analysis

SKILL.md

Spatial Multi-Omics Analysis Pipeline

Comprehensive biological interpretation of spatial omics data. Transforms spatially variable genes (SVGs), domain annotations, and tissue context into actionable biological insights covering pathway enrichment, cell-cell interactions, druggable targets, immune microenvironment, and multi-modal integration.

KEY PRINCIPLES:

  1. Report-first approach - Create report file FIRST, then populate progressively
  2. Domain-by-domain analysis - Characterize each spatial region independently before comparison
  3. Gene-list-centric - Analyze user-provided SVGs and marker genes with ToolUniverse databases
  4. Biological interpretation - Go beyond statistics to explain biological meaning of spatial patterns
  5. Disease focus - Emphasize disease mechanisms and therapeutic opportunities when disease context is provided
  6. Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
  7. Multi-modal thinking - Integrate RNA, protein, and metabolite information when available
  8. Validation guidance - Suggest experimental validation approaches for key findings
  9. Source references - Every statement must cite tool/database source
  10. Completeness checklist - Mandatory section showing analysis coverage
  11. English-first queries - Always use English terms in tool calls. Respond in user's language

When to Use This Skill

Apply when users:

  • Provide spatially variable genes from spatial transcriptomics experiments
  • Ask about biological interpretation of spatial domains/clusters
  • Need pathway enrichment analysis of spatial gene expression data
  • Want to understand cell-cell interactions from spatial data
  • Ask about tumor microenvironment heterogeneity from spatial omics
  • Need druggable targets in specific spatial regions
  • Ask about tissue zonation patterns (liver, brain, kidney)
  • Want to integrate spatial transcriptomics + proteomics data
  • Ask about immune infiltration patterns from spatial data
  • Need to compare healthy vs disease regions spatially
  • Ask "What pathways are enriched in this tumor core vs tumor margin?"
  • Ask "What cell-cell interactions occur in this spatial domain?"

NOT for (use other skills instead):

  • Single gene interpretation without spatial context -> Use tooluniverse-target-research
  • Variant interpretation -> Use tooluniverse-variant-interpretation
  • Drug safety profiling -> Use tooluniverse-adverse-event-detection
  • Disease-only analysis without spatial data -> Use tooluniverse-multiomic-disease-characterization
  • GWAS analysis -> Use tooluniverse-gwas-* skills
  • Bulk RNA-seq (non-spatial) -> Use tooluniverse-systems-biology

Input Parameters

Parameter Required Description Example
svgs Yes Spatially variable genes (gene symbols) ['EGFR', 'CDH1', 'VIM', 'MYC', 'CD3E']
tissue_type Yes Tissue/organ type brain, liver, lung, breast, skin
technology No Spatial omics platform used 10x Visium, MERFISH, DBiTplus, SLIDE-seq
disease_context No Disease if applicable breast cancer, Alzheimer disease, liver cirrhosis
spatial_domains No Dict mapping domain name to marker genes {'Tumor core': ['MYC','EGFR'], 'Stroma': ['VIM','COL1A1']}
cell_types No Cell types identified in deconvolution ['Epithelial', 'T cell', 'Macrophage', 'Fibroblast']
proteins No Proteins detected (if multi-modal) ['CD3', 'CD8', 'PD-L1', 'Ki67']
metabolites No Metabolites detected (if SpatialMETA) ['glutamine', 'lactate', 'ATP']

Spatial Omics Integration Score (0-100)

Score Components

Data Completeness (0-30 points):

  • SVGs provided (>10 genes): 5 points
  • Disease context provided: 5 points
  • Spatial domains defined: 5 points
  • Cell type composition available: 5 points
  • Multi-modal data (protein/metabolite): 5 points
  • Literature context found: 5 points

Biological Insight (0-40 points):

  • Significant pathway enrichment (FDR < 0.05): 10 points
  • Cell-cell interaction predictions: 10 points
  • Disease mechanism identified: 10 points
  • Druggable targets found in disease regions: 10 points

Evidence Quality (0-30 points):

  • Cross-database validation (gene found in 3+ databases): 10 points
  • Clinical validation (approved drugs for spatial targets): 10 points
  • Literature support (PubMed evidence for spatial patterns): 10 points

Score Interpretation

Score Tier Interpretation
80-100 Excellent Comprehensive spatial characterization, strong biological insights, druggable targets identified
60-79 Good Good pathway and interaction analysis, some disease/therapeutic context
40-59 Moderate Basic enrichment complete, limited spatial domain comparison or interaction analysis
0-39 Limited Minimal data, gene-level annotation only

Evidence Grading System

Tier Symbol Criteria Examples
T1 [T1] Direct human evidence, clinical proof FDA-approved drug for spatial target, validated biomarker
T2 [T2] Experimental evidence Validated spatial pattern in literature, known ligand-receptor pair
T3 [T3] Computational/database evidence PPI network prediction, pathway enrichment, expression correlation
T4 [T4] Annotation/prediction only GO annotation, text-mined association, predicted interaction

Report Template

Create this file structure at the start: {tissue}_{disease}_spatial_omics_report.md

# Spatial Multi-Omics Analysis Report: {Tissue Type}

**Report Generated**: {date}
**Technology**: {platform}
**Tissue**: {tissue_type}
**Disease Context**: {disease or "Normal tissue"}
**Total SVGs Analyzed**: {count}
**Spatial Domains**: {count}
**Spatial Omics Integration Score**: (to be calculated)

---

## Executive Summary

(2-3 sentence synthesis of key spatial findings - fill after all phases complete)

---

## 1. Tissue & Disease Context

### Tissue Information
| Property | Value | Source |
|----------|-------|--------|
| Tissue type | | |
| Disease | | |
| Expected cell types | | HPA |

### Disease Identifiers (if applicable)
| System | ID | Source |
|--------|-----|--------|

**Sources**: (tools used)

---

## 2. Spatially Variable Gene Characterization

### 2.1 Gene ID Resolution
| Gene Symbol | Ensembl ID | Entrez ID | UniProt | Function | Source |
|-------------|------------|-----------|---------|----------|--------|

### 2.2 Tissue Expression Patterns
| Gene | Tissue Expression | Specificity | Source |
|------|-------------------|-------------|--------|

### 2.3 Subcellular Localization
| Gene | Location | Confidence | Source |
|------|----------|------------|--------|

### 2.4 Disease Associations
| Gene | Disease | Score | Evidence | Source |
|------|---------|-------|----------|--------|

**Sources**: (tools used)

---

## 3. Pathway Enrichment Analysis

### 3.1 STRING Functional Enrichment
| Category | Term | Description | P-value | FDR | Genes | Source |
|----------|------|-------------|---------|-----|-------|--------|

### 3.2 Reactome Pathway Analysis
| Pathway ID | Name | P-value | FDR | Genes Found | Total Genes | Source |
|------------|------|---------|-----|-------------|-------------|--------|

### 3.3 GO Biological Processes
| GO Term | Description | P-value | FDR | Genes | Source |
|---------|-------------|---------|-----|-------|--------|

### 3.4 GO Molecular Functions
| GO Term | Description | P-value | FDR | Genes | Source |
|---------|-------------|---------|-----|-------|--------|

### 3.5 GO Cellular Components
| GO Term | Description | P-value | FDR | Genes | Source |
|---------|-------------|---------|-----|-------|--------|

### Pathway Summary
- Top enriched pathways:
- Key biological processes:
- Spatial pathway implications:

**Sources**: (tools used)

---

## 4. Spatial Domain Characterization

### Domain: {domain_name}

#### Marker Genes
| Gene | Function | Pathways | Source |
|------|----------|----------|--------|

#### Enriched Pathways (domain-specific)
| Pathway | P-value | FDR | Genes | Source |
|---------|---------|-----|-------|--------|

#### Cell Type Signature
| Cell Type | Marker Genes Present | Confidence |
|-----------|---------------------|------------|

#### Biological Interpretation
(Narrative interpretation of this domain)

(Repeat for each domain)

### 4.N Domain Comparison
| Feature | Domain 1 | Domain 2 | Domain 3 |
|---------|----------|----------|----------|
| Top pathway | | | |
| Cell types | | | |
| Disease relevance | | | |

**Sources**: (tools used)

---

## 5. Cell-Cell Interaction Inference

### 5.1 Protein-Protein Interactions (STRING)
| Protein A | Protein B | Score | Type | Source |
|-----------|-----------|-------|------|--------|

### 5.2 Ligand-Receptor Pairs
| Ligand | Receptor | Domain (Ligand) | Domain (Receptor) | Evidence | Source |
|--------|----------|-----------------|-------------------|----------|--------|

### 5.3 Signaling Pathways
| Pathway | Components in Data | Spatial Distribution | Source |
|---------|--------------------|---------------------|--------|

### 5.4 Interaction Network Summary
- Key interaction hubs:
- Cross-domain interactions:
- Predicted cell-cell communication axes:

**Sources**: (tools used)

---

## 6. Disease & Therapeutic Context

### 6.1 Disease Gene Overlap
| Gene | Disease Association Score | Evidence Type | Source |
|------|--------------------------|---------------|--------|

### 6.2 Druggable Targets in Spatial Domains
| Gene | Domain | Tractability | Modality | Approved Drugs | Source |
|------|--------|-------------|----------|----------------|--------|

### 6.3 Drug Mechanisms Relevant to Spatial Targets
| Drug | Target | Mechanism | Phase | Source |
|------|--------|-----------|-------|--------|

### 6.4 Clinical Trials
| NCT ID | Title | Target Gene | Phase | Status | Source |
|--------|-------|-------------|-------|--------|--------|

### Therapeutic Summary
- Druggable genes in disease regions:
- Approved therapies:
- Pipeline drugs:
- Novel opportunities:

**Sources**: (tools used)

---

## 7. Multi-Modal Integration

### 7.1 Protein-RNA Concordance (if protein data available)
| Gene/Protein | RNA Pattern | Protein Pattern | Concordance | Source |
|-------------|-------------|-----------------|-------------|--------|

### 7.2 Subcellular Context
| Gene | mRNA Location (spatial) | Protein Location (HPA) | Concordance | Source |
|------|------------------------|----------------------|-------------|--------|

### 7.3 Metabolic Context (if metabolomics available)
| Gene | Metabolic Pathway | Metabolites Detected | Spatial Pattern | Source |
|------|-------------------|---------------------|-----------------|--------|

**Sources**: (tools used)

---

## 8. Immune Microenvironment (if relevant)

### 8.1 Immune Cell Markers
| Cell Type | Marker Genes | Spatial Domain | Source |
|-----------|-------------|----------------|--------|

### 8.2 Immune Checkpoint Expression
| Checkpoint | Gene | Expression Pattern | Source |
|------------|------|--------------------|--------|

### 8.3 Tumor-Immune Interface (if cancer)
| Feature | Finding | Evidence | Source |
|---------|---------|----------|--------|

### Immune Summary
- Immune infiltration pattern:
- Key immune checkpoints:
- Immunotherapy implications:

**Sources**: (tools used)

---

## 9. Literature & Validation Context

### 9.1 Literature Evidence
| PMID | Title | Relevance | Year | Source |
|------|-------|-----------|------|--------|

### 9.2 Known Spatial Patterns
(Known tissue architecture/zonation from literature)

### 9.3 Validation Recommendations
| Priority | Gene/Target | Method | Rationale |
|----------|-------------|--------|-----------|
| High | | IHC / smFISH | |
| Medium | | IF / ISH | |

**Sources**: (tools used)

---

## Spatial Omics Integration Score

| Component | Points | Max | Details |
|-----------|--------|-----|---------|
| SVGs provided | | 5 | |
| Disease context | | 5 | |
| Spatial domains | | 5 | |
| Cell types | | 5 | |
| Multi-modal data | | 5 | |
| Literature context | | 5 | |
| Pathway enrichment | | 10 | |
| Cell-cell interactions | | 10 | |
| Disease mechanism | | 10 | |
| Druggable targets | | 10 | |
| Cross-database validation | | 10 | |
| Clinical validation | | 10 | |
| Literature support | | 10 | |
| **TOTAL** | | **100** | |

**Score**: XX/100 - [Tier]

---

## Completeness Checklist

- [ ] Gene ID resolution complete
- [ ] Tissue expression patterns analyzed (HPA)
- [ ] Subcellular localization checked (HPA)
- [ ] Pathway enrichment complete (STRING + Reactome)
- [ ] GO enrichment complete (BP + MF + CC)
- [ ] Spatial domains characterized individually
- [ ] Domain comparison performed
- [ ] Protein-protein interactions analyzed (STRING)
- [ ] Ligand-receptor pairs identified
- [ ] Disease associations checked (OpenTargets)
- [ ] Druggable targets identified (OpenTargets tractability)
- [ ] Drug mechanisms reviewed
- [ ] Multi-modal integration performed (if data available)
- [ ] Immune microenvironment characterized (if relevant)
- [ ] Literature search completed
- [ ] Validation recommendations provided
- [ ] Spatial Omics Integration Score calculated
- [ ] Executive summary written
- [ ] All sections have source citations

---

## References

### Data Sources Used
| # | Tool | Parameters | Section | Items Retrieved |
|---|------|------------|---------|-----------------|

### Database Versions
- OpenTargets: (current)
- STRING: v12.0
- Reactome: (current)
- HPA: (current)
- GTEx: v10

Phase 0: Input Processing & Disambiguation (ALWAYS FIRST)

Objective: Parse user input, resolve tissue/disease identifiers, establish analysis context.

Tools Used

OpenTargets_get_disease_id_description_by_name (if disease context provided):

  • Input: diseaseName (string) - Disease name
  • Output: {data: {search: {hits: [{id, name, description}]}}}
  • Use: Get MONDO/EFO IDs for disease queries

OpenTargets_get_disease_description_by_efoId:

  • Input: efoId (string) - Disease ID (e.g., MONDO_0007254)
  • Output: {data: {disease: {id, name, description, dbXRefs}}}
  • Use: Get full disease description

HPA_search_genes_by_query (tissue cell type context):

  • Input: query (string) - Search term
  • Output: List of gene entries matching query
  • Use: Verify tissue-relevant genes

Workflow

  1. Parse SVG list from user input (ensure valid gene symbols)
  2. Identify tissue type and map to standard ontology term
  3. If disease provided, resolve to MONDO/EFO ID using OpenTargets
  4. Get disease description and cross-references
  5. Determine analysis scope:
    • Cancer? -> Include immune microenvironment, somatic mutations, druggable targets
    • Neurological? -> Include brain region specificity, neuronal markers
    • Metabolic? -> Include metabolic zonation, enzyme distribution
    • Normal tissue? -> Focus on tissue architecture and cell type composition
  6. Set up report file with header information

Decision Logic

  • Cancer tissue: Enable immune microenvironment phase, CIViC/cBioPortal queries, immuno-oncology analysis
  • Normal tissue: Skip disease phases, focus on tissue zonation and cell type composition
  • Liver/kidney/brain: Enable zonation-specific analysis
  • No disease context: Proceed with tissue biology only
  • Small gene list (<20): Warn about limited enrichment power, emphasize gene-level analysis
  • Large gene list (>500): Suggest filtering to top SVGs by significance before enrichment

Phase 1: Gene Characterization

Objective: Resolve gene identifiers, annotate functions, tissue specificity, and subcellular localization.

Tools Used

MyGene_query_genes (gene ID resolution):

  • Input: query (string) - Gene symbol
  • Output: {hits: [{_id, symbol, name, ensembl: {gene}, entrezgene}]}
  • Use: Resolve gene symbol to Ensembl ID, Entrez ID
  • NOTE: First hit may not be exact match - filter by symbol field

UniProt_get_function_by_accession (gene function):

  • Input: accession (string) - UniProt accession
  • Output: List of function description strings
  • Use: Get protein function annotation

UniProt_get_subcellular_location_by_accession (protein localization):

  • Input: accession (string)
  • Output: Subcellular location information
  • Use: Where the protein is located in the cell

HPA_get_subcellular_location (validated localization):

  • Input: gene_name (string) - Gene symbol
  • Output: {gene_name, main_locations: [], additional_locations: [], location_summary}
  • Use: Experimentally validated protein subcellular location

HPA_get_rna_expression_by_source (tissue expression):

  • Input: gene_name (string), source_type (string: 'tissue'), source_name (string)
  • Output: {data: {gene_name, source_type, source_name, expression_value, expression_level}}
  • Use: Check expression in the specific tissue of interest
  • NOTE: All 3 parameters are REQUIRED

HPA_get_comprehensive_gene_details_by_ensembl_id (full HPA data):

  • Input: ensembl_id (string), include_isoforms (bool), include_images (bool), include_antibodies (bool), include_expression (bool) - ALL 5 parameters REQUIRED
  • Output: {ensembl_id, gene_name, uniprot_ids, summary, protein_classes, tissue_expression, cell_line_expression, ...}
  • Use: One-stop gene characterization from HPA
  • NOTE: Use include_expression=True for tissue data; set others to False for faster response

HPA_get_cancer_prognostics_by_gene (cancer prognosis):

  • Input: ensembl_id (string) - Ensembl gene ID (NOT gene_name)
  • Output: {gene_name, prognostic_cancers_count, prognostic_summary: [{cancer_type, prognostic_type, p_value}]}
  • Use: Prognostic significance in cancer (if cancer context)

UniProtIDMap_gene_to_uniprot (ID mapping):

  • Input: gene_name (string), organism (string, default 'human')
  • Output: UniProt accession for the gene
  • Use: Map gene symbol to UniProt accession

Workflow

  1. For each SVG (batch if >20, sample top genes): a. Query MyGene to get Ensembl ID, Entrez ID b. Map to UniProt accession c. Get subcellular location from HPA d. Get tissue expression from HPA e. If cancer: check cancer prognostics
  2. Compile gene characterization table
  3. Identify genes with tissue-specific expression
  4. Note genes with nuclear vs membrane vs secreted localization (relevant for spatial patterns)

Batch Strategy for Large Gene Lists

  • 10-50 genes: Characterize all individually
  • 50-200 genes: Characterize top 50 by priority (known disease genes first), summarize rest
  • 200+ genes: Characterize top 30, use enrichment for the full list
  • Always run pathway enrichment on the FULL list regardless

Phase 2: Pathway & Functional Enrichment

Objective: Identify biological pathways and functions enriched in SVGs and per-domain gene sets.

Tools Used

STRING_functional_enrichment (primary enrichment):

  • Input: protein_ids (array of gene symbols), species (int, 9606 for human)
  • Output: {status: 'success', data: [{category, term, number_of_genes, number_of_genes_in_background, p_value, fdr, description, inputGenes, preferredNames}]}
  • Use: Comprehensive enrichment across GO, KEGG, Reactome, COMPARTMENTS, DISEASES
  • Categories: Process (GO:BP), Function (GO:MF), Component (GO:CC), KEGG, Reactome, COMPARTMENTS, DISEASES, Keyword, PMID
  • NOTE: This is the PRIMARY enrichment tool. Returns all categories in one call

ReactomeAnalysis_pathway_enrichment (Reactome-specific):

  • Input: identifiers (string, space-separated gene symbols, NOT array)
  • Output: {data: {token, pathways_found, pathways: [{pathway_id, name, p_value, fdr, entities_found, entities_total}]}}
  • Use: Detailed Reactome pathway analysis with hierarchy
  • NOTE: identifiers is a SPACE-SEPARATED STRING, not array

Reactome_map_uniprot_to_pathways (individual gene):

  • Input: id (string) - UniProt accession
  • Output: Plain list of pathway objects (no data wrapper)
  • Use: Map individual proteins to Reactome pathways

GO_get_annotations_for_gene (individual gene GO):

  • Input: gene_id (string) - Gene symbol or ID
  • Output: Plain list of GO annotation objects
  • Use: Get GO annotations for individual genes

kegg_search_pathway (KEGG pathway search):

  • Input: query (string) - Pathway name or keyword
  • Output: Pathway search results
  • Use: Find KEGG pathways relevant to spatial findings

WikiPathways_search (WikiPathways):

  • Input: query (string) - Search term
  • Output: WikiPathways search results
  • Use: Additional pathway context

Workflow

  1. Global SVG enrichment: Run STRING_functional_enrichment on ALL SVGs
    • Filter results by FDR < 0.05
    • Separate by category (Process, Function, Component, KEGG, Reactome)
    • Report top 10-15 per category
  2. Reactome detailed analysis: Run ReactomeAnalysis_pathway_enrichment
    • Report top pathways with FDR < 0.05
  3. Per-domain enrichment (if spatial domains provided):
    • Run STRING_functional_enrichment on each domain's gene set
    • Compare enriched pathways across domains
    • Identify domain-specific vs shared pathways
  4. Compile pathway tables: Merge results from all enrichment tools

Enrichment Interpretation

  • Signaling pathways (RTK, Wnt, Notch, Hedgehog): Cell-cell communication
  • Metabolic pathways: Tissue metabolic zonation
  • Immune pathways: Immune infiltration/exclusion
  • ECM/adhesion pathways: Tissue structure and remodeling
  • Cell cycle/proliferation: Growth zones
  • Apoptosis/stress: Damage zones

Phase 3: Spatial Domain Characterization

Objective: Characterize each spatial domain biologically and compare between domains.

Tools Used

Uses the same tools as Phase 2 (STRING_functional_enrichment, ReactomeAnalysis) applied per-domain, plus:

HPA_get_biological_processes_by_gene (per-gene processes):

  • Input: gene_name (string)
  • Output: Biological processes associated with the gene
  • Use: Annotate domain marker genes

HPA_get_protein_interactions_by_gene (gene interactions):

  • Input: gene_name (string)
  • Output: Known protein interaction partners
  • Use: Build domain-specific interaction context

Workflow

  1. For each spatial domain: a. Get marker gene list b. Run STRING_functional_enrichment on domain genes c. Identify top pathways, GO terms d. Assign likely cell type(s) based on marker genes:
    • Epithelial: CDH1, EPCAM, KRT18, KRT19
    • Mesenchymal/Fibroblast: VIM, COL1A1, COL3A1, FAP, ACTA2
    • Immune T cell: CD3E, CD3D, CD4, CD8A, CD8B
    • Immune B cell: CD19, CD20 (MS4A1), CD79A
    • Macrophage: CD68, CD163, CSF1R
    • Endothelial: PECAM1, VWF, CDH5
    • Neuronal: SNAP25, SYP, MAP2, NEFL
    • Hepatocyte: ALB, HNF4A, CYP3A4 e. Generate biological interpretation narrative
  2. Compare domains:
    • Differential pathways
    • Unique vs shared genes
    • Disease-relevant vs homeostatic regions
    • Transition zones (shared genes between adjacent domains)

Cell Type Assignment Rules

When user does not provide cell type annotations, infer from marker genes:

  • Check each gene against known cell type markers
  • Use HPA tissue/cell type expression data for validation
  • Report confidence level (high: 3+ markers match, medium: 2 markers, low: 1 marker)

Phase 4: Cell-Cell Interaction Inference

Objective: Predict cell-cell communication from spatial gene expression patterns.

Tools Used

STRING_get_interaction_partners (PPI network):

  • Input: protein_ids (array), species (int, 9606), limit (int), confidence_score (float, 0.7)
  • Output: {status: 'success', data: [{preferredName_A, preferredName_B, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}
  • Use: Find protein-protein interactions among SVGs
  • Score types: nscore=neighborhood, fscore=fusion, pscore=phylogenetic, ascore=coexpression, escore=experimental, dscore=database, tscore=textmining

STRING_get_protein_interactions (pairwise interactions):

  • Input: protein_ids (array), species (int, 9606)
  • Output: Interaction data between specified proteins
  • Use: Get interactions within a specific gene set

intact_search_interactions (IntAct database):

  • Input: query (string), max (int)
  • Output: Interaction data from IntAct
  • Use: Complement STRING with IntAct interactions

Reactome_get_interactor (Reactome interactions):

  • Input: Protein/gene identifier
  • Output: Reactome interaction data
  • Use: Pathway-level interaction context

DGIdb_get_drug_gene_interactions (drug-gene interactions):

  • Input: genes (array of strings)
  • Output: Drug-gene interaction data
  • Use: Identify druggable interaction nodes

Ligand-Receptor Analysis

Known ligand-receptor pairs to check in SVG list:

  • Growth factors: EGF-EGFR, HGF-MET, VEGF-KDR, FGF-FGFR, PDGF-PDGFRA/B
  • Cytokines: TNF-TNFR, IL6-IL6R, IFNG-IFNGR, TGFB1-TGFBR1/2
  • Chemokines: CXCL12-CXCR4, CCL2-CCR2, CXCL10-CXCR3
  • Immune checkpoints: CD274(PD-L1)-PDCD1(PD-1), CD80/CD86-CTLA4, LGALS9-HAVCR2(TIM-3)
  • Notch signaling: DLL1/3/4-NOTCH1/2/3/4, JAG1/2-NOTCH1/2
  • Wnt signaling: WNT ligands-FZD receptors
  • Adhesion: CDH1-CDH1 (homotypic), ITGA/B integrins-ECM
  • Hedgehog: SHH-PTCH1

Workflow

  1. Run STRING_get_interaction_partners on all SVGs
    • Filter interactions with score > 0.7
    • Identify hub genes (most connections)
  2. Check for known ligand-receptor pairs in gene list
    • Cross-reference with spatial domain assignments
    • Identify potential cross-domain signaling
  3. Build interaction network:
    • Intra-domain interactions (within same spatial region)
    • Inter-domain interactions (between different regions)
    • Identify signaling axes (e.g., tumor-stroma, immune-tumor)
  4. Map interactions to Reactome signaling pathways

Phase 5: Disease & Therapeutic Context

Objective: Connect spatial findings to disease mechanisms and identify druggable targets.

Tools Used

OpenTargets_get_associated_targets_by_disease_efoId (disease genes):

  • Input: efoId (string), size (int)
  • Output: {data: {disease: {associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}
  • Use: Get disease-associated genes, overlap with SVGs

OpenTargets_get_target_tractability_by_ensemblID (druggability):

  • Input: ensemblId (string)
  • Output: Tractability data (small molecule, antibody, other modalities)
  • Use: Assess if spatial targets are druggable

OpenTargets_get_associated_drugs_by_target_ensemblID (drugs for target):

  • Input: ensemblId (string), size (int)
  • Output: Drug data for the target
  • Use: Find approved/clinical drugs targeting spatial genes

OpenTargets_get_drug_mechanisms_of_action_by_chemblId (drug mechanism):

  • Input: chemblId (string)
  • Output: Mechanism of action data
  • Use: Understand how drugs act on spatial targets

OpenTargets_target_disease_evidence (evidence linking target to disease):

  • Input: ensemblId (string), efoId (string)
  • Output: Evidence items linking target to disease
  • Use: Specific evidence for each spatial gene in disease

clinical_trials_search (clinical trials):

  • Input: action = "search_studies", condition (string), intervention (string), limit (int)
  • Output: {total_count, studies: [{nctId, title, status, conditions}]}
  • Use: Find clinical trials for spatial targets
  • NOTE: action MUST be "search_studies"

DGIdb_get_gene_druggability (druggability categories):

  • Input: genes (array of strings)
  • Output: {data: {genes: {nodes: [{name, geneCategories: [{name}]}]}}}
  • Use: Classify genes as druggable, kinase, GPCR, etc.

civic_search_genes (CIViC cancer evidence, if cancer):

  • Input: (no filter by name)
  • Output: Gene list from CIViC
  • Use: Check if SVGs have CIViC clinical evidence

Workflow

  1. Disease gene overlap (if disease context provided): a. Get disease-associated targets from OpenTargets b. Intersect with SVGs c. For overlapping genes, get specific evidence
  2. Druggable target identification: a. Run DGIdb_get_gene_druggability on all SVGs b. For druggable genes, check OpenTargets tractability c. Get approved drugs for druggable spatial targets
  3. Clinical trials: a. Search for trials targeting spatial genes in the disease context b. Prioritize trials for genes in disease-enriched spatial domains
  4. Cancer-specific (if cancer): a. Check CIViC for clinical evidence b. Get mutation prevalence from cBioPortal (if specific mutations known) c. Check immune checkpoint genes in spatial data

Phase 6: Multi-Modal Integration

Objective: Integrate protein, RNA, and metabolite spatial data when available.

Tools Used

HPA_get_subcellular_location (protein localization):

  • Input: gene_name (string)
  • Output: {gene_name, main_locations, additional_locations, location_summary}
  • Use: Compare mRNA spatial pattern with protein subcellular location

HPA_get_rna_expression_in_specific_tissues (tissue RNA):

  • Input: ensembl_id (string), tissue_name (string)
  • Output: Expression data for specific tissue
  • Use: Validate spatial expression against bulk tissue data

Reactome_map_uniprot_to_pathways (metabolic pathways):

  • Input: id (string) - UniProt accession
  • Output: List of pathways
  • Use: Map genes to metabolic pathways for metabolomics integration

kegg_get_pathway_info (KEGG pathway details):

  • Input: pathway_id (string) - KEGG pathway ID
  • Output: Pathway information including metabolites
  • Use: Link spatial genes to metabolic pathways and metabolites

Workflow

  1. RNA-Protein concordance (if protein data provided): a. For each gene with both RNA and protein data:
    • Compare spatial RNA pattern with protein detection
    • Check HPA for known post-transcriptional regulation
    • Note concordant (expected) vs discordant (interesting) patterns
  2. Subcellular context: a. Map spatial RNA localization to protein subcellular location (HPA) b. Secreted proteins -> likely paracrine signaling c. Membrane proteins -> cell surface markers d. Nuclear proteins -> transcription factors
  3. Metabolic integration (if metabolomics available): a. Map genes to metabolic pathways (Reactome, KEGG) b. Link detected metabolites to enzyme-encoding genes c. Identify spatial metabolic heterogeneity d. Check for known metabolic zonation patterns

Phase 7: Immune Microenvironment (Cancer/Inflammation)

Objective: Characterize immune cell composition and checkpoint expression in spatial context.

Conditions for Activation

Only execute if:

  • Disease context is cancer, autoimmune, or inflammatory
  • SVGs include immune markers (CD3E, CD8A, CD68, CD163, etc.)
  • User specifically asks about immune patterns

Tools Used

STRING_functional_enrichment (immune pathway enrichment):

  • Applied to immune-relevant SVGs
  • Filter for immune-related GO terms and pathways

OpenTargets_get_target_tractability_by_ensemblID (checkpoint druggability):

  • Applied to immune checkpoint genes
  • Check for approved immunotherapies

iedb_search_epitopes (epitope data):

  • Input: organism_name (string), source_antigen_name (string)
  • Output: {status, data, count}
  • Use: Check if spatial antigens have known epitopes

Immune Cell Markers Reference

Cell Type Key Markers Extended Markers
CD8+ T cell CD8A, CD8B GZMA, GZMB, PRF1, IFNG
CD4+ T cell CD4 IL2, IL4, IL17A, FOXP3 (Treg)
Regulatory T cell FOXP3, IL2RA CTLA4, TIGIT
B cell CD19, MS4A1, CD79A IGHG1, IGHM
Plasma cell SDC1 (CD138), XBP1 IGHG1, MZB1
M1 Macrophage CD68, NOS2, TNF IL1B, CXCL10
M2 Macrophage CD68, CD163, MRC1 ARG1, IL10
Dendritic cell ITGAX (CD11c), HLA-DRA CD80, CD86
NK cell NCAM1 (CD56), NKG7 GNLY, KLRD1
Neutrophil FCGR3B, CXCR2 S100A8, S100A9
Mast cell KIT, TPSAB1 CPA3, HDC

Immune Checkpoint Reference

Checkpoint Gene Ligand Therapeutic Antibody
PD-1/PD-L1 PDCD1/CD274 CD274, PDCD1LG2 Pembrolizumab, Nivolumab, Atezolizumab
CTLA-4 CTLA4 CD80, CD86 Ipilimumab
TIM-3 HAVCR2 LGALS9 Sabatolimab
LAG-3 LAG3 HLA class II Relatlimab
TIGIT TIGIT PVR, PVRL2 Tiragolumab
VISTA VSIR PSGL1 -

Workflow

  1. Identify immune-related SVGs from marker reference
  2. Classify immune cell types present per spatial domain
  3. Check immune checkpoint expression
  4. Assess immune infiltration patterns:
    • Hot (T cell infiltrated) vs Cold (immune desert) vs Excluded
  5. Identify potential immunotherapy targets
  6. Check for tertiary lymphoid structures (B cell + T cell clusters)

Phase 8: Literature & Validation Context

Objective: Provide literature evidence for spatial findings and suggest validation experiments.

Tools Used

PubMed_search_articles (literature search):

  • Input: query (string), max_results (int)
  • Output: List of [{pmid, title, authors, journal, pub_date, doi}]
  • Use: Find published evidence for spatial patterns

openalex_literature_search (broader literature):

  • Input: query (string), per_page (int)
  • Output: List of works with titles, DOIs, abstracts
  • Use: Complement PubMed with preprints and broader coverage

Literature Search Strategy

  1. Tissue + spatial: "{tissue} spatial transcriptomics" - e.g., "liver spatial transcriptomics"
  2. Disease + spatial: "{disease} spatial omics" - e.g., "breast cancer spatial transcriptomics"
  3. Gene + tissue: "{top_gene} {tissue} expression" for key SVGs
  4. Zonation (if relevant): "{tissue} zonation gene expression"
  5. Technology: "{technology} {tissue}" - e.g., "Visium breast cancer"

Validation Recommendations Template

Priority Target Method Rationale Feasibility
High Key SVG smFISH / RNAscope Validate spatial pattern at single-molecule level Medium
High Druggable target IHC on serial sections Confirm protein expression in spatial domain High
High Ligand-receptor pair Proximity ligation assay (PLA) Confirm physical interaction at tissue level Medium
Medium Domain markers Multiplexed IF (CODEX/IBEX) Validate multiple markers simultaneously Low-Medium
Medium Pathway Spatial metabolomics (MALDI/DESI) Confirm metabolic pathway activity Low
Low Novel interaction Co-culture + conditioned media Functional validation of predicted interaction Medium

Workflow

  1. Search PubMed for tissue + disease + spatial transcriptomics
  2. Search for known spatial patterns in the tissue type
  3. Cross-reference findings with published spatial atlas data
  4. Generate validation recommendations based on:
    • Novelty of finding (novel patterns need more validation)
    • Clinical relevance (druggable targets prioritized)
    • Technical feasibility
  5. Cite relevant methodology papers for each validation approach

Tool Parameter Reference (CRITICAL)

Verified Parameter Names

Tool Parameter CORRECT Common MISTAKE Notes
MyGene_query_genes query query q Filter results by symbol field
STRING_functional_enrichment identifiers protein_ids (array) identifiers Also needs species=9606
STRING_get_interaction_partners identifiers protein_ids (array) identifiers limit, confidence_score optional
ReactomeAnalysis_pathway_enrichment genes identifiers (string) Array SPACE-SEPARATED string, NOT array
HPA_get_subcellular_location gene gene_name ensembl_id Uses gene symbol
HPA_get_cancer_prognostics_by_gene gene ensembl_id gene_name Uses Ensembl ID, NOT symbol
HPA_get_rna_expression_by_source params gene_name, source_type, source_name - ALL 3 required
HPA_get_rna_expression_in_specific_tissues gene ensembl_id gene_name Uses Ensembl ID
OpenTargets_get_target_tractability_by_ensemblID target ensemblId ensemblID camelCase
OpenTargets_get_associated_drugs_by_target_ensemblID target ensemblId, size - Both REQUIRED
OpenTargets_get_associated_targets_by_disease_efoId disease efoId diseaseId Returns {data: {disease: {associatedTargets}}}
DGIdb_get_gene_druggability genes genes (array) gene_name Array of strings
DGIdb_get_drug_gene_interactions genes genes (array) gene_name Array of strings
clinical_trials_search action action='search_studies' Missing action action is REQUIRED
ensembl_lookup_gene species species='homo_sapiens' No species REQUIRED parameter
GTEx tools operation operation (SOAP) Missing All GTEx tools need operation parameter
HPA_get_comprehensive_gene_details_by_ensembl_id all params ALL 5 required: ensembl_id, include_isoforms, include_images, include_antibodies, include_expression Missing booleans Set booleans to False except expression
GTEx tools gencode gencode_id (array) gene_id Requires versioned GENCODE ID

Response Format Reference

Tool Response Format Key Fields
STRING_functional_enrichment {status, data: [{category, term, description, p_value, fdr, inputGenes}]} Filter by FDR < 0.05
ReactomeAnalysis_pathway_enrichment {data: {pathways: [{pathway_id, name, p_value, fdr, entities_found, entities_total}]}} Top 20 returned
STRING_get_interaction_partners {status, data: [{preferredName_A, preferredName_B, score}]} Score > 0.7 for high confidence
MyGene_query_genes {hits: [{_id, symbol, name, ensembl: {gene}, entrezgene}]} Filter by exact symbol match
HPA_get_subcellular_location {gene_name, main_locations: [], additional_locations: [], location_summary} Direct dict response
OpenTargets_get_target_tractability_by_ensemblID {data: {target: {id, tractability: [{label, modality, value}]}}} Check value=true
DGIdb_get_gene_druggability {data: {genes: {nodes: [{name, geneCategories: [{name}]}]}}} GraphQL response
PubMed_search_articles Plain list of [{pmid, title, authors, journal, pub_date}] No data wrapper
clinical_trials_search {total_count, studies: [{nctId, title, status, conditions}]} total_count can be None

Fallback Strategies

Pathway Enrichment

  • Primary: STRING_functional_enrichment (most comprehensive, one call)
  • Fallback: ReactomeAnalysis_pathway_enrichment (Reactome-specific)
  • Default: Individual gene GO annotations (GO_get_annotations_for_gene)

Tissue Expression

  • Primary: HPA_get_rna_expression_by_source
  • Fallback: HPA_get_comprehensive_gene_details_by_ensembl_id
  • Default: Note "tissue expression data unavailable"

Disease Association

  • Primary: OpenTargets_get_associated_targets_by_disease_efoId
  • Fallback: OpenTargets_target_disease_evidence (per gene)
  • Default: Skip disease section if no disease context

Drug Information

  • Primary: OpenTargets_get_associated_drugs_by_target_ensemblID
  • Fallback: DGIdb_get_drug_gene_interactions
  • Default: Note "no approved drugs identified"

Literature

  • Primary: PubMed_search_articles
  • Fallback: openalex_literature_search
  • Default: Note "no spatial-specific literature found"

Common Use Cases

Use Case 1: Cancer Spatial Heterogeneity

Input: Visium data from breast cancer with 5 spatial domains (tumor core, tumor margin, stroma, immune infiltrate, normal tissue) and 200 SVGs.

Analysis focus:

  • Tumor-specific pathways (proliferation, DNA repair)
  • Immune infiltration patterns (hot vs cold)
  • Tumor-stroma interactions (CAF signaling)
  • Druggable targets in tumor core
  • Immune checkpoint expression patterns
  • Prognostic genes per domain

Use Case 2: Brain Tissue Zonation

Input: MERFISH data from hippocampus with cell-type specific genes and neuronal subtype markers.

Analysis focus:

  • Neuronal subtype characterization
  • Synaptic signaling pathways
  • Neurotransmitter receptor distribution
  • Known hippocampal zonation patterns (CA1, CA3, DG)
  • Neurodegenerative disease gene overlap

Use Case 3: Liver Metabolic Zonation

Input: Spatial transcriptomics of liver with periportal vs pericentral gene gradients.

Analysis focus:

  • Metabolic enzyme distribution (CYP450, gluconeogenesis, lipogenesis)
  • Wnt signaling gradient (known zonation regulator)
  • Oxygen gradient-responsive genes
  • Drug metabolism enzyme spatial patterns
  • Liver disease gene overlap

Use Case 4: Tumor-Immune Interface

Input: DBiTplus data from melanoma with spatial protein + RNA data showing tumor-immune boundary.

Analysis focus:

  • Immune cell composition at boundary
  • Checkpoint ligand-receptor pairs
  • Immune exclusion mechanisms
  • Immunotherapy target identification
  • Multi-modal (RNA + protein) concordance

Use Case 5: Developmental Spatial Patterns

Input: Spatial transcriptomics of embryonic tissue with developmental patterning genes.

Analysis focus:

  • Morphogen gradients (Wnt, BMP, FGF, SHH)
  • Transcription factor spatial patterns
  • Cell fate determination genes
  • Developmental signaling pathways
  • Comparison to adult tissue patterns

Use Case 6: Disease Progression Mapping

Input: Spatial data from neurodegenerative tissue showing disease gradient from affected to unaffected regions.

Analysis focus:

  • Disease gene expression gradient
  • Inflammatory response spatial pattern
  • Neuronal loss markers
  • Glial activation patterns
  • Therapeutic window identification

Limitations & Known Issues

Database-Specific

  • Enrichment: enrichr_gene_enrichment_analysis returns connectivity graph (107MB), NOT standard enrichment. Use STRING_functional_enrichment instead
  • GTEx: SOAP-style tools requiring operation parameter; needs versioned GENCODE IDs (e.g., ENSG00000141510.16)
  • HPA: Some tools use gene_name, others use ensembl_id - check parameter reference
  • OpenTargets: Disease IDs use underscore format (MONDO_0007254), not colon
  • cBioPortal_get_cancer_studies: BROKEN - has literal {limit} in URL causing 400 error

Conceptual

  • No raw spatial data processing: This skill analyzes gene LISTS, not raw spatial matrices (Seurat/Scanpy/squidpy handle raw data)
  • No spatial statistics: Cannot perform Moran's I, spatial autocorrelation, or variogram analysis
  • No image analysis: Cannot process H&E or fluorescence images
  • No deconvolution: Cannot perform cell type deconvolution (use BayesSpace, cell2location, RCTD externally)
  • Ligand-receptor inference: Based on gene co-expression + known pairs, not spatial proximity statistics (use CellChat, NicheNet, COMMOT externally)

Technical

  • Large gene lists: >200 genes may slow STRING queries; batch or sample
  • Response format variability: Always check both dict and list response types
  • Rate limits: STRING and OpenTargets may throttle frequent requests

Summary

Spatial Multi-Omics Analysis skill provides:

  1. Gene characterization (ID resolution, function, localization, tissue expression)
  2. Pathway & functional enrichment (STRING, Reactome, GO, KEGG)
  3. Spatial domain characterization (per-domain and cross-domain comparison)
  4. Cell-cell interaction inference (PPI, ligand-receptor, signaling pathways)
  5. Disease & therapeutic context (disease genes, druggable targets, clinical trials)
  6. Multi-modal integration (RNA-protein concordance, metabolic pathways)
  7. Immune microenvironment characterization (cell types, checkpoints, immunotherapy)
  8. Literature context & validation recommendations

Outputs: Comprehensive markdown report with Spatial Omics Integration Score (0-100) Best for: Biological interpretation of spatial omics experiments (post-processing after spatial data analysis tools) Uses: 70+ ToolUniverse tools across 9 analysis phases Time: ~10-20 minutes depending on gene list size and analysis scope

Weekly Installs
2
Repository
wu-yc/labclaw
GitHub Stars
646
First Seen
3 days ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2