tooluniverse-gwas-snp-interpretation
GWAS SNP Interpretation Skill
Overview
Interpret genetic variants (SNPs) from GWAS studies by aggregating evidence from multiple sources to provide comprehensive clinical and biological context.
Use Cases:
- "Interpret rs7903146" (TCF7L2 diabetes variant)
- "What diseases is rs429358 associated with?" (APOE Alzheimer's variant)
- "Clinical significance of rs1801133" (MTHFR variant)
- "Is rs12913832 in any fine-mapped loci?" (Eye color variant)
What It Does
The skill provides a comprehensive interpretation of SNPs by:
- SNP Annotation: Retrieves basic variant information including genomic coordinates, alleles, functional consequence, and mapped genes
- Association Discovery: Finds all GWAS trait/disease associations with statistical significance
- Fine-Mapping Evidence: Identifies credible sets the variant belongs to (fine-mapped causal loci)
- Gene Mapping: Uses Locus-to-Gene (L2G) predictions to identify likely causal genes
- Clinical Summary: Aggregates evidence into actionable clinical significance
Workflow
User Input: rs7903146
↓
[1] SNP Lookup
→ Get location, consequence, MAF
→ gwas_get_snp_by_id
↓
[2] Association Search
→ Find all trait/disease associations
→ gwas_get_associations_for_snp
↓
[3] Fine-Mapping (Optional)
→ Get credible set membership
→ OpenTargets_get_variant_credible_sets
↓
[4] Gene Predictions
→ Extract L2G scores for causal genes
→ (embedded in credible sets)
↓
[5] Clinical Summary
→ Aggregate evidence
→ Identify key traits and genes
↓
Output: Comprehensive Interpretation Report
Data Sources
GWAS Catalog (EMBL-EBI)
- SNP annotations: Functional consequences, mapped genes, population frequencies
- Associations: P-values, effect sizes, study metadata
- Coverage: 350,000+ publications, 670,000+ associations
Open Targets Genetics
- Fine-mapping: Statistical credible sets from SuSiE, FINEMAP methods
- L2G predictions: Machine learning-based gene prioritization
- Colocalization: QTL evidence for causal genes
- Coverage: UK Biobank, FinnGen, and other large cohorts
Input Parameters
Required
rs_id(str): dbSNP rs identifier- Format: "rs" + number (e.g., "rs7903146")
- Must be valid rsID in GWAS Catalog
Optional
include_credible_sets(bool, default=True): Query fine-mapping data- True: Complete interpretation (slower, ~10-30s)
- False: Fast associations only (~2-5s)
p_threshold(float, default=5e-8): Genome-wide significance thresholdmax_associations(int, default=100): Maximum associations to retrieve
Output Format
Returns SNPInterpretationReport containing:
1. SNP Basic Info
{
'rs_id': 'rs7903146',
'chromosome': '10',
'position': 112998590,
'ref_allele': 'C',
'alt_allele': 'T',
'consequence': 'intron_variant',
'mapped_genes': ['TCF7L2'],
'maf': 0.293
}
2. Trait Associations
[
{
'trait': 'Type 2 diabetes',
'p_value': 1.2e-128,
'beta': '0.28 unit increase',
'study_id': 'GCST010555',
'pubmed_id': '33536258',
'effect_allele': 'T'
},
...
]
3. Credible Sets (Fine-Mapping)
[
{
'study_id': 'GCST90476118',
'trait': 'Renal failure',
'finemapping_method': 'SuSiE-inf',
'p_value': 3.5e-42,
'predicted_genes': [
{'gene': 'TCF7L2', 'score': 0.863}
],
'region': '10:112950000-113050000'
},
...
]
4. Clinical Significance
Genome-wide significant associations with 100 traits/diseases:
- Type 2 diabetes
- Diabetic retinopathy
- HbA1c levels
...
Identified in 20 fine-mapped loci.
Predicted causal genes: TCF7L2
Example Usage
See QUICK_START.md for platform-specific examples.
Tools Used
GWAS Catalog Tools
gwas_get_snp_by_id: Get SNP annotationgwas_get_associations_for_snp: Get all trait associations
Open Targets Tools
OpenTargets_get_variant_info: Get variant details with population frequenciesOpenTargets_get_variant_credible_sets: Get fine-mapping credible sets with L2G
Interpretation Guide
P-value Significance Levels
- p < 5e-8: Genome-wide significant (strong evidence)
- p < 5e-6: Suggestive (moderate evidence)
- p < 0.05: Nominal (weak evidence)
L2G Score Interpretation
- > 0.5: High confidence causal gene
- 0.1-0.5: Moderate confidence
- < 0.1: Low confidence
Clinical Actionability
- High: Multiple genome-wide significant associations + in credible sets + high L2G scores
- Moderate: Genome-wide significant associations but limited fine-mapping
- Low: Suggestive associations or limited replication
Limitations
- Variant ID Conversion: OpenTargets requires chr_pos_ref_alt format, which may need allele lookup
- Population Specificity: Associations may vary by ancestry
- Effect Sizes: Beta values are study-dependent (different phenotype scales)
- Causality: Associations don't prove causation; fine-mapping improves confidence
- Currency: Data reflects published GWAS; latest studies may not be included
Best Practices
- Use Full Interpretation: Enable
include_credible_sets=Truefor clinical decisions - Check Multiple Variants: Look at other variants in the same locus
- Validate Populations: Consider ancestry-specific effect sizes
- Review Publications: Check original studies for context
- Integrate Evidence: Combine with functional data, eQTLs, pQTLs
Technical Notes
Performance
- Fast mode (no credible sets): 2-5 seconds
- Full mode (with credible sets): 10-30 seconds
- Bottleneck: OpenTargets GraphQL API rate limits
Error Handling
- Invalid rs_id: Returns error message
- No associations: Returns empty list with note
- API failures: Graceful degradation (returns partial results)
Related Skills
- Gene Function Analysis: Interpret predicted causal genes
- Disease Ontology Lookup: Understand trait classifications
- PubMed Literature Search: Find original GWAS publications
- Variant Effect Prediction: Functional consequence analysis
References
- GWAS Catalog: https://www.ebi.ac.uk/gwas/
- Open Targets Genetics: https://genetics.opentargets.org/
- GWAS Significance Thresholds: Fadista et al. 2016
- L2G Method: Mountjoy et al. 2021 (Nature Genetics)
Version
- Version: 1.0.0
- Last Updated: 2026-02-13
- ToolUniverse Version: >= 1.0.0
- Tools Required: gwas_get_snp_by_id, gwas_get_associations_for_snp, OpenTargets_get_variant_credible_sets
More from wu-yc/labclaw
tooluniverse-chemical-safety
Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.
19rowan
Cloud-based quantum chemistry platform with Python API. Preferred for computational chemistry workflows including pKa prediction, geometry optimization, conformer searching, molecular property calculations, protein-ligand docking (AutoDock Vina), and AI protein cofolding (Chai-1, Boltz-1/2). Use when tasks involve quantum chemistry calculations, molecular property prediction, DFT or semiempirical methods, neural network potentials (AIMNet2), protein-ligand binding predictions, or automated computational chemistry pipelines. Provides cloud compute resources with no local setup required.
18tooluniverse-drug-repurposing
Identify drug repurposing candidates using ToolUniverse for target-based, compound-based, and disease-driven strategies. Searches existing drugs for new therapeutic indications by analyzing targets, bioactivity, safety profiles, and literature evidence. Use when exploring drug repurposing opportunities, finding new indications for approved drugs, or when users mention drug repositioning, off-label uses, or therapeutic alternatives.
18rdkit
Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms.
17tooluniverse-clinical-guidelines
Search and retrieve clinical practice guidelines across 12+ authoritative sources including NICE, WHO, ADA, AHA/ACC, NCCN, SIGN, CPIC, CMA, CTFPHC, GIN, MAGICapp, PubMed, EuropePMC, TRIP, and OpenAlex. Covers disease management, cardiology, oncology, diabetes, pharmacogenomics, and more. Use when users ask about clinical guidelines, treatment recommendations, standard of care, evidence-based medicine, or drug-gene dosing recommendations.
17tooluniverse-protein-therapeutic-design
Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.
17