tooluniverse-gwas-study-explorer

Originally frommims-harvard/tooluniverse

Installation

SKILL.md

GWAS Study Deep Dive & Meta-Analysis

Compare GWAS studies, perform meta-analyses, and assess replication across cohorts

Overview

The GWAS Study Deep Dive & Meta-Analysis skill enables comprehensive comparison of genome-wide association studies (GWAS) for the same trait, meta-analysis of genetic loci across studies, and systematic assessment of replication and study quality. It integrates data from the NHGRI-EBI GWAS Catalog and Open Targets Genetics to provide a complete picture of the genetic architecture of complex traits.

Key Capabilities

Study Comparison: Compare all GWAS studies for a trait, assessing sample sizes, ancestries, and platforms
Meta-Analysis: Aggregate effect sizes across studies and calculate heterogeneity statistics
Replication Assessment: Identify replicated vs novel findings across discovery and replication cohorts
Quality Evaluation: Assess statistical power, ancestry diversity, and data availability

Use Cases

1. Comprehensive Trait Analysis

Scenario: "I want to understand all available GWAS data for type 2 diabetes"

Workflow:

Search for all T2D studies in GWAS Catalog
Filter by sample size and ancestry
Extract top associations from each study
Identify consistently replicated loci
Assess ancestry-specific effects

Outcome: Complete landscape of T2D genetics with replicated findings and population-specific signals

2. Locus-Specific Meta-Analysis

Scenario: "Is the TCF7L2 association with T2D consistent across all studies?"

Workflow:

Retrieve all TCF7L2 (rs7903146) associations for T2D
Calculate combined effect size and p-value
Assess heterogeneity (I² statistic)
Generate forest plot data
Interpret heterogeneity level

Outcome: Quantitative assessment of effect size consistency with heterogeneity interpretation

3. Replication Analysis

Scenario: "Which findings from the discovery cohort replicated in the independent sample?"

Workflow:

Get top hits from discovery study
Check for presence and significance in replication study
Assess direction consistency
Calculate replication rate
Identify novel vs failed replication

Outcome: Systematic replication report with success rates and failed findings

4. Multi-Ancestry Comparison

Scenario: "Are T2D loci consistent across European and East Asian populations?"

Workflow:

Filter studies by ancestry
Compare top associations between populations
Identify shared vs population-specific loci
Assess allele frequency differences
Evaluate transferability of genetic risk scores

Outcome: Ancestry-specific genetic architecture with transferability assessment

Statistical Methods

Meta-Analysis Approach

This skill implements standard GWAS meta-analysis methods:

Fixed-Effects Model:

Used when heterogeneity is low (I² < 25%)
Weights studies by inverse variance
Assumes true effect size is the same across studies

Random-Effects Model (recommended when I² > 50%):

Accounts for between-study variation
More conservative than fixed-effects
Better for diverse ancestries or methodologies

Heterogeneity Assessment:

The I² statistic measures the percentage of variance due to between-study heterogeneity:

I² = [(Q - df) / Q] × 100%

where Q = Cochran's Q statistic
      df = degrees of freedom (n_studies - 1)

Interpretation Guidelines:

I² < 25%: Low heterogeneity → fixed-effects appropriate
I² = 25-50%: Moderate heterogeneity → investigate sources
I² = 50-75%: Substantial heterogeneity → random-effects preferred
I² > 75%: Considerable heterogeneity → meta-analysis may not be appropriate

Sources of Heterogeneity

Common reasons for high I²:

Ancestry differences: Different allele frequencies and LD structure
Phenotype heterogeneity: Trait definition varies across studies
Platform differences: Imputation quality and coverage
Winner's curse: Discovery studies overestimate effect sizes
Cohort characteristics: Age, sex, environmental factors

Recommendations:

Perform subgroup analysis by ancestry
Use meta-regression to investigate sources
Consider excluding outlier studies
Apply genomic control correction

Study Quality Assessment

Quality Metrics

The skill evaluates studies based on:

1. Sample Size:

Power to detect associations (80% power requires n > 10,000 for OR=1.2)
Precision of effect size estimates
Ability to detect modest effects

2. Ancestry Diversity:

Single-ancestry vs multi-ancestry
Population stratification control
Transferability of findings

3. Data Availability:

Summary statistics available for meta-analysis
Individual-level data vs summary-level
Imputation quality scores

4. Genotyping Quality:

Platform density and coverage
Imputation reference panel
Quality control measures

5. Statistical Rigor:

Genome-wide significance threshold (p < 5×10⁻⁸)
Multiple testing correction
Replication in independent cohort

Quality Tiers

Tier 1 (High Quality):

n ≥ 50,000
Summary statistics available
Multi-ancestry or large single-ancestry
Imputed to high-quality reference
Independent replication

Tier 2 (Moderate Quality):

n ≥ 10,000
Standard GWAS platform
Adequate power for common variants
Some data availability

Tier 3 (Limited):

n < 10,000
Limited power
May miss modest effects
Use with caution

Best Practices

Before Meta-Analysis

Check phenotype consistency: Ensure studies measure the same trait
Verify ancestry overlap: High heterogeneity expected if ancestries differ
Harmonize alleles: Align effect alleles across studies
Quality control: Exclude low-quality studies or associations

Interpreting Results

Genome-wide significance: p < 5×10⁻⁸ (Bonferroni for ~1M independent tests)
Replication threshold: p < 0.05 in independent cohort
Direction consistency: Effect should be same direction across studies
Heterogeneity: I² > 50% suggests caution in interpretation

Common Pitfalls

❌ Don't:

Meta-analyze without checking heterogeneity
Ignore ancestry differences
Over-interpret nominal p-values
Assume replication failure means false positive

✅ Do:

Always report I² statistic
Perform sensitivity analyses
Consider ancestry-stratified analysis
Account for winner's curse in discovery studies

Limitations & Caveats

Data Limitations

Incomplete Overlap: Studies may analyze different SNPs
Cohort Overlap: Some cohorts participate in multiple studies (inflates significance)
Publication Bias: Significant findings more likely to be published
Winner's Curse: Discovery studies overestimate effect sizes
Imputation Quality: Varies across studies and populations

Statistical Limitations

Heterogeneity: High I² may preclude meaningful meta-analysis
Sample Size Differences: Large studies dominate fixed-effects models
Allele Frequency Differences: Same variant has different effects across ancestries
Linkage Disequilibrium: Fine-mapping needed to identify causal variants
Gene-Environment Interactions: Not captured in standard meta-analysis

Interpretation Guidelines

When I² > 75%:

Meta-analysis results should be interpreted with extreme caution
Investigate sources of heterogeneity systematically
Consider ancestry-specific or subgroup analyses
Descriptive comparison may be more appropriate than meta-analysis

When Studies Conflict:

Check for methodological differences
Verify phenotype definitions match
Investigate population stratification
Consider conditional analysis

Scientific References

Key Publications

GWAS Best Practices:
- Visscher et al. (2017). "10 Years of GWAS Discovery" American Journal of Human Genetics 101(1): 5-22
- PMID: 28686856
- DOI: 10.1016/j.ajhg.2017.06.005
Meta-Analysis Methods:
- Evangelou & Ioannidis (2013). "Meta-analysis methods for genome-wide association studies and beyond" Nature Reviews Genetics 14: 379-389
- PMID: 23657481
Heterogeneity Interpretation:
- Higgins et al. (2003). "Measuring inconsistency in meta-analyses" BMJ 327: 557-560
- PMID: 12958120
Multi-Ancestry GWAS:
- Peterson et al. (2019). "Genome-wide Association Studies in Ancestrally Diverse Populations" Nature Reviews Genetics 20: 409-422
- PMID: 30926972
Replication Standards:
- Chanock et al. (2007). "Replicating genotype-phenotype associations" Nature 447: 655-660
- PMID: 17554299

Tools Used

GWAS Catalog API

gwas_search_studies: Find studies by trait
gwas_get_study_by_id: Get detailed study metadata
gwas_get_associations_for_study: Retrieve study associations
gwas_get_associations_for_snp: Get SNP associations across studies
gwas_search_associations: Search associations by trait

Open Targets Genetics GraphQL API

OpenTargets_search_gwas_studies_by_disease: Disease-based study search
OpenTargets_get_gwas_study: Detailed study information with LD populations
OpenTargets_get_variant_credible_sets: Fine-mapped loci for variant
OpenTargets_get_study_credible_sets: All credible sets for study
OpenTargets_get_variant_info: Variant annotation and allele frequencies

Glossary

Association: Statistical relationship between a genetic variant and a trait

Credible Set: Set of variants likely to contain the causal variant (from fine-mapping)

Effect Size: Magnitude of genetic association (beta coefficient or odds ratio)

Fine-Mapping: Statistical method to identify causal variants within a locus

Genome-Wide Significance: p < 5×10⁻⁸, accounting for ~1M independent tests

Heterogeneity (I²): Percentage of variance due to between-study differences

L2G (Locus-to-Gene): Score predicting which gene is affected by a GWAS locus

LD (Linkage Disequilibrium): Non-random association of alleles at different loci

Meta-Analysis: Statistical combination of results from multiple studies

Replication: Independent confirmation of an association in a new cohort

Summary Statistics: Per-SNP statistics (p-value, beta, SE) from GWAS

Winner's Curse: Overestimation of effect size in discovery studies

Next Steps

After running this skill, consider:

Fine-Mapping: Use credible sets from Open Targets to identify causal variants
Functional Follow-Up: Investigate biological mechanisms of replicated loci
Genetic Risk Scores: Calculate polygenic risk scores using validated loci
Drug Target Identification: Use L2G scores to prioritize therapeutic targets
Cross-Trait Analysis: Look for pleiotropy with related traits

Version History

v1.0 (2026-02-13): Initial release with study comparison, meta-analysis, and replication assessment

Created by: ToolUniverse GWAS Analysis Team Last Updated: 2026-02-13 License: Open source (MIT)

Related skills

More from wu-yc/labclaw

Installs

Repository

wu-yc/labclaw

GitHub Stars

981

First Seen

Mar 15, 2026

Security Audits

SocketPass

tooluniverse-gwas-study-explorer

GWAS Study Deep Dive & Meta-Analysis

Overview

Key Capabilities

Use Cases

1. Comprehensive Trait Analysis

2. Locus-Specific Meta-Analysis

3. Replication Analysis

4. Multi-Ancestry Comparison

Statistical Methods

Meta-Analysis Approach

Sources of Heterogeneity

Study Quality Assessment

Quality Metrics

Quality Tiers

Best Practices

Before Meta-Analysis

Interpreting Results

Common Pitfalls

Limitations & Caveats

Data Limitations

Statistical Limitations

Interpretation Guidelines

Scientific References

Key Publications

Tools Used

GWAS Catalog API

Open Targets Genetics GraphQL API

Glossary

Next Steps

Version History

More from wu-yc/labclaw

tooluniverse-chemical-safety

rowan

tooluniverse-drug-repurposing

rdkit

tooluniverse-clinical-guidelines

tooluniverse-protein-therapeutic-design