gget

Installation
SKILL.md

gget

Unified CLI and Python access to 20+ genomic databases. All modules work as both command-line tools and Python functions.

Installation

uv pip install --upgrade gget

Some modules require setup: gget setup alphafold|cellxgene|elm|gpt

Quick Start

# CLI: gget <module> [arguments]
gget search -s human BRCA1
gget info ENSG00000012048
gget seq ENSG00000012048 -t   # protein sequence

# Python: gget.module(arguments)
import gget
gget.search(["BRCA1"], species="homo_sapiens")
gget.info(["ENSG00000012048"])

Common flags: -o (save to file), -csv (CSV output), -q (quiet)

Supporting Files

Scripts

  • scripts/gene_analysis.py - Gene discovery to sequence analysis pipeline
  • scripts/enrichment_pipeline.py - Gene list enrichment workflow
  • scripts/batch_sequence_analysis.py - Batch BLAST/alignment processing

Module Overview

Reference & Gene Information

Module What it does Example
ref Download reference genomes (Ensembl) gget ref -w gtf -d human
search Find genes by name/description gget search -s human "GABA receptor"
info Gene/transcript metadata (Ensembl+UniProt+NCBI) gget info ENSG00000012048
seq Nucleotide/protein sequences gget seq -t ENSG00000012048

Sequence Analysis

Module What it does Example
blast NCBI BLAST searches gget blast MKWMFK... -db swissprot
blat UCSC BLAT genomic mapping gget blat ATCGATCG -a human
muscle Multiple sequence alignment gget muscle sequences.fasta
diamond Fast local alignment gget diamond query.fa -ref ref.fa

Structure & Protein

Module What it does Example
pdb Query Protein Data Bank gget pdb 7S7U
alphafold Predict 3D structure (setup required) gget alphafold MKWMFK...
elm Eukaryotic linear motifs (setup required) gget elm LIAQSIGQASFV

Expression & Disease

Module What it does Example
archs4 Correlated genes / tissue expression gget archs4 -w tissue ACE2
cellxgene Single-cell RNA-seq data (setup required) gget cellxgene --gene ACE2 --tissue lung
enrichr GO/pathway enrichment analysis gget enrichr -db ontology ACE2 AGT
bgee Orthologs / expression across species gget bgee ENSG00000169194
opentargets Disease & drug associations gget opentargets ENSG00000169194
cbio Cancer genomics (cBioPortal) gget cbio search breast
cosmic Somatic mutations (requires account) gget cosmic EGFR

Other

Module What it does
mutate Generate mutated sequences from annotations
setup Install module-specific dependencies

Key Workflows

Gene Discovery → Sequence Analysis

# Search → info → sequence → BLAST
results = gget.search(["GABA", "receptor"], species="homo_sapiens")
info = gget.info(results["ensembl_id"].tolist()[:5])
sequences = gget.seq(results["ensembl_id"].tolist()[:5], translate=True)
blast_hits = gget.blast(my_sequence, database="swissprot", limit=10)

Expression & Enrichment

# Tissue expression → correlated genes → enrichment
tissue_expr = gget.archs4("ACE2", which="tissue")
correlated = gget.archs4("ACE2", which="correlation")
enrichment = gget.enrichr(correlated["gene_symbol"].tolist()[:50], database="ontology", plot=True)

Enrichr Database Shortcuts

Shortcut Database
pathway KEGG_2021_Human
transcription ChEA_2016
ontology GO_Biological_Process_2021
diseases_drugs GWAS_Catalog_2019
celltypes PanglaoDB_Augmented_2021

Single-Cell Data

# Gene symbols are case-sensitive: 'PAX7' (human), 'Pax7' (mouse)
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="epithelial cell")
# Filters: disease, development_stage, sex, assay, donor_id, ethnicity

Comparative Genomics

orthologs = gget.bgee("ENSG00000169194", type="orthologs")
human_seq = gget.seq("ENSG00000169194", translate=True)
alignment = gget.muscle([human_seq, mouse_seq])

Best Practices

  • Use --limit to control result sizes
  • Save results with -o for reproducibility
  • Process max ~1000 Ensembl IDs at once with gget info
  • Use gget diamond with --threads for faster local alignment; save DB with --diamond_db
  • For gget muscle, use -s5 (Super5) for large datasets
  • AlphaFold multimer: use -mr 20 for accuracy, -r for AMBER relaxation
  • Update regularly: uv pip install --upgrade gget (databases change structure)

Attribution

Adapted from K-Dense-AI/claude-scientific-skills (BSD-2-Clause). Citation: Luebbert & Pachter (2023) Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

Weekly Installs
3
GitHub Stars
11
First Seen
8 days ago
Installed on
claude-code3
opencode2
gemini-cli2
deepagents2
antigravity2
github-copilot2