gget
Installation
SKILL.md
gget
Unified CLI and Python access to 20+ genomic databases. All modules work as both command-line tools and Python functions.
Installation
uv pip install --upgrade gget
Some modules require setup: gget setup alphafold|cellxgene|elm|gpt
Quick Start
# CLI: gget <module> [arguments]
gget search -s human BRCA1
gget info ENSG00000012048
gget seq ENSG00000012048 -t # protein sequence
# Python: gget.module(arguments)
import gget
gget.search(["BRCA1"], species="homo_sapiens")
gget.info(["ENSG00000012048"])
Common flags: -o (save to file), -csv (CSV output), -q (quiet)
Supporting Files
- module_reference.md - Complete parameter reference for all 20+ modules
- database_info.md - Database descriptions and update frequencies
- workflows.md - Extended workflow examples
Scripts
scripts/gene_analysis.py- Gene discovery to sequence analysis pipelinescripts/enrichment_pipeline.py- Gene list enrichment workflowscripts/batch_sequence_analysis.py- Batch BLAST/alignment processing
Module Overview
Reference & Gene Information
| Module | What it does | Example |
|---|---|---|
ref |
Download reference genomes (Ensembl) | gget ref -w gtf -d human |
search |
Find genes by name/description | gget search -s human "GABA receptor" |
info |
Gene/transcript metadata (Ensembl+UniProt+NCBI) | gget info ENSG00000012048 |
seq |
Nucleotide/protein sequences | gget seq -t ENSG00000012048 |
Sequence Analysis
| Module | What it does | Example |
|---|---|---|
blast |
NCBI BLAST searches | gget blast MKWMFK... -db swissprot |
blat |
UCSC BLAT genomic mapping | gget blat ATCGATCG -a human |
muscle |
Multiple sequence alignment | gget muscle sequences.fasta |
diamond |
Fast local alignment | gget diamond query.fa -ref ref.fa |
Structure & Protein
| Module | What it does | Example |
|---|---|---|
pdb |
Query Protein Data Bank | gget pdb 7S7U |
alphafold |
Predict 3D structure (setup required) | gget alphafold MKWMFK... |
elm |
Eukaryotic linear motifs (setup required) | gget elm LIAQSIGQASFV |
Expression & Disease
| Module | What it does | Example |
|---|---|---|
archs4 |
Correlated genes / tissue expression | gget archs4 -w tissue ACE2 |
cellxgene |
Single-cell RNA-seq data (setup required) | gget cellxgene --gene ACE2 --tissue lung |
enrichr |
GO/pathway enrichment analysis | gget enrichr -db ontology ACE2 AGT |
bgee |
Orthologs / expression across species | gget bgee ENSG00000169194 |
opentargets |
Disease & drug associations | gget opentargets ENSG00000169194 |
cbio |
Cancer genomics (cBioPortal) | gget cbio search breast |
cosmic |
Somatic mutations (requires account) | gget cosmic EGFR |
Other
| Module | What it does |
|---|---|
mutate |
Generate mutated sequences from annotations |
setup |
Install module-specific dependencies |
Key Workflows
Gene Discovery → Sequence Analysis
# Search → info → sequence → BLAST
results = gget.search(["GABA", "receptor"], species="homo_sapiens")
info = gget.info(results["ensembl_id"].tolist()[:5])
sequences = gget.seq(results["ensembl_id"].tolist()[:5], translate=True)
blast_hits = gget.blast(my_sequence, database="swissprot", limit=10)
Expression & Enrichment
# Tissue expression → correlated genes → enrichment
tissue_expr = gget.archs4("ACE2", which="tissue")
correlated = gget.archs4("ACE2", which="correlation")
enrichment = gget.enrichr(correlated["gene_symbol"].tolist()[:50], database="ontology", plot=True)
Enrichr Database Shortcuts
| Shortcut | Database |
|---|---|
pathway |
KEGG_2021_Human |
transcription |
ChEA_2016 |
ontology |
GO_Biological_Process_2021 |
diseases_drugs |
GWAS_Catalog_2019 |
celltypes |
PanglaoDB_Augmented_2021 |
Single-Cell Data
# Gene symbols are case-sensitive: 'PAX7' (human), 'Pax7' (mouse)
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="epithelial cell")
# Filters: disease, development_stage, sex, assay, donor_id, ethnicity
Comparative Genomics
orthologs = gget.bgee("ENSG00000169194", type="orthologs")
human_seq = gget.seq("ENSG00000169194", translate=True)
alignment = gget.muscle([human_seq, mouse_seq])
Best Practices
- Use
--limitto control result sizes - Save results with
-ofor reproducibility - Process max ~1000 Ensembl IDs at once with
gget info - Use
gget diamondwith--threadsfor faster local alignment; save DB with--diamond_db - For
gget muscle, use-s5(Super5) for large datasets - AlphaFold multimer: use
-mr 20for accuracy,-rfor AMBER relaxation - Update regularly:
uv pip install --upgrade gget(databases change structure)
Attribution
Adapted from K-Dense-AI/claude-scientific-skills (BSD-2-Clause). Citation: Luebbert & Pachter (2023) Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836
Weekly Installs
3
Repository
delphine-l/claude_globalGitHub Stars
11
First Seen
8 days ago
Security Audits
Installed on
claude-code3
opencode2
gemini-cli2
deepagents2
antigravity2
github-copilot2