skills/gptomics/bioskills/bio-entrez-link

bio-entrez-link

SKILL.md

Entrez Link

Navigate between NCBI databases using Biopython's Entrez module (ELink utility).

Required Setup

from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit

Core Function

Entrez.elink() - Cross-Database Links

Find related records in the same or different databases.

# Find proteins linked to a gene
handle = Entrez.elink(dbfrom='gene', db='protein', id='672')
record = Entrez.read(handle)
handle.close()

# Extract linked IDs
linkset = record[0]
if linkset['LinkSetDb']:
    links = linkset['LinkSetDb'][0]['Link']
    protein_ids = [link['Id'] for link in links]
    print(f"Found {len(protein_ids)} linked proteins")

Key Parameters:

Parameter Description Example
dbfrom Source database 'gene'
db Target database 'protein'
id Source record ID(s) '672' or '672,675'
linkname Specific link type 'gene_protein_refseq'
cmd Link command 'neighbor', 'neighbor_score'

ELink Result Structure

record[0]                          # First linkset
record[0]['DbFrom']                # Source database
record[0]['IdList']                # Input IDs
record[0]['LinkSetDb']             # List of link results
record[0]['LinkSetDb'][0]['DbTo']  # Target database
record[0]['LinkSetDb'][0]['LinkName']  # Link name
record[0]['LinkSetDb'][0]['Link']  # List of linked records
record[0]['LinkSetDb'][0]['Link'][0]['Id']  # Linked ID

Common Link Paths

Gene to Other Databases

From To Link Name Description
gene protein gene_protein All proteins
gene protein gene_protein_refseq RefSeq proteins only
gene nucleotide gene_nuccore Nucleotide sequences
gene nucleotide gene_nuccore_refseqrna RefSeq mRNA
gene pubmed gene_pubmed Related publications
gene homologene gene_homologene Homologs
gene snp gene_snp SNPs in gene
gene clinvar gene_clinvar Clinical variants

Nucleotide to Other Databases

From To Link Name Description
nucleotide protein nuccore_protein Encoded proteins
nucleotide gene nuccore_gene Gene records
nucleotide pubmed nuccore_pubmed Publications
nucleotide taxonomy nuccore_taxonomy Organism taxonomy
nucleotide biosample nuccore_biosample Sample info
nucleotide sra nuccore_sra Related SRA data

Protein to Other Databases

From To Link Name Description
protein nucleotide protein_nuccore Coding sequences
protein gene protein_gene Gene records
protein pubmed protein_pubmed Publications
protein structure protein_structure 3D structures
protein cdd protein_cdd Conserved domains

PubMed Links

From To Link Name Description
pubmed pubmed pubmed_pubmed Related articles
pubmed gene pubmed_gene Mentioned genes
pubmed protein pubmed_protein Mentioned proteins
pubmed nucleotide pubmed_nuccore Mentioned sequences

Code Patterns

Gene to Protein

from Bio import Entrez

Entrez.email = 'your.email@example.com'

def get_proteins_for_gene(gene_id):
    handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

protein_ids = get_proteins_for_gene('672')  # BRCA1
print(f"RefSeq proteins: {protein_ids[:5]}")

Nucleotide to Gene

def get_gene_for_nucleotide(nuc_id):
    handle = Entrez.elink(dbfrom='nucleotide', db='gene', id=nuc_id)
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return None
    return record[0]['LinkSetDb'][0]['Link'][0]['Id']

gene_id = get_gene_for_nucleotide('NM_007294')
print(f"Gene ID: {gene_id}")

Find Related PubMed Articles

def get_related_articles(pmid, max_results=10):
    handle = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid, linkname='pubmed_pubmed')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    links = record[0]['LinkSetDb'][0]['Link']
    return [link['Id'] for link in links[:max_results]]

related = get_related_articles('35412348')
print(f"Related articles: {related}")

Get All Available Links

def discover_links(db, record_id):
    handle = Entrez.elink(dbfrom=db, id=record_id, cmd='acheck')
    record = Entrez.read(handle)
    handle.close()

    links = {}
    for linkset in record[0].get('LinkSetDb', []):
        links[linkset['LinkName']] = linkset['DbTo']
    return links

available = discover_links('gene', '672')
for name, target in available.items():
    print(f"{name} -> {target}")

Navigate Gene -> Protein -> Structure

def gene_to_structures(gene_id):
    # Gene to protein
    handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    protein_ids = [link['Id'] for link in record[0]['LinkSetDb'][0]['Link'][:5]]

    # Protein to structure
    handle = Entrez.elink(dbfrom='protein', db='structure', id=','.join(protein_ids))
    record = Entrez.read(handle)
    handle.close()

    structure_ids = []
    for linkset in record:
        if linkset['LinkSetDb']:
            structure_ids.extend([link['Id'] for link in linkset['LinkSetDb'][0]['Link']])
    return structure_ids

structures = gene_to_structures('672')
print(f"Structure IDs: {structures[:5]}")

Link Multiple IDs at Once

def batch_link(dbfrom, db, ids):
    if isinstance(ids, list):
        ids = ','.join(ids)

    handle = Entrez.elink(dbfrom=dbfrom, db=db, id=ids)
    record = Entrez.read(handle)
    handle.close()

    # Returns one linkset per input ID
    results = {}
    for linkset in record:
        source_id = linkset['IdList'][0]
        linked_ids = []
        if linkset['LinkSetDb']:
            linked_ids = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']]
        results[source_id] = linked_ids
    return results

results = batch_link('gene', 'protein', ['672', '675', '7157'])
for gene, proteins in results.items():
    print(f"Gene {gene}: {len(proteins)} proteins")

Get Publications for a Sequence

def get_sequence_publications(accession):
    # First get the GI/UID
    handle = Entrez.esearch(db='nucleotide', term=f'{accession}[accn]')
    search = Entrez.read(handle)
    handle.close()

    if not search['IdList']:
        return []
    uid = search['IdList'][0]

    # Link to PubMed
    handle = Entrez.elink(dbfrom='nucleotide', db='pubmed', id=uid)
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

pmids = get_sequence_publications('NM_007294')
print(f"PubMed IDs: {pmids[:5]}")

Link Commands

Command Description
neighbor Default - get linked records
neighbor_score Include relevance scores
neighbor_history Store results in history
acheck List all available links
ncheck Check if any links exist
lcheck Check specific link exists
llinks Get URLs to Entrez links
prlinks Get provider links (external)

Common Errors

Error Cause Solution
Empty LinkSetDb No links exist Check if record has linked data
HTTPError 400 Invalid ID or database Verify ID exists in source database
KeyError Missing expected field Check if LinkSetDb is empty first
Single linkset expected, got list Multiple input IDs Iterate through record list

Decision Tree

Need to find related records?
├── Know what link you want?
│   └── Use elink with specific linkname
├── Discover what links exist?
│   └── Use elink with cmd='acheck'
├── Navigate to target database?
│   └── Use elink(dbfrom=X, db=Y, id=Z)
├── Find related records in same database?
│   └── Use elink(dbfrom=X, db=X) with neighbor
├── Chain multiple databases?
│   └── Call elink multiple times
└── Need the actual records?
    └── Use elink first, then efetch with IDs

Related Skills

  • entrez-search - Search databases before linking
  • entrez-fetch - Retrieve records after finding linked IDs
  • batch-downloads - Download many linked records efficiently
Weekly Installs
3
Installed on
windsurf2
trae2
opencode2
codex2
claude-code2
antigravity2