alphafold-database

SKILL.md

AlphaFold Database

Programmatic access to DeepMind's AlphaFold Protein Structure Database (200M+ predicted structures).

Quick Reference

# Fetch structure via Biopython
from Bio.PDB import alphafold_db
predictions = list(alphafold_db.get_predictions("P00520"))
alphafold_db.download_cif_for(predictions[0], directory="./output")

# Direct API call
import requests
resp = requests.get("https://alphafold.ebi.ac.uk/api/prediction/P00520")
entry_id = resp.json()[0]['entryId']  # AF-P00520-F1

# Download structure file
structure_url = f"https://alphafold.ebi.ac.uk/files/{entry_id}-model_v4.cif"

When to Use

  • Obtain 3D coordinates for proteins without experimental structures
  • Assess prediction quality via pLDDT and PAE metrics
  • Download structure files (mmCIF, PDB) for visualization or docking
  • Retrieve proteome-scale datasets for computational analysis

Key Concepts

Term Description
UniProt Accession Protein identifier (e.g., P00520) used to query
AlphaFold ID Format: AF-{UniProt}-F{fragment} (e.g., AF-P00520-F1)
pLDDT Per-residue confidence (0-100); >90 = reliable, <50 = disordered
PAE Predicted Aligned Error; <5A = high confidence domain positions

See references/confidence-scores.md for detailed interpretation guidance.

File Types

File URL Pattern Contents
Coordinates {id}-model_v4.cif Atomic positions (mmCIF)
Confidence {id}-confidence_v4.json Per-residue pLDDT array
PAE Matrix {id}-predicted_aligned_error_v4.json Inter-residue error

Base URL: https://alphafold.ebi.ac.uk/files/

Core Operations

Fetch Structure Metadata

import requests
resp = requests.get(f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}")
metadata = resp.json()[0]
af_id = metadata['entryId']

Download All Files

Use scripts/alphafold_utils.py:

from scripts.alphafold_utils import download_alphafold_files
paths = download_alphafold_files("AF-P04637-F1", output_dir="./data")

Analyze Confidence

from scripts.alphafold_utils import get_plddt_scores
stats = get_plddt_scores("AF-P04637-F1")
print(f"Average pLDDT: {stats['mean']:.1f}")

Bulk Proteome Access

# Google Cloud Storage
gsutil ls gs://public-datasets-deepmind-alphafold-v4/
gsutil -m cp "gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar" ./

See references/bulk-access.md for BigQuery queries and batch processing.

Caveats

  • Predictions, not experiments: Verify critical findings experimentally
  • Confidence matters: Always check pLDDT before using regions
  • Single chains only: No multimers or complexes
  • No ligands: Missing cofactors, ions, PTMs

Setup

pip install biopython requests numpy matplotlib pandas scipy
# Optional: pip install google-cloud-bigquery gsutil

Links

Weekly Installs
26
First Seen
Feb 25, 2026
Installed on
mcpjam26
claude-code26
replit26
junie26
windsurf26
zencoder26