alphafold-database
AlphaFold Database
Programmatic access to DeepMind's AlphaFold Protein Structure Database (200M+ predicted structures).
Quick Reference
# Fetch structure via Biopython
from Bio.PDB import alphafold_db
predictions = list(alphafold_db.get_predictions("P00520"))
alphafold_db.download_cif_for(predictions[0], directory="./output")
# Direct API call
import requests
resp = requests.get("https://alphafold.ebi.ac.uk/api/prediction/P00520")
entry_id = resp.json()[0]['entryId'] # AF-P00520-F1
# Download structure file
structure_url = f"https://alphafold.ebi.ac.uk/files/{entry_id}-model_v4.cif"
When to Use
- Obtain 3D coordinates for proteins without experimental structures
- Assess prediction quality via pLDDT and PAE metrics
- Download structure files (mmCIF, PDB) for visualization or docking
- Retrieve proteome-scale datasets for computational analysis
Key Concepts
| Term | Description |
|---|---|
| UniProt Accession | Protein identifier (e.g., P00520) used to query |
| AlphaFold ID | Format: AF-{UniProt}-F{fragment} (e.g., AF-P00520-F1) |
| pLDDT | Per-residue confidence (0-100); >90 = reliable, <50 = disordered |
| PAE | Predicted Aligned Error; <5A = high confidence domain positions |
See references/confidence-scores.md for detailed interpretation guidance.
File Types
| File | URL Pattern | Contents |
|---|---|---|
| Coordinates | {id}-model_v4.cif |
Atomic positions (mmCIF) |
| Confidence | {id}-confidence_v4.json |
Per-residue pLDDT array |
| PAE Matrix | {id}-predicted_aligned_error_v4.json |
Inter-residue error |
Base URL: https://alphafold.ebi.ac.uk/files/
Core Operations
Fetch Structure Metadata
import requests
resp = requests.get(f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}")
metadata = resp.json()[0]
af_id = metadata['entryId']
Download All Files
Use scripts/alphafold_utils.py:
from scripts.alphafold_utils import download_alphafold_files
paths = download_alphafold_files("AF-P04637-F1", output_dir="./data")
Analyze Confidence
from scripts.alphafold_utils import get_plddt_scores
stats = get_plddt_scores("AF-P04637-F1")
print(f"Average pLDDT: {stats['mean']:.1f}")
Bulk Proteome Access
# Google Cloud Storage
gsutil ls gs://public-datasets-deepmind-alphafold-v4/
gsutil -m cp "gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar" ./
See references/bulk-access.md for BigQuery queries and batch processing.
Caveats
- Predictions, not experiments: Verify critical findings experimentally
- Confidence matters: Always check pLDDT before using regions
- Single chains only: No multimers or complexes
- No ligands: Missing cofactors, ions, PTMs
Setup
pip install biopython requests numpy matplotlib pandas scipy
# Optional: pip install google-cloud-bigquery gsutil
Links
- Database: https://alphafold.ebi.ac.uk/
- API Docs: https://alphafold.ebi.ac.uk/api-docs
- Biopython: https://biopython.org/docs/dev/api/Bio.PDB.alphafold_db.html
More from aminoanalytica/amina-skills
biorxiv-database
Search and retrieve preprints from bioRxiv. Use when asked to "search bioRxiv", "find preprints", "look up bioRxiv papers", or retrieve life sciences literature.
28scikit-bio
Python bioinformatics library for sequence manipulation, alignments, phylogenetics, diversity metrics (Shannon, UniFrac), ordination (PCoA, CCA), statistical tests (PERMANOVA, Mantel), and biological file format I/O.
28pdb-database
Query and retrieve protein/nucleic acid structures from RCSB PDB. Use when you need to search the PDB database for structures or metadata. Supports text, sequence, and structure-based searches, coordinate downloads, and metadata retrieval for structural biology workflows.
28chembl-database
Query the ChEMBL database for bioactive compounds, drug targets, and bioactivity data. Use this skill when searching for small molecules, finding inhibitors for protein targets, or analyzing drug mechanisms of action.
28amina-init
>
28amina-de-novo-protein-binder-design
De novo protein binder design pipeline -- autonomous backbone generation, sequence design, and multi-metric validation. Use when asked to "design a binder", "create a protein binder", "de novo binder design", "binder for [target]", "design something that binds [protein]", or any task involving computational protein binder engineering from a target structure.
5