protein-structure-design-boltzgen
SKILL.md
BoltzGen All-Atom Design
Prerequisites
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10+ | 3.12 |
| CUDA | 12.0+ | 12.2 |
| GPU VRAM | 24GB | 80GB (A800) |
| RAM | 32GB | 64GB |
How to run
Local installation
git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .
Binder design according to the input YAML file
boltzgen run example/vanilla_protein/1g13prot.yaml \
--output workbench/test_run \
--protocol protein-anything \
--num_designs 10 \
--budget 2
# --num_designs is the number of intermediate designs. In practice you will want between 10,000 - 60,000
# --budget is how many designs should be in the final diversity optimized set
YAML configuration
BoltzGen uses an entity-based YAML format to specify what to design and what the target is.
Important notes:
- Residue indices use
label_seq_id(1-indexed), notauth_seq_id - File paths are relative to the YAML file location
- Run
boltzgen check config.yamlto verify before running - View in Molstar to confirm binding site is correctly specified
Entity Types
Designed Protein
entities:
- protein:
id: B # Chain ID for designed protein
sequence: 80..140 # Variable length (80-140 residues)
Sequence specification:
80..140- random length between 80 and 140 residues80- exactly 80 designed residuesAAAVVV20PPP- specific residues with 20 designed in middle3..5C6C3- designed residues with specific cysteines
Target from File
entities:
- file:
path: target.cif # CIF or PDB file (relative to YAML)
include: # Which chains/residues to include
- chain:
id: A
res_index: 2..50,55.. # Optional: specific residues
binding_types: # Where design should bind
- chain:
id: A
binding: 45,67,89 # Binding site residues
structure_groups: "all" # Optional: structure specification
Non-Designed Protein
entities:
- protein:
id: X
sequence: AAVTTTTPPP # Fixed sequence (not designed)
Constraints (Bonds)
constraints:
- bond:
atom1: [S, 11, SG] # [chain_id, res_index, atom_name]
atom2: [S, 18, SG] # Disulfide bond
Protocol-Specific Examples
Protein Binder Design (protein-anything)
entities:
# Designed binder (80-140 residues)
- protein:
id: B
sequence: 80..140
# Target protein
- file:
path: target.cif
include:
- chain:
id: A
binding_types:
- chain:
id: A
binding: 45,67,89
Peptide Design (peptide-anything)
entities:
# Designed peptide (12-20 residues)
- protein:
id: G
sequence: 12..20
- file:
path: target.cif
include:
- chain:
id: A
binding_types:
- chain:
id: A
binding: 343,344,251
structure_groups: "all"
Cyclic Peptide with Disulfide
entities:
- protein:
id: S
sequence: 10..14C6C3 # Designed with cysteines
- file:
path: target.cif
include:
- chain:
id: A
constraints:
- bond:
atom1: [S, 11, SG]
atom2: [S, 18, SG]
WHL Stapled Peptide
entities:
- protein:
id: R
sequence: 3..5C6C3
- ligand:
id: Q
ccd: WHL
- file:
path: target.cif
include:
- chain:
id: A
constraints:
- bond:
atom1: [R, 4, SG]
atom2: [Q, 1, CK]
- bond:
atom1: [R, 11, SG]
atom2: [Q, 1, CH]
Small Molecule Binding (protein-small_molecule)
entities:
- protein:
id: A
sequence: 100..150
- ligand:
smiles: "CCO" # Ethanol
# or ccd: ATP # From CCD database
Nanobody Design (nanobody-anything)
entities:
- protein:
id: H
sequence: EVQLVESGG... # Framework with designed CDRs
# Use specific residue notation for CDR design
- file:
path: antigen.cif
include:
- chain:
id: A
Advanced Options
Partial Target Flexibility
entities:
- file:
path: target.cif
include:
- chain:
id: A
structure_groups:
- group:
visibility: 1 # Fixed structure
id: A
res_index: 10..50
- group:
visibility: 0 # Flexible (not structurally specified)
id: A
res_index: 51..60
Redesign Existing Residues
entities:
- file:
path: complex.cif
include:
- chain:
id: A
design: # Residues to redesign
- chain:
id: A
res_index: 14..19
Secondary Structure Constraints
entities:
- file:
path: target.cif
design:
- chain:
id: A
res_index: 14..19
secondary_structure:
- chain:
id: A
helix: 15..17
sheet: 19
loop: 14
Not-Binding Regions
entities:
- file:
path: target.cif
include:
- chain:
id: A
- chain:
id: B
binding_types:
- chain:
id: A
binding: 45,67,89
- chain:
id: B
not_binding: "all" # Design should NOT bind here
Design protocols
| Protocol | Use Case |
|---|---|
protein-anything |
Design proteins to bind proteins or peptides |
peptide-anything |
Design cyclic peptides to bind proteins |
protein-small_molecule |
Design proteins to bind small molecules |
nanobody-anything |
Design nanobody CDRs |
antibody-anything |
Design antibody CDRs |
Output format
output/
├── sample_0/
│ ├── design.cif # All-atom structure (CIF format)
│ ├── metrics.json # Confidence scores
│ └── sequence.fasta # Sequence
├── sample_1/
│ └── ...
└── summary.csv
Note: BoltzGen outputs CIF format. Convert to PDB if needed:
from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")
Sample output
Successful run
$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete
Results saved to: ./out/boltzgen/2501161234/
Output directory structure:
out/boltzgen/2501161234/
├── intermediate_designs/ # Raw diffusion outputs
│ ├── design_0.cif
│ └── design_0.npz
├── intermediate_designs_inverse_folded/
│ ├── refold_cif/ # Refolded complexes
│ └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
├── final_10_designs/ # Top designs
└── results_overview.pdf # Summary plots
What good output looks like:
- Refolding RMSD < 2.0A (design folds as predicted)
- ipTM > 0.5 (confident interface)
- All designs complete pipeline without errors
Decision tree
Should I use BoltzGen?
│
└─ What type of design?
├─ All-atom precision needed → protein-structure-design-boltzgen ✓
├─ Ligand binding pocket → protein-structure-design-boltzgen ✓
└─ Antibody or nanobody design → antibody-design-iggm
Typical performance
| Campaign Size | Time (L40S) | Cost (Modal) | Notes |
|---|---|---|---|
| 50 designs | 30-45 min | ~$8 | Quick exploration |
| 100 designs | 1-1.5h | ~$15 | Standard campaign |
| 500 designs | 5-8h | ~$70 | Large campaign |
Per-design: ~30-60s for typical binder.
Verify
find output -name "*.cif" | wc -l # Should match num_samples
Troubleshooting
Verify config first: Always run boltzgen check config.yaml before running the full pipeline
Slow generation: Use fewer designs for initial testing, then scale up
OOM errors: Use A100-80GB or reduce --num-designs
Wrong binding site: Residue indices use label_seq_id (1-indexed), check in Molstar viewer
Error interpretation
| Error | Cause | Fix |
|---|---|---|
RuntimeError: CUDA out of memory |
Large design or long protein | Use A100-80GB or reduce designs |
FileNotFoundError: *.cif |
Target file not found | File paths are relative to YAML location |
ValueError: invalid chain |
Chain not in target | Verify chain IDs with Molstar or PyMOL |
modal: command not found |
Modal CLI not installed | Run pip install modal && modal setup |
Next: Validate with structure-prediction-boltz-2.
Weekly Installs
2
Repository
pharmolix/openbiomedGitHub Stars
1.0K
First Seen
10 days ago
Security Audits
Installed on
trae-cn2
iflow-cli2
deepagents2
antigravity2
claude-code2
github-copilot2