text-based-molecule-editing
SKILL.md
Text-Based Molecule Editing
Modify molecular structures guided by natural language property descriptions.
When to Use
- User wants to optimize a molecule for specific properties (solubility, binding, drug-likeness)
- User provides a molecule and requests property-based modifications
- User wants to explore structural variants guided by text descriptions
Workflow
Step 1: Prepare Input Molecule
from open_biomed.data import Molecule
from open_biomed.tools.tool_registry import TOOLS
# Option A: From molecule name (queries PubChem)
tool = TOOLS["molecule_name_request"]
result, _ = tool.run(accession="aspirin")
molecule = result[0]
# Option B: From SMILES directly
molecule = Molecule.from_smiles("CC(=O)Oc1ccccc1C(=O)O")
Step 2: Calculate Baseline Properties (Optional)
qed_tool = TOOLS["molecule_qed"]
logp_tool = TOOLS["molecule_logp"]
sa_tool = TOOLS["molecule_sa"]
qed, _ = qed_tool.run(molecule=molecule)
logp, _ = logp_tool.run(molecule=molecule)
sa, _ = sa_tool.run(molecule=molecule)
Step 3: Run Text-Based Editing
from open_biomed.core.pipeline import InferencePipeline
from open_biomed.data import Text
pipeline = InferencePipeline(
task="text_based_molecule_editing",
model="molt5",
model_ckpt="./checkpoints/server/text_based_molecule_editing_biot5.ckpt",
device="cuda:0"
)
outputs = pipeline.run(
molecule=molecule,
text=Text.from_str("This molecule should be more soluble in water"),
)
edited_molecule = outputs[0][0]
Step 4: Compare Properties
qed_new, _ = qed_tool.run(molecule=edited_molecule)
logp_new, _ = logp_tool.run(molecule=edited_molecule)
print(f"Original SMILES: {molecule.smiles}")
print(f"Edited SMILES: {edited_molecule.smiles}")
print(f"LogP change: {logp[0]:.2f} → {logp_new[0]:.2f}")
Expected Outputs
| Step | Output | Description |
|---|---|---|
| Step 1 | Molecule object |
Input molecule with SMILES |
| Step 2 | float values |
QED (0-1), LogP, SA scores |
| Step 3 | Molecule object |
Edited molecule with new structure |
| Step 4 | Comparison | Before/after property summary |
Interpretation Guide
LogP (Lipophilicity)
| Value | Solubility | Interpretation |
|---|---|---|
| < 0 | High water solubility | Very hydrophilic |
| 0-2 | Moderate | Good balance for oral drugs |
| 2-5 | Low water solubility | May need formulation help |
| > 5 | Very lipophilic | Poor absorption likely |
QED (Quantitative Estimate of Drug-likeness)
| Value | Quality | Interpretation |
|---|---|---|
| > 0.7 | Excellent | Highly drug-like |
| 0.5-0.7 | Good | Acceptable drug-likeness |
| 0.3-0.5 | Moderate | May need optimization |
| < 0.3 | Poor | Significant liabilities |
SA (Synthetic Accessibility)
| Value | Difficulty | Interpretation |
|---|---|---|
| 1-3 | Easy | Straightforward synthesis |
| 3-5 | Moderate | Some challenges |
| 5-7 | Difficult | Complex synthesis needed |
| > 7 | Very difficult | Likely impractical |
Error Handling
Model Checkpoint Not Found
Symptom: FileNotFoundError for checkpoint file
Solution: Ensure checkpoint exists at ./checkpoints/server/text_based_molecule_editing_biot5.ckpt
import os
ckpt_path = "./checkpoints/server/text_based_molecule_editing_biot5.ckpt"
if not os.path.exists(ckpt_path):
raise FileNotFoundError(f"Download checkpoint to: {ckpt_path}")
Invalid SMILES Output
Symptom: Model generates invalid SMILES string
Solution: The model returns None for invalid molecules. Try:
- Rephrasing the edit prompt
- Using beam search with more beams
- Running multiple times for different outputs
CUDA Out of Memory
Symptom: RuntimeError: CUDA out of memory
Solution: Use CPU or smaller batch:
pipeline = InferencePipeline(
task="text_based_molecule_editing",
model="molt5",
model_ckpt="./checkpoints/server/text_based_molecule_editing_biot5.ckpt",
device="cpu" # Fallback to CPU
)
Example
Input: aspirin
Prompt: "This molecule should be more soluble in water"
Original SMILES: CC(=O)Oc1ccccc1C(=O)O
Edited SMILES: CC(=O)Oc1ccc(C(=O)O)cc1C(=O)O
Property Changes:
LogP: 1.31 → 1.01 (-0.30, more soluble)
QED: 0.55 → 0.59 (+0.04, better drug-likeness)
SA: 1.58 → 1.81 (+0.23, slightly harder to synthesize)
See Also
examples/basic_example.py- Full runnable example scriptexamples/solubility_optimization.py- Solubility-focused workflowreferences/troubleshooting.md- Detailed error handlingreferences/advanced.md- Advanced prompt engineering tips
Weekly Installs
1
Repository
pharmolix/openbiomedGitHub Stars
1.0K
First Seen
11 days ago
Security Audits
Installed on
mcpjam1
claude-code1
kilo1
junie1
windsurf1
zencoder1