chemistry-rdkit
RDKit Cheminformatics Best Practice
Molecular I/O
- Create molecules from SMILES:
mol = Chem.MolFromSmiles('CCO') - Always check for None:
MolFromSmilesreturns None on invalid input - Convert to canonical SMILES:
Chem.MolToSmiles(mol) - Read SDF files:
suppl = Chem.SDMolSupplier('file.sdf') - Read SMILES files:
suppl = Chem.SmilesMolSupplier('file.smi') - Write molecules:
writer = Chem.SDWriter('output.sdf')
Molecular Descriptors
- Molecular weight:
Descriptors.MolWt(mol) - LogP (lipophilicity):
Descriptors.MolLogP(mol) - TPSA (polar surface area):
Descriptors.TPSA(mol) - H-bond donors/acceptors:
Descriptors.NumHDonors(mol),Descriptors.NumHAcceptors(mol) - Rotatable bonds:
Descriptors.NumRotatableBonds(mol) - Lipinski Rule of 5: MW <= 500, LogP <= 5, HBD <= 5, HBA <= 10
Fingerprints and Similarity
- Morgan (circular) fingerprints:
AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048) - RDKit fingerprints:
Chem.RDKFingerprint(mol) - MACCS keys:
MACCSkeys.GenMACCSKeys(mol) - Tanimoto similarity:
DataStructs.TanimotoSimilarity(fp1, fp2) - Use radius=2 (ECFP4 equivalent) as default for most applications
- For virtual screening, Tanimoto > 0.7 suggests structural similarity
Substructure Search
- SMARTS patterns:
pattern = Chem.MolFromSmarts('[OH]') - Check match:
mol.HasSubstructMatch(pattern) - Get all matches:
mol.GetSubstructMatches(pattern) - Common SMARTS:
[#6](=O)[OH](carboxylic acid),[NH2](primary amine) - Filter compound libraries by functional group presence
Property Calculation Patterns
- Batch processing: iterate over SDMolSupplier, skip None entries
- Use
Chem.Descriptors.descListfor all available descriptors - For ADMET filtering, calculate Lipinski, Veber, and PAINS filters
- Generate 3D coordinates:
AllChem.EmbedMolecule(mol, AllChem.ETKDG()) - Minimize energy:
AllChem.MMFFOptimizeMolecule(mol)
Common Pitfalls
- Always sanitize molecules (default behavior) — disable only when needed
- Add hydrogens explicitly for 3D work:
Chem.AddHs(mol) - Handle stereochemistry: use
Chem.AssignStereochemistry(mol) - Large SDF files: use
ForwardSDMolSupplierfor memory efficiency - Kekulization errors usually indicate invalid SMILES input
More from aiming-lab/autoresearchclaw
scientific-writing
Academic manuscript writing with IMRAD structure, citation formatting, and reporting guidelines. Use when drafting or revising research papers.
10hypothesis-formulation
Structured scientific hypothesis generation from observations. Use when formulating testable hypotheses, competing explanations, or experimental predictions.
9scientific-visualization
Publication-ready scientific figure design with matplotlib and seaborn. Use when creating journal submission figures with proper formatting, accessibility, and statistical annotations.
9literature-search
Systematic literature review methodology including search strategy, screening, and synthesis. Use when conducting literature reviews or writing background sections.
9statistical-reporting
Statistical test selection, assumption checking, and APA-formatted reporting. Use when analyzing experimental results or writing results sections.
9a-evolve
>
8