chembl-database
ChEMBL Database
ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.
Use Cases
- Find potent inhibitors for a protein target
- Search for compounds similar to a known drug
- Retrieve drug mechanism of action data
- Filter compounds by molecular properties (Lipinski, etc.)
- Export bioactivity data for ML or analysis
Installation
uv pip install chembl_webresource_client
Basic Usage
from chembl_webresource_client.new_client import new_client
# Fetch compound by identifier
mol = new_client.molecule.get('CHEMBL192')
# Retrieve target data
tgt = new_client.target.get('CHEMBL203')
# Query activity measurements
acts = new_client.activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=50
)
Available Endpoints
| Resource | Description |
|---|---|
molecule |
Compound structures and properties |
target |
Biological targets |
activity |
Bioassay measurements |
assay |
Experimental protocols |
drug |
Approved drug data |
mechanism |
Drug mechanisms of action |
drug_indication |
Therapeutic indications |
similarity |
Structure similarity search |
substructure |
Substructure search |
document |
Literature references |
cell_line |
Cell line data |
protein_class |
Protein classifications |
image |
SVG molecular images |
Query Operators
The client uses Django-style filtering:
| Operator | Function | Example |
|---|---|---|
__exact |
Exact match | pref_name__exact='Aspirin' |
__icontains |
Case-insensitive substring | pref_name__icontains='kinase' |
__lte, __gte |
Less/greater than or equal | standard_value__lte=10 |
__lt, __gt |
Less/greater than | pchembl_value__gt=7 |
__range |
Value within range | alogp__range=[-1, 5] |
__in |
Value in list | target_chembl_id__in=['CHEMBL203'] |
__isnull |
Null check | pchembl_value__isnull=False |
__startswith |
Prefix match | pref_name__startswith='Proto' |
__regex |
Regular expression | pref_name__regex='^[A-Z]{3}' |
Common Workflows
Find Target Inhibitors
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
# Get potent BRAF inhibitors (IC50 < 100 nM)
braf_hits = activity.filter(
target_chembl_id='CHEMBL5145',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
for hit in braf_hits:
print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")
Search by Target Name
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activity
# Find CDK targets
cdk_targets = target.filter(
pref_name__icontains='cyclin-dependent kinase',
target_type='SINGLE PROTEIN'
)
target_ids = [t['target_chembl_id'] for t in cdk_targets]
# Get activities for these targets
cdk_activities = activity.filter(
target_chembl_id__in=target_ids[:5],
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
Structure Similarity Search
from chembl_webresource_client.new_client import new_client
sim = new_client.similarity
# Find molecules 80% similar to ibuprofen
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)
for m in matches:
print(f"{m['molecule_chembl_id']}: {m['similarity']}%")
Substructure Search
from chembl_webresource_client.new_client import new_client
sub = new_client.substructure
# Find compounds with benzimidazole core
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)
Filter by Molecular Properties
from chembl_webresource_client.new_client import new_client
mol = new_client.molecule
# Lipinski-compliant fragments
fragments = mol.filter(
molecule_properties__mw_freebase__lte=300,
molecule_properties__alogp__lte=3,
molecule_properties__hbd__lte=3,
molecule_properties__hba__lte=3
)
Drug Mechanisms of Action
from chembl_webresource_client.new_client import new_client
mech = new_client.mechanism
drug_ind = new_client.drug_indication
# Get mechanism of metformin
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)
for m in mechanisms:
print(f"Target: {m['target_chembl_id']}")
print(f"Action: {m['action_type']}")
# Get approved indications
indications = drug_ind.filter(molecule_chembl_id=metformin_id)
Generate Molecule Images
from chembl_webresource_client.new_client import new_client
img = new_client.image
# Get SVG of caffeine
caffeine_svg = img.get('CHEMBL113')
with open('caffeine.svg', 'w') as f:
f.write(caffeine_svg)
Key Response Fields
Molecule Properties
| Field | Description |
|---|---|
molecule_chembl_id |
ChEMBL identifier |
pref_name |
Preferred name |
molecule_structures.canonical_smiles |
SMILES string |
molecule_structures.standard_inchi_key |
InChI key |
molecule_properties.mw_freebase |
Molecular weight |
molecule_properties.alogp |
Calculated LogP |
molecule_properties.hba / hbd |
H-bond acceptors/donors |
molecule_properties.psa |
Polar surface area |
molecule_properties.rtb |
Rotatable bonds |
molecule_properties.num_ro5_violations |
Lipinski violations |
molecule_properties.qed_weighted |
QED drug-likeness |
Activity Fields
| Field | Description |
|---|---|
molecule_chembl_id |
Compound ID |
target_chembl_id |
Target ID |
standard_type |
Measurement type (IC50, Ki, EC50) |
standard_value |
Numeric value |
standard_units |
Units (nM, uM) |
pchembl_value |
Normalized -log10 value |
data_validity_comment |
Quality flag |
potential_duplicate |
Duplicate indicator |
Target Fields
| Field | Description |
|---|---|
target_chembl_id |
ChEMBL target ID |
pref_name |
Preferred name |
target_type |
SINGLE PROTEIN, PROTEIN COMPLEX, etc. |
organism |
Species |
Mechanism Fields
| Field | Description |
|---|---|
molecule_chembl_id |
Drug ID |
target_chembl_id |
Target ID |
mechanism_of_action |
Description |
action_type |
INHIBITOR, AGONIST, ANTAGONIST, etc. |
Export to DataFrame
import pandas as pd
from chembl_webresource_client.new_client import new_client
activity = new_client.activity
results = activity.filter(
target_chembl_id='CHEMBL279',
standard_type='Ki',
pchembl_value__isnull=False
)
df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)
Configuration
from chembl_webresource_client.settings import Settings
cfg = Settings.Instance()
cfg.CACHING = True # Enable response caching
cfg.CACHE_EXPIRE = 43200 # Cache TTL (12 hours)
cfg.TIMEOUT = 60 # Request timeout
cfg.TOTAL_RETRIES = 5 # Retry attempts
Data Quality Notes
- ChEMBL data is manually curated but verify
data_validity_commentfields - Check
potential_duplicateflags when aggregating results - Use
pchembl_valuefor normalized comparisons across assay types - Activity values without
standard_unitsshould be used cautiously
Best Practices
- Use caching - Reduces API load and improves performance
- Filter early - Apply filters to reduce data transfer
- Limit results - Use
[:n]slicing for testing - Check validity - Inspect
data_validity_commentfields - Use pchembl_value - Normalized values enable cross-assay comparison
- Batch queries - Use
__inoperator for multiple IDs
Error Handling
from chembl_webresource_client.new_client import new_client
mol = new_client.molecule
try:
result = mol.get('INVALID_ID')
except Exception as e:
if '404' in str(e):
print("Compound not found")
elif '503' in str(e):
print("Service unavailable - retry later")
else:
raise
External Links
- ChEMBL: https://www.ebi.ac.uk/chembl/
- API Documentation: https://chembl.gitbook.io/chembl-interface-documentation
- Python Client: https://github.com/chembl/chembl_webresource_client
More from aminoanalytica/amina-skills
pymol
Control PyMOL molecular visualization through Claude Code. Use when asked to "visualize protein", "render structure", "show cartoon", "color by chain", "ray trace", "set up pymol", "install pymol", or work with molecular graphics. Handles setup, visualization commands, and publication-quality figure generation.
39uniprot-database
Query and retrieve protein sequences, annotations, and functional data from UniProt. Supports text search, ID mapping between databases, batch downloads, and access to Swiss-Prot (reviewed) and TrEMBL (predicted) entries.
29rdkit
Python cheminformatics library for molecular manipulation and analysis. Parse SMILES/SDF/MOL formats, compute descriptors (MW, LogP, TPSA), generate fingerprints (Morgan, MACCS), perform substructure queries with SMARTS, create 2D/3D geometries, calculate similarity, and run chemical reactions.
28biorxiv-database
Search and retrieve preprints from bioRxiv. Use when asked to "search bioRxiv", "find preprints", "look up bioRxiv papers", or retrieve life sciences literature.
28scikit-bio
Python bioinformatics library for sequence manipulation, alignments, phylogenetics, diversity metrics (Shannon, UniFrac), ordination (PCoA, CCA), statistical tests (PERMANOVA, Mantel), and biological file format I/O.
28pdb-database
Query and retrieve protein/nucleic acid structures from RCSB PDB. Use when you need to search the PDB database for structures or metadata. Supports text, sequence, and structure-based searches, coordinate downloads, and metadata retrieval for structural biology workflows.
28