chembl-database

SKILL.md

ChEMBL Database

ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.

Use Cases

  • Find potent inhibitors for a protein target
  • Search for compounds similar to a known drug
  • Retrieve drug mechanism of action data
  • Filter compounds by molecular properties (Lipinski, etc.)
  • Export bioactivity data for ML or analysis

Installation

uv pip install chembl_webresource_client

Basic Usage

from chembl_webresource_client.new_client import new_client

# Fetch compound by identifier
mol = new_client.molecule.get('CHEMBL192')

# Retrieve target data
tgt = new_client.target.get('CHEMBL203')

# Query activity measurements
acts = new_client.activity.filter(
    target_chembl_id='CHEMBL203',
    standard_type='IC50',
    standard_value__lte=50
)

Available Endpoints

Resource Description
molecule Compound structures and properties
target Biological targets
activity Bioassay measurements
assay Experimental protocols
drug Approved drug data
mechanism Drug mechanisms of action
drug_indication Therapeutic indications
similarity Structure similarity search
substructure Substructure search
document Literature references
cell_line Cell line data
protein_class Protein classifications
image SVG molecular images

Query Operators

The client uses Django-style filtering:

Operator Function Example
__exact Exact match pref_name__exact='Aspirin'
__icontains Case-insensitive substring pref_name__icontains='kinase'
__lte, __gte Less/greater than or equal standard_value__lte=10
__lt, __gt Less/greater than pchembl_value__gt=7
__range Value within range alogp__range=[-1, 5]
__in Value in list target_chembl_id__in=['CHEMBL203']
__isnull Null check pchembl_value__isnull=False
__startswith Prefix match pref_name__startswith='Proto'
__regex Regular expression pref_name__regex='^[A-Z]{3}'

Common Workflows

Find Target Inhibitors

from chembl_webresource_client.new_client import new_client

activity = new_client.activity

# Get potent BRAF inhibitors (IC50 < 100 nM)
braf_hits = activity.filter(
    target_chembl_id='CHEMBL5145',
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

for hit in braf_hits:
    print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")

Search by Target Name

from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

# Find CDK targets
cdk_targets = target.filter(
    pref_name__icontains='cyclin-dependent kinase',
    target_type='SINGLE PROTEIN'
)

target_ids = [t['target_chembl_id'] for t in cdk_targets]

# Get activities for these targets
cdk_activities = activity.filter(
    target_chembl_id__in=target_ids[:5],
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

Structure Similarity Search

from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

# Find molecules 80% similar to ibuprofen
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)

for m in matches:
    print(f"{m['molecule_chembl_id']}: {m['similarity']}%")

Substructure Search

from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

# Find compounds with benzimidazole core
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)

Filter by Molecular Properties

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

# Lipinski-compliant fragments
fragments = mol.filter(
    molecule_properties__mw_freebase__lte=300,
    molecule_properties__alogp__lte=3,
    molecule_properties__hbd__lte=3,
    molecule_properties__hba__lte=3
)

Drug Mechanisms of Action

from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

# Get mechanism of metformin
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)

for m in mechanisms:
    print(f"Target: {m['target_chembl_id']}")
    print(f"Action: {m['action_type']}")

# Get approved indications
indications = drug_ind.filter(molecule_chembl_id=metformin_id)

Generate Molecule Images

from chembl_webresource_client.new_client import new_client

img = new_client.image

# Get SVG of caffeine
caffeine_svg = img.get('CHEMBL113')

with open('caffeine.svg', 'w') as f:
    f.write(caffeine_svg)

Key Response Fields

Molecule Properties

Field Description
molecule_chembl_id ChEMBL identifier
pref_name Preferred name
molecule_structures.canonical_smiles SMILES string
molecule_structures.standard_inchi_key InChI key
molecule_properties.mw_freebase Molecular weight
molecule_properties.alogp Calculated LogP
molecule_properties.hba / hbd H-bond acceptors/donors
molecule_properties.psa Polar surface area
molecule_properties.rtb Rotatable bonds
molecule_properties.num_ro5_violations Lipinski violations
molecule_properties.qed_weighted QED drug-likeness

Activity Fields

Field Description
molecule_chembl_id Compound ID
target_chembl_id Target ID
standard_type Measurement type (IC50, Ki, EC50)
standard_value Numeric value
standard_units Units (nM, uM)
pchembl_value Normalized -log10 value
data_validity_comment Quality flag
potential_duplicate Duplicate indicator

Target Fields

Field Description
target_chembl_id ChEMBL target ID
pref_name Preferred name
target_type SINGLE PROTEIN, PROTEIN COMPLEX, etc.
organism Species

Mechanism Fields

Field Description
molecule_chembl_id Drug ID
target_chembl_id Target ID
mechanism_of_action Description
action_type INHIBITOR, AGONIST, ANTAGONIST, etc.

Export to DataFrame

import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

Configuration

from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # Enable response caching
cfg.CACHE_EXPIRE = 43200     # Cache TTL (12 hours)
cfg.TIMEOUT = 60             # Request timeout
cfg.TOTAL_RETRIES = 5        # Retry attempts

Data Quality Notes

  • ChEMBL data is manually curated but verify data_validity_comment fields
  • Check potential_duplicate flags when aggregating results
  • Use pchembl_value for normalized comparisons across assay types
  • Activity values without standard_units should be used cautiously

Best Practices

  1. Use caching - Reduces API load and improves performance
  2. Filter early - Apply filters to reduce data transfer
  3. Limit results - Use [:n] slicing for testing
  4. Check validity - Inspect data_validity_comment fields
  5. Use pchembl_value - Normalized values enable cross-assay comparison
  6. Batch queries - Use __in operator for multiple IDs

Error Handling

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("Compound not found")
    elif '503' in str(e):
        print("Service unavailable - retry later")
    else:
        raise

External Links

Weekly Installs
26
First Seen
Feb 25, 2026
Installed on
mcpjam26
claude-code26
replit26
junie26
windsurf26
zencoder26