chembl-database

Installation

SKILL.md

ChEMBL Database

ChEMBL is the European Bioinformatics Institute's repository of bioactive compound data, containing over 2 million compounds, 19 million bioactivity measurements, and 13,000+ drug targets.

Use Cases

Find potent inhibitors for a protein target
Search for compounds similar to a known drug
Retrieve drug mechanism of action data
Filter compounds by molecular properties (Lipinski, etc.)
Export bioactivity data for ML or analysis

Installation

uv pip install chembl_webresource_client

Basic Usage

from chembl_webresource_client.new_client import new_client

# Fetch compound by identifier
mol = new_client.molecule.get('CHEMBL192')

# Retrieve target data
tgt = new_client.target.get('CHEMBL203')

# Query activity measurements
acts = new_client.activity.filter(
    target_chembl_id='CHEMBL203',
    standard_type='IC50',
    standard_value__lte=50
)

Available Endpoints

Resource	Description
`molecule`	Compound structures and properties
`target`	Biological targets
`activity`	Bioassay measurements
`assay`	Experimental protocols
`drug`	Approved drug data
`mechanism`	Drug mechanisms of action
`drug_indication`	Therapeutic indications
`similarity`	Structure similarity search
`substructure`	Substructure search
`document`	Literature references
`cell_line`	Cell line data
`protein_class`	Protein classifications
`image`	SVG molecular images

Query Operators

The client uses Django-style filtering:

Operator	Function	Example
`__exact`	Exact match	`pref_name__exact='Aspirin'`
`__icontains`	Case-insensitive substring	`pref_name__icontains='kinase'`
`__lte`, `__gte`	Less/greater than or equal	`standard_value__lte=10`
`__lt`, `__gt`	Less/greater than	`pchembl_value__gt=7`
`__range`	Value within range	`alogp__range=[-1, 5]`
`__in`	Value in list	`target_chembl_id__in=['CHEMBL203']`
`__isnull`	Null check	`pchembl_value__isnull=False`
`__startswith`	Prefix match	`pref_name__startswith='Proto'`
`__regex`	Regular expression	`pref_name__regex='^[A-Z]{3}'`

Common Workflows

Find Target Inhibitors

from chembl_webresource_client.new_client import new_client

activity = new_client.activity

# Get potent BRAF inhibitors (IC50 < 100 nM)
braf_hits = activity.filter(
    target_chembl_id='CHEMBL5145',
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

for hit in braf_hits:
    print(f"{hit['molecule_chembl_id']}: {hit['standard_value']} nM")

Search by Target Name

from chembl_webresource_client.new_client import new_client

target = new_client.target
activity = new_client.activity

# Find CDK targets
cdk_targets = target.filter(
    pref_name__icontains='cyclin-dependent kinase',
    target_type='SINGLE PROTEIN'
)

target_ids = [t['target_chembl_id'] for t in cdk_targets]

# Get activities for these targets
cdk_activities = activity.filter(
    target_chembl_id__in=target_ids[:5],
    standard_type='IC50',
    standard_value__lte=100,
    standard_units='nM'
)

Structure Similarity Search

from chembl_webresource_client.new_client import new_client

sim = new_client.similarity

# Find molecules 80% similar to ibuprofen
ibuprofen_smiles = 'CC(C)Cc1ccc(cc1)C(C)C(=O)O'
matches = sim.filter(smiles=ibuprofen_smiles, similarity=80)

for m in matches:
    print(f"{m['molecule_chembl_id']}: {m['similarity']}%")

Substructure Search

from chembl_webresource_client.new_client import new_client

sub = new_client.substructure

# Find compounds with benzimidazole core
benzimidazole = 'c1ccc2[nH]cnc2c1'
compounds = sub.filter(smiles=benzimidazole)

Filter by Molecular Properties

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

# Lipinski-compliant fragments
fragments = mol.filter(
    molecule_properties__mw_freebase__lte=300,
    molecule_properties__alogp__lte=3,
    molecule_properties__hbd__lte=3,
    molecule_properties__hba__lte=3
)

Drug Mechanisms of Action

from chembl_webresource_client.new_client import new_client

mech = new_client.mechanism
drug_ind = new_client.drug_indication

# Get mechanism of metformin
metformin_id = 'CHEMBL1431'
mechanisms = mech.filter(molecule_chembl_id=metformin_id)

for m in mechanisms:
    print(f"Target: {m['target_chembl_id']}")
    print(f"Action: {m['action_type']}")

# Get approved indications
indications = drug_ind.filter(molecule_chembl_id=metformin_id)

Generate Molecule Images

from chembl_webresource_client.new_client import new_client

img = new_client.image

# Get SVG of caffeine
caffeine_svg = img.get('CHEMBL113')

with open('caffeine.svg', 'w') as f:
    f.write(caffeine_svg)

Key Response Fields

Molecule Properties

Field	Description
`molecule_chembl_id`	ChEMBL identifier
`pref_name`	Preferred name
`molecule_structures.canonical_smiles`	SMILES string
`molecule_structures.standard_inchi_key`	InChI key
`molecule_properties.mw_freebase`	Molecular weight
`molecule_properties.alogp`	Calculated LogP
`molecule_properties.hba` / `hbd`	H-bond acceptors/donors
`molecule_properties.psa`	Polar surface area
`molecule_properties.rtb`	Rotatable bonds
`molecule_properties.num_ro5_violations`	Lipinski violations
`molecule_properties.qed_weighted`	QED drug-likeness

Activity Fields

Field	Description
`molecule_chembl_id`	Compound ID
`target_chembl_id`	Target ID
`standard_type`	Measurement type (IC50, Ki, EC50)
`standard_value`	Numeric value
`standard_units`	Units (nM, uM)
`pchembl_value`	Normalized -log10 value
`data_validity_comment`	Quality flag
`potential_duplicate`	Duplicate indicator

Target Fields

Field	Description
`target_chembl_id`	ChEMBL target ID
`pref_name`	Preferred name
`target_type`	SINGLE PROTEIN, PROTEIN COMPLEX, etc.
`organism`	Species

Mechanism Fields

Field	Description
`molecule_chembl_id`	Drug ID
`target_chembl_id`	Target ID
`mechanism_of_action`	Description
`action_type`	INHIBITOR, AGONIST, ANTAGONIST, etc.

Export to DataFrame

import pandas as pd
from chembl_webresource_client.new_client import new_client

activity = new_client.activity

results = activity.filter(
    target_chembl_id='CHEMBL279',
    standard_type='Ki',
    pchembl_value__isnull=False
)

df = pd.DataFrame(list(results))
df.to_csv('dopamine_d2_ligands.csv', index=False)

Configuration

from chembl_webresource_client.settings import Settings

cfg = Settings.Instance()

cfg.CACHING = True           # Enable response caching
cfg.CACHE_EXPIRE = 43200     # Cache TTL (12 hours)
cfg.TIMEOUT = 60             # Request timeout
cfg.TOTAL_RETRIES = 5        # Retry attempts

Data Quality Notes

ChEMBL data is manually curated but verify data_validity_comment fields
Check potential_duplicate flags when aggregating results
Use pchembl_value for normalized comparisons across assay types
Activity values without standard_units should be used cautiously

Best Practices

Use caching - Reduces API load and improves performance
Filter early - Apply filters to reduce data transfer
Limit results - Use [:n] slicing for testing
Check validity - Inspect data_validity_comment fields
Use pchembl_value - Normalized values enable cross-assay comparison
Batch queries - Use __in operator for multiple IDs

Error Handling

from chembl_webresource_client.new_client import new_client

mol = new_client.molecule

try:
    result = mol.get('INVALID_ID')
except Exception as e:
    if '404' in str(e):
        print("Compound not found")
    elif '503' in str(e):
        print("Service unavailable - retry later")
    else:
        raise

External Links

ChEMBL: https://www.ebi.ac.uk/chembl/
API Documentation: https://chembl.gitbook.io/chembl-interface-documentation
Python Client: https://github.com/chembl/chembl_webresource_client

Related skills

More from aminoanalytica/amina-skills

Installs

Repository

aminoanalytica/…a-skills

First Seen

Feb 25, 2026

Security Audits

SnykPass

chembl-database

ChEMBL Database

Use Cases

Installation

Basic Usage

Available Endpoints

Query Operators

Common Workflows

Find Target Inhibitors

Search by Target Name

Structure Similarity Search

Substructure Search

Filter by Molecular Properties

Drug Mechanisms of Action

Generate Molecule Images

Key Response Fields

Molecule Properties

Activity Fields

Target Fields

Mechanism Fields

Export to DataFrame

Configuration

Data Quality Notes

Best Practices

Error Handling

External Links

More from aminoanalytica/amina-skills

pymol

uniprot-database

rdkit

biorxiv-database

scikit-bio

pdb-database