Install ChEMBL28 & rdkit cartridge #chemoinformatics #RDKit

Recently ChEMBL 28 was released. It’s good news for chemoinformaticitan and time to update your chembldb ;)

Of course I did it. At first I tried to build postgresql chembl28 on my main conda env but it was difficult to install rdkit-postgresql due to some package confliction. So I made clean environment for postgresql/rdkit and install chembldb in it.

The procedure is below. At first, make new env for db and download chembl28.

$ conda create -n rdkit-postgres python=3.7
(rdkit-postgres) $ conda activate rdkit-postgres
(rdkit-postgres) $ conda install -c conda-forge mamba
(rdkit-postgres) $ mamba install -c conda-forge postgresql==12.3
(rdkit-postgres) $ mamba install -c rdkit rdkit-postgresql
(rdkit-postgres) $ wget ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_28_postgresql.tar.gz
(rdkit-postgres) $ tar vxfz chembl_28_postgresql.tar.gz

It’s important to know that you don’t need to type ‘mamba install -c conda-forge rdkit’. Python library rdkit is not required just install rdkit-postgresql is required.

Then initdb of postgresq and start. And create chembl_28 database.

(rdkit-postgres) $ pg_ctl initdb -D ~/pgdata
(rdkit-postgres) $ pg_ctl -D ~/pgdata start
(rdkit-postgres) $ psql postgres
# CREATE DATABASE chembl_28;
# \q

Now I made chembl_28 db. Let’s install data into the DB with following command.

(rdkit-postgres) $ cd chembl_28/chembl_postgresql
(rdkit-postgres) $ pg_restore --no-owner -h localhost -U iwatobipen -d chembl_28 chembl_28_postgresql.dmp

After waiting 10~20 mins, the install procedure will be finished.

Then I added rdkit extension to the DB. The details are described following URL.
https://www.rdkit.org/docs/Cartridge.html

(rdkit-postgres) $ psql chembl_28
# create extension if not exists rdkit;
# create schema rdk;
# select * into rdk.mols from (select molregno,mol_from_ctab(molfile::cstring) m  from compound_structures) tmp where m is not null;
# create index molidx on rdk.mols using gist(m);
# alter table rdk.mols add primary key (molregno);
# select molregno,torsionbv_fp(m) as torsionbv,morganbv_fp(m) as mfp2,featmorganbv_fp(m) as ffp2 into rdk.fps from rdk.mols;
# create index fps_ttbv_idx on rdk.fps using gist(torsionbv);
# create index fps_mfp2_idx on rdk.fps using gist(mfp2);
# create index fps_ffp2_idx on rdk.fps using gist(ffp2);
# alter table rdk.fps add primary key (molregno);

The code will take 30 mins or more but after doing that the db will be able to search with rdkit-function.

And also if pychembldb is installed, you can make query as more pythonic. Of course the db can access from different conda env which is installed postgresql.

(rdkit-postgres) $ conda activate hoge
(hoge) $ ipython

>>>following code comes from ipython console.

from pychembldb import *
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D, rdDepictor
rdDepictor.SetPreferCoordGen(True)
mols = []
for target in chembldb.query(Target).filter_by(pref_name='SARS-CoV-2'):
    for assay in target.assays:
        for act in assay.activities:
            try:
                if act.standard_type == 'IC50' and act.standard_relation == '=':
                    mol = Chem.MolFromSmiles(act.compound.molecule.structure.canonical_smiles, act.compound.molecule.)
                    mol.SetProp('IC50', str(act.value))
                    mol.SetProp('pChembl_value', str(act.pchembl_value))
                    mols.append(mol)
            except: pass
Draw.MolsToGridImage(mols[:10], molsPerRow=3, subImgSize=(300,100))

Also I made pychembldb with razi integrated version. https://github.com/iwatobipen/pychembldb/tree/raziintegration

This package can search molecule with rdkit cartridge functionality. In summary ChEMBL + RDKIT cartridge is really useful and powerful tool for chemoinformatics.

Any comments, suggestions are greatly appreciated.

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

2 thoughts on “Install ChEMBL28 & rdkit cartridge #chemoinformatics #RDKit

  1. Hey pen :) Your first code-block includes the following 2 lines –

    (rdkit-postgres) $ mamba install -c conda-forge postgresql==12.3
    (rdkit-postgres) $ mamba install -c rdkit rdkit-postgresql

    I’m wondering why you install postgresql 12.3 from conda-forge, then install rdkit-postgresql, which has an instance of postgresql in it anyway ? I haven’t tried this, so I may be missing something ! I usually just install rdkit-postgresql and load chembl into it.

    Thanks for your blog ! I always enjoy it :)

    1. Thanks for your comment.
      I installed postgresql via conda-forge because I would like to make DB environment separately and use specific version of postgresql.
      Thanks,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: