Recently ChEMBL 28 was released. It’s good news for chemoinformaticitan and time to update your chembldb ;)
Of course I did it. At first I tried to build postgresql chembl28 on my main conda env but it was difficult to install rdkit-postgresql due to some package confliction. So I made clean environment for postgresql/rdkit and install chembldb in it.
The procedure is below. At first, make new env for db and download chembl28.
$ conda create -n rdkit-postgres python=3.7
(rdkit-postgres) $ conda activate rdkit-postgres
(rdkit-postgres) $ conda install -c conda-forge mamba
(rdkit-postgres) $ mamba install -c conda-forge postgresql==12.3
(rdkit-postgres) $ mamba install -c rdkit rdkit-postgresql
(rdkit-postgres) $ wget ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_28_postgresql.tar.gz
(rdkit-postgres) $ tar vxfz chembl_28_postgresql.tar.gz
It’s important to know that you don’t need to type ‘mamba install -c conda-forge rdkit’. Python library rdkit is not required just install rdkit-postgresql is required.
Then initdb of postgresq and start. And create chembl_28 database.
(rdkit-postgres) $ pg_ctl initdb -D ~/pgdata
(rdkit-postgres) $ pg_ctl -D ~/pgdata start
(rdkit-postgres) $ psql postgres
# CREATE DATABASE chembl_28;
# \q
Now I made chembl_28 db. Let’s install data into the DB with following command.
(rdkit-postgres) $ cd chembl_28/chembl_postgresql
(rdkit-postgres) $ pg_restore --no-owner -h localhost -U iwatobipen -d chembl_28 chembl_28_postgresql.dmp
After waiting 10~20 mins, the install procedure will be finished.
Then I added rdkit extension to the DB. The details are described following URL.
https://www.rdkit.org/docs/Cartridge.html
(rdkit-postgres) $ psql chembl_28
# create extension if not exists rdkit;
# create schema rdk;
# select * into rdk.mols from (select molregno,mol_from_ctab(molfile::cstring) m from compound_structures) tmp where m is not null;
# create index molidx on rdk.mols using gist(m);
# alter table rdk.mols add primary key (molregno);
# select molregno,torsionbv_fp(m) as torsionbv,morganbv_fp(m) as mfp2,featmorganbv_fp(m) as ffp2 into rdk.fps from rdk.mols;
# create index fps_ttbv_idx on rdk.fps using gist(torsionbv);
# create index fps_mfp2_idx on rdk.fps using gist(mfp2);
# create index fps_ffp2_idx on rdk.fps using gist(ffp2);
# alter table rdk.fps add primary key (molregno);
The code will take 30 mins or more but after doing that the db will be able to search with rdkit-function.
And also if pychembldb is installed, you can make query as more pythonic. Of course the db can access from different conda env which is installed postgresql.
(rdkit-postgres) $ conda activate hoge
(hoge) $ ipython
>>>following code comes from ipython console.
from pychembldb import *
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D, rdDepictor
rdDepictor.SetPreferCoordGen(True)
mols = []
for target in chembldb.query(Target).filter_by(pref_name='SARS-CoV-2'):
for assay in target.assays:
for act in assay.activities:
try:
if act.standard_type == 'IC50' and act.standard_relation == '=':
mol = Chem.MolFromSmiles(act.compound.molecule.structure.canonical_smiles, act.compound.molecule.)
mol.SetProp('IC50', str(act.value))
mol.SetProp('pChembl_value', str(act.pchembl_value))
mols.append(mol)
except: pass
Draw.MolsToGridImage(mols[:10], molsPerRow=3, subImgSize=(300,100))

Also I made pychembldb with razi integrated version. https://github.com/iwatobipen/pychembldb/tree/raziintegration
This package can search molecule with rdkit cartridge functionality. In summary ChEMBL + RDKIT cartridge is really useful and powerful tool for chemoinformatics.
Any comments, suggestions are greatly appreciated.
Hey pen :) Your first code-block includes the following 2 lines –
(rdkit-postgres) $ mamba install -c conda-forge postgresql==12.3
(rdkit-postgres) $ mamba install -c rdkit rdkit-postgresql
I’m wondering why you install postgresql 12.3 from conda-forge, then install rdkit-postgresql, which has an instance of postgresql in it anyway ? I haven’t tried this, so I may be missing something ! I usually just install rdkit-postgresql and load chembl into it.
Thanks for your blog ! I always enjoy it :)
Thanks for your comment.
I installed postgresql via conda-forge because I would like to make DB environment separately and use specific version of postgresql.
Thanks,