Modify version of RDKit/Contrib/pzc #chemoinformatics #RDKit #ChEMBL #pychembldb

In 2013, Dr. Paul Czodrowski published nice article in JCIM. And the code is available from github. And also Paul contributed rdkit with the code. https://github.com/pzc/rdkit/tree/master/Contrib/pzc

This code can build model from chembl activity data set with given accession number as a query. I had interest the code however, the code is old. So it doesn’t support python3 and unfortunately ChEMBL web API was changed so the code is no longer available.

So I tried to convert the code to support python3 and modified some methods which from retrieving data via web API to retrieving data via pychembldb. pychembldb is an useful package for chemoinforamtician which is a Python interface for ChEMBLdb. The main core is sqlalchemy. It means that you don’t need to write SQL query to retrieve data from ChEMBL DB.

Here is a my modified version of PZC.
https://github.com/iwatobipen/pzc_pychembldb

To get data and build model, it’s only required one liner command with accession number of a target which you would like to build predictive model.

Pzc try to make model 10 times and save all models and the make HTML report.

Here is an example.

$ python p_con_pychembldb.py --accession P43088 --uniq
load rdk schema
gather Data for Accession-ID 'P43088'
100/### >|||||||valerror: Smiles

valerror: cansmirdkit
valerror: Smiles
valerror: cansmirdkit
**** snip *****
valerror: Smiles
valerror: cansmirdkit
/home/iwatobipen/miniconda3/envs/chemoinfo/lib/python3.7/site-packages/sklearn/metrics/_classification.py:846: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
[0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1]
[0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1]
[[ 6  1]
 [ 1 28]]
**** snip ****
/home/iwatobipen/miniconda3/envs/chemoinfo/lib/python3.7/site-packages/sklearn/metrics/_classification.py:846: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
[1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 1 1]
[0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 1 1]
[[ 6  3]
 [ 0 27]]
act/inact from TL's 71/17
Model 0 saved into File: P43088_5000nm_model_0.pkl
Model 1 saved into File: P43088_5000nm_model_1.pkl
Model 2 saved into File: P43088_5000nm_model_2.pkl
Model 3 saved into File: P43088_5000nm_model_3.pkl
Model 4 saved into File: P43088_5000nm_model_4.pkl
Model 5 saved into File: P43088_5000nm_model_5.pkl
Model 6 saved into File: P43088_5000nm_model_6.pkl
Model 7 saved into File: P43088_5000nm_model_7.pkl
Model 8 saved into File: P43088_5000nm_model_8.pkl
Model 9 saved into File: P43088_5000nm_model_9.pkl
Model 0 active: 71	inactive: 17
Model 1 active: 71	inactive: 17
Model 2 active: 70	inactive: 18
Model 3 active: 74	inactive: 14
Model 4 active: 75	inactive: 13
Model 5 active: 73	inactive: 15
Model 6 active: 73	inactive: 15
Model 7 active: 69	inactive: 19
Model 8 active: 74	inactive: 14
Model 9 active: 74	inactive: 14

After running the code, I could get 10 models and html report. Here is a report.

The report seems easy to understand all experiments. And of course user can modify model and protocols. I think it is useful not only building predictive model but also data retriever from ChEMBL (step0).

Some packages are required to use the code but I think it will be interesting parts of chemoinformatics ;)

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: