Make curated Kinase inhibitor dataset from ChEMBL30 #memo #chemoinformatcs

Kinase is one of the attractive target for drug discovery. So there are lots of data not only protein but also inhibitor available.

ChEMBL is useful public data source for Kinase inhibitor data however to use the data, we need to retrieve data from the DB and curate it. Of course there are commercial database focused on Kinase, but not freely available. I would like to use data conveniently in my hobby ;).

If you think so too, I would like to check following repository openkinome/kinodata. URL is below

https://github.com/openkinome/kinodata

The repository provides useful notebook for getting kinase related dataset from ChEMBL29. Fortunately we can use ChEMBL30! So I modified the notebook for ChEMBL30 and make kinase-inhibitor and activities dataset.

At first, I cloned the repository and chembl_30_sqlite.tar.gz from chemblsite.(sqlite) After that I ran ‘kinases_in_chembl.ipynb’ with ChEMBL30.

human_kinases.aggregated.csv which is provided from original repo was used in following code. And most of the code in my blogpost is came from kinodata repo. I appreciate great work of the authors.
Compared to ChEMBL29 and ChEMBL30, there are more data in ver 30.

After runngin the code, I could get 'human_kinases_and_chembl_targets.chembl_30.csv'
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Next step, I ran kinase-bioactivities-in-chembl.ipynb for making csv which has structure and biological activity information.

Here is a code.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

The code avobe worth to read because there is good SQL example to data extraction from ChEMBLDB. And useful data curation process is provided.

Finaly, I added PCA analysis with potent compouns fingerprint. In the dataset. I used useful_rdkit_utils to convert molecule to fingerprint array. By using the package I could get molecular fp in few code.

In summary, using these notebook I could make kinase inhibitor dataset conveniently. I’ll use it for many chemoinformatics tasks ;)

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: