Kinase is one of the attractive target for drug discovery. So there are lots of data not only protein but also inhibitor available.
ChEMBL is useful public data source for Kinase inhibitor data however to use the data, we need to retrieve data from the DB and curate it. Of course there are commercial database focused on Kinase, but not freely available. I would like to use data conveniently in my hobby ;).
If you think so too, I would like to check following repository openkinome/kinodata. URL is below
https://github.com/openkinome/kinodata
The repository provides useful notebook for getting kinase related dataset from ChEMBL29. Fortunately we can use ChEMBL30! So I modified the notebook for ChEMBL30 and make kinase-inhibitor and activities dataset.
At first, I cloned the repository and chembl_30_sqlite.tar.gz from chemblsite.(sqlite) After that I ran ‘kinases_in_chembl.ipynb’ with ChEMBL30.
human_kinases.aggregated.csv which is provided from original repo was used in following code. And most of the code in my blogpost is came from kinodata repo. I appreciate great work of the authors. Compared to ChEMBL29 and ChEMBL30, there are more data in ver 30. After runngin the code, I could get 'human_kinases_and_chembl_targets.chembl_30.csv'
Next step, I ran kinase-bioactivities-in-chembl.ipynb for making csv which has structure and biological activity information.
Here is a code.
The code avobe worth to read because there is good SQL example to data extraction from ChEMBLDB. And useful data curation process is provided.
Finaly, I added PCA analysis with potent compouns fingerprint. In the dataset. I used useful_rdkit_utils to convert molecule to fingerprint array. By using the package I could get molecular fp in few code.
In summary, using these notebook I could make kinase inhibitor dataset conveniently. I’ll use it for many chemoinformatics tasks ;)