Target prediction using ChEMBL

You know, there are some database that can publicly available database in chemo informatics area.
ChEMBL DB is one of useful database. George Papadatos introduced useful tool for target prediction using ChEMBL. He provided chembl target prediction model via ftp server !
So, everyone can use the model.
I used the model and tried to target prediction.
At first, I get the model from ftp server. And launched jupyter notebook. 😉

iwatobipen$ wget wget ftp://ftp.ebi.ac.uk/pub/databases/chembl/target_predictions/chembl_22_models.tar.gz
iwatobipen$ tar -vzxf chembl_22_models.tar.gz
iwatobipen$ jupyter notebook

It ready! Go ahead.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import PandasTools
from rdkit import DataStructs
import pandas as pd
from pandas import concat
from collections import OrderedDict
import requests
import numpy
from sklearn.externals import joblib
from rdkit import rdBase
print( rdBase.rdkitVersion )
>2016.09.2

I tried in python3.5 environment.

morgan_nb = joblib.load( 'models_22/10uM/mNB_10uM_all.pkl' )
classes = list( morgan_nb.targets )
len( classes )
> 1524 # model has 1524 targets ( classes )

I used sitagliptin as input molecule.

smiles = 'C1CN2C(=NN=C2C(F)(F)F)CN1C(=O)C[C@@H](CC3=CC(=C(C=C3F)F)F)N'
mol = Chem.MolFromSmiles( smiles )
mol

Next, calculate morgan fingerprint and convert the fingerprint to numpy array.

fp = AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits=2048 )
res = numpy.zeros( len(fp), numpy.int32 )
DataStructs.ConvertToNumpyArray( fp, res )

Predict target and sort the result by probability.

probas = list( morgan_nb.predict_proba( res.reshape(1,-1))[0] ) 
top_pred = predictions.sort_values( by='proba', ascending = False).head(10)
top_pred

Jupyter notebook ver5 changed table view!

Then convert from chembl ID to target name.

def fetch_WS( trgt ):
    re = requests.get( 'https://www.ebi.ac.uk/chembl/api/data/target/{0}.json'.format(trgt) )
    return ( trgt, re.json()['pref_name'], re.json()['organism'] )
plist = []
for i , e in enumerate( top_pred['id'] ):
    plist.append( fetch_WS(e) )
target_info = pd.DataFrame( plist, columns=['id', 'name', 'organism'])
pd.merge( top_pred, target_info )

The model predicted stagliptin is DPP4 modulator! I think this work is interesting. I will try to predict another molecules and integrate local ChEMBL DB to improve performance.
😉

Original source code is following URL. Thanks for useful information!!!!
https://github.com/madgpap/notebooks
http://chembl.blogspot.jp/2016/03/target-prediction-models-update.html

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中