MMP using predict

I’m still thinking about how to use mmp data in our lab.
Inspired following nice presentation, I challenged to make predictive model from MMPA.
One of ipython notebook about rdkit. link
And another is Greg’s nice presentation about reaction finger print . link2
To make predictive model from mmps, I think, I need to convert molecular transformation to fingerprint .
So, I tried to use atompair fingerprint about pair.
Data preparation is almost same as link.
In following code, TID_CHEMBL240.txt was get from chembl19, and mmp_herg.txt was made by using rdkit mmp codes.
Steps are…
1st. Merge MMPs and herg_pKi.
2nd. Calculate delta pKi.
3rd. Calculate AtomPairFingerprint.
4th. Classify delta pKi is over 1 (10 hold increase pKi) or not.

import math
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
from rdkit.Chem import PandasTools
from sklearn import feature_extraction
from sklearn.svm import SVR, SVC
from sklearn import svm
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

df = pd.read_table("TID_CHEMBL240.txt")
mmps = pd.read_csv("mmp_herg.txt", header=None, names=["smiles1", "smiles2","molregno1","molregno2","tform","core"])
PandasTools.AddMoleculeColumnToFrame(mmps,"smiles1","mol1")
PandasTools.AddMoleculeColumnToFrame(mmps,"smiles2","mol2")
mmps = mmps[["mol1","mol2","molregno1","molregno2","tform","core"]]


#make activity table
t1 = df[["MOLREGNO", "STANDARD_VALUE"]]
#1st step merge data
mmpdds=mmps.merge(t1,left_on="molregno1",right_on="MOLREGNO",suffixes=("_1","_2")).merge(t1,left_on="molregno2", right_on="MOLREGNO",suffixes=("_1","_2"))
mmpdds["pKi_1"]=mmpdds.apply(lambda row:-1*math.log10(float(row["STANDARD_VALUE_1"])*1e-9),axis=1)
mmpdds["pKi_2"]=mmpdds.apply(lambda row:-1*math.log10(float(row["STANDARD_VALUE_2"])*1e-9),axis=1)

#2nd step calc delta pKi
mmpdds["delta"]=mmpdds.pKi_1-mmpdds.pKi_2
mmpdds = mmpdds[["mol1","mol2","molregno1", "molregno2", "pKi_1", "pKi_2", "delta", "tform", "core"]]

#3rd Calc AtompairFingerprint
mmpdds["afp_1"]=mmpdds.apply(lambda row:AllChem.GetAtomPairFingerprint(row["mol1"]) ,axis=1)
mmpdds["afp_2"]=mmpdds.apply(lambda row:AllChem.GetAtomPairFingerprint(row["mol2"]) ,axis=1)
mmpdds["deltafp"]=mmpdds.afp_2-mmpdds.afp_1
mmpdds = mmpdds.dropna()
#classify delta pKi is >1 or not.
mmpdds["ov10"] = mmpdds.delta > 1

Hmm. maybe work fine.
Go next Step.
Next, I got NonzeroElements from AtomPairFingerprint and make sparse matrix from it.
Scikit-learn has good module DictVectrizer.
So, used tha method to handle sparse matrix.
And split dataset using train_test_split.
OK, read to go.
First, build SVR classification module.
Then apply to test set.

nzf = [fp.GetNonzeroElements() for fp in mmpdds.deltafp]
v=feature_extraction.DictVectorizer(sparse=True)
sparse_mat=v.fit_transform(nzf)
x_train, x_test, y_train, y_test = train_test_split(
    sparse_mat, mmpdds.ov10, test_size=0.3, random_state=42)
clf=svm.SVC()
clf.fit(x_train,y_train)
pred=clf.predict(x_test)

cm = confusion_matrix(y_test, pred)
print(cm)

confusion matrix was…
[[359 0]
[ 5 64]]
Fine !

文章が文章になっているのか不明ですね、、、
とりあえず動作は確認できたのでベースはよしということで。

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中