build QSAR model using RDKit

I’m interested in deep learning.
Some days ago, I read following paper.
Prediction of New Bioactive Molecules using a Bayesian Belief Network
The author shows Bayesian belief network for classification (BBNC) method is a useful addition to the computational chemist’s toolbox.
So, Today I tried to write script that build qsar models.

At first, calculate molecular descriptors.
The code is follows…

import sys, cPickle
import numpy as np
from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
from sklearn import preprocessing

min_max_scaler = preprocessing.MinMaxScaler()

trainset = sys.argv[1]
testset = sys.argv[2]
trainset = [mol for mol in Chem.SDMolSupplier(trainset) if mol is not None]
testset = [mol for mol in Chem.SDMolSupplier(testset) if mol is not None]

nms=[x[0] for x in Descriptors._descList]
calc = MoleculeDescriptors.MolecularDescriptorCalculator(nms)

trainDescrs = [calc.CalcDescriptors(x) for x in trainset]
testDescrs  = [calc.CalcDescriptors(x) for x in testset]
trainDescrs = np.array(trainDescrs)
testDescrs = np.array(testDescrs)

x_train_minmax = min_max_scaler.fit_transform( trainDescrs )
x_test_minmax = min_max_scaler.fit_transform( testDescrs )

classes={'(A) low':0,'(B) medium':1,'(C) high':1}
train_acts = np.array([classes[mol.GetProp("SOL_classification")] for mol in trainset],dtype="int")
test_acts = np.array([classes[mol.GetProp("SOL_classification")] for mol in testset],dtype="int")

dataset = ( (x_train_minmax, train_acts),(x_train_minmax, train_acts), (x_test_minmax, test_acts) )

f = open("rdk_sol_set_norm_descs.pkl", "wb")

Now I could get train and test data set as pkl file.
Next, build the model using scikit-learn
The code build the model using RANDOMFOREST, SVM, Naive Bayes, Ristrict Bollzmann-SVM classifiler(RBS).
Scikit-learn can join RBM-SVM using pipeline method.
Model can save as pkl file using cPicke. (following code print results only. 😉 )
Scikit-learn is very simple to use, and powerful.
I posted same code and example files (that from RDKit ) to here.

import sys, cPickle
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.naive_bayes import GaussianNB
from sklearn import cross_validation
from sklearn import metrics
from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import Pipeline

f = open(sys.argv[1], "rb")
train, valid, test = cPickle.load(f)

train_x, train_y = train
test_x, test_y = test

nclf = RandomForestClassifier( n_estimators=100, max_depth=5, random_state=0, n_jobs=1 )
nclf = train_x, train_y )
preds = nclf.predict( test_x )
print metrics.confusion_matrix(test_y, preds)
print metrics.classification_report(test_y, preds)
accuracy = nclf.score(test_x, test_y)
print accuracy

print "SVM"
clf_svm = svm.SVC( gamma=0.001, C=100. )
clf_svm = train_x, train_y )
preds_SVM = clf_svm.predict( test_x )
print metrics.confusion_matrix( test_y, preds_SVM )
print metrics.classification_report( test_y, preds_SVM )
accuracy = clf_svm.score( test_x, test_y )

print accuracy

print "NB"
gnb = GaussianNB()
clf_NB = train_x, train_y )
preds_NB = clf_NB.predict( test_x )
print metrics.confusion_matrix( test_y, preds_NB )
print metrics.classification_report( test_y, preds_NB )

#accuracy = preds_NB.score( test_x, test_y )
#print accuracy

print "RBM"
cls_svm2 = svm.SVC( gamma=0.001, C=100. )
rbm = BernoulliRBM(random_state = 0, verbose = True)
classifier = Pipeline( steps=[("rbm", rbm), ("cls_svm2", cls_svm2)] )
rbm.learning_rate = 0.06
rbm.n_iter = 20
rbm.n_compornents = 1000, train_y)
pred_RBM = classifier.predict(test_x)
print metrics.confusion_matrix(test_y, pred_RBM)
print metrics.classification_report(test_y, pred_RBM)
accuracy = classifier.score( test_x, test_y )
print accuracy


以下に詳細を記入するか、アイコンをクリックしてログインしてください。 ロゴ アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中