Build regression model in Keras

I introduced Keras in mishimasyk#9. And my presentation was how to build classification model in Keras.
A participant asked me that how to build regression model in Keras. I could not answer his question.
After syk#9, I searched Keras API and found good method.
Keras has Scikit-learn API. The API can build regression model. 😉
https://keras.io/scikit-learn-api/
Example code is following.
The code is used for build QSAR model.

import numpy as np
import pandas as pd
import sys

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs

from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import KFold
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout
from keras.wrappers.scikit_learn import KerasRegressor

def getFpArr( mols, nBits = 1024 ):
    fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits=nBits ) for mol in mols ]
    X = []
    for fp in fps:
        arr = np.zeros( (1,) )
        DataStructs.ConvertToNumpyArray( fp, arr )
        X.append( arr )
    return X

def getResponse( mols, prop="ACTIVITY" ):
    Y = []
    for mol in mols:
        act = mol.GetProp( prop )
        act = 9. - np.log10( float( act ) )
        Y.append( act )
    return Y

def base_model():
    model = Sequential()
    model.add( Dense( input_dim=1024, output_dim = 100 ) )
    model.add( Activation( "relu" ) )
    model.add( Dense( 100 ) )
    model.add( Activation( "relu" ) )
    model.add( Dense( 1 ) )
    #model.add( Activation( 'relu' ) )
    model.compile( loss="mean_squared_error",  optimizer="adam" )
    return model


if __name__ == '__main__':
    filename = sys.argv[1]
    sdf = [ mol for mol in Chem.SDMolSupplier( filename ) ]
    X = getFpArr( sdf )
    Y = getResponse( sdf )

    trainx, testx, trainy, testy = train_test_split( X, Y, test_size=0.2, random_state=0 )
    trainx, testx, trainy, testy = np.asarray( trainx ), np.asarray( testx ), np.asarray( trainy ), np.asarray( testy )
    estimator = KerasRegressor( build_fn = base_model,
                                nb_epoch=100,
                                batch_size=20,
                                 )
    estimator.fit( trainx, trainy )
    pred_y = estimator.predict( testx )
    r2 = r2_score( testy, pred_y )
    rmse = mean_squared_error( testy, pred_y )
    print( "KERAS: R2 : {0:f}, RMSE : {1:f}".format( r2, rmse ) )

Run the code.
I used CHEMBLdatast.

mishimasyk9 iwatobipen$ python keras_regression.py sdf/CHEMBL952131_EGFR.sdf 
Using Theano backend.
Epoch 1/100
102/102 [==============================] - 0s - loss: 62.2934     
........................   
Epoch 100/100
102/102 [==============================] - 0s - loss: 0.0123     
KERAS: R2 : 0.641975, RMSE : 0.578806

R2 is 0.64. It’s not so bad. 😉
I pushed the script to the syk 9 repository.
https://github.com/Mishima-syk/9/tree/master/iwatobipen

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中