Build regression model in Keras

I introduced Keras in mishimasyk#9. And my presentation was how to build classification model in Keras.
A participant asked me that how to build regression model in Keras. I could not answer his question.
After syk#9, I searched Keras API and found good method.
Keras has Scikit-learn API. The API can build regression model. ;-)
https://keras.io/scikit-learn-api/
Example code is following.
The code is used for build QSAR model.

import numpy as np
import pandas as pd
import sys

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs

from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import KFold
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout
from keras.wrappers.scikit_learn import KerasRegressor

def getFpArr( mols, nBits = 1024 ):
    fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits=nBits ) for mol in mols ]
    X = []
    for fp in fps:
        arr = np.zeros( (1,) )
        DataStructs.ConvertToNumpyArray( fp, arr )
        X.append( arr )
    return X

def getResponse( mols, prop="ACTIVITY" ):
    Y = []
    for mol in mols:
        act = mol.GetProp( prop )
        act = 9. - np.log10( float( act ) )
        Y.append( act )
    return Y

def base_model():
    model = Sequential()
    model.add( Dense( input_dim=1024, output_dim = 100 ) )
    model.add( Activation( "relu" ) )
    model.add( Dense( 100 ) )
    model.add( Activation( "relu" ) )
    model.add( Dense( 1 ) )
    #model.add( Activation( 'relu' ) )
    model.compile( loss="mean_squared_error",  optimizer="adam" )
    return model


if __name__ == '__main__':
    filename = sys.argv[1]
    sdf = [ mol for mol in Chem.SDMolSupplier( filename ) ]
    X = getFpArr( sdf )
    Y = getResponse( sdf )

    trainx, testx, trainy, testy = train_test_split( X, Y, test_size=0.2, random_state=0 )
    trainx, testx, trainy, testy = np.asarray( trainx ), np.asarray( testx ), np.asarray( trainy ), np.asarray( testy )
    estimator = KerasRegressor( build_fn = base_model,
                                nb_epoch=100,
                                batch_size=20,
                                 )
    estimator.fit( trainx, trainy )
    pred_y = estimator.predict( testx )
    r2 = r2_score( testy, pred_y )
    rmse = mean_squared_error( testy, pred_y )
    print( "KERAS: R2 : {0:f}, RMSE : {1:f}".format( r2, rmse ) )

Run the code.
I used CHEMBLdatast.

mishimasyk9 iwatobipen$ python keras_regression.py sdf/CHEMBL952131_EGFR.sdf 
Using Theano backend.
Epoch 1/100
102/102 [==============================] - 0s - loss: 62.2934     
........................   
Epoch 100/100
102/102 [==============================] - 0s - loss: 0.0123     
KERAS: R2 : 0.641975, RMSE : 0.578806

R2 is 0.64. It’s not so bad. ;-)
I pushed the script to the syk 9 repository.
https://github.com/Mishima-syk/9/tree/master/iwatobipen

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: