Graph convolution classification with deepchem

I posted about graph convolution regression using deepchem. And today, I tried graph convolution classification using deepchem.
Code is almost same as regression model. The only a difference point is use dc.models.MultitaskGraphClassifier instead of dc.models.MultitaskGraphRegressor.
I got sample ( JAK3 inhibitor ) data from chembl and tried to make model.

At first I used pandas to convert activity class ( active, non active )to 0,1 bit. Easy to do it.

import panda as pd
import pandas as pd
df = pd.read_table('jak3_chembl.txt', header=0)
df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )
pd.factorize( df.ACTIVITY_COMMENT )
len(pd.factorize( df.ACTIVITY_COMMENT ))
df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )[0]

df.to_csv('./preprocessed_jak3.csv', index=False)

Next wrote model and test it.

import tensorflow as tf
import deepchem as dc
import numpy as np
import pandas as pd

graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer()
loader = dc.data.data_loader.CSVLoader( tasks=['activity_class'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer )
dataset = loader.featurize( './preprocessed_jak3.csv' )

splitter = dc.splits.splitters.RandomSplitter()
trainset,testset = splitter.train_test_split( dataset )

hp = dc.molnet.preset_hyper_parameters
param = hp.hps[ 'graphconv' ]
print(param['batch_size'])
g = tf.Graph()
graph_model = dc.nn.SequentialGraph( 75 )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphPool() )
graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' ))
graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 ))
graph_model.add( dc.nn.GraphGather( 10 , activation='tanh'))

with tf.Session() as sess:
    model_graphconv = dc.models.MultitaskGraphClassifier( graph_model,
                                                      1,
                                                      75,
                                                     batch_size=10,
                                                     learning_rate = param['learning_rate'],
                                                     optimizer_type = 'adam',
                                                     beta1=.9,beta2=.999)
    model_graphconv.fit( trainset, nb_epoch=50 )

train_scores = {}
#regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean )
classification_metric = dc.metrics.Metric( dc.metrics.roc_auc_score, np.mean )
train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ classification_metric ]  )
p=model_graphconv.predict( testset )

for i in range( len(p )):
    print( p[i], testset.y[i] )

print(train_scores) 

root@08d8f729f78b:/deepchem/pen_test# python graphconv_jak3.py

And datalog file is….

Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./preprocessed_jak3.csv
Loading shard 1 of size 8192.
Featurizing sample 0
TIMING: featurizing shard 0 took 2.023 s
TIMING: dataset construction took 3.830 s
Loading dataset from disk.
TIMING: dataset construction took 2.263 s
Loading dataset from disk.
TIMING: dataset construction took 1.147 s
Loading dataset from disk.
50
Training for 50 epochs
Starting epoch 0
On batch 0
...............
On batch 0
On batch 50
computed_metrics: [0.97176380945032259]
{'graphconvreg': {'mean-roc_auc_score': 0.97176380945032259}}

Not so bad.
Classification model gives better result than regression model.
All code is pushed my github repository.
https://github.com/iwatobipen/deeplearning

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

2 thoughts on “Graph convolution classification with deepchem

  1. Thanks for the great post. I’m trying to run this code but getting an error.
    When I run: “graph_model.add( dc.nn.GraphConv( int(param[‘n_filters’]), 75, activation=’relu’ ))” ,
    I get an error: “AttributeError: module ‘deepchem’ has no attribute ‘nn'”

    Any suggestion as to how to overcome it? Thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: