I posted about graph convolution regression using deepchem. And today, I tried graph convolution classification using deepchem.
Code is almost same as regression model. The only a difference point is use dc.models.MultitaskGraphClassifier instead of dc.models.MultitaskGraphRegressor.
I got sample ( JAK3 inhibitor ) data from chembl and tried to make model.
At first I used pandas to convert activity class ( active, non active )to 0,1 bit. Easy to do it.
import panda as pd import pandas as pd df = pd.read_table('jak3_chembl.txt', header=0) df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT ) pd.factorize( df.ACTIVITY_COMMENT ) len(pd.factorize( df.ACTIVITY_COMMENT )) df['activity_class'] = pd.factorize( df.ACTIVITY_COMMENT )[0] df.to_csv('./preprocessed_jak3.csv', index=False)
Next wrote model and test it.
import tensorflow as tf import deepchem as dc import numpy as np import pandas as pd graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer() loader = dc.data.data_loader.CSVLoader( tasks=['activity_class'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer ) dataset = loader.featurize( './preprocessed_jak3.csv' ) splitter = dc.splits.splitters.RandomSplitter() trainset,testset = splitter.train_test_split( dataset ) hp = dc.molnet.preset_hyper_parameters param = hp.hps[ 'graphconv' ] print(param['batch_size']) g = tf.Graph() graph_model = dc.nn.SequentialGraph( 75 ) graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' )) graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 )) graph_model.add( dc.nn.GraphPool() ) graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' )) graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 )) graph_model.add( dc.nn.GraphPool() ) graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' )) graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 )) graph_model.add( dc.nn.GraphGather( 10 , activation='tanh')) with tf.Session() as sess: model_graphconv = dc.models.MultitaskGraphClassifier( graph_model, 1, 75, batch_size=10, learning_rate = param['learning_rate'], optimizer_type = 'adam', beta1=.9,beta2=.999) model_graphconv.fit( trainset, nb_epoch=50 ) train_scores = {} #regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean ) classification_metric = dc.metrics.Metric( dc.metrics.roc_auc_score, np.mean ) train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ classification_metric ] ) p=model_graphconv.predict( testset ) for i in range( len(p )): print( p[i], testset.y[i] ) print(train_scores)
root@08d8f729f78b:/deepchem/pen_test# python graphconv_jak3.py
And datalog file is….
Loading raw samples now. shard_size: 8192 About to start loading CSV from ./preprocessed_jak3.csv Loading shard 1 of size 8192. Featurizing sample 0 TIMING: featurizing shard 0 took 2.023 s TIMING: dataset construction took 3.830 s Loading dataset from disk. TIMING: dataset construction took 2.263 s Loading dataset from disk. TIMING: dataset construction took 1.147 s Loading dataset from disk. 50 Training for 50 epochs Starting epoch 0 On batch 0 ............... On batch 0 On batch 50 computed_metrics: [0.97176380945032259] {'graphconvreg': {'mean-roc_auc_score': 0.97176380945032259}}
Not so bad.
Classification model gives better result than regression model.
All code is pushed my github repository.
https://github.com/iwatobipen/deeplearning
Thanks for the great post. I’m trying to run this code but getting an error.
When I run: “graph_model.add( dc.nn.GraphConv( int(param[‘n_filters’]), 75, activation=’relu’ ))” ,
I get an error: “AttributeError: module ‘deepchem’ has no attribute ‘nn'”
Any suggestion as to how to overcome it? Thanks
Hi, thanks for your query. I think the post is too old and API of deep chem is changed.
You can call these layer from Layers Class. Could you please check following documentation?
https://deepchem.readthedocs.io/en/latest/api_reference/layers.html
I hope this will help you.
Thanks!