Somedays ago, I posted blog about deepchem. I am still playing with deepchem. Today I tried to use graph convolution regression model.
Deepchem provided Graph convolution Regressor. Cool.
I used solubility data provided from AstraZeneca. https://www.ebi.ac.uk/chembl/assay/inspect/CHEMBL3301364
My test code is following. Almost same as deepchem”s example code.
CSVLoader method is very useful because it can not only read data but also calculate graph feature of each molecule.
Next, Define of Graph convolution network.
import tensorflow as tf import deepchem as dc import numpy as np graph_featurizer = dc.feat.graph_features.ConvMolFeaturizer() loader = dc.data.data_loader.CSVLoader( tasks=['LogS'], smiles_field="CANONICAL_SMILES", id_field="CMPD_CHEMBLID", featurizer=graph_featurizer ) dataset = loader.featurize( './bioactivity.csv' ) splitter = dc.splits.splitters.RandomSplitter() trainset,testset = splitter.train_test_split( dataset ) hp = dc.molnet.preset_hyper_parameters param = hp.hps[ 'graphconvreg' ] print(param['batch_size']) g = tf.Graph() graph_model = dc.nn.SequentialGraph( 75 ) graph_model.add( dc.nn.GraphConv( int(param['n_filters']), 75, activation='relu' )) graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 )) graph_model.add( dc.nn.GraphPool() ) graph_model.add( dc.nn.GraphConv( int(param['n_filters']), int(param['n_filters']), activation='relu' )) graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 )) graph_model.add( dc.nn.GraphPool() ) graph_model.add( dc.nn.Dense( int(param['n_fully_connected_nodes']), int(param['n_filters']), activation='relu' )) graph_model.add( dc.nn.BatchNormalization( epsilon=1e-5, mode=1 )) #graph_model.add( dc.nn.GraphGather(param['batch_size'], activation='tanh')) graph_model.add( dc.nn.GraphGather( 10 , activation='tanh')) with tf.Session() as sess: model_graphconv = dc.models.MultitaskGraphRegressor( graph_model, 1, 75, batch_size=10, learning_rate = param['learning_rate'], optimizer_type = 'adam', beta1=.9,beta2=.999) model_graphconv.fit( trainset, nb_epoch=30 ) train_scores = {} regression_metric = dc.metrics.Metric( dc.metrics.pearson_r2_score, np.mean ) train_scores['graphconvreg'] = model_graphconv.evaluate( trainset,[ regression_metric ] ) p=model_graphconv.predict( testset ) print(train_scores)
Next run the code.
root@08d8f729f78b:/deepchem/pen_test# python graphconv_test.py > datalog
And datalog file is….
Loading raw samples now. shard_size: 8192 About to start loading CSV from ./bioactivity.csv Loading shard 1 of size 8192. Featurizing sample 0 Featurizing sample 1000 ... Starting epoch 29 On batch 0 On batch 50 On batch 100 computed_metrics: [0.52744994044080606] {'graphconvreg': {'mean-pearson_r2_score': 0.52744994044080606}}
r2 score is still row, but I think it can improve by change of nb_epochs.
All sample code was uploaded to github.
https://github.com/iwatobipen/deeplearning/blob/master/datalog
Hi, you should try early stopping to choose the number of epochs automatically.
https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
It saves time while also avoiding over-fitting the model to the training set.
Good luck!
F.
Hi Francois,
You are right!
I didn’t use early stopping because of coding convenience. But I should do it.
Thanks
Hi!
Thanks for this post :)
When running your code, I get the following error:
AttributeError: module 'deepchem' has no attribute 'nn'
Would you by any chance know anything about it?
Best,
T.
Hi, what version of deepchem did you use? I think there are some API changes in current version of deepchem. I will check it if you could give the version info.
Thanks
Hi!
The deepchem version I’m using is 2.1.0.
Best,
T.
Hi, I see. Version of DeepChem in my post is old. Current version of deepchem doesn’t have nn. You can find same issue in github. https://github.com/deepchem/deepchem/issues/1257
I’m not follow current version of DeepChem but I think following URL will be help for you,
https://deepchem.io/docs/notebooks/graph_convolutional_networks_for_tox21.html
Best,
Pen