Visualize atom weight of AttentiveFP #DGL #RDKit #Chemoinformatics

Yesterday, I posted an example of DGL (almost same as original example code).

And I could make regression model with my own dataset. Fortunately DGL developer provides a code for visualize atom weights of trained model.

It means that, after building the model with AttentiveFP, you can visualize atom weight of the give molecule which means how much each atom contribute to the target value.

I saw many example of the approach but never tried it by myself. So I tried to it today.

Following code is same as yesterday.

%matplotlib inline 
import matplotlib.pyplot as plt
import os
from rdkit import Chem
from rdkit import RDPaths

import dgl
import numpy as np
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
from dgl import model_zoo

from dgl.data.chem.utils import mol_to_complete_graph, mol_to_bigraph

from dgl.data.chem.utils import atom_type_one_hot
from dgl.data.chem.utils import atom_degree_one_hot
from dgl.data.chem.utils import atom_formal_charge
from dgl.data.chem.utils import atom_num_radical_electrons
from dgl.data.chem.utils import atom_hybridization_one_hot
from dgl.data.chem.utils import atom_total_num_H_one_hot
from dgl.data.chem.utils import one_hot_encoding
from dgl.data.chem import CanonicalAtomFeaturizer
from dgl.data.chem import CanonicalBondFeaturizer
from dgl.data.chem import ConcatFeaturizer
from dgl.data.chem import BaseAtomFeaturizer
from dgl.data.chem import BaseBondFeaturizer

from dgl.data.chem import one_hot_encoding
from dgl.data.utils import split_dataset

from functools import partial
from sklearn.metrics import roc_auc_score
def chirality(atom):
    try:
        return one_hot_encoding(atom.GetProp('_CIPCode'), ['R', 'S']) + \
               [atom.HasProp('_ChiralityPossible')]
    except:
        return [False, False] + [atom.HasProp('_ChiralityPossible')]
    
def collate_molgraphs(data):
    """Batching a list of datapoints for dataloader.
    Parameters
    ----------
    data : list of 3-tuples or 4-tuples.
        Each tuple is for a single datapoint, consisting of
        a SMILES, a DGLGraph, all-task labels and optionally
        a binary mask indicating the existence of labels.
    Returns
    -------
    smiles : list
        List of smiles
    bg : BatchedDGLGraph
        Batched DGLGraphs
    labels : Tensor of dtype float32 and shape (B, T)
        Batched datapoint labels. B is len(data) and
        T is the number of total tasks.
    masks : Tensor of dtype float32 and shape (B, T)
        Batched datapoint binary mask, indicating the
        existence of labels. If binary masks are not
        provided, return a tensor with ones.
    """
    assert len(data[0]) in [3, 4], \
        'Expect the tuple to be of length 3 or 4, got {:d}'.format(len(data[0]))
    if len(data[0]) == 3:
        smiles, graphs, labels = map(list, zip(*data))
        masks = None
    else:
        smiles, graphs, labels, masks = map(list, zip(*data))

    bg = dgl.batch(graphs)
    bg.set_n_initializer(dgl.init.zero_initializer)
    bg.set_e_initializer(dgl.init.zero_initializer)
    labels = torch.stack(labels, dim=0)
    
    if masks is None:
        masks = torch.ones(labels.shape)
    else:
        masks = torch.stack(masks, dim=0)
    return smiles, bg, labels, masks

atom_featurizer = BaseAtomFeaturizer(
                 {'hv': ConcatFeaturizer([
                  partial(atom_type_one_hot, allowable_set=[
                          'B', 'C', 'N', 'O', 'F', 'Si', 'P', 'S', 'Cl', 'As', 'Se', 'Br', 'Te', 'I', 'At'],
                    encode_unknown=True),
                  partial(atom_degree_one_hot, allowable_set=list(range(6))),
                  atom_formal_charge, atom_num_radical_electrons,
                  partial(atom_hybridization_one_hot, encode_unknown=True),
                  lambda atom: [0], # A placeholder for aromatic information,
                    atom_total_num_H_one_hot, chirality
                 ],
                )})
bond_featurizer = BaseBondFeaturizer({
                                     'he': lambda bond: [0 for _ in range(10)]
    })

train=os.path.join(RDPaths.RDDocsDir, 'Book/data/solubility.train.sdf')
test=os.path.join(RDPaths.RDDocsDir, 'Book/data/solubility.test.sdf')

train_graph =[mol_to_bigraph(mol,
                           atom_featurizer=atom_featurizer, 
                           bond_featurizer=bond_featurizer) for mol in train_mols]

test_graph =[mol_to_bigraph(mol,
                           atom_featurizer=atom_featurizer, 
                           bond_featurizer=bond_featurizer) for mol in test_mols]

def run_a_train_epoch(n_epochs, epoch, model, data_loader,loss_criterion, optimizer):
    model.train()
    total_loss = 0
    losses = []
    
    for batch_id, batch_data in enumerate(data_loader):
        batch_data
        smiles, bg, labels, masks = batch_data
        if torch.cuda.is_available():
            bg.to(torch.device('cuda:0'))
            labels = labels.to('cuda:0')
            masks = masks.to('cuda:0')
        
        prediction = model(bg, bg.ndata['hv'], bg.edata['he'])
        loss = (loss_criterion(prediction, labels)*(masks != 0).float()).mean()
        #loss = loss_criterion(prediction, labels)
        #print(loss.shape)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        losses.append(loss.data.item())
        
    #total_score = np.mean(train_meter.compute_metric('rmse'))
    total_score = np.mean(losses)
    print('epoch {:d}/{:d}, training {:.4f}'.format( epoch + 1, n_epochs,  total_score))
    return total_score

model = model_zoo.chem.AttentiveFP(node_feat_size=39,
                                  edge_feat_size=10,
                                  num_layers=2,
                                  num_timesteps=2,
                                  graph_feat_size=200,
                                  output_size=1,
                                  dropout=0.2)
model = model.to('cuda:0')

train_loader = DataLoader(dataset=list(zip(train_smi, train_graph, train_sol)), batch_size=128, collate_fn=collate_molgraphs)
test_loader = DataLoader(dataset=list(zip(test_smi, test_graph, test_sol)), batch_size=128, collate_fn=collate_molgraphs)

loss_fn = nn.MSELoss(reduction='none')
optimizer = torch.optim.Adam(model.parameters(), lr=10 ** (-2.5), weight_decay=10 ** (-5.0),)
n_epochs = 100
epochs = []
scores = []
for e in range(n_epochs):
    score = run_a_train_epoch(n_epochs, e, model, train_loader, loss_fn, optimizer)
    epochs.append(e)
    scores.append(score)
model.eval()

OK, I build the predictive model. (of course build model can save with torch.save(model.state_dict(), PATH) method.)

Let’s visualize molecule with atom weights!

At first, import packages for molecule visualization.

import copy
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
from IPython.display import display
import matplotlib
import matplotlib.cm as cm

Then define visualization function. Following code is borrowed from original repository thanks a lot. DGL model has get_node_weight option, which returns node_weight of the graph. The model has two layers of GRU so timestep must be 0 or 1 following code I used 0 as timestep.

def drawmol(idx, dataset, timestep):
    smiles, graph, _ = dataset[idx]
    print(smiles)
    bg = dgl.batch([graph])
    atom_feats, bond_feats = bg.ndata['hv'], bg.edata['he']
    if torch.cuda.is_available():
        print('use cuda')
        bg.to(torch.device('cuda:0'))
        atom_feats = atom_feats.to('cuda:0')
        bond_feats = bond_feats.to('cuda:0')
    
    _, atom_weights = model(bg, atom_feats, bond_feats, get_node_weight=True)
    assert timestep < len(atom_weights), 'Unexpected id for the readout round'
    atom_weights = atom_weights[timestep]
    min_value = torch.min(atom_weights)
    max_value = torch.max(atom_weights)
    atom_weights = (atom_weights - min_value) / (max_value - min_value)
    
    norm = matplotlib.colors.Normalize(vmin=0, vmax=1.28)
    cmap = cm.get_cmap('bwr')
    plt_colors = cm.ScalarMappable(norm=norm, cmap=cmap)
    atom_colors = {i: plt_colors.to_rgba(atom_weights[i].data.item()) for i in range(bg.number_of_nodes())}

    mol = Chem.MolFromSmiles(smiles)
    rdDepictor.Compute2DCoords(mol)
    drawer = rdMolDraw2D.MolDraw2DSVG(280, 280)
    drawer.SetFontSize(1)
    op = drawer.drawOptions()
    
    mol = rdMolDraw2D.PrepareMolForDrawing(mol)
    drawer.DrawMolecule(mol, highlightAtoms=range(bg.number_of_nodes()),
                             highlightBonds=[],
                             highlightAtomColors=atom_colors)
    drawer.FinishDrawing()
    svg = drawer.GetDrawingText()
    svg = svg.replace('svg:', '')
    if torch.cuda.is_available():
        atom_weights = atom_weights.to('cpu')
    return (Chem.MolFromSmiles(smiles), atom_weights.data.numpy(), svg)

Draw test dataset molecules. The model predicts solubility and color indicates that red is positive effect for solubility and blue is negative impact.

target = test_loader.dataset
for i in range(len(target)):
    mol, aw, svg = drawmol(i, target, 0)
    display(SVG(svg))

For personally, hydroxyl group has positive effect for solubility I think but the model shows it is not always true. Hmm my code is something wrong? Or I need to think about more details of the model?
I would like to try more predictive task and write helper code for DGL’s AttentiveFP for convenient way to molecular visualization and model building.

Today’s whole code is uploaded below.
https://gist.github.com/iwatobipen/72a2d9dd616322f1f20469a152f2bb58

Any comments or suggestions will be highly appreciated. ;)

18 thoughts on “Visualize atom weight of AttentiveFP #DGL #RDKit #Chemoinformatics”

Great job, as always. For counter-intuitive attention weights, there are two possible explanations:
1. After all, the models are playing with statistics and there can be a wrong correlation when the dataset cannot fully represent the underlying distribution of compounds in the chemical space.
2. With mechanism like graph convolution, back propagation, etc., the node features not only encode the particular atoms, but also a local context.

iwatobipen says:

11/12/2019 at 08:15

Thanks for useful comment.
If dominant reason is 1., huge amount of training dataset will be help I think . I used very small amount of data in the post.
By considering the explanation of 2., the model learned its own concept of functional groups. It seems interesting for me. I’d like to try to use the approach other dataset.

Reply

That’s a great post! As for your discussion, I have a question about how to do scaling of atomic weights. In your post, the raw weights were converted into the scaled ones in the same way as MinMaxScaler of sklearn. But if the raw atomic weight can take a negative value, the sign may be reversed. The sign (positive or negative; red or blue) is supposed to be important for the intuitive visualization. In this regard, I guess it could be better to do the scaling just with dividing each value by the maximum absolute weights. ( [w / max(abs(weights)) for w in weights]), and it may alleviate the counter-intuition a little in some cases.

iwatobipen says:

12/12/2019 at 19:05

Hi, thank you for your nice suggestion!
I agree your opinion. And I checked source code. The weight which is returned by AttentiveFP was calculated with softmax function.(If I understood DGL correctly)
So all returned values will be positive.
It means that current scaling process will not change the sign.
However, I’ll try your proposal later. Now I’m writing the comment from mobile phone…
Thanks

Reply
1. Mufei Li says:
  
  12/12/2019 at 21:47
  
  Perfectly correct. These are typically computed by sigmoid/softmax.
2. iwatobipen says:
  
  12/12/2019 at 22:36
  
  Thanks for your comment!
  It’s really helpful. ;)

“The model has two layers of GRU so timestep must be 0 or 1 following code I used 0 as timestep.”？

is “0 or 1” the GRU’s state? what will happen if you use 1 as timestep?

Mufei Li says:

14/12/2019 at 03:48

The model uses two GRU layers and “0 or 1” refers to the index of the GRU layer. Since we compute one suite of atom weights for each GRU layer, we therefore have two suites of atom weights and need to specify the atom weights to use for visualization. My bad for using the term “timestep”, which is probably a bit confusing.

Reply
1. iwatobipen says:
  
  14/12/2019 at 08:39
  
  Hi Mufei, Thanks for your information!
  And thanks for your query Saku! As Mufei mentioned, you can find more details in source code of attentiveFP. You should check globalpool layer.
  
  https://github.com/dmlc/dgl/blob/master/python/dgl/model_zoo/chem/attentive_fp.py
  Thanks

Hi, iwatobipen-san, and Mufei-san,

Thanks for your quick and valuable comments! I didn’t look at the implementation, but I got it that it’s by sigmoid/softmax to give 0-1 scores.

I am still confused about how to distinguish “positive” and “negative” for each weight. It’s not the case only for this example, but also for some of others in which explainable methods are applied to QSAR models. I guess the followings may somehow address counter-intuition (The data size and/or algorithm itself also matters as you discussed).

If the raw weights are calculated by sigmoid/softmax, how is the scaling by setting 0.5, the middle point of such functions, as the mean: like, [ (w – 0.5) / max(abs(w – 0.5)) for w in weights] ??
Or,
How about using Sequential Colormap rather than Diverging Coloarmap (see, https://matplotlib.org/tutorials/colors/colormaps.html), as used in the original implementation of DGL example (cmap was set by “cm.get_cmap(‘Oranges’)” in it )? By doing this, it can be considered only how large or small each weight is.

Thanks in advance!

iwatobipen says:

13/12/2019 at 21:58

Thanks for your suggestion.
I modified my code to print min-max value of each molecule in visualization process.
URL is below.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

<br /> Viewer requires iframe.<br />

view raw

dgltest_ver2.1.ipynb

hosted with ❤ by GitHub

At the case of AttentiveFP, min, max values are often smaller than 0.5. So the scaling by setting 0.5 will not work.

The middle point of atom weight depends on each molecule. The magnitude of the contribution against molecular properties of each atom is different on each molecule.
I’m not sure that 0.5 is middle point of the weight in this case.
I’ll check other GCN visualization code.

Reply
1. rkakamilan (@rkakamilan) says:
  
  14/12/2019 at 23:40
  
  Many thanks! Your reply clearly answered what I want to know! Based on this results, my first suggestion doesn’t seem suitable for this. And I found that how to do the scaling would be the problem that needs to be considered further.
  
  It was a very useful discussion! Please continue to make great posts!

Dose the attentiveFP method implemented in DGL also work for classification? Is it possible that you demonstrate how we can use it for classification task?

Thanks a lot!

iwatobipen says:

06/03/2020 at 09:38

Hi Mark,
I think it can’t do it with default implementatio.
You can see line 320 in the attentive_fp that self.predict defined with Drop out and linear layer as last layer. So you should add or change the last layer as softmax for classification.
https://github.com/dmlc/dgl/blob/master/python/dgl/model_zoo/chem/attentive_fp.py
I’ll try it later and will post about it if the task will work well. Thanks,

Reply
1. Mark says:
  
  06/03/2020 at 18:24
  
  Thanks for your instruction!
  
  Really appreciate your effort, if you can demonstrate how to use it for the classification tasks!

This post is great! I
Can you show us how to draw an colorbar together with the highlighted graph? It shall be better to see a colorbar in the reseach paper.

iwatobipen says:

25/04/2020 at 21:38

Hi Frank,
Thanks for your comment. I’ll check dgl and try to do it. Pls give me a time. ;)
Thanks,
iwatobipen

Reply
iwatobipen says:

26/04/2020 at 23:05

Hi Frank,
I tried to add colorbar to my code.
Hope this will help you.
https://nbviewer.jupyter.org/gist/iwatobipen/017fae146c1d67a2cfee455c0a427330

Reply

Mufei Li says:

11/12/2019 at 03:19

Great job, as always. For counter-intuitive attention weights, there are two possible explanations:
1. After all, the models are playing with statistics and there can be a wrong correlation when the dataset cannot fully represent the underlying distribution of compounds in the chemical space.
2. With mechanism like graph convolution, back propagation, etc., the node features not only encode the particular atoms, but also a local context.

1. iwatobipen says:
  
  11/12/2019 at 08:15
  
  Thanks for useful comment.
  If dominant reason is 1., huge amount of training dataset will be help I think . I used very small amount of data in the post.
  By considering the explanation of 2., the model learned its own concept of functional groups. It seems interesting for me. I’d like to try to use the approach other dataset.
  
rkakamilan (@rkakamilan) says:

12/12/2019 at 17:19

That’s a great post! As for your discussion, I have a question about how to do scaling of atomic weights. In your post, the raw weights were converted into the scaled ones in the same way as MinMaxScaler of sklearn. But if the raw atomic weight can take a negative value, the sign may be reversed. The sign (positive or negative; red or blue) is supposed to be important for the intuitive visualization. In this regard, I guess it could be better to do the scaling just with dividing each value by the maximum absolute weights. ( [w / max(abs(weights)) for w in weights]), and it may alleviate the counter-intuition a little in some cases.

1. iwatobipen says:
  
  12/12/2019 at 19:05
  
  Hi, thank you for your nice suggestion!
  I agree your opinion. And I checked source code. The weight which is returned by AttentiveFP was calculated with softmax function.(If I understood DGL correctly)
  So all returned values will be positive.
  It means that current scaling process will not change the sign.
  However, I’ll try your proposal later. Now I’m writing the comment from mobile phone…
  Thanks
  
  1. Mufei Li says:
    
    12/12/2019 at 21:47
    
    Perfectly correct. These are typically computed by sigmoid/softmax.
  2. iwatobipen says:
    
    12/12/2019 at 22:36
    
    Thanks for your comment!
    It’s really helpful. ;)
Saku says:

12/12/2019 at 22:55

“The model has two layers of GRU so timestep must be 0 or 1 following code I used 0 as timestep.”？

is “0 or 1” the GRU’s state? what will happen if you use 1 as timestep?

1. Mufei Li says:
  
  14/12/2019 at 03:48
  
  The model uses two GRU layers and “0 or 1” refers to the index of the GRU layer. Since we compute one suite of atom weights for each GRU layer, we therefore have two suites of atom weights and need to specify the atom weights to use for visualization. My bad for using the term “timestep”, which is probably a bit confusing.
  
  1. iwatobipen says:
    
    14/12/2019 at 08:39
    
    Hi Mufei, Thanks for your information!
    And thanks for your query Saku! As Mufei mentioned, you can find more details in source code of attentiveFP. You should check globalpool layer.
    
    https://github.com/dmlc/dgl/blob/master/python/dgl/model_zoo/chem/attentive_fp.py
    Thanks
rkakamilan (@rkakamilan) says:

13/12/2019 at 11:35

Hi, iwatobipen-san, and Mufei-san,

Thanks for your quick and valuable comments! I didn’t look at the implementation, but I got it that it’s by sigmoid/softmax to give 0-1 scores.

I am still confused about how to distinguish “positive” and “negative” for each weight. It’s not the case only for this example, but also for some of others in which explainable methods are applied to QSAR models. I guess the followings may somehow address counter-intuition (The data size and/or algorithm itself also matters as you discussed).

If the raw weights are calculated by sigmoid/softmax, how is the scaling by setting 0.5, the middle point of such functions, as the mean: like, [ (w – 0.5) / max(abs(w – 0.5)) for w in weights] ??
Or,
How about using Sequential Colormap rather than Diverging Coloarmap (see, https://matplotlib.org/tutorials/colors/colormaps.html), as used in the original implementation of DGL example (cmap was set by “cm.get_cmap(‘Oranges’)” in it )? By doing this, it can be considered only how large or small each weight is.

Thanks in advance!

1. iwatobipen says:
  
  13/12/2019 at 21:58
  
  Thanks for your suggestion.
  I modified my code to print min-max value of each molecule in visualization process.
  URL is below.
  
  Sorry, something went wrong. Reload?
  
  Sorry, we cannot display this file.
  
  Sorry, this file is invalid so it cannot be displayed.
  
  <br /> Viewer requires iframe.<br />
  
  view raw
  
  dgltest_ver2.1.ipynb
  
  hosted with ❤ by GitHub
  
  At the case of AttentiveFP, min, max values are often smaller than 0.5. So the scaling by setting 0.5 will not work.
  
  The middle point of atom weight depends on each molecule. The magnitude of the contribution against molecular properties of each atom is different on each molecule.
  I’m not sure that 0.5 is middle point of the weight in this case.
  I’ll check other GCN visualization code.
  
  1. rkakamilan (@rkakamilan) says:
    
    14/12/2019 at 23:40
    
    Many thanks! Your reply clearly answered what I want to know! Based on this results, my first suggestion doesn’t seem suitable for this. And I found that how to do the scaling would be the problem that needs to be considered further.
    
    It was a very useful discussion! Please continue to make great posts!
Mark says:

05/03/2020 at 22:41

Dose the attentiveFP method implemented in DGL also work for classification? Is it possible that you demonstrate how we can use it for classification task?

Thanks a lot!

1. iwatobipen says:
  
  06/03/2020 at 09:38
  
  Hi Mark,
  I think it can’t do it with default implementatio.
  You can see line 320 in the attentive_fp that self.predict defined with Drop out and linear layer as last layer. So you should add or change the last layer as softmax for classification.
  https://github.com/dmlc/dgl/blob/master/python/dgl/model_zoo/chem/attentive_fp.py
  I’ll try it later and will post about it if the task will work well. Thanks,
  
  1. Mark says:
    
    06/03/2020 at 18:24
    
    Thanks for your instruction!
    
    Really appreciate your effort, if you can demonstrate how to use it for the classification tasks!
Frank Liu says:

22/04/2020 at 02:09

This post is great! I
Can you show us how to draw an colorbar together with the highlighted graph? It shall be better to see a colorbar in the reseach paper.

1. iwatobipen says:
  
  25/04/2020 at 21:38
  
  Hi Frank,
  Thanks for your comment. I’ll check dgl and try to do it. Pls give me a time. ;)
  Thanks,
  iwatobipen
  
2. iwatobipen says:
  
  26/04/2020 at 23:05
  
  Hi Frank,
  I tried to add colorbar to my code.
  Hope this will help you.
  https://nbviewer.jupyter.org/gist/iwatobipen/017fae146c1d67a2cfee455c0a427330

Visualize atom weight of AttentiveFP #DGL #RDKit #Chemoinformatics

Published by iwatobipen

18 thoughts on “Visualize atom weight of AttentiveFP #DGL #RDKit #Chemoinformatics”

Leave a reply to Mufei Li Cancel reply

Share this:

Related

Published by iwatobipen

18 thoughts on “Visualize atom weight of AttentiveFP #DGL #RDKit #Chemoinformatics”

Leave a reply to Mufei Li Cancel reply