New trial of AttentiveFP with new atom feature #DGL #RDKit #Chemoinformatics

Recently I posted an example of AttentiveFP and I found that atom weights doesn’t directly reflect functional groups. And I could get useful suggestion via comment from DGL developper!

And I wonder that how about to use functional group feature to train the model.

But how can I detect functional groups in the molecule? Because functional group is human defined feature.

…. Fortunately, as you know! RDKit has useful function to extract functional group automatically!

Original article is below.
An algorithm to identify functional groups in organic molecules Peter Ertl https://jcheminf.springeropen.com/articles/10.1186/s13321-017-0225-z

And the implementation was found in following URL.
https://github.com/rdkit/rdkit/tree/master/Contrib/IFG

So I used the function to define new atom featurizer. The code is below. The util function can detect functional group of molecule and add the type to atom property. It can use for atom featurizer.

#ifgutil.py
import sys
import os
from rdkit import Chem
from rdkit import RDPaths
from collections import defaultdict
from dgl.data.chem.utils import one_hot_encoding

ifg_path = os.path.join(RDPaths.RDContribDir, 'IFG')
sys.path.append(ifg_path)

import ifg


def map_fgs(mol):
    atoms = list(mol.GetAtoms())
    for atom in atoms:
        atom.SetProp("IFG_TYPE", "")
    fgs = ifg.identify_functional_groups(mol)
    for fg in fgs:
        for atmid in fg.atomIds:
            atom = mol.GetAtomWithIdx(atmid)
            atom.SetProp('IFG_TYPE', fg.type)
    return mol

def make_ifg_list(mols):
    res = set()
    for mol in mols:
        for atom in mol.GetAtoms():
            ifg = atom.GetProp('IFG_TYPE')
            res.add(ifg)
    return list(res)

def atom_ifg_one_hot(atom, allowable_set=None, encode_unknown=False):
    if allowable_set is None:
        raise Exception
    try:
        ifg = atom.GetProp('IFG_TYPE')
    except:
        print('get IFG TYPE at First')
        
    return one_hot_encoding(ifg, allowable_set, encode_unknown=encode_unknown)

And I used the featurize for AttentiveFP training.

Whole code is uploaded to my gist. ;)

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
AttentiveFP with IFG.

In this case, atom weights does not reflect functional group but seems model can catch up some feature of functional group I think.

AttentiveFP uses GRU so learning process is complex. I would like to apply the featurizer more simple algorithm such as GCN.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: