New trial of AttentiveFP with new atom feature #DGL #RDKit #Chemoinformatics

Recently I posted an example of AttentiveFP and I found that atom weights doesn’t directly reflect functional groups. And I could get useful suggestion via comment from DGL developper!

And I wonder that how about to use functional group feature to train the model.

But how can I detect functional groups in the molecule? Because functional group is human defined feature.

…. Fortunately, as you know! RDKit has useful function to extract functional group automatically!

Original article is below.
An algorithm to identify functional groups in organic molecules Peter Ertl https://jcheminf.springeropen.com/articles/10.1186/s13321-017-0225-z

And the implementation was found in following URL.
https://github.com/rdkit/rdkit/tree/master/Contrib/IFG

So I used the function to define new atom featurizer. The code is below. The util function can detect functional group of molecule and add the type to atom property. It can use for atom featurizer.

#ifgutil.py
import sys
import os
from rdkit import Chem
from rdkit import RDPaths
from collections import defaultdict
from dgl.data.chem.utils import one_hot_encoding

ifg_path = os.path.join(RDPaths.RDContribDir, 'IFG')
sys.path.append(ifg_path)

import ifg


def map_fgs(mol):
    atoms = list(mol.GetAtoms())
    for atom in atoms:
        atom.SetProp("IFG_TYPE", "")
    fgs = ifg.identify_functional_groups(mol)
    for fg in fgs:
        for atmid in fg.atomIds:
            atom = mol.GetAtomWithIdx(atmid)
            atom.SetProp('IFG_TYPE', fg.type)
    return mol

def make_ifg_list(mols):
    res = set()
    for mol in mols:
        for atom in mol.GetAtoms():
            ifg = atom.GetProp('IFG_TYPE')
            res.add(ifg)
    return list(res)

def atom_ifg_one_hot(atom, allowable_set=None, encode_unknown=False):
    if allowable_set is None:
        raise Exception
    try:
        ifg = atom.GetProp('IFG_TYPE')
    except:
        print('get IFG TYPE at First')
        
    return one_hot_encoding(ifg, allowable_set, encode_unknown=encode_unknown)

And I used the featurize for AttentiveFP training.

Whole code is uploaded to my gist. ;)

AttentiveFP with IFG.

In this case, atom weights does not reflect functional group but seems model can catch up some feature of functional group I think.

AttentiveFP uses GRU so learning process is complex. I would like to apply the featurizer more simple algorithm such as GCN.