Recently I posted an example of AttentiveFP and I found that atom weights doesn’t directly reflect functional groups. And I could get useful suggestion via comment from DGL developper!
And I wonder that how about to use functional group feature to train the model.
But how can I detect functional groups in the molecule? Because functional group is human defined feature.
…. Fortunately, as you know! RDKit has useful function to extract functional group automatically!
Original article is below.
An algorithm to identify functional groups in organic molecules Peter Ertl https://jcheminf.springeropen.com/articles/10.1186/s13321-017-0225-z
And the implementation was found in following URL.
So I used the function to define new atom featurizer. The code is below. The util function can detect functional group of molecule and add the type to atom property. It can use for atom featurizer.
#ifgutil.py import sys import os from rdkit import Chem from rdkit import RDPaths from collections import defaultdict from dgl.data.chem.utils import one_hot_encoding ifg_path = os.path.join(RDPaths.RDContribDir, 'IFG') sys.path.append(ifg_path) import ifg def map_fgs(mol): atoms = list(mol.GetAtoms()) for atom in atoms: atom.SetProp("IFG_TYPE", "") fgs = ifg.identify_functional_groups(mol) for fg in fgs: for atmid in fg.atomIds: atom = mol.GetAtomWithIdx(atmid) atom.SetProp('IFG_TYPE', fg.type) return mol def make_ifg_list(mols): res = set() for mol in mols: for atom in mol.GetAtoms(): ifg = atom.GetProp('IFG_TYPE') res.add(ifg) return list(res) def atom_ifg_one_hot(atom, allowable_set=None, encode_unknown=False): if allowable_set is None: raise Exception try: ifg = atom.GetProp('IFG_TYPE') except: print('get IFG TYPE at First') return one_hot_encoding(ifg, allowable_set, encode_unknown=encode_unknown)
And I used the featurize for AttentiveFP training.
Whole code is uploaded to my gist. ;)
In this case, atom weights does not reflect functional group but seems model can catch up some feature of functional group I think.
AttentiveFP uses GRU so learning process is complex. I would like to apply the featurizer more simple algorithm such as GCN.