I often use ECFP4 for building QSAR models. Because, I think the algorithm represents details of molecules.
Original circular finger print can be calculated using pipeline pilot. But, RDKit also can calculate ECFP4 like fingerprint called Morgan-Finger print.
BTW, when I tried to build QSAR models using chainer, I had one question that ECFP4 really represents molecular features?
So, I searched another way to represent molecules and I found exciting report some days ago.
The title is ‘Convolutional Networks on Graphs for Learning Molecular Fingerprints’ .
URL is following.
The author created a differentiable fingerprint called ‘neural fingerprint’.
That means the fingerprint will be optimised through a training. Sounds nice! 😉
Also the author described the advantages of the method.
• Interpretability. Standard fingerprints encode each possible fragment completely distinctly, with no notion of similarity between fragments. In contrast, each feature of a neural graph fingerprint can be activated by similar but distinct molecular fragments, making the feature representation more meaningful.
This is what I searched.
Next see more details of the fingerprint.
Neural Fingerprint(NeuFP) needs molecule and radius(num. of layers) as input( this is same as ECFP.) and weights of each layer. The NeuFP’s radius corresponds number of neural net layers.
And the NeuFP uses smooth function instead of hashing.
Finally the author used softmax function instead of indexing step. I thin this step is key point because softmax function can differentiable.
The report showed some example of prediction and neural finger print showed good performance.
Fortunately, source code is uploaded github, so reader who is interested in the report can use neural fingerprint.
There are some examples in the repo.
visualize.py visualizes neural fingerprint. It’s very cool.
I hope more enhancement of the document near the future..