New fingerprint algorithm using RDKit

I often use ECFP4 for building QSAR models. Because, I think the algorithm represents details of molecules.

Original circular finger print can be calculated using pipeline pilot. But, RDKit also can calculate ECFP4 like fingerprint called Morgan-Finger print.

BTW, when I tried to build QSAR models using chainer, I had one question that ECFP4 really represents molecular features?
So, I searched another way to represent molecules and I found exciting report some days ago.
The title is ‘Convolutional Networks on Graphs for Learning Molecular Fingerprints’ .
URL is following.
http://arxiv.org/abs/1509.09292

The author created a differentiable fingerprint called ‘neural fingerprint’.
That means the fingerprint will be optimised through a training. Sounds nice! ;-)
Also the author described the advantages of the method.


• Interpretability. Standard fingerprints encode each possible fragment completely distinctly, with no notion of similarity between fragments. In contrast, each feature of a neural graph fingerprint can be activated by similar but distinct molecular fragments, making the feature representation more meaningful.
….

This is what I searched.
Next see more details of the fingerprint.

In the article Fig2 shows pseudocode of the new fingerprint and current circular fingerprints.
fig2

Neural Fingerprint(NeuFP) needs molecule and radius(num. of layers) as input( this is same as ECFP.) and weights of each layer. The NeuFP’s radius corresponds number of neural net layers.
And the NeuFP uses smooth function instead of hashing.
Finally the author used softmax function instead of indexing step. I thin this step is key point because softmax function can differentiable.

The report showed some example of prediction and neural finger print showed good performance.

Fortunately, source code is uploaded github, so reader who is interested in the report can use neural fingerprint.

There are some examples in the repo.
visualize.py visualizes neural fingerprint. It’s very cool.
I hope more enhancement of the document near the future..

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: