Visualize Molecular Similarity

Somedays ago, I found nice work about visualization of molecular similarity.
Molecular similarity is used to compare molecular structures. For example, tanimoto coefficient and so on.
But, for Chemists, scores are difficult to understand occasionally.
Gregory A Landrum et al. reported very nice work to represent molecular similarity.
They used RDKit, Scikit-larn and matplotlib to make molecular similarity maps. And they supplied source code on web .

So, I tried to use that code.
Data that I used, was same as the report.

At first make “pics folder” in supp1 or supp2 folder.

lion$ mkdir pics
lion$ tree
.
├── data
│   ├── cmps.dat
│   └── training_cmps.dat
├── generate_maps.py
└── pics

Then run script.
If you interested in the code, please get source code and read it. :-)

lion$ python generate_maps.py 
generate atom pairs similarity maps
generate morgan2 similarity maps
generate countmorgan2 similarity maps
generate featmorgan2 similarity maps
generate random forest similarity maps
generate naive bayes similarity maps
lion$ tree
.
├── data
│   ├── cmps.dat
│   └── training_cmps.dat
├── generate_maps.py
└── pics
    ├── mol1_ap.png
    ├── mol1_cmorgan2.png
    ├── mol1_fmorgan2.png
    ├── mol1_morgan2.png
    ├── mol1_nb.png
    ├── mol1_rf.png
    ├── mol2_ap.png
    ├── mol2_cmorgan2.png
    ├── mol2_fmorgan2.png
    ├── mol2_morgan2.png
    ├── mol2_nb.png
    └── mol2_rf.png

2 directories, 15 files

Now, I got similarity map using morgan fp, atom pair fp, count morgan fp, feat morgan fp and naive bayes, random forest.
Some pics like that.

Atompair sim
mol1_ap

Feat morgan sim
mol1_fmorgan2

Reference molecule is CCCN(CCCCN1CCN(c2ccccc2OC)CC1)Cc1ccc2ccccc2c1.
And compared molecule is COc1cccc2cc(C(=O)NCCCCN3CCN(c4cccc5nccnc54)CC3)oc21.

Random Forest and naive Bayes are used in the code, but if you want, you can add SVM and any other method.
I think this work is very useful for discuss molecular similarity !
And I interested in calcAtomGaussians method in RDKit.

Ref.
Similarity maps – a visualization strategy for molecular fingerprints and machine-learning methods
Journal of Cheminformatics 2013, 5:43 doi:10.1186/1758-2946-5-43

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: