Visualize Molecular Similarity

Somedays ago, I found nice work about visualization of molecular similarity.
Molecular similarity is used to compare molecular structures. For example, tanimoto coefficient and so on.
But, for Chemists, scores are difficult to understand occasionally.
Gregory A Landrum et al. reported very nice work to represent molecular similarity.
They used RDKit, Scikit-larn and matplotlib to make molecular similarity maps. And they supplied source code on web .

So, I tried to use that code.
Data that I used, was same as the report.

At first make “pics folder” in supp1 or supp2 folder.

lion$ mkdir pics
lion$ tree
├── data
│   ├── cmps.dat
│   └── training_cmps.dat
└── pics

Then run script.
If you interested in the code, please get source code and read it. 🙂

lion$ python 
generate atom pairs similarity maps
generate morgan2 similarity maps
generate countmorgan2 similarity maps
generate featmorgan2 similarity maps
generate random forest similarity maps
generate naive bayes similarity maps
lion$ tree
├── data
│   ├── cmps.dat
│   └── training_cmps.dat
└── pics
    ├── mol1_ap.png
    ├── mol1_cmorgan2.png
    ├── mol1_fmorgan2.png
    ├── mol1_morgan2.png
    ├── mol1_nb.png
    ├── mol1_rf.png
    ├── mol2_ap.png
    ├── mol2_cmorgan2.png
    ├── mol2_fmorgan2.png
    ├── mol2_morgan2.png
    ├── mol2_nb.png
    └── mol2_rf.png

2 directories, 15 files

Now, I got similarity map using morgan fp, atom pair fp, count morgan fp, feat morgan fp and naive bayes, random forest.
Some pics like that.

Atompair sim

Feat morgan sim

Reference molecule is CCCN(CCCCN1CCN(c2ccccc2OC)CC1)Cc1ccc2ccccc2c1.
And compared molecule is COc1cccc2cc(C(=O)NCCCCN3CCN(c4cccc5nccnc54)CC3)oc21.

Random Forest and naive Bayes are used in the code, but if you want, you can add SVM and any other method.
I think this work is very useful for discuss molecular similarity !
And I interested in calcAtomGaussians method in RDKit.

Similarity maps – a visualization strategy for molecular fingerprints and machine-learning methods
Journal of Cheminformatics 2013, 5:43 doi:10.1186/1758-2946-5-43


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s