Somedays ago, I found nice work about visualization of molecular similarity.
Molecular similarity is used to compare molecular structures. For example, tanimoto coefficient and so on.
But, for Chemists, scores are difficult to understand occasionally.
Gregory A Landrum et al. reported very nice work to represent molecular similarity.
They used RDKit, Scikit-larn and matplotlib to make molecular similarity maps. And they supplied source code on web .
So, I tried to use that code.
Data that I used, was same as the report.
At first make “pics folder” in supp1 or supp2 folder.
lion$ mkdir pics lion$ tree . ├── data │ ├── cmps.dat │ └── training_cmps.dat ├── generate_maps.py └── pics
Then run script.
If you interested in the code, please get source code and read it. :-)
lion$ python generate_maps.py generate atom pairs similarity maps generate morgan2 similarity maps generate countmorgan2 similarity maps generate featmorgan2 similarity maps generate random forest similarity maps generate naive bayes similarity maps lion$ tree . ├── data │ ├── cmps.dat │ └── training_cmps.dat ├── generate_maps.py └── pics ├── mol1_ap.png ├── mol1_cmorgan2.png ├── mol1_fmorgan2.png ├── mol1_morgan2.png ├── mol1_nb.png ├── mol1_rf.png ├── mol2_ap.png ├── mol2_cmorgan2.png ├── mol2_fmorgan2.png ├── mol2_morgan2.png ├── mol2_nb.png └── mol2_rf.png 2 directories, 15 files
Now, I got similarity map using morgan fp, atom pair fp, count morgan fp, feat morgan fp and naive bayes, random forest.
Some pics like that.
Reference molecule is CCCN(CCCCN1CCN(c2ccccc2OC)CC1)Cc1ccc2ccccc2c1.
And compared molecule is COc1cccc2cc(C(=O)NCCCCN3CCN(c4cccc5nccnc54)CC3)oc21.
Random Forest and naive Bayes are used in the code, but if you want, you can add SVM and any other method.
I think this work is very useful for discuss molecular similarity !
And I interested in calcAtomGaussians method in RDKit.
Ref.
Similarity maps – a visualization strategy for molecular fingerprints and machine-learning methods
Journal of Cheminformatics 2013, 5:43 doi:10.1186/1758-2946-5-43