Visualize Molecular Similarity

Somedays ago, I found nice work about visualization of molecular similarity.
Molecular similarity is used to compare molecular structures. For example, tanimoto coefficient and so on.
But, for Chemists, scores are difficult to understand occasionally.
Gregory A Landrum et al. reported very nice work to represent molecular similarity.
They used RDKit, Scikit-larn and matplotlib to make molecular similarity maps. And they supplied source code on web .

So, I tried to use that code.
Data that I used, was same as the report.

At first make “pics folder” in supp1 or supp2 folder.

lion$ mkdir pics
lion$ tree
.
├── data
│   ├── cmps.dat
│   └── training_cmps.dat
├── generate_maps.py
└── pics

Then run script.
If you interested in the code, please get source code and read it. 🙂

lion$ python generate_maps.py 
generate atom pairs similarity maps
generate morgan2 similarity maps
generate countmorgan2 similarity maps
generate featmorgan2 similarity maps
generate random forest similarity maps
generate naive bayes similarity maps
lion$ tree
.
├── data
│   ├── cmps.dat
│   └── training_cmps.dat
├── generate_maps.py
└── pics
    ├── mol1_ap.png
    ├── mol1_cmorgan2.png
    ├── mol1_fmorgan2.png
    ├── mol1_morgan2.png
    ├── mol1_nb.png
    ├── mol1_rf.png
    ├── mol2_ap.png
    ├── mol2_cmorgan2.png
    ├── mol2_fmorgan2.png
    ├── mol2_morgan2.png
    ├── mol2_nb.png
    └── mol2_rf.png

2 directories, 15 files

Now, I got similarity map using morgan fp, atom pair fp, count morgan fp, feat morgan fp and naive bayes, random forest.
Some pics like that.

Atompair sim
mol1_ap

Feat morgan sim
mol1_fmorgan2

Reference molecule is CCCN(CCCCN1CCN(c2ccccc2OC)CC1)Cc1ccc2ccccc2c1.
And compared molecule is COc1cccc2cc(C(=O)NCCCCN3CCN(c4cccc5nccnc54)CC3)oc21.

Random Forest and naive Bayes are used in the code, but if you want, you can add SVM and any other method.
I think this work is very useful for discuss molecular similarity !
And I interested in calcAtomGaussians method in RDKit.

Ref.
Similarity maps – a visualization strategy for molecular fingerprints and machine-learning methods
Journal of Cheminformatics 2013, 5:43 doi:10.1186/1758-2946-5-43

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中