select diverse molecules from dataset.

To select subset from compound library, I think about molecular diversity. So, I often use clustering . There are many clustering method in chemo-informatics area. But, today I used another approach. Use GA. My strategy is that, evaluate diversity of molecules is sum of dissimilarity. Let’s start. For convenience, I used rdkit first_prop_200.sdf as dataset.Continue reading “select diverse molecules from dataset.”