select diverse molecules from dataset.

To select subset from compound library, I think about molecular diversity. So, I often use clustering . There are many clustering method in chemo-informatics area. But, today I used another approach. Use GA. My strategy is that, evaluate diversity of molecules is sum of dissimilarity. Let’s start. For convenience, I used rdkit first_prop_200.sdf as dataset.Continue reading “select diverse molecules from dataset.”


GA with python

Genetic algorithm is sometime used for designing diversity compounds library set or optimising hyper parameter about QSAR etc…. I knew only one library about genetic algorithm in python ‘pyevolve’. Some days ago, I found another library deap. Development is active. I wrote sample to solve knapsack problem. Deap is installed by using pip or easy_install.Continue reading “GA with python”