select diverse molecules from dataset.

To select subset from compound library, I think about molecular diversity. So, I often use clustering . There are many clustering method in chemo-informatics area. But, today I used another approach. Use GA. My strategy is that, evaluate diversity of molecules is sum of dissimilarity. Let’s start. For convenience, I used rdkit first_prop_200.sdf as dataset.Continue reading “select diverse molecules from dataset.”

Advertisement

GA with python

Genetic algorithm is sometime used for designing diversity compounds library set or optimising hyper parameter about QSAR etc…. I knew only one library about genetic algorithm in python ‘pyevolve’. Some days ago, I found another library deap. Development is active. I wrote sample to solve knapsack problem. Deap is installed by using pip or easy_install.Continue reading “GA with python”