Benchmarking platform for generative models. #RDKit #Chemoinformatics #DeepLearning #guacamol

Yesterday I posted benchmarking platform named ‘moses’ and found it worked for test data. And then I could get comment from @Mufei Li, developer of DGL that how about to try guacamol. I checked guacamol before but didn’t try it. So I installed guacamol and used it.

From original repo, GuacaMol is an open source Python package for benchmarking of models for de novo molecular design.

This package is developed by BenevolentAI. This package can assess de novo molecular generator of following metrics.

  1. Validity
  2. Uniqueness
  3. Novelty
  4. Fr´echet ChemNet Distance
  5. KL divergence

5 is different point to moses. KL divergence metrics is based on molecular descriptors. It is not fingerprint base. So different structure but similar molecular property sets will show high score.

For test the package I installed guacamol and used it.

Fortunately, guacamol can install via pip ;).

And I tested two metrics one was KLDivBenchmark and the other was UniquenessBenchmark. As you can see most of the following code is same as test code of original repository.

I uploaded my code to my gist.

Moses and Guacamol are both useful package for benchmarking.

It is important that we should get benchmark data compared to same baseline. However in the real drug discovery project, molecular properties which is required in the projects are depends on their situation. So it is not big problem of generator performance because current models can generate reasonable structures I think.

We need to go beyond that… ;)