Benchmarking platform for generative models. #RDKit #Chemoinformatics #DeepLearning #moses

There are lots of publications about molecular generators. Each publication implements novel algorithms so we need tool for comparing these models that which is better for us.

I often use PCA, tSNE for chemical space visualization and calculate some scores such as QED, SA/SC Score and molecular properties. However I need the unified metrics. So I think Molecular Sets(MOSES) is nice tool to do it.

MOSES provides useful metrics shown below.

  1. Fragment similarity (FRAG) which is defined as the cosine distance between vectors of fragment frequencies.
  2. Scaffold similarity (SCAFF) which is defined as similar as FRAG but the metrics uses frequency of scaffolds instead of fragments.
  3. Nearest neighbor similarity (SNN) which is the average Tanimoto similarity between test molecules and generated molecules.
  4. Internal diversity (IntDiv_p) assesses the chemical diversity within the generated set of molecules. p=1 or 2
    IntDiv_p(G)=1-p\sqrt{\frac{1}{|G2|}-\sum_T(m_1, m_2)^p}
  5. Freched ChemNet Distance (FCD) which uses the penultimate layer of ChemNet and measure distance of reference molecules and generated molecules.

Fortunately source code of MOSES are freely available. I installed moses in my PC and test it.

MOSES repository provides install script, so it is easy to install moses and required packages. I modified and install it because I use torch=1.2.0 but original requires ver 1.1.0. So I commented out the line 16 of

After installed the package, I used moses from jupyter notebook. All molecules are borrowed from test script of the repository.

Following code is an example. It is easy to get metrics, just calling metrics.get_all_metrics(testmolecules, generatedmolecules). As shown below the method calculate all metrics which are implemented MOSES and show some additional properties such as ratio of valid molecule, qed, molwt etc.

import pandas as pd
from rdkit import Chem

from moses import metrics

test = ['Oc1ccccc1-c1cccc2cnccc12','COc1cccc(NC(=O)Cc2coc3ccc(OC)cc23)c1']
test_sf = ['COCc1nnc(NC(=O)COc2ccc(C(C)(C)C)cc2)s1',
gen = ['CNC', 'Oc1ccccc1-c1cccc2cnccc12',

metrics.get_all_metrics(test, gen, k=3)

>> out

{'valid': 0.8,
 'unique@3': 1.0,
 'FCD/Test': 52.58373533265676,
 'SNN/Test': 0.3152585653588176,
 'Frag/Test': 0.30000000000000004,
 'Scaf/Test': 0.5,
 'IntDiv': 0.7189187332987785,
 'IntDiv2': 0.49790709357032226,
 'Filters': 0.75,
 'logP': 4.9581881764518005,
 'SA': 0.5086898026154574,
 'QED': 0.045033731661603064,
 'NP': 0.2902816615644048,
 'weight': 14761.927533455337}

I think moses is very useful tool for checking performance of molecular generator. Thanks the author for developing such as a nice tool for in silico drug discovery!

Today’s my code uploaded gist.

Enjoy chemoinformatics! ;)

6 thoughts on “Benchmarking platform for generative models. #RDKit #Chemoinformatics #DeepLearning #moses

      • I tried some metrics they suggested for distribution learning, but not the whole benchmark. Would love to know your feedback :)

      • Hi, did you try guacamol, moses, both or other?
        What kind of metrics would you like to get?
        Currently I tried it in my personal computer so dataset is very small. I need to try more large dataset. ;)

  1. @iwatobipen, mostly it’s just metrics like validity, novelty, uniqueness and the distribution discrepancy for common descriptors like logP, SAS, QED, MW.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.