Recently there are lots of articles about de novo design of molecules using deep learning. And each article use own metrics for generated molecules. Mainly used metrics is percentage of valid molecules, distribution of phys-chem prop and molecular diversity based on fingerprint.
Following article describes new method for generative models for molecules.
The authors proposed “Fréchet ChemblNet Distance(FID)”. The distance shows distance of distribution between generated molecules and ChEMBL molecules. ChemblNet is based on biological and chemical similarity, it is indicator of drug likeness.
The authors made their own generative model based on LSTM and took a benchmark with some generative models reported before.
In fig3, their generative model(JKU net) shows good performance (the smallest FCD). It is interesting for me that GAN model ORGAN did not show good performance.
Their model uses end torken and encoding of multi pull strings. i.e. Cl => L
FCD is useful indicator for generative model. I would like to test the method ASAP
BYW, how about FCD between chembl and pharmaceutical company’s compound.
Readers who are in pharma how about try it? ;-)
The detail of Fréchet Distance is described below.