The article about the guideline of RNN based molecular generation #memo #chemoinformatics

I’m in summer vacation from today. Due to pandemic, we don’t have plan to go travel in this summer vacation ;( Hope the situation will go soon….

As reader know recently SMILES based de novo design is used for not only material design but also drug discovery project. Some years ago, the approach generates many invalid molecules because it is difficult to learn grammar of SMILES. However recently RNN based approach works very well also other approaches works well too GAN, Graph Based and image based(???). And chemoinformatitian can generate focused compound set with RNN generator and transfer learning technique.

I would like to introduce a nice article about guide line of SMLIES based generator.

They investigated the effect of data set and number of epochs for transfer learning. They used REINVENT(RNN based generator) and made base model with ChEMBL data set. Then preformed transfer learning with some kinds of specific data set such as target focused data, patent data etc.

I don’t describe details about the article here if reader who has interest the article please check it ;)

Their results are interesting for me. It indicates that the model which is trained large and general compound data can generate diverse of valid molecule and also indicates that it can learn specific compound feature(distribution) with small amount of compound set. For example macro cyclic compound, spiro cyclic block containing compound.

It means that to build focused library generator, user don’t need to prepare large amount of focused training data set but need to prepare general data set for learning SMILES grammar and small data set for transfer learning.

Now we can use many open source based de novo compound generator algorithm and techniques. Is there best way to do de novo design? No, it depends on our situation and requirements ;)

…. There are many publication and codes are available in these days…. I need to keep studying and opening my eyes……


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

4 thoughts on “The article about the guideline of RNN based molecular generation #memo #chemoinformatics

  1. I’m one of your blog fans. I came across your blog last week and have been following since.
    Your posts on chemoinformatics is really really precious! Keep it up.
    Wish you and your family safe and sound during this COVID pandemic.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: