Scaffold growing with RNN #RDKit #Pytorch #Chemoinformatics

My favorite molecular generator is REINVENT which is SMILES RNN based generator. Because it is very flexible and easy to modify.

And recently same group in Astrazeneca published new version of REINVENT, its title is SMILES-Based Deep Generative Scaffold Decorator for De-Novo Drug Design

It seems very exciting for me! Because there are many molecular generator in these days but there are few implementation for scaffold growing. Some approach uses conditional RNN or conditional graph based approach but it can’t specify the position of growing substituents. However their implementation can decorate user defined positions.

Fortunately the code is able to get from github!

Their approach has two model one is scaffold generator and the another is scaffold decorator. Scaffold decorator is trained with parts of compounds which pass the Rule of 3. It is important point because the idea prevents generation of too large undruggable compounds I think.

OK let’s test the code. At first, this code use pyspark so user need to install spark and pyspark at first. And pyspark work java version 8, not work version 11. Please check your java version. And another key point is that attention mechanism is used for decorator model. By using attention mechanism, the model can find the position where attach the substituents.

I think it is very reasonable and efficient approach for scaffold decoration.

You can read more details in original article so I skipped exprain session and go to code!

Are you ready? Let’s go.

Following code is almost same as in original repo. I used quinoline as an example scaffold and used sample trained model for convenience.

$> git clone 
$> cd reinvent-scaffold-decorator
$> echo "c1c([*:0])cc2c(c1)c([*:1])ccn2" > scaffold.smi
$> git clone 
$> cd reinvent-scaffold-decorator
$> echo "c1c([*:0])cc2c(c1)c([*:1])ccn2" > scaffold.smi
$> ./ -m drd2_decorator/models/model.trained.50 -i scaffold.smi -o generated_molecules.parquet -r 32 -n 32 -d multi

The code above will take few minutes in my env (GPU GTX 1650). GPU machine is recommended. After run the code generated molecules are stored as parquet file.

The file can read by using pyspark. The result image is below. You see, all decorated molecule has two substituents with defined positions. I posed scaffold and linker cutting method before. It seems interesting to apply the code I think ;)

And whole code can be found in my gist.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

RDKit + deep learning framework is good friend for chemoinformatics! Let’s enjoy!


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: