Transfer learning of DGMG for focused library gegneration #DGL #Chemoinformatics

Transfer learning is very useful method in deeplearning. Because it can use pre trained model and can re train with few parameters.

I think it is useful for molecular generator too. If it is useful for the generator, it can use for focused library generation. I posted about DGL molecular generation. So I tried to apply transfer learning with DGL.

At first, import several packages for coding.

import os
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import RDPaths
from rdkit.Chem import Draw
from dgl.data.chem import utils
from dgl.model_zoo.chem import pretrain
from dgl.model_zoo.chem.dgmg import MoleculeEnv
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
import copy
mols = Chem.SDMolSupplier(f"{RDPaths.RDDocsDir}/Book/data/cdk2.sdf")
model = pretrain.load_pretrained('DGMG_ChEMBL_canonical')

Then made three copy of the DGMG model.

model1 = copy.deepcopy(model)
model2 = copy.deepcopy(model)
model3 = copy.deepcopy(model)

Down load utility function for chemical structure handling from DGL original repository.

!wget https://raw.githubusercontent.com/dmlc/dgl/master/examples/pytorch/model_zoo/chem/generative_models/dgmg/utils.py
!wget https://raw.githubusercontent.com/dmlc/dgl/master/examples/pytorch/model_zoo/chem/generative_models/dgmg/sascorer.py

Freeze upper layer and last choose_dest_agent layer set requires grad True.

from utils import MoleculeDataset
params = model1.parameters()
for idx, param in enumerate(params):
    param.requires_grad = False

params_ = model1.choose_dest_agent.parameters()
for idx, param in enumerate(params_):
    param.requires_grad = True
    
params = model2.parameters()
for idx, param in enumerate(params):
    param.requires_grad = False

params_ = model2.choose_dest_agent.parameters()
for idx, param in enumerate(params_):
    param.requires_grad = True

params = model3.parameters()
for idx, param in enumerate(params):
    param.requires_grad = False

params_ = model3.choose_dest_agent.parameters()
for idx, param in enumerate(params_):
    param.requires_grad = True

For convenience, each model learning only 10 epochs and check generated structure. At frist, check default output.

genmols = []
i = 0
while i < 20:
    SMILES = model(rdkit_mol=True)
    if Chem.MolFromSmiles(SMILES) is not None:
        genmols.append(Chem.MolFromSmiles(SMILES))
        i += 1
Draw.MolsToGridImage(genmols)/sourcecode]
<!-- /wp:shortcode -->

<!-- wp:image {"id":2814,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img src="https://iwatobipen.files.wordpress.com/2019/09/def1.png?w=439" alt="" class="wp-image-2814" /></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p>Defined cdk2 molecules, linear and cyclic molecules for additional learning.</p>
<!-- /wp:paragraph -->

<!-- wp:shortcode -->

atom_types = ['O', 'Cl', 'C', 'S', 'F', 'Br', 'N']
bond_types = [Chem.rdchem.BondType.SINGLE,
                               Chem.rdchem.BondType.DOUBLE,
                               Chem.rdchem.BondType.TRIPLE]
env = MoleculeEnv(atom_types, bond_types)
from utils import Subset
from utils import Optimizer
subs1 = Subset([Chem.MolToSmiles(mol) for mol in mols], 'canonical', env)
subs2 = Subset(['C1NCOC1' for _ in range(10)], 'canonical', env)
subs3 = Subset(['CNCOCC' for _ in range(10)], 'canonical', env)
loader1  = DataLoader(subs1, 1)
loader2  = DataLoader(subs2, 1)
loader3  = DataLoader(subs3, 1)

First trial is CDK2 molecules.

optimizer = Optimizer(0.1, Adam(model1.parameters(), 0.1))
model1.train()
for i in range(10):
    for data in loader1:
        optimizer.zero_grad()
        logp = model1(data, compute_log_prob=True)
        loss_averaged = - logp
        optimizer.backward_and_step(loss_averaged)
model1.eval()
genmols = []
i = 0
while i < 20:
    SMILES = model1(rdkit_mol=True)
    if Chem.MolFromSmiles(SMILES) is not None:
        genmols.append(Chem.MolFromSmiles(SMILES))
        i += 1
from rdkit.Chem import Draw
Draw.MolsToGridImage(genmols)

Hmm it seems bicyclic compound is more generated….?

Then learn cyclic molecule.

optimizer = Optimizer(0.1, Adam(model2.parameters(), 0.1))
model2.train()
for i in range(10):
    for data in loader2:
        optimizer.zero_grad()
        logp = model2(data, compute_log_prob=True)
        loss_averaged = - logp
        optimizer.backward_and_step(loss_averaged)
model2.eval()
genmols = []
i = 0
while i < 20:
    SMILES = model2(rdkit_mol=True)
    if Chem.MolFromSmiles(SMILES) is not None:
        genmols.append(Chem.MolFromSmiles(SMILES))
        i += 1
from rdkit.Chem import Draw
Draw.MolsToGridImage(genmols)

As expected, cyclic molecules are generated.

Finally let’s check linear molecule as learning data.

optimizer = Optimizer(0.1, Adam(model3.parameters(), 0.1))
model3.train()
for i in range(10):
    for data in loader3:
        optimizer.zero_grad()
        logp = model3(data, compute_log_prob=True)
        loss_averaged = - logp
        optimizer.backward_and_step(loss_averaged)
model3.eval()
genmols = []
i = 0
while i < 20:
    SMILES = model3(rdkit_mol=True)
    if Chem.MolFromSmiles(SMILES) is not None:
        genmols.append(Chem.MolFromSmiles(SMILES))
        i += 1
from rdkit.Chem import Draw
Draw.MolsToGridImage(genmols)

Wow many linear molecules are generated.

This is very simple example for transfer learning. I think it is useful method because by using the method, user don’t need to learn huge number of parameters.

My code is not so efficient because I don’t fully understand how to learn dgmg.

I would like to read document more deeply.

Reader who has interest the code, you can find whole code from following url.

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/transf.ipynb

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: