Replace core with DeLinker #RDKit #Chemoinformatics #DeepLearning

In the FBDD projects, fragment linking strategy is very easy to understand about the strategy but it is difficult to linking two fragments in the real world I think. There are many tools for linking fragments in virtually. These tools are used not only be applied to FBDD but also scaffold hopping etc.

There are few examples are reported for de novo fragment linking with deep learning compared to the de novo compound( SMILES ) generatrion.

Recently interesting package is reported in JCIM, it’s open reader can get PDF from ACS site.

The title is ‘Deep Generative Models for 3D Linker Design’ and URL is below.

The author developed python package named DeLniker which link two rdkit mol object with deep generative model.

The package is available in python3.6 with tensorflow 1.10. I would like to test DeLinker with my development env(python3.7) so I modified the code and tried to use it.

My environment was python 3.7, tensorflow-gpu 1.14, rdkit 2020.03.01

To use the env described above, I changed API was changed from tensorflow1.10 to tensorflow1.14, GRUCell should be called from compat.v1.nn.rnn_cell.

                        #cell = tf.contrib.rnn.GRUCell(new_h_dim)
                        cell = tf.compat.v1.nn.rnn_cell.GRUCell(new_h_dim)
                        #cell = tf.nn.rnn_cell.DropoutWrapper(cell,
                        #                state_keep_prob=self.placeholders['graph_state_keep_prob'])
                        cell = tf.compat.v1.nn.rnn_cell.DropoutWrapper(cell,

After changing the part, I modified example notebook from fragment linking task to core replacement task.

Following code run on jupyter notebook and example folder. Let’s test it.
At first import packages.

import sys

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Draw import MolDrawing, DrawingOptions
from rdkit.Chem import MolStandardize

import numpy as np

from itertools import product
from joblib import Parallel, delayed
import re
from collections import defaultdict

from IPython.display import clear_output
IPythonConsole.ipython_useSVG = True

from DeLinker_test import DenseGGNNChemModel
import frag_utils
import rdkit_conf_parallel
from data.prepare_data import read_file, preprocess
import example_utils
import rdkit
> '2020.03.1'

Then add some basic settings and read molecule from smiles. Following code generate 3D conformer for core replacement because DeLinker generates linker which keep fragment linking point angle and exit vector. So I need to generate 3D at first. Fortunately RDKit can do it very easy. After generating the conformer I removed core structure. Now I got side chains with 3D conformation and attachment point as *.

# How many cores for multiprocessing
n_cores = 4
# Whether to use GPU for generating molecules with DeLinker
use_gpu = True
vemurafenib = Chem.MolFromSmiles('CCCS(=O)(=O)Nc1ccc(F)c(c1F)C(=O)c2c[nH]c3c2cc(cn3)c4ccc(Cl)cc4')
core = Chem.MolFromSmiles('c12c(cc[NH]2)cccn1')

Draw.MolsToGridImage([vemurafenib, core])
tempmol = Chem.AddHs(vemurafenib)
vemurafenib_3d = Chem.RemoveHs(tempmol)

sidechains = Chem.ReplaceCore(vemurafenib_3d, core)

Ok get some query related data.

# Get distance and angle between fragments
dist, ang = frag_utils.compute_distance_and_angle(sidechains, "", Chem.MolToSmiles(sidechains))
Chem.MolToSmiles(sidechains), dist, ang
> 5.6219243402884125,
> 1.4035194537415576)

In my example code, file name and path settings are same as original example code, so if you would like to trace it be careful because the code will over write original example output.

# Write data to file
data_path = "./fragments_test_data.txt"
with open(data_path, 'w') as f:
    f.write("%s %s %s" % (Chem.MolToSmiles(sidechains), dist, ang))

raw_data = read_file(data_path)
preprocess(raw_data, "zinc", "fragments_test", True)

Almost there load model and train it.

import os
if not use_gpu:
    os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
# Arguments for DeLinker
args = defaultdict(None)
args['--dataset'] = 'zinc'
args['--config'] = '{"generation": true, \
                     "batch_size": 1, \
                     "number_of_generation_per_valid": 50, \
                     "min_atoms": 6, "max_atoms": 15, \
                     "train_file": "molecules_fragments_test.json", \
                     "valid_file": "molecules_fragments_test.json", \
                     "output_name": "DeLinker_example_generation.smi"}'
args['--freeze-graph-model'] = False
args['--restore'] = '../models/pretrained_DeLinker_model.pickle'

# Setup model and generate molecules
model = DenseGGNNChemModel(args)

Let’s read generated molecules and visualize them.

# Load molecules
generated_smiles = frag_utils.read_triples_file("./DeLinker_example_generation.smi")

in_mols = [smi[1] for smi in generated_smiles]
frag_mols = [smi[0] for smi in generated_smiles]
gen_mols = [smi[2] for smi in generated_smiles]

du = Chem.MolFromSmiles('*')
clean_frags = [Chem.MolToSmiles(Chem.RemoveHs(AllChem.ReplaceSubstructs(Chem.MolFromSmiles(smi),du,Chem.MolFromSmiles('[H]'),True)[0])) for smi in frag_mols]


# Check valid
results = []
for in_mol, frag_mol, gen_mol, clean_frag in zip(in_mols, frag_mols, gen_mols, clean_frags):
    if len(Chem.MolFromSmiles(gen_mol).GetSubstructMatch(Chem.MolFromSmiles(clean_frag)))>0:
        results.append([in_mol, frag_mol, gen_mol, clean_frag])

print("Number of generated SMILES: \t%d" % len(generated_smiles))
print("Number of valid SMILES: \t%d" % len(results))
print("%% Valid: \t\t\t%.2f%%" % (len(results)/len(generated_smiles)*100))

> Number of generated SMILES: 	500
> Number of valid SMILES: 	495
> % Valid: 			99.00%

from rdkit.Chem import Draw
from IPython.display import display
gemols = []
for res in results[:100]:
    im = Draw.MolsToGridImage([vemurafenib]+[Chem.MolFromSmiles(s) for s in res[1:3]], molsPerRow=4)

Draw.MolsToGridImage(gemols[:30], molsPerRow=4)

The model could generated new molecules with high validity. It seems nice. And generated molecules images are below.

Original molecule vemurafenib has bicyclic core (azaindole) but generated molecule has aliphatic or mono cyclic linker. I’m not sure the results are due to training set or not. I would like to check training data later.

Any way Delinker works for fragment linking task. For practically we need to filter generated molecule with other in silico tools such as docking MD etc.

It’s interesting tool however there are many tools for fragment linking, scaffold replacement in silico tools. So how to differentiate them for example MOE, OE schrodinger etc. IMHO open source package has many flexibility. Looking forward to further researches with the package.

Today’s code was uploaded my gist. Thank for DeLinker developer!

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “Replace core with DeLinker #RDKit #Chemoinformatics #DeepLearning

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: