Enumerate partial heteroaromatic rings in a molecule #RDKit #Chemoinformatics

I posted hetero shuffling before. It worked well but redundant. There is a nice code in RDKit UGM2017 material. URL is below.

https://github.com/rdkit/UGM_2017/blob/master/Notebooks/Cole-Enumerate-Heterocycles.ipynb

The code defined transformation with hard coding and seems nice.

In case of real project, we sometime would like to do enumeration against partial substructure not all structure. I thought how to do it.

Fortunately RDKit can do it by setting “_protected” property of Atoms. It is worth to know (you know, the approach is described in RDKit document of course!).

Following code is almost borrowed form the UGM material. Thanks for sharing nice code. Import packages, read Reaction data and reaction objects at first.

 
from __future__ import print_function
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
import copy
import numpy as np

import pandas as pd
csvfile = './data/heterocycle_reactions.csv'

import csv
smarts_reader = csv.DictReader(open(csvfile))
REACTIONS = []
for row in smarts_reader:
    smarts = row['SMARTS']
    if not smarts:
        continue

    for product in row['CONVERT_TO'].split(','):
        reaction = smarts + '>>' + product
        REACTIONS.append(AllChem.ReactionFromSmarts(reaction))

Then define some functions. I used mol object as an input directly instead of SMILES.

, isomericSmiles=True)
if isosmi in unique:
continue
unique.add(isosmi)
Chem.SanitizeMol(newmol[0])
yield newmol[0]

def enumerate_heterocycles(mol):
start = mol
starting_points = [start]
seen = set()
while starting_points:
for newmol in get_unique_products(starting_points.pop()):
newmol_smiles = Chem.MolToSmiles(newmol)
if newmol_smiles in seen:
continue
starting_points.append(newmol)
seen.add(newmol_smiles)
yield newmol

Now ready to check it.

I used capivasertib which is kinase inhibitor as an example.

 
rwmol = Chem.RWMol(mcs_mol)

rwconf = Chem.Conformer(rwmol.GetNumAtoms())
matches = rwmol.GetSubstructMatch(mcs_mol)

ref_conf = mol1.GetConformer()
for i, match in enumerate(matches):
    print(ref_conf.GetAtomPosition(ref_match[i]).x)
    # Added atom position information from reference molecule
    rwconf.SetAtomPosition(match, ref_conf.GetAtomPosition(ref_match[i]))
rwmol.AddConformer(rwconf)

Check reference molecule and query molecule structure. I made two molobjects one is non protected and the other is protected atom excepting phenyl ring.

 
capivasertib = Chem.MolFromSmiles('c1cc(ccc1[C@H](CCO)NC(=O)C2(CCN(CC2)c3c4cc[nH]c4ncn3)N)Cl')

protected_capivasertib = copy.deepcopy(capivasertib)
atoms = protected_capivasertib.GetAtoms()
phenyl = Chem.MolFromSmiles('c1ccccc1')
mactches = protected_capivasertib.GetSubstructMatches(phenyl)
arr = np.array(mactches)
matches = arr.flatten()
for atom in atoms:
    if atom.GetIdx() not in matches:
        atom.SetProp('_protected', '1')
capivasertib
capivasertib

Let’s check it.

Enumerated hetero shuffled derivative from non protected and protected molecules. Then use ConstrainEmbed method.

Lots of molecules are generated from non protected molecule!

 
enume1 = list(enumerate_heterocycles(capivasertib))
enume2 = list(enumerate_heterocycles(protected_capivasertib))
print(len(enume1), len(enume2))
> 2592 9

And following results shows effect of “_protected” prop. It is very useful I think. RDKit has many cool features for chemoinformatics.

sourcecode language=”python” wraplines=”false” collapse=”false”]
Draw.MolsToGridImage(enume1[:10], molsPerRow=5)
Draw.MolsToGridImage(enume2[:10], molsPerRow=5)
[/sourcecode]

NON PROTECTED
PROTECTED

Lower figure shows hetero shuffled molecules at only phenyl rings.

I uploaded today’s code to my gist and repo.

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/Protect%20and%20enumerate%20heterocycles.ipynb

example
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.