Enumerate partial heteroaromatic rings in a molecule #RDKit #Chemoinformatics

I posted hetero shuffling before. It worked well but redundant. There is a nice code in RDKit UGM2017 material. URL is below.

https://github.com/rdkit/UGM_2017/blob/master/Notebooks/Cole-Enumerate-Heterocycles.ipynb

The code defined transformation with hard coding and seems nice.

In case of real project, we sometime would like to do enumeration against partial substructure not all structure. I thought how to do it.

Fortunately RDKit can do it by setting “_protected” property of Atoms. It is worth to know (you know, the approach is described in RDKit document of course!).

Following code is almost borrowed form the UGM material. Thanks for sharing nice code. Import packages, read Reaction data and reaction objects at first.

 
from __future__ import print_function
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
from rdkit.Chem.Draw import IPythonConsole
import copy
import numpy as np

import pandas as pd
csvfile = './data/heterocycle_reactions.csv'

import csv
smarts_reader = csv.DictReader(open(csvfile))
REACTIONS = []
for row in smarts_reader:
    smarts = row['SMARTS']
    if not smarts:
        continue

    for product in row['CONVERT_TO'].split(','):
        reaction = smarts + '>>' + product
        REACTIONS.append(AllChem.ReactionFromSmarts(reaction))

Then define some functions. I used mol object as an input directly instead of SMILES.

def get_unique_products(mol):
    unique = set()
    for rxn in REACTIONS:
        for newmol in rxn.RunReactants((mol,)):
            isosmi = Chem.MolToSmiles(newmol[0], isomericSmiles=True)
            if isosmi in unique:
                continue
            unique.add(isosmi)
            Chem.SanitizeMol(newmol[0])
            yield newmol[0]

def enumerate_heterocycles(mol):
    start = mol
    starting_points = [start]
    seen = set()
    while starting_points:
        for newmol in get_unique_products(starting_points.pop()):
            newmol_smiles = Chem.MolToSmiles(newmol)
            if newmol_smiles in seen:
                continue
            starting_points.append(newmol)
            seen.add(newmol_smiles)
            yield newmol

Now ready to check it.

I used capivasertib which is kinase inhibitor as an example.

 
rwmol = Chem.RWMol(mcs_mol)

rwconf = Chem.Conformer(rwmol.GetNumAtoms())
matches = rwmol.GetSubstructMatch(mcs_mol)

ref_conf = mol1.GetConformer()
for i, match in enumerate(matches):
    print(ref_conf.GetAtomPosition(ref_match[i]).x)
    # Added atom position information from reference molecule
    rwconf.SetAtomPosition(match, ref_conf.GetAtomPosition(ref_match[i]))
rwmol.AddConformer(rwconf)

Check reference molecule and query molecule structure. I made two molobjects one is non protected and the other is protected atom excepting phenyl ring.

 
capivasertib = Chem.MolFromSmiles('c1cc(ccc1[C@H](CCO)NC(=O)C2(CCN(CC2)c3c4cc[nH]c4ncn3)N)Cl')

protected_capivasertib = copy.deepcopy(capivasertib)
atoms = protected_capivasertib.GetAtoms()
phenyl = Chem.MolFromSmiles('c1ccccc1')
mactches = protected_capivasertib.GetSubstructMatches(phenyl)
arr = np.array(mactches)
matches = arr.flatten()
for atom in atoms:
    if atom.GetIdx() not in matches:
        atom.SetProp('_protected', '1')
capivasertib
capivasertib

Let’s check it.

Enumerated hetero shuffled derivative from non protected and protected molecules. Then use ConstrainEmbed method.

Lots of molecules are generated from non protected molecule!

 
enume1 = list(enumerate_heterocycles(capivasertib))
enume2 = list(enumerate_heterocycles(protected_capivasertib))
print(len(enume1), len(enume2))
> 2592 9

And following results shows effect of “_protected” prop. It is very useful I think. RDKit has many cool features for chemoinformatics.

 
Draw.MolsToGridImage(enume1[:10], molsPerRow=5)
Draw.MolsToGridImage(enume2[:10], molsPerRow=5)
NON PROTECTED
PROTECTED

Lower figure shows hetero shuffled molecules at only phenyl rings.

I uploaded today’s code to my gist and repo.

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/Protect%20and%20enumerate%20heterocycles.ipynb

example

Constrain Embedding with MCS as a core #RDKit #Chemoinformatics

Recently I’m struggling to generate 3D structure with RDKit. I would like to generate constrained 3D structure of target molecule with reference molecule.

You know, there are very nice articles are found in RDKit blog site. URLs are below.
http://rdkit.blogspot.com/2013/12/using-allchemconstrainedembed.html
http://rdkit.blogspot.com/2019/01/more-on-constrained-embedding.html

These posts are describing how to use ConstrainedEmbed and EmbedMolecule with coorMaps option. ConstrainedEmbed method. Regarding the posts, ConstrainedEmbed method is more suitable way to generate constrained 3D structure.

So I tried to use ConstrainedEmbed method with MCS as a core. But, molecule from smarts which can not sanitize could not generate 3D structure due to valence error.

To address the issue, I tested to use RWMol object as a query.

Following code, I used ethyl benzene as a query. At first I got MCS from two molecules. And generate mcs mol objects.

 
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdFMCS
from rdkit.Chem.Draw import IPythonConsole
mol1 = Chem.MolFromSmiles('CCc1ccccc1')
mol2 = Chem.MolFromSmiles('CCc1nnccc1')
mcs = rdFMCS.FindMCS([mol1, mol2])
mcs_mol = Chem.MolFromSmarts(mcs.smartsString)

Then get substructure match atom index of ref and target molecules.

 
ref_match = mol1.GetSubstructMatch(mcs_mol)
target_match = mol2.GetSubstructMatch(mcs_mol)

Then I made rwmol for conformer generation of mcs_mol. To do it, I set atom position by using reference molecule. Chem.Conformer method generates conformer object. And added conformer to rwmol object with AddConformer method.

 
rwmol = Chem.RWMol(mcs_mol)

rwconf = Chem.Conformer(rwmol.GetNumAtoms())
matches = rwmol.GetSubstructMatch(mcs_mol)

ref_conf = mol1.GetConformer()
for i, match in enumerate(matches):
    print(ref_conf.GetAtomPosition(ref_match[i]).x)
    # Added atom position information from reference molecule
    rwconf.SetAtomPosition(match, ref_conf.GetAtomPosition(ref_match[i]))
rwmol.AddConformer(rwconf)

Check reference molecule and query molecule structure.

 
IPythonConsole.drawMol3D(mol1, size=(200,200))
IPythonConsole.drawMol3D(rwmol)

Now I can get 3D query molecule!

Then use ConstrainEmbed method.

 
AllChem.ConstrainedEmbed(mol2, rwmol)
IPythonConsole.drawMol3D(mol2)
 
print(Chem.MolToMolBlock(mol1))
print(Chem.MolToMolBlock(mol2))
     RDKit          3D

  8  8  0  0  0  0  0  0  0  0999 V2000
    2.7975    0.0839    0.7058 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9966    0.0409   -0.5636 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5291    0.0093   -0.2802 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0984   -1.2078   -0.1315 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4446   -1.2253    0.1282 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1521   -0.0544    0.2377 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4891    1.1401    0.0826 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1390    1.2133   -0.1791 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  7  8  2  0
  8  3  1  0
M  END


     RDKit          3D

  8  8  0  0  0  0  0  0  0  0999 V2000
    2.7972    0.0849    0.7132 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0013    0.0551   -0.5760 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5443    0.0073   -0.2790 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1738    1.1566   -0.1708 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4845    1.1340    0.0822 N   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1546   -0.0385    0.2385 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4628   -1.2423    0.1318 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0973   -1.2199   -0.1320 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  7  8  2  0
  8  3  1  0
M  END

It worked. This example is very simple case. I would like to check more complex case later.
Today’s code uploaded below.

https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/constrained_embedding.ipynb

code example

Hetero shuffle and halogen replacement of molecule #RDKit #Chemoinformatics

I posted comparison of rdChemReactions and EditableMol before. In the post I wrote rdChemReaction could not keep position of original molecule. But…. It is due to my wrong query.

Today I checked some function of RDKit. I tried to replace aromatic carbon to nitrogen and replace hydrogen which is attached aromatic carbon to fluorine and checked their position.

The code is below.

You can see today’s my code keeps geometry both replace aromatic carobon and replace hydrogen to fluorine case.

I would like to learn reaction grammar ASAP. It is very important for chemoirnfomatics!