Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics

Hetero shuffling is the approach which replace atoms of scaffold and generate new molecule with atom replaced scaffold. For example benzene as core, examples of shuffled cores will be pyridine, pyrimidine etc.

The approach is often used medicinal chemistry project to improve ADMET properties, biological activities and also used for substance patent claim strategy. Native RDKit can’t do it directly. So I would like to code to do that.

Let’s try. At first always import required packages. Following code wrote on Jupyter-Notebook.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import Fragments
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
import numpy as np
import copy
from rdkit.Chem import AllChem
import itertools
rdBase.DisableLog('rdApp.err

Then I defined function named checkmol which extract atoms which is not inringsize 5 and check atom symbol is not sulphur or oxygen. Because normally S or O contained six membered aromatic ring is not used for building blocks. And my code generates all combination of aromatic atoms, so I need the check function.

def checkmol(mol):
    arom_atoms = mol.GetAromaticAtoms()
    symbols = [atom.GetSymbol() for atom in arom_atoms if not atom.IsInRingSize(5)]
    if symbols == []:
        return True
    elif 'O' in symbols or 'S' in symbols:
        return False
    else:
        return True

Next I defined main function. HeteroShuffle class is required two arguments one is query molecule which would like to change scaffold and second one is query which is target core for shuffling.

The make_connectors function generates reaction objects. The objects memorize the position where the rgroups are attached the query core.

The re_construction function reconstructs molecule from r-groups and new atom shuffled core.

The generate_mols function generate shuffled heteroaromatic scaffold. The function generates possible combinations of atoms with ‘itertools.product’ method. And checkmol function filters undesirable molecules.

class HeteroShuffle():
    
    def __init__(self, mol, query):
        self.mol = mol
        self.query = query
        self.subs = Chem.ReplaceCore(self.mol, self.query)
        self.core = Chem.ReplaceSidechains(self.mol, self.query)
        self.target_atomic_nums = [6, 7, 8, 16]
    
    
    def make_connectors(self):
        n = len(Chem.MolToSmiles(self.subs).split('.'))
        map_no = n+1
        self.rxn_dict = {}
        for i in range(n):
            self.rxn_dict[i+1] = AllChem.ReactionFromSmarts('[{0}*][*:{1}].[{0}*][*:{2}]>>[*:{1}][*:{2}]'.format(i+1, map_no, map_no+1))
        return self.rxn_dict

    def re_construct_mol(self, core):
        '''
        re construct mols from given substructures and core
        '''
        keys = self.rxn_dict.keys()
        ps = [[core]]
        for key in keys:
            ps = self.rxn_dict[key].RunReactants([ps[0][0], self.subs])
        mol = ps[0][0]
        try:
            smi = Chem.MolToSmiles(mol)
            mol = Chem.MolFromSmiles(smi)
            Chem.SanitizeMol(mol)
            return mol
        except:
            return None

    def get_target_atoms(self):
        '''
        get target atoms for replace
        target atoms means atoms which don't have anyatom(*) in neighbors
        '''
        atoms = []
        for atom in self.core.GetAromaticAtoms():
            neighbors = [a.GetSymbol() for a in atom.GetNeighbors()]
            if '*' not in neighbors and atom.GetSymbol() !='*':
                atoms.append(atom)
        print(len(atoms))
        return atoms
    
    def generate_mols(self):
        atoms = self.get_target_atoms()
        idxs = [atom.GetIdx() for atom in atoms]
        combinations = itertools.product(self.target_atomic_nums, repeat=len(idxs))
        smiles_set = set()
        self.make_connectors()
        for combination in combinations:
            target = copy.deepcopy(self.core)
            #print(Chem.MolToSmiles(target))
            for i, idx in enumerate(idxs):
                target.GetAtomWithIdx(idx).SetAtomicNum(combination[i])
            smi = Chem.MolToSmiles(target)
            #smi = smi.replace('sH','s').replace('oH','o').replace('cH3','c')
            #print('rep '+smi)
            target = Chem.MolFromSmiles(smi)
            if target != None:
                n_attachment = len([atom for atom in target.GetAtoms() if atom.GetAtomicNum() == 0])
                n_aromatic_atoms = len(list(target.GetAromaticAtoms()))
                if target.GetNumAtoms() - n_attachment == n_aromatic_atoms:
                    try:
                        mol = self.re_construct_mol(target)  
                        if checkmol(mol):
                            smiles_set.add(Chem.MolToSmiles(mol))
                    except:
                        pass
        mols = [Chem.MolFromSmiles(smi) for smi in smiles_set]
        return mols

Now ready. Let’s test the code. First example is fused six-membered hetero cycles as a query.

# Gefitinib
mol1 = Chem.MolFromSmiles('COC1=C(C=C2C(=C1)N=CN=C2NC3=CC(=C(C=C3)F)Cl)OCCCN4CCOCC4')
core1 = Chem.MolFromSmiles('c1ccc2c(c1)cncn2')
ht=HeteroSuffle(mol1, core1)
res=ht.generate_mols()
Draw.MolsToGridImage(res, molsPerRow=5)

Second one is a molecule with five membered ring as a query.

#  Oxaprozin
mol2 = Chem.MolFromSmiles('OC(=O)CCC1=NC(=C(O1)C1=CC=CC=C1)C1=CC=CC=C1')
core2 =  Chem.MolFromSmiles('c1cnco1')
ht=HeteroSuffle(mol2, core2)
res=ht.generate_mols()
Draw.MolsToGridImage(res, molsPerRow=5)

It worked. The code generates all possible combinations include undesired rings (i.e. oxygen contained six membered heteroaromatic systems). But it seems easy to generate scaffold diverse molecules.

All code can check from following URL. The code is a piece of the code of thin book for chemoinformatics, py4chemoinformatcs.

https://nbviewer.jupyter.org/github/Mishima-syk/py4chemoinformatics/blob/master/notebooks/ch05_hetero_shuffle.ipynb

Now we started the book translation from Japanese to English with contributors.

2 thoughts on “Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics”

Marco says:

01/05/2019 at 21:50

Hi, very interesting piece of code for rapid scan on heteroaromatics:
what about pirrole nitrogens ? I expected to see the following one among the oxaprozin mates
OC(=O)CCc1cc(c2ccccc2)c([nH]1)c3ccccc3
and this one as well:
OC(=O)CCc1nc(c2ccccc2)c([nH]1)c3ccccc3

Also, nitrogen on the attachment points might be useful (as an option):
OC(=O)CCc1cc(c2ccccc2)n(c1)c3ccccc3
OC(=O)CCc1cn(c2ccccc2)c(n1)c3ccccc3

iwatobipen says:

06/05/2019 at 21:59

Hi Marco,
Thank you for your comment.
Pirrole nitrogens are aromatic atoms so, the code will change the nitrogen atoms to other aromatic atoms.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

<br /> Viewer requires iframe.<br />

view raw

example_code.ipynb

hosted with ❤ by GitHub

However, in the case of substituted pirrrole nitrogen atoms should not be replaced because if the nitrogen exchanged to carbon atom, it can not satisfy the Huckel rule I think.
It is highly appreciated if you have any comments or suggestion.
Thanks.

Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics

Published by iwatobipen

2 thoughts on “Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics”

Leave a comment Cancel reply

Related

Published by iwatobipen

2 thoughts on “Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics”

Leave a comment Cancel reply