Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics

Hetero shuffling is the approach which replace atoms of scaffold and generate new molecule with atom replaced scaffold. For example benzene as core, examples of shuffled cores will be pyridine, pyrimidine etc.

The approach is often used medicinal chemistry project to improve ADMET properties, biological activities and also used for substance patent claim strategy. Native RDKit can’t do it directly. So I would like to code to do that.

Let’s try. At first always import required packages. Following code wrote on Jupyter-Notebook.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import Fragments
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
import numpy as np
import copy
from rdkit.Chem import AllChem
import itertools
rdBase.DisableLog('rdApp.err

Then I defined function named checkmol which extract atoms which is not inringsize 5 and check atom symbol is not sulphur or oxygen. Because normally S or O contained six membered aromatic ring is not used for building blocks. And my code generates all combination of aromatic atoms, so I need the check function.

def checkmol(mol):
    arom_atoms = mol.GetAromaticAtoms()
    symbols = [atom.GetSymbol() for atom in arom_atoms if not atom.IsInRingSize(5)]
    if symbols == []:
        return True
    elif 'O' in symbols or 'S' in symbols:
        return False
    else:
        return True

Next I defined main function. HeteroShuffle class is required two arguments one is query molecule which would like to change scaffold and second one is query which is target core for shuffling.

The make_connectors function generates reaction objects. The objects memorize the position where the rgroups are attached the query core.

The re_construction function reconstructs molecule from r-groups and new atom shuffled core.

The generate_mols function generate shuffled heteroaromatic scaffold. The function generates possible combinations of atoms with ‘itertools.product’ method. And checkmol function filters undesirable molecules.

class HeteroShuffle():
    
    def __init__(self, mol, query):
        self.mol = mol
        self.query = query
        self.subs = Chem.ReplaceCore(self.mol, self.query)
        self.core = Chem.ReplaceSidechains(self.mol, self.query)
        self.target_atomic_nums = [6, 7, 8, 16]
    
    
    def make_connectors(self):
        n = len(Chem.MolToSmiles(self.subs).split('.'))
        map_no = n+1
        self.rxn_dict = {}
        for i in range(n):
            self.rxn_dict[i+1] = AllChem.ReactionFromSmarts('[{0}*][*:{1}].[{0}*][*:{2}]>>[*:{1}][*:{2}]'.format(i+1, map_no, map_no+1))
        return self.rxn_dict

    def re_construct_mol(self, core):
        '''
        re construct mols from given substructures and core
        '''
        keys = self.rxn_dict.keys()
        ps = [[core]]
        for key in keys:
            ps = self.rxn_dict[key].RunReactants([ps[0][0], self.subs])
        mol = ps[0][0]
        try:
            smi = Chem.MolToSmiles(mol)
            mol = Chem.MolFromSmiles(smi)
            Chem.SanitizeMol(mol)
            return mol
        except:
            return None

    def get_target_atoms(self):
        '''
        get target atoms for replace
        target atoms means atoms which don't have anyatom(*) in neighbors
        '''
        atoms = []
        for atom in self.core.GetAromaticAtoms():
            neighbors = [a.GetSymbol() for a in atom.GetNeighbors()]
            if '*' not in neighbors and atom.GetSymbol() !='*':
                atoms.append(atom)
        print(len(atoms))
        return atoms
    
    def generate_mols(self):
        atoms = self.get_target_atoms()
        idxs = [atom.GetIdx() for atom in atoms]
        combinations = itertools.product(self.target_atomic_nums, repeat=len(idxs))
        smiles_set = set()
        self.make_connectors()
        for combination in combinations:
            target = copy.deepcopy(self.core)
            #print(Chem.MolToSmiles(target))
            for i, idx in enumerate(idxs):
                target.GetAtomWithIdx(idx).SetAtomicNum(combination[i])
            smi = Chem.MolToSmiles(target)
            #smi = smi.replace('sH','s').replace('oH','o').replace('cH3','c')
            #print('rep '+smi)
            target = Chem.MolFromSmiles(smi)
            if target != None:
                n_attachment = len([atom for atom in target.GetAtoms() if atom.GetAtomicNum() == 0])
                n_aromatic_atoms = len(list(target.GetAromaticAtoms()))
                if target.GetNumAtoms() - n_attachment == n_aromatic_atoms:
                    try:
                        mol = self.re_construct_mol(target)  
                        if checkmol(mol):
                            smiles_set.add(Chem.MolToSmiles(mol))
                    except:
                        pass
        mols = [Chem.MolFromSmiles(smi) for smi in smiles_set]
        return mols

Now ready. Let’s test the code. First example is fused six-membered hetero cycles as a query.

# Gefitinib
mol1 = Chem.MolFromSmiles('COC1=C(C=C2C(=C1)N=CN=C2NC3=CC(=C(C=C3)F)Cl)OCCCN4CCOCC4')
core1 = Chem.MolFromSmiles('c1ccc2c(c1)cncn2')
ht=HeteroSuffle(mol1, core1)
res=ht.generate_mols()
Draw.MolsToGridImage(res, molsPerRow=5)

Second one is a molecule with five membered ring as a query.

#  Oxaprozin
mol2 = Chem.MolFromSmiles('OC(=O)CCC1=NC(=C(O1)C1=CC=CC=C1)C1=CC=CC=C1')
core2 =  Chem.MolFromSmiles('c1cnco1')
ht=HeteroSuffle(mol2, core2)
res=ht.generate_mols()
Draw.MolsToGridImage(res, molsPerRow=5)

It worked. The code generates all possible combinations include undesired rings (i.e. oxygen contained six membered heteroaromatic systems). But it seems easy to generate scaffold diverse molecules.

All code can check from following URL. The code is a piece of the code of thin book for chemoinformatics, py4chemoinformatcs.

https://nbviewer.jupyter.org/github/Mishima-syk/py4chemoinformatics/blob/master/notebooks/ch05_hetero_shuffle.ipynb

Now we started the book translation from Japanese to English with contributors.

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

2 thoughts on “Generate possible heteroaromatic cores from query molecule #RDKit #chemoinformatics

  1. Hi, very interesting piece of code for rapid scan on heteroaromatics:
    what about pirrole nitrogens ? I expected to see the following one among the oxaprozin mates
    OC(=O)CCc1cc(c2ccccc2)c([nH]1)c3ccccc3
    and this one as well:
    OC(=O)CCc1nc(c2ccccc2)c([nH]1)c3ccccc3

    Also, nitrogen on the attachment points might be useful (as an option):
    OC(=O)CCc1cc(c2ccccc2)n(c1)c3ccccc3
    OC(=O)CCc1cn(c2ccccc2)c(n1)c3ccccc3

  2. Hi Marco,
    Thank you for your comment.
    Pirrole nitrogens are aromatic atoms so, the code will change the nitrogen atoms to other aromatic atoms.


    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.

    However, in the case of substituted pirrrole nitrogen atoms should not be replaced because if the nitrogen exchanged to carbon atom, it can not satisfy the Huckel rule I think.
    It is highly appreciated if you have any comments or suggestion.
    Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: