Hetero shuffling is the approach which replace atoms of scaffold and generate new molecule with atom replaced scaffold. For example benzene as core, examples of shuffled cores will be pyridine, pyrimidine etc.
The approach is often used medicinal chemistry project to improve ADMET properties, biological activities and also used for substance patent claim strategy. Native RDKit can’t do it directly. So I would like to code to do that.
Let’s try. At first always import required packages. Following code wrote on Jupyter-Notebook.
from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem import Fragments from rdkit.Chem.Draw import IPythonConsole from rdkit import rdBase import numpy as np import copy from rdkit.Chem import AllChem import itertools rdBase.DisableLog('rdApp.err
Then I defined function named checkmol which extract atoms which is not inringsize 5 and check atom symbol is not sulphur or oxygen. Because normally S or O contained six membered aromatic ring is not used for building blocks. And my code generates all combination of aromatic atoms, so I need the check function.
def checkmol(mol): arom_atoms = mol.GetAromaticAtoms() symbols = [atom.GetSymbol() for atom in arom_atoms if not atom.IsInRingSize(5)] if symbols == []: return True elif 'O' in symbols or 'S' in symbols: return False else: return True
Next I defined main function. HeteroShuffle class is required two arguments one is query molecule which would like to change scaffold and second one is query which is target core for shuffling.
The make_connectors function generates reaction objects. The objects memorize the position where the rgroups are attached the query core.
The re_construction function reconstructs molecule from r-groups and new atom shuffled core.
The generate_mols function generate shuffled heteroaromatic scaffold. The function generates possible combinations of atoms with ‘itertools.product’ method. And checkmol function filters undesirable molecules.
class HeteroShuffle(): def __init__(self, mol, query): self.mol = mol self.query = query self.subs = Chem.ReplaceCore(self.mol, self.query) self.core = Chem.ReplaceSidechains(self.mol, self.query) self.target_atomic_nums = [6, 7, 8, 16] def make_connectors(self): n = len(Chem.MolToSmiles(self.subs).split('.')) map_no = n+1 self.rxn_dict = {} for i in range(n): self.rxn_dict[i+1] = AllChem.ReactionFromSmarts('[{0}*][*:{1}].[{0}*][*:{2}]>>[*:{1}][*:{2}]'.format(i+1, map_no, map_no+1)) return self.rxn_dict def re_construct_mol(self, core): ''' re construct mols from given substructures and core ''' keys = self.rxn_dict.keys() ps = [[core]] for key in keys: ps = self.rxn_dict[key].RunReactants([ps[0][0], self.subs]) mol = ps[0][0] try: smi = Chem.MolToSmiles(mol) mol = Chem.MolFromSmiles(smi) Chem.SanitizeMol(mol) return mol except: return None def get_target_atoms(self): ''' get target atoms for replace target atoms means atoms which don't have anyatom(*) in neighbors ''' atoms = [] for atom in self.core.GetAromaticAtoms(): neighbors = [a.GetSymbol() for a in atom.GetNeighbors()] if '*' not in neighbors and atom.GetSymbol() !='*': atoms.append(atom) print(len(atoms)) return atoms def generate_mols(self): atoms = self.get_target_atoms() idxs = [atom.GetIdx() for atom in atoms] combinations = itertools.product(self.target_atomic_nums, repeat=len(idxs)) smiles_set = set() self.make_connectors() for combination in combinations: target = copy.deepcopy(self.core) #print(Chem.MolToSmiles(target)) for i, idx in enumerate(idxs): target.GetAtomWithIdx(idx).SetAtomicNum(combination[i]) smi = Chem.MolToSmiles(target) #smi = smi.replace('sH','s').replace('oH','o').replace('cH3','c') #print('rep '+smi) target = Chem.MolFromSmiles(smi) if target != None: n_attachment = len([atom for atom in target.GetAtoms() if atom.GetAtomicNum() == 0]) n_aromatic_atoms = len(list(target.GetAromaticAtoms())) if target.GetNumAtoms() - n_attachment == n_aromatic_atoms: try: mol = self.re_construct_mol(target) if checkmol(mol): smiles_set.add(Chem.MolToSmiles(mol)) except: pass mols = [Chem.MolFromSmiles(smi) for smi in smiles_set] return mols
Now ready. Let’s test the code. First example is fused six-membered hetero cycles as a query.
# Gefitinib mol1 = Chem.MolFromSmiles('COC1=C(C=C2C(=C1)N=CN=C2NC3=CC(=C(C=C3)F)Cl)OCCCN4CCOCC4') core1 = Chem.MolFromSmiles('c1ccc2c(c1)cncn2') ht=HeteroSuffle(mol1, core1) res=ht.generate_mols() Draw.MolsToGridImage(res, molsPerRow=5)

Second one is a molecule with five membered ring as a query.
# Oxaprozin mol2 = Chem.MolFromSmiles('OC(=O)CCC1=NC(=C(O1)C1=CC=CC=C1)C1=CC=CC=C1') core2 = Chem.MolFromSmiles('c1cnco1') ht=HeteroSuffle(mol2, core2) res=ht.generate_mols() Draw.MolsToGridImage(res, molsPerRow=5)

It worked. The code generates all possible combinations include undesired rings (i.e. oxygen contained six membered heteroaromatic systems). But it seems easy to generate scaffold diverse molecules.
All code can check from following URL. The code is a piece of the code of thin book for chemoinformatics, py4chemoinformatcs.
Now we started the book translation from Japanese to English with contributors.
Hi, very interesting piece of code for rapid scan on heteroaromatics:
what about pirrole nitrogens ? I expected to see the following one among the oxaprozin mates
OC(=O)CCc1cc(c2ccccc2)c([nH]1)c3ccccc3
and this one as well:
OC(=O)CCc1nc(c2ccccc2)c([nH]1)c3ccccc3
Also, nitrogen on the attachment points might be useful (as an option):
OC(=O)CCc1cc(c2ccccc2)n(c1)c3ccccc3
OC(=O)CCc1cn(c2ccccc2)c(n1)c3ccccc3
Hi Marco,
Thank you for your comment.
Pirrole nitrogens are aromatic atoms so, the code will change the nitrogen atoms to other aromatic atoms.
example_code.ipynb
hosted with ❤ by GitHub
However, in the case of substituted pirrrole nitrogen atoms should not be replaced because if the nitrogen exchanged to carbon atom, it can not satisfy the Huckel rule I think.
It is highly appreciated if you have any comments or suggestion.
Thanks.