TIPS of rdkit #RDKit #memorandum #chemoinformatics

As you know, rdkit is very attractive and active project for chemoinformatics. Recent version of rdkit has heteroshuffle enumerator. It is useful for generate new molecules. You can find the details of the method below.

However in the drug discovery project, it is not needed enumerate all aromatic rings in the molecule I think. So enumeration of targeted ring is useful I think.

Fortunately EnumerateHeterocyles uses rdkit Reaction method by using defined reaction rules. It means that specific atoms can protect with setting atom property. Let’s write code! Following code I use omeprazole as example. And I got ring information to protect specific atoms.

from rdkit import rdBase
from rdkit import Chem
from rdkit.Chem import EnumerateHeterocycles
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
import copy
omeprazole = Chem.MolFromSmiles('CC1=CN=C(C(=C1OC)C)CS(=O)C2=NC3=C(N2)C=C(C=C3)OC')
ringinfo = omeprazole.GetRingInfo()
rings = ringinfo.AtomRings()

At first enumerate heterocycles without any restriction.

res = EnumerateHeterocycles.EnumerateHeterocycles(omeprazole)
res = [m for m in res]
> 384

Check generated molecules.

Draw.MolsToGridImage(res[:20], molsPerRow=3)

Next, protect pyridine ring and generate heterocycle of bicyclic moiety. As expected, number of generated molecules are reduced.

protected_omeprazole = copy.deepcopy(omeprazole)
for atmidx in rings[0]:
    atom = protected_omeprazole.GetAtomWithIdx(atmidx)
    atom.SetProp('_protected', '1')
res2 = EnumerateHeterocycles.EnumerateHeterocycles(protected_omeprazole)
res2 = [m for m in res2]
> 96

Check the structures.

Draw.MolsToGridImage(res2[:50], molsPerRow=5)

Protect works fine!

And I checked whether the function keep 3D structure of the molecule.

Also I worked fine.

So it is useful for generate specific focused library.

I uploaded the code to gist.