Make RGroup decompose table with old ver. RDKit #RDKit #Chemoinformatics

Objective
Recent version of RDKit has rdRGroupDecomposition module for R Group decomposition. You know, it is very useful module for SAR analysis. However this function is not implemented in old version of rdkit (i.e. rdkit for 32 bit windows). I would like to implement similar module for 32bit windows system.

My approach with code
So, I tried to write RGroup decomposition function. Following code is slow, it needs more improvement for large molecular data sets.
At first import required packages and dataset.

import pandas as pd
from rdkit.Chem import Draw
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import IPythonConsole
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdFMCS
from rdkit.Chem import RWMol
from collections import defaultdict
from rdkit.Chem.rdDepictor import SetPreferCoordGen
SetPreferCoordGen(True)
from rdkit import RDPaths
import os
mols = [m for m in Chem.SDMolSupplier(os.path.join(RDPaths.RDDocsDir,'Book/data/cdk2.sdf'))]
for m in mols:
    AllChem.Compute2DCoords(m)

Then find MCS for molecules. I would like to extract core structure so set some options for FindMCS.

mcs = rdFMCS.FindMCS(mols[:4], completeRingsOnly=True, ringMatchesRingOnly=True)
core = Chem.MolFromSmarts(mcs.smartsString)

Then define RGroupDecomposer class for rgroup decomposition. I remove not in ring atoms from core structure because MedChem often uses it for a scaffold. To do it, RWmol module is used. It is useful for molecular editing. Following code automatically remove molecule which does not have core structure.

class RGroupDecomposer():

    def __init__(self, mols, core):
        self.mols = mols
        self.core = core
        
    def get_scaffold(self):
        emol = RWMol(self.core)
        atms = [a for a in emol.GetAtoms()]
        for a in atms:
            if not a.IsInRing():
                emol.RemoveAtom(a.GetIdx())
        return emol.GetMol()
    
    def rg_decompose(self):
        self.coremol = self.get_scaffold()
        self.match_mols = [mol for mol in self.mols if mol.HasSubstructMatch(self.coremol)]
        self.hmols = [Chem.AddHs(mol) for mol in self.match_mols]
        [AllChem.Compute2DCoords(mol) for mol in self.hmols]
        self.side_chains = [Chem.ReplaceCore(mol, self.coremol) for mol in self.hmols]
        self.core_info = Chem.ReplaceSidechains(self.hmols[0], self.coremol)
        
    def get_df(self):
        data = defaultdict(list)
        for i, side_chains in enumerate(self.side_chains):
            # Thank you for comment! fixed it
            data['mol'].apped(Chem.MolToSmiles(self.match_mols[i]))
            side_chains = Chem.MolToSmiles(side_chains).split('.')
            for idx, side_chain in enumerate(side_chains):
                side_chain = Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles(side_chain)))
                data['R_{}'.format(idx+1)].append(side_chain)
            data['core'].append(Chem.MolToSmiles(self.core_info))
            
        df = pd.DataFrame(data)
        return df

# this is helper function to convert dataframe of smiles to dataframe of ROMol.
def allsmi2rdmol(df):
    cols = df.columns.to_list()
    for col in cols:
        PandasTools.AddMoleculeColumnToFrame(df, smilesCol=str(col), molCol=str(col))
    return df

Now I finished all tasks.
Let’s check it. I used CDK2.sdf for test.

rgd = RGroupDecomposer(mols[:40], core)
rgd.rg_decompose()
df = rgd.get_df()
newdf = allsmi2rdmol(df)
from IPython.display import display_html
from IPython.display import HTML
HTML(df.to_html())

It seems work. I used HTML for svg image visualization. Usually this step is not required but my environment can’t render mol image without the step even if RenderImagesInAllDataFrames set True……

I’m not sure how to fix the problem ;-(

Today’s code is uploaded my gist.

code example for RGr decompose

Any comments and suggestions will be greatly appreciated.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

3 thoughts on “Make RGroup decompose table with old ver. RDKit #RDKit #Chemoinformatics

  1. Thanks for this post!

    One change I would suggest to fix the error where mol column is not showing the correct molecule structure in each row by adding the append command: data[‘mol’].append(Chem.MolToSmiles(self.match_mols[i]))

    or modifying RGroupDecomposer class, in the function get_df:

    def get_df(self):
        data = defaultdict(list)
        for i, side_chains in enumerate(self.side_chains):
            data['mol'].append(Chem.MolToSmiles(self.match_mols[i]))
            side_chains = Chem.MolToSmiles(side_chains).split('.')
            for idx, side_chain in enumerate(side_chains):
                side_chain = Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles(side_chain)))
                data['R_{}'.format(idx+1)].append(side_chain)
            data['core'].append(Chem.MolToSmiles(self.core_info))
    
        df = pd.DataFrame(data)
        return df
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: