New visualization way for SAR #RDKit #chemoinformatics

Recently I found very cool visualization code for SAR that is provided by @simonduerr. It seems cool!

You can get the code from his repository.

I’ve never seen such a cool visualization. Current version of the package can fix the position of the rgroup automatically.

So I tried to use the code for my own dataset.

I got SAR data from ChEMBL, its target is MPS1. At first import packages and load data.

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import PandasTools
from rdkit.Chem import Draw
from radialscope import RadialScope as rs
from radialscope import draw_with_indeces
import matplotlib.pyplot as plt
from import ScalarMappable
import svgutils.compose as sc
from IPython.display import SVG, HTML
import numpy as np
import pandas as pd
df = pd.read_csv('./data/mps.csv', sep=';')

Then define core and template for data extraction and visualization. Next retrieve molecules and R group information which has defined core structure.

core = Chem.MolFromSmiles('c2cn(C1CCCCC1)c3ncncc23')
templ =  Chem.MolFromSmiles('c2cn(C1CCCCC1)c3nc(C)ncc23')
matchmols = []
AlogP = []
std_val = []
for idx, mol in enumerate(mols):
    if mol.HasSubstructMatch(core):
        pIC50 = np.round(9-np.log10(df['Standard Value'][idx]),2)

rg = []
for mol in matchmols:
    sc = Chem.ReplaceCore(mol,core)
    sc = Chem.MolToSmiles(sc).replace('[1*]', '~')

Then, define drawing settings, I defined white_cutoff to zero because I would like to parameter with white.

settings={ 'SMILESSTRING':Chem.MolToSmiles(templ),  # use methyl groups for your rests, smiles strings can't use labels
  'use_bw_atom_theme':False,  # draw all atoms black
 'use_bold_font': True, # replace all fonts wherever possible with bold text
 'white_cutoff': 0,  #make text of labels white if greater than this number
  'scalefactor':0.7  #scale the total plot in the end, for large molecule you will need to decrease this or make the viewbox bigger in a vector software such as Inkscape

Next, define R group information. I used pIC50 and AlogP as SAR information. And other setting is same as original example.

Attach_atom_id is key parameter, it defines where SAR information place.

atom index 0 is dummy methyl group of the template.

radial_scope_setup_R1= {
    'rest_label':"R$_1$", # Label of the atom in the radial scope plot
    'no_wedges':len(rg), # Number of wedges in the Radial scope Plot
    'coverangle_wedges':180, # Degrees of a full circle the Rscope should cover
    'startangle':-30, #Start angle of the Rscope 
    'CMAPINNER':"Blues",  # Colormap of the inner circle. Max value has full color, options see below
    'CMAPOUTER':"RdPu", # Colormap of the outer circle. Max value has full color
    'OUTERLABEL':"pIC50", # Label of the outer circle
    'INNERLABEL':"AlogP", # Label of the inner circle
    'rounding':True, # ENABLE rounding if you want something like >99 
    'rounding_boundary':10,  # cutoff for displaying > 
    'value_groups':rg, # Labels for the outer circle, you can use math using $, any ~ will be interpreted as smiles string
    'attach_atom_id': 0,

Now almost there, type following code.

scope_plot=rs(settings, radial_scope_setup_R1) # add all radial_scope dictionaries to this call

It works fine!

The package can visualize multiple substituents information. However I think it should be visualize one R group information per image. Because this way is difficult to understand combination of R groups.

I don’t understand the details of the code (it’s too hacky for me) but really useful for MedChem I think.

Thanks again for sharing the nice coode.

Today’s code was uploaded my gist and github repo.