Sometime, medicinal Chemists think about scaffold hopping approach in drug discovery project to overcome their issue.
When I think about scaffold hopping, I consider about what’s key interaction of molecule and protein.
It’s called pharmacophore.
And also hopping approach is used me too approach to find another IP space.
BTW, In 2006, researchers in Lilly reported interesting report called ErG (Extended reduced graph approach).
URL is following.
http://pubs.acs.org/doi/abs/10.1021/ci050457y
In Figure12, they showed some example of molecules that retrieved similarity search using Daylight FP and ErgFP.
It was interesting for me because highly similar compounds in ErGFP are low tanimoto score in DaylightFP.
And the algorithm was implemented new version of rdkit! That’s Cool!
So, I want to use the function in myself.
The function can call from rdReducedGraph.
My snippet was following…
from __future__ import print_function import os from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole from rdkit.Chem import AllChem, rdReducedGraphs from rdkit.Chem import DataStructs from rdkit import rdBase import numpy as np from rdkit.Chem import RDConfig # ErG FP is not bit vect. def calc_ergfp( fp1, fp2 ): denominator = np.sum( np.dot(fp1,fp1) ) + np.sum( np.dot(fp2,fp2) ) - np.sum( np.dot(fp1,fp2 )) numerator = np.sum( np.dot(fp1,fp2) ) return numerator / denominator from matplotlib import pyplot as plt import seaborn as sns import pandas as pd from rdkit.Chem import PandasTools docdir = RDConfig.RDDocsDir sdfdir = os.path.join( docdir,'Book/data/cdk2.sdf') mols = [ mol for mol in Chem.SDMolSupplier(sdfdir) if mol != None ] #Calc ECPF4like and ErGFP morganfps = [ AllChem.GetMorganFingerprintAsBitVect(mol,2) for mol in mols ] ergfps = [ rdReducedGraphs.GetErGFingerprint( mol ) for mol in mols ] #Tracking All Data molis =[] moljs =[] morgantcs = [ ] for i in range( len(morganfps) ): for j in range( i ): molis.append( Chem.MolToSmiles(mols[i]) ) moljs.append( Chem.MolToSmiles(mols[j]) ) tc = DataStructs.TanimotoSimilarity( morganfps[i], morganfps[j] ) morgantcs.append( tc ) ergtcs = [ ] for i in range( len(ergfps) ): for j in range( i ): tc = calc_ergfp( ergfps[i], ergfps[j] ) ergtcs.append( tc ) df = pd.DataFrame( {'MORGAN':morgantcs, 'ERG':ergtcs, 'molis' : molis, 'moljs': moljs} ) PandasTools.AddMoleculeColumnToFrame( df, smilesCol='molis', molCol='ROMoli') PandasTools.AddMoleculeColumnToFrame( df, smilesCol='moljs', molCol='ROMolj') dfsummary = df[['ERG', "MORGAN", "ROMoli", "ROMolj"] ] # Extract morgan tc > 0.6 dfsummary[dfsummary.MORGAN > 0.6]
Hmm,, this data set has only similar structures.
Then extract only ErG > 0.7
dfsummary[dfsummary.ERG > 0.7]
This set shows highly similarity only when ErG used.
Finally I got over view of dataset.
sns.pairplot(dfsummary) sns.plt.show()
ErG is more fuzzy algorithm than morgan method.
I think the function is one of the useful tools for description of molecular feature.
And I’ll try another dataset soon.
Hi iwatobipen,
I’m really interested in your blogs. I have some questions about ErGFingerprint. Do you know how to generate the reduced graph for a molecule? I’m working on this recently but I can’t find a function that can be used to produce a reduced graph sequence like (Hf][Cu][Sc][V]=[Sc]). If you have any ideas, please give me some comments! Thanks in advance.
Hi Zheng,
I think metal atoms are not defined in the function.
I’m not good at C++ but seeing following code, metal atoms are not defined. Because this function is designed for scaffold hopping of small molecules.
https://github.com/rdkit/rdkit/blob/d41752d558bf7200ab67b98cdd9e37f1bdd378de/Code/GraphMol/ReducedGraphs/ReducedGraphs.cpp
Sorry for helping you.
Kind regards,
Hi Pen,
Sorry, I think I misunderstood you. It’s not a metal atom but a pharmacophore center (refer to DOI: 10.1021/acs.jcim.8b00626), where [Sc], for example, is chosen to denote an aromatic ring. I believe that you may be interested in this representation.
Best,