ErGFingerprint in RDKit

Sometime, medicinal Chemists think about scaffold hopping approach in drug discovery project to overcome their issue.
When I think about scaffold hopping, I consider about what’s key interaction of molecule and protein.
It’s called pharmacophore.
And also hopping approach is used me too approach to find another IP space.

BTW, In 2006, researchers in Lilly reported interesting report called ErG (Extended reduced graph approach).
URL is following.
http://pubs.acs.org/doi/abs/10.1021/ci050457y
In Figure12, they showed some example of molecules that retrieved similarity search using Daylight FP and ErgFP.
It was interesting for me because highly similar compounds in ErGFP are low tanimoto score in DaylightFP.

And the algorithm was implemented new version of rdkit! That’s Cool!

So, I want to use the function in myself.
The function can call from rdReducedGraph.
My snippet was following…

from __future__ import print_function
import os
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem, rdReducedGraphs
from rdkit.Chem import DataStructs
from rdkit import rdBase
import numpy as np
from rdkit.Chem import RDConfig
# ErG FP is not bit vect.
def calc_ergfp( fp1, fp2 ):
    denominator = np.sum( np.dot(fp1,fp1) ) + np.sum( np.dot(fp2,fp2) ) - np.sum( np.dot(fp1,fp2 ))
    numerator = np.sum( np.dot(fp1,fp2) )
    return numerator / denominator
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
from rdkit.Chem import PandasTools

docdir = RDConfig.RDDocsDir
sdfdir = os.path.join( docdir,'Book/data/cdk2.sdf')
mols = [ mol for mol in Chem.SDMolSupplier(sdfdir) if mol != None ]

#Calc ECPF4like and ErGFP
morganfps = [ AllChem.GetMorganFingerprintAsBitVect(mol,2) for mol in mols ]
ergfps = [ rdReducedGraphs.GetErGFingerprint( mol ) for mol in mols ]
#Tracking All Data
molis =[]
moljs =[]
morgantcs = [ ]
for i in range( len(morganfps) ):
    for j in range( i ):
        molis.append( Chem.MolToSmiles(mols[i]) )
        moljs.append( Chem.MolToSmiles(mols[j]) )
        tc = DataStructs.TanimotoSimilarity( morganfps[i], morganfps[j] )
        morgantcs.append( tc )

ergtcs = [ ]
for i in range( len(ergfps) ):
    for j in range( i ):
        tc = calc_ergfp( ergfps[i], ergfps[j] )
        ergtcs.append( tc )

df = pd.DataFrame( {'MORGAN':morgantcs, 'ERG':ergtcs, 'molis' : molis, 'moljs': moljs} )
PandasTools.AddMoleculeColumnToFrame( df, smilesCol='molis', molCol='ROMoli')
PandasTools.AddMoleculeColumnToFrame( df, smilesCol='moljs', molCol='ROMolj')
dfsummary = df[['ERG', "MORGAN", "ROMoli", "ROMolj"] ]
# Extract morgan tc > 0.6
dfsummary[dfsummary.MORGAN > 0.6]

Hmm,, this data set has only similar structures.
morgan tc

Then extract only ErG > 0.7

dfsummary[dfsummary.ERG > 0.7]

This set shows highly similarity only when ErG used.
ErG Tc

Finally I got over view of dataset.

sns.pairplot(dfsummary)
sns.plt.show()

scatter

ErG is more fuzzy algorithm than morgan method.
I think the function is one of the useful tools for description of molecular feature.
And I’ll try another dataset soon.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

3 thoughts on “ErGFingerprint in RDKit

  1. Hi iwatobipen,

    I’m really interested in your blogs. I have some questions about ErGFingerprint. Do you know how to generate the reduced graph for a molecule? I’m working on this recently but I can’t find a function that can be used to produce a reduced graph sequence like (Hf][Cu][Sc][V]=[Sc]). If you have any ideas, please give me some comments! Thanks in advance.

  2. Hi Pen,
    Sorry, I think I misunderstood you. It’s not a metal atom but a pharmacophore center (refer to DOI: 10.1021/acs.jcim.8b00626), where [Sc], for example, is chosen to denote an aromatic ring. I believe that you may be interested in this representation.

    Best,

Leave a reply to Shuangjia Zheng Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.