ErGFingerprint in RDKit

Sometime, medicinal Chemists think about scaffold hopping approach in drug discovery project to overcome their issue.
When I think about scaffold hopping, I consider about what’s key interaction of molecule and protein.
It’s called pharmacophore.
And also hopping approach is used me too approach to find another IP space.

BTW, In 2006, researchers in Lilly reported interesting report called ErG (Extended reduced graph approach).
URL is following.
http://pubs.acs.org/doi/abs/10.1021/ci050457y
In Figure12, they showed some example of molecules that retrieved similarity search using Daylight FP and ErgFP.
It was interesting for me because highly similar compounds in ErGFP are low tanimoto score in DaylightFP.

And the algorithm was implemented new version of rdkit! That’s Cool!

So, I want to use the function in myself.
The function can call from rdReducedGraph.
My snippet was following…

from __future__ import print_function
import os
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem, rdReducedGraphs
from rdkit.Chem import DataStructs
from rdkit import rdBase
import numpy as np
from rdkit.Chem import RDConfig
# ErG FP is not bit vect.
def calc_ergfp( fp1, fp2 ):
    denominator = np.sum( np.dot(fp1,fp1) ) + np.sum( np.dot(fp2,fp2) ) - np.sum( np.dot(fp1,fp2 ))
    numerator = np.sum( np.dot(fp1,fp2) )
    return numerator / denominator
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
from rdkit.Chem import PandasTools

docdir = RDConfig.RDDocsDir
sdfdir = os.path.join( docdir,'Book/data/cdk2.sdf')
mols = [ mol for mol in Chem.SDMolSupplier(sdfdir) if mol != None ]

#Calc ECPF4like and ErGFP
morganfps = [ AllChem.GetMorganFingerprintAsBitVect(mol,2) for mol in mols ]
ergfps = [ rdReducedGraphs.GetErGFingerprint( mol ) for mol in mols ]
#Tracking All Data
molis =[]
moljs =[]
morgantcs = [ ]
for i in range( len(morganfps) ):
    for j in range( i ):
        molis.append( Chem.MolToSmiles(mols[i]) )
        moljs.append( Chem.MolToSmiles(mols[j]) )
        tc = DataStructs.TanimotoSimilarity( morganfps[i], morganfps[j] )
        morgantcs.append( tc )

ergtcs = [ ]
for i in range( len(ergfps) ):
    for j in range( i ):
        tc = calc_ergfp( ergfps[i], ergfps[j] )
        ergtcs.append( tc )

df = pd.DataFrame( {'MORGAN':morgantcs, 'ERG':ergtcs, 'molis' : molis, 'moljs': moljs} )
PandasTools.AddMoleculeColumnToFrame( df, smilesCol='molis', molCol='ROMoli')
PandasTools.AddMoleculeColumnToFrame( df, smilesCol='moljs', molCol='ROMolj')
dfsummary = df[['ERG', "MORGAN", "ROMoli", "ROMolj"] ]
# Extract morgan tc > 0.6
dfsummary[dfsummary.MORGAN > 0.6]

Hmm,, this data set has only similar structures.
morgan tc

Then extract only ErG > 0.7

dfsummary[dfsummary.ERG > 0.7]

This set shows highly similarity only when ErG used.
ErG Tc

Finally I got over view of dataset.

sns.pairplot(dfsummary)
sns.plt.show()

scatter

ErG is more fuzzy algorithm than morgan method.
I think the function is one of the useful tools for description of molecular feature.
And I’ll try another dataset soon.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s