ErGFingerprint in RDKit

Sometime, medicinal Chemists think about scaffold hopping approach in drug discovery project to overcome their issue.
When I think about scaffold hopping, I consider about what’s key interaction of molecule and protein.
It’s called pharmacophore.
And also hopping approach is used me too approach to find another IP space.

BTW, In 2006, researchers in Lilly reported interesting report called ErG (Extended reduced graph approach).
URL is following.
http://pubs.acs.org/doi/abs/10.1021/ci050457y
In Figure12, they showed some example of molecules that retrieved similarity search using Daylight FP and ErgFP.
It was interesting for me because highly similar compounds in ErGFP are low tanimoto score in DaylightFP.

And the algorithm was implemented new version of rdkit! That’s Cool!

So, I want to use the function in myself.
The function can call from rdReducedGraph.
My snippet was following…

from __future__ import print_function
import os
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem, rdReducedGraphs
from rdkit.Chem import DataStructs
from rdkit import rdBase
import numpy as np
from rdkit.Chem import RDConfig
# ErG FP is not bit vect.
def calc_ergfp( fp1, fp2 ):
    denominator = np.sum( np.dot(fp1,fp1) ) + np.sum( np.dot(fp2,fp2) ) - np.sum( np.dot(fp1,fp2 ))
    numerator = np.sum( np.dot(fp1,fp2) )
    return numerator / denominator
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
from rdkit.Chem import PandasTools

docdir = RDConfig.RDDocsDir
sdfdir = os.path.join( docdir,'Book/data/cdk2.sdf')
mols = [ mol for mol in Chem.SDMolSupplier(sdfdir) if mol != None ]

#Calc ECPF4like and ErGFP
morganfps = [ AllChem.GetMorganFingerprintAsBitVect(mol,2) for mol in mols ]
ergfps = [ rdReducedGraphs.GetErGFingerprint( mol ) for mol in mols ]
#Tracking All Data
molis =[]
moljs =[]
morgantcs = [ ]
for i in range( len(morganfps) ):
    for j in range( i ):
        molis.append( Chem.MolToSmiles(mols[i]) )
        moljs.append( Chem.MolToSmiles(mols[j]) )
        tc = DataStructs.TanimotoSimilarity( morganfps[i], morganfps[j] )
        morgantcs.append( tc )

ergtcs = [ ]
for i in range( len(ergfps) ):
    for j in range( i ):
        tc = calc_ergfp( ergfps[i], ergfps[j] )
        ergtcs.append( tc )

df = pd.DataFrame( {'MORGAN':morgantcs, 'ERG':ergtcs, 'molis' : molis, 'moljs': moljs} )
PandasTools.AddMoleculeColumnToFrame( df, smilesCol='molis', molCol='ROMoli')
PandasTools.AddMoleculeColumnToFrame( df, smilesCol='moljs', molCol='ROMolj')
dfsummary = df[['ERG', "MORGAN", "ROMoli", "ROMolj"] ]
# Extract morgan tc > 0.6
dfsummary[dfsummary.MORGAN > 0.6]

Hmm,, this data set has only similar structures.
morgan tc

Then extract only ErG > 0.7

dfsummary[dfsummary.ERG > 0.7]

This set shows highly similarity only when ErG used.
ErG Tc

Finally I got over view of dataset.

sns.pairplot(dfsummary)
sns.plt.show()

scatter

ErG is more fuzzy algorithm than morgan method.
I think the function is one of the useful tools for description of molecular feature.
And I’ll try another dataset soon.

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中