Calculate USRCAT with RDKit #RDKit

Some years ago, I posted blog about USRCAT.
USRCAT is shape based method like ROCS. And it works very fast. The code was freely available but to use the code, user need to install it.
But as you know, new version of RDKit implements this function! That is good news isn’t it.
I tried the function just now.
Source code is following.

import os
import seaborn as sns
import pandas as pd
from rdkit import Chem
from rdkit.Chem import rdBase
from rdkit.Chem import RDConfig
from rdkit.Chem import AllChem
from rdkit.Chem.rdMolDescriptors import GetUSRScore, GetUSRCAT
from rdkit.Chem import DataStructs
print( rdBase.rdkitVersion )
path = os.path.join( RDConfig.RDDocsDir, "Book/data/cdk2.sdf" )

mols = [ mol for mol in Chem.SDMolSupplier( path ) ]
for mol in mols:
    AllChem.EmbedMolecule( mol, 
                           useExpTorsionAnglePrefs = True,
                           useBasicKnowledge = True )
usrcats = [ GetUSRCAT( mol ) for mol in mols ]
fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2 ) for mol in mols ]

data = { "tanimoto":[], "usrscore":[] }

for i in range( len( usrcats )):
    for j in range( i ):
        tc = DataStructs.TanimotoSimilarity( fps[ i ], fps[ j ] )
        score = GetUSRScore( usrcats[ i ], usrcats[ j ] )
        data["tanimoto"].append( tc )
        data["usrscore"].append( score )
        print( score, tc )
df = pd.DataFrame( data )

fig = sns.pairplot( df )
fig.savefig( 'plot.png' )

Run the code.

iwatobipen$ python
# output
0.4878222403055059 0.46296296296296297
0.2983740604270427 0.48148148148148145
0.36022943735904756 0.5660377358490566
0.3480531986117265 0.5
0.3593106395905704 0.6595744680851063
0.25662588527525304 0.6122448979591837
0.18452571918677163 0.46296296296296297
0.18534407651655047 0.5769230769230769
0.1698894448811921 0.5660377358490566
0.19927998441539707 0.6956521739130435
0.2052241644475582 0.15714285714285714
0.21930710455068858 0.10526315789473684
0.21800341857284924 0.1038961038961039

Tanimoto coeff and USRScore showed different score ( 2D vs 3D pharmacophore ). I think USRScore provides new way to estimate molecular similarity.

RDKit is really cool toolkit. I love it. ;-)


Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

5 thoughts on “Calculate USRCAT with RDKit #RDKit

  1. Wow, amazing blog structure! How long have you been blogging for? you make blogging glance easy. The whole glance of your website is excellent, as well as the content material!

  2. Are USRCAT or UFSRAT suited ligand -based screening on millions of compounds such as zinc database with 100 M?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: