Some years ago, I posted blog about USRCAT.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505738/
USRCAT is shape based method like ROCS. And it works very fast. The code was freely available but to use the code, user need to install it.
But as you know, new version of RDKit implements this function! That is good news isn’t it.
I tried the function just now.
Source code is following.
import os import seaborn as sns import pandas as pd from rdkit import Chem from rdkit.Chem import rdBase from rdkit.Chem import RDConfig from rdkit.Chem import AllChem from rdkit.Chem.rdMolDescriptors import GetUSRScore, GetUSRCAT from rdkit.Chem import DataStructs print( rdBase.rdkitVersion ) path = os.path.join( RDConfig.RDDocsDir, "Book/data/cdk2.sdf" ) mols = [ mol for mol in Chem.SDMolSupplier( path ) ] for mol in mols: AllChem.EmbedMolecule( mol, useExpTorsionAnglePrefs = True, useBasicKnowledge = True ) usrcats = [ GetUSRCAT( mol ) for mol in mols ] fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2 ) for mol in mols ] data = { "tanimoto":[], "usrscore":[] } for i in range( len( usrcats )): for j in range( i ): tc = DataStructs.TanimotoSimilarity( fps[ i ], fps[ j ] ) score = GetUSRScore( usrcats[ i ], usrcats[ j ] ) data["tanimoto"].append( tc ) data["usrscore"].append( score ) print( score, tc ) df = pd.DataFrame( data ) fig = sns.pairplot( df ) fig.savefig( 'plot.png' )
Run the code.
iwatobipen$ python usrcattest.py # output 2017.09.1 0.4878222403055059 0.46296296296296297 0.2983740604270427 0.48148148148148145 0.36022943735904756 0.5660377358490566 0.3480531986117265 0.5 0.3593106395905704 0.6595744680851063 0.25662588527525304 0.6122448979591837 0.18452571918677163 0.46296296296296297 0.18534407651655047 0.5769230769230769 0.1698894448811921 0.5660377358490566 0.19927998441539707 0.6956521739130435 0.2052241644475582 0.15714285714285714 0.21930710455068858 0.10526315789473684 0.21800341857284924 0.1038961038961039
Tanimoto coeff and USRScore showed different score ( 2D vs 3D pharmacophore ). I think USRScore provides new way to estimate molecular similarity.
RDKit is really cool toolkit. I love it. ;-)
Wow, amazing blog structure! How long have you been blogging for? you make blogging glance easy. The whole glance of your website is excellent, as well as the content material!
Hi. Thank you for you comment. ;-) I started this blog since 2012. Are you interested in Chemoinformatics / programming.
Are USRCAT or UFSRAT suited ligand -based screening on millions of compounds such as zinc database with 100 M?
Hi, I recommend you to read the article. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505738/
Thank you