A new function of rdkit201909 #RDKit #Chemoinformatics

Last week, I got cool information that new version of rdkit is available by installing conda!

You can check new feature in ReleaseNotes in original repo,
https://github.com/rdkit/rdkit/blob/master/ReleaseNotes.md

I installed RDKit201909 now and am reading document now.

MolHash function originally developed from nextmove software is implemented in the version.
[RDKit Document]https://www.rdkit.org/docs/source/rdkit.Chem.rdMolHash.html
[Nextmove software]https://nextmovesoftware.github.io/molhash/commandline.html#usage

MolHash can generates some hashes of molecule. I used the module with a molecule. Code is below.

from rdkit import Chem                                                                       
from rdkit.Chem import rdBase                                                                
 print(rdBase.rdkitVersion)                                                                   
> 2019.09.1

from rdkit.Chem import rdMolHash                                                             
# read sample molecule from SMILES
tofa = Chem.MolFromSmiles('C[C@@H]1CCN(C[C@@H]1N(C)C2=NC=NC3=C2C=CN3)C(=O)CC#N')             
# Generate molhash
print(rdMolHash.GenerateMoleculeHashString(tofa))                                            
100-23-25-Hx54Xg-jb4lRQ-VbT4fg-F92gVQ-wedGAQ-s3wCEg

MolHash module provides some molhash functions. And it can get as dictionary with ‘names’ method and by calling MolHash method with these functions, I could get hash string.

rdMolHash.HashFunction.names                                                         

{'AnonymousGraph': rdkit.Chem.rdMolHash.HashFunction.AnonymousGraph,
 'ElementGraph': rdkit.Chem.rdMolHash.HashFunction.ElementGraph,
 'CanonicalSmiles': rdkit.Chem.rdMolHash.HashFunction.CanonicalSmiles,
 'MurckoScaffold': rdkit.Chem.rdMolHash.HashFunction.MurckoScaffold,
 'ExtendedMurcko': rdkit.Chem.rdMolHash.HashFunction.ExtendedMurcko,
 'MolFormula': rdkit.Chem.rdMolHash.HashFunction.MolFormula,
 'AtomBondCounts': rdkit.Chem.rdMolHash.HashFunction.AtomBondCounts,
 'DegreeVector': rdkit.Chem.rdMolHash.HashFunction.DegreeVector,
 'Mesomer': rdkit.Chem.rdMolHash.HashFunction.Mesomer,
 'HetAtomTautomer': rdkit.Chem.rdMolHash.HashFunction.HetAtomTautomer,
 'HetAtomProtomer': rdkit.Chem.rdMolHash.HashFunction.HetAtomProtomer,
 'RedoxPair': rdkit.Chem.rdMolHash.HashFunction.RedoxPair,
 'Regioisomer': rdkit.Chem.rdMolHash.HashFunction.Regioisomer,
 'NetCharge': rdkit.Chem.rdMolHash.HashFunction.NetCharge,
 'SmallWorldIndexBR': rdkit.Chem.rdMolHash.HashFunction.SmallWorldIndexBR,
 'SmallWorldIndexBRL': rdkit.Chem.rdMolHash.HashFunction.SmallWorldIndexBRL,
 'ArthorSubstructureOrder': rdkit.Chem.rdMolHash.HashFunction.ArthorSubstructureOrder}

Generate MolHash with all defined hash functions.

for k, v in molhashf.items(): 
    print(k, rdMolHash.MolHash(tofa, v)) 

>                                                                                             
AnonymousGraph ****(*)*1**[*@@](*)[*@@](*(*)*2****3****23)*1
ElementGraph C[C@@H]1CCN(C(O)CCN)C[C@@H]1N(C)C1NCNC2NCCC12
CanonicalSmiles C[C@@H]1CCN(C(=O)CC#N)C[C@@H]1N(C)c1ncnc2[nH]ccc12
MurckoScaffold c1nc(N[C@@H]2CCCNC2)c2cc[nH]c2n1
ExtendedMurcko *[C@@H]1CCN(*)C[C@@H]1N(*)c1ncnc2[nH]ccc12
MolFormula C16H20N6O
AtomBondCounts 23,25
DegreeVector 0,8,11,4
Mesomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2N[CH][CH][C]12_0
HetAtomTautomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2[N][CH][CH][C]21_1_0
HetAtomProtomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2[N][CH][CH][C]21_1
RedoxPair C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2N[CH][CH][C]12
Regioisomer *C.*C(=O)CC#N.*N(*)*.C.C1CNC[C@H2][C@H2]1.c1ncc2cc[nH]c2n1
NetCharge 0
SmallWorldIndexBR B25R3
SmallWorldIndexBRL B25R3L11
ArthorSubstructureOrder 001700190100100007000092000000

Details of these functions are described in NextMove web page.
https://nextmovesoftware.github.io/molhash/introduction.html

It is interesting for getting several information by using the functions. I’ll think about the application of the method for chemoinformatics task.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: