A new function of rdkit201909 #RDKit #Chemoinformatics

Last week, I got cool information that new version of rdkit is available by installing conda!

You can check new feature in ReleaseNotes in original repo,
https://github.com/rdkit/rdkit/blob/master/ReleaseNotes.md

I installed RDKit201909 now and am reading document now.

MolHash function originally developed from nextmove software is implemented in the version.
[RDKit Document]https://www.rdkit.org/docs/source/rdkit.Chem.rdMolHash.html
[Nextmove software]https://nextmovesoftware.github.io/molhash/commandline.html#usage

MolHash can generates some hashes of molecule. I used the module with a molecule. Code is below.

from rdkit import Chem                                                                       
from rdkit.Chem import rdBase                                                                
 print(rdBase.rdkitVersion)                                                                   
> 2019.09.1

from rdkit.Chem import rdMolHash                                                             
# read sample molecule from SMILES
tofa = Chem.MolFromSmiles('C[C@@H]1CCN(C[C@@H]1N(C)C2=NC=NC3=C2C=CN3)C(=O)CC#N')             
# Generate molhash
print(rdMolHash.GenerateMoleculeHashString(tofa))                                            
100-23-25-Hx54Xg-jb4lRQ-VbT4fg-F92gVQ-wedGAQ-s3wCEg

MolHash module provides some molhash functions. And it can get as dictionary with ‘names’ method and by calling MolHash method with these functions, I could get hash string.

rdMolHash.HashFunction.names                                                         

{'AnonymousGraph': rdkit.Chem.rdMolHash.HashFunction.AnonymousGraph,
 'ElementGraph': rdkit.Chem.rdMolHash.HashFunction.ElementGraph,
 'CanonicalSmiles': rdkit.Chem.rdMolHash.HashFunction.CanonicalSmiles,
 'MurckoScaffold': rdkit.Chem.rdMolHash.HashFunction.MurckoScaffold,
 'ExtendedMurcko': rdkit.Chem.rdMolHash.HashFunction.ExtendedMurcko,
 'MolFormula': rdkit.Chem.rdMolHash.HashFunction.MolFormula,
 'AtomBondCounts': rdkit.Chem.rdMolHash.HashFunction.AtomBondCounts,
 'DegreeVector': rdkit.Chem.rdMolHash.HashFunction.DegreeVector,
 'Mesomer': rdkit.Chem.rdMolHash.HashFunction.Mesomer,
 'HetAtomTautomer': rdkit.Chem.rdMolHash.HashFunction.HetAtomTautomer,
 'HetAtomProtomer': rdkit.Chem.rdMolHash.HashFunction.HetAtomProtomer,
 'RedoxPair': rdkit.Chem.rdMolHash.HashFunction.RedoxPair,
 'Regioisomer': rdkit.Chem.rdMolHash.HashFunction.Regioisomer,
 'NetCharge': rdkit.Chem.rdMolHash.HashFunction.NetCharge,
 'SmallWorldIndexBR': rdkit.Chem.rdMolHash.HashFunction.SmallWorldIndexBR,
 'SmallWorldIndexBRL': rdkit.Chem.rdMolHash.HashFunction.SmallWorldIndexBRL,
 'ArthorSubstructureOrder': rdkit.Chem.rdMolHash.HashFunction.ArthorSubstructureOrder}

Generate MolHash with all defined hash functions.

for k, v in molhashf.items(): 
    print(k, rdMolHash.MolHash(tofa, v)) 

>                                                                                             
AnonymousGraph ****(*)*1**[*@@](*)[*@@](*(*)*2****3****23)*1
ElementGraph C[C@@H]1CCN(C(O)CCN)C[C@@H]1N(C)C1NCNC2NCCC12
CanonicalSmiles C[C@@H]1CCN(C(=O)CC#N)C[C@@H]1N(C)c1ncnc2[nH]ccc12
MurckoScaffold c1nc(N[C@@H]2CCCNC2)c2cc[nH]c2n1
ExtendedMurcko *[C@@H]1CCN(*)C[C@@H]1N(*)c1ncnc2[nH]ccc12
MolFormula C16H20N6O
AtomBondCounts 23,25
DegreeVector 0,8,11,4
Mesomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2N[CH][CH][C]12_0
HetAtomTautomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2[N][CH][CH][C]21_1_0
HetAtomProtomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2[N][CH][CH][C]21_1
RedoxPair C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2N[CH][CH][C]12
Regioisomer *C.*C(=O)CC#N.*N(*)*.C.C1CNC[C@H2][C@H2]1.c1ncc2cc[nH]c2n1
NetCharge 0
SmallWorldIndexBR B25R3
SmallWorldIndexBRL B25R3L11
ArthorSubstructureOrder 001700190100100007000092000000

Details of these functions are described in NextMove web page.
https://nextmovesoftware.github.io/molhash/introduction.html

It is interesting for getting several information by using the functions. I’ll think about the application of the method for chemoinformatics task.