Last week, I got cool information that new version of rdkit is available by installing conda!
You can check new feature in ReleaseNotes in original repo,
https://github.com/rdkit/rdkit/blob/master/ReleaseNotes.md
I installed RDKit201909 now and am reading document now.
MolHash function originally developed from nextmove software is implemented in the version.
[RDKit Document]https://www.rdkit.org/docs/source/rdkit.Chem.rdMolHash.html
[Nextmove software]https://nextmovesoftware.github.io/molhash/commandline.html#usage
MolHash can generates some hashes of molecule. I used the module with a molecule. Code is below.
from rdkit import Chem from rdkit.Chem import rdBase print(rdBase.rdkitVersion) > 2019.09.1 from rdkit.Chem import rdMolHash # read sample molecule from SMILES tofa = Chem.MolFromSmiles('C[C@@H]1CCN(C[C@@H]1N(C)C2=NC=NC3=C2C=CN3)C(=O)CC#N') # Generate molhash print(rdMolHash.GenerateMoleculeHashString(tofa)) 100-23-25-Hx54Xg-jb4lRQ-VbT4fg-F92gVQ-wedGAQ-s3wCEg
MolHash module provides some molhash functions. And it can get as dictionary with ‘names’ method and by calling MolHash method with these functions, I could get hash string.
rdMolHash.HashFunction.names {'AnonymousGraph': rdkit.Chem.rdMolHash.HashFunction.AnonymousGraph, 'ElementGraph': rdkit.Chem.rdMolHash.HashFunction.ElementGraph, 'CanonicalSmiles': rdkit.Chem.rdMolHash.HashFunction.CanonicalSmiles, 'MurckoScaffold': rdkit.Chem.rdMolHash.HashFunction.MurckoScaffold, 'ExtendedMurcko': rdkit.Chem.rdMolHash.HashFunction.ExtendedMurcko, 'MolFormula': rdkit.Chem.rdMolHash.HashFunction.MolFormula, 'AtomBondCounts': rdkit.Chem.rdMolHash.HashFunction.AtomBondCounts, 'DegreeVector': rdkit.Chem.rdMolHash.HashFunction.DegreeVector, 'Mesomer': rdkit.Chem.rdMolHash.HashFunction.Mesomer, 'HetAtomTautomer': rdkit.Chem.rdMolHash.HashFunction.HetAtomTautomer, 'HetAtomProtomer': rdkit.Chem.rdMolHash.HashFunction.HetAtomProtomer, 'RedoxPair': rdkit.Chem.rdMolHash.HashFunction.RedoxPair, 'Regioisomer': rdkit.Chem.rdMolHash.HashFunction.Regioisomer, 'NetCharge': rdkit.Chem.rdMolHash.HashFunction.NetCharge, 'SmallWorldIndexBR': rdkit.Chem.rdMolHash.HashFunction.SmallWorldIndexBR, 'SmallWorldIndexBRL': rdkit.Chem.rdMolHash.HashFunction.SmallWorldIndexBRL, 'ArthorSubstructureOrder': rdkit.Chem.rdMolHash.HashFunction.ArthorSubstructureOrder}
Generate MolHash with all defined hash functions.
for k, v in molhashf.items(): print(k, rdMolHash.MolHash(tofa, v)) > AnonymousGraph ****(*)*1**[*@@](*)[*@@](*(*)*2****3****23)*1 ElementGraph C[C@@H]1CCN(C(O)CCN)C[C@@H]1N(C)C1NCNC2NCCC12 CanonicalSmiles C[C@@H]1CCN(C(=O)CC#N)C[C@@H]1N(C)c1ncnc2[nH]ccc12 MurckoScaffold c1nc(N[C@@H]2CCCNC2)c2cc[nH]c2n1 ExtendedMurcko *[C@@H]1CCN(*)C[C@@H]1N(*)c1ncnc2[nH]ccc12 MolFormula C16H20N6O AtomBondCounts 23,25 DegreeVector 0,8,11,4 Mesomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2N[CH][CH][C]12_0 HetAtomTautomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2[N][CH][CH][C]21_1_0 HetAtomProtomer C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2[N][CH][CH][C]21_1 RedoxPair C[C@@H]1CCN([C]([O])C[C][N])C[C@@H]1N(C)[C]1[N][CH][N][C]2N[CH][CH][C]12 Regioisomer *C.*C(=O)CC#N.*N(*)*.C.C1CNC[C@H2][C@H2]1.c1ncc2cc[nH]c2n1 NetCharge 0 SmallWorldIndexBR B25R3 SmallWorldIndexBRL B25R3L11 ArthorSubstructureOrder 001700190100100007000092000000
Details of these functions are described in NextMove web page.
https://nextmovesoftware.github.io/molhash/introduction.html
It is interesting for getting several information by using the functions. I’ll think about the application of the method for chemoinformatics task.