One of the hot topic of new version of RDKit is an integration of MolVS which is tool for molecular standardization.
Molecular standardization is important for not only chemist but also chemoinformatist. Because tautomer shows different representation of molecule and it will be affect accuracy of QSAR models.
I wrote molecular standardization tools named ‘MolVS’ before and MolVS is an another library at the time. Now we can call molvs from native RDKit.
I used 2-hydroxy prydine as an example.
from rdkit import Chem from rdkit import rdBase from rdkit.Chem.Draw import IPythonConsole from rdkit.Chem import Draw rdBase.rdkitVersion from rdkit.Chem import MolStandardize smi1 = 'c1cccc(O)n1' mol1 = Chem.MolFromSmiles(smi1) smi2 = 'C1=CC(=O)NC=C1' mol2 = Chem.MolFromSmiles(smi2) Draw.MolsToGridImage([mol1, mol2])
Same formula but different structure.
Standardization method is very simple.
stsmi1 = MolStandardize.canonicalize_tautomer_smiles(smi1) stsmi2 = MolStandardize.canonicalize_tautomer_smiles(smi2) Draw.MolsToGridImage([Chem.MolFromSmiles(stsmi1), Chem.MolFromSmiles(stsmi2)])
Also it is easy to get possible tautomers from a smiles. And MolStandarize class has many method. It is very useful for data preprocessing I think.
tautomers = MolStandardize.enumerate_tautomers_smiles(smi1) print(tautomers) >{'O=c1cccc[nH]1', 'Oc1ccccn1'} dir(MolStandardize) ['MolVSError', 'StandardizeError', 'Standardizer', 'ValidateError', 'Validator', ,,,, 'canonicalize_tautomer_smiles', 'charge', 'division', 'enumerate_tautomers_smiles', 'errors', 'fragment', 'log', 'logging', 'metal', 'normalize', 'print_function', 'standardize', 'standardize_smiles', 'tautomer', 'unicode_literals', 'utils', 'validate', 'validate_smiles', 'validations']
I uploaded the snippet to my repo. It can read from following URL.
https://nbviewer.jupyter.org/github/iwatobipen/chemo_info/blob/master/rdkit_notebook/new_fp_generator.ipynb
P.S.
I will go to Kumamoto to participate chemoinformatics conference tomorrow. I hope I can have many fruitful discussions.
Hi! I’m trying to use you script but I get this error:
from rdkit.Chem import MolStandardize
ImportError: cannot import name ‘MolStandardize’
Do you know why i’m not able to import that module?
Thanks!