standardization of tautomers #RDKit

One of the hot topic of new version of RDKit is an integration of MolVS which is tool for molecular standardization.
Molecular standardization is important for not only chemist but also chemoinformatist. Because tautomer shows different representation of molecule and it will be affect accuracy of QSAR models.
I wrote molecular standardization tools named ‘MolVS’ before and MolVS is an another library at the time. Now we can call molvs from native RDKit.
I used 2-hydroxy prydine as an example.

from rdkit import Chem
from rdkit import rdBase
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw

rdBase.rdkitVersion

from rdkit.Chem import MolStandardize

smi1 = 'c1cccc(O)n1'
mol1 = Chem.MolFromSmiles(smi1)
smi2 = 'C1=CC(=O)NC=C1'
mol2 = Chem.MolFromSmiles(smi2)

Draw.MolsToGridImage([mol1, mol2])

Same formula but different structure.

Standardization method is very simple.

stsmi1 = MolStandardize.canonicalize_tautomer_smiles(smi1)
stsmi2 = MolStandardize.canonicalize_tautomer_smiles(smi2)

Draw.MolsToGridImage([Chem.MolFromSmiles(stsmi1), Chem.MolFromSmiles(stsmi2)])

Also it is easy to get possible tautomers from a smiles. And MolStandarize class has many method. It is very useful for data preprocessing I think.

tautomers = MolStandardize.enumerate_tautomers_smiles(smi1)
print(tautomers)
>{'O=c1cccc[nH]1', 'Oc1ccccn1'}
dir(MolStandardize)
['MolVSError',
 'StandardizeError',
 'Standardizer',
 'ValidateError',
 'Validator',
,,,,
 'canonicalize_tautomer_smiles',
 'charge',
 'division',
 'enumerate_tautomers_smiles',
 'errors',
 'fragment',
 'log',
 'logging',
 'metal',
 'normalize',
 'print_function',
 'standardize',
 'standardize_smiles',
 'tautomer',
 'unicode_literals',
 'utils',
 'validate',
 'validate_smiles',
 'validations']

I uploaded the snippet to my repo. It can read from following URL.
https://nbviewer.jupyter.org/github/iwatobipen/chemo_info/blob/master/rdkit_notebook/new_fp_generator.ipynb

P.S.
I will go to Kumamoto to participate chemoinformatics conference tomorrow. I hope I can have many fruitful discussions.

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “standardization of tautomers #RDKit

  1. Hi! I’m trying to use you script but I get this error:
    from rdkit.Chem import MolStandardize
    ImportError: cannot import name ‘MolStandardize’
    Do you know why i’m not able to import that module?
    Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: