Generate possible list of SMLIES with RDKit #RDKit

In the computer vision, it is often used data augmentation technique for getting large data set. On the other hand, Canonical SMILES representations are used in chemoinformatics area.
RDKit UGM in last year, Dr. Esben proposed new approach for RNN with SMILES. He expanded 602 training molecules to almost 8000 molecules with different smiles representation technique.

Click to access Bjerrum_RDKitUGM_Smiles_Enumeration_for_RNN.pdf

This approach seems works well.
In the UGM hackathon at this year, this random smiles generate function is implemented and it can call from new version of RDKit!
I appreciate rdkit developers!

It is very easy to use, pls see the code below.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print(rdBase.rdkitVersion)
>2018.09.1

I used kinase inhibitor as an example.

testsmi = 'CC(C1=C(C=CC(=C1Cl)F)Cl)OC2=C(N=CC(=C2)C3=CN(N=C3)C4CCNCC4)N'
mol = Chem.MolFromSmiles(testsmi)
mol


Default output of MolToSmiles is canonical manner.

print(Chem.MolToSmiles(mol))
>CC(Oc1cc(-c2cnn(C3CCNCC3)c2)cnc1N)c1c(Cl)ccc(F)c1Cl

But if you set MolToSmiles with doRandom=True option, the function return random but valid SMILES.

mols = []
for _ in range(50):
  smi = Chem.MolToSmiles(mol, doRandom=True)
  print(smi)
  m = Chem.MolFromSmiles(smi)
  mols.append(m)

>Fc1c(Cl)c(C(Oc2cc(-c3cn(nc3)C3CCNCC3)cnc2N)C)c(cc1)Cl
>O(c1cc(-c2cnn(c2)C2CCNCC2)cnc1N)C(c1c(Cl)c(ccc1Cl)F)C
>--snip--
>c1(N)ncc(-c2cnn(C3CCNCC3)c2)cc1OC(c1c(Cl)ccc(F)c1Cl)C
#check molecules
Draw.MolsToGridImage(mols, molsPerRow = 10)

Different SMILES but same molecule!

There are many deep learning approaches which use SMIELS as input. It is useful for these models to augment input data I think.

I uploaded my example code on google colab and github my repo.

Colab
https://colab.research.google.com/drive/1dMmgCpskrfI1afh3qmPdv8ixkGTuOCDH

github
https://github.com/iwatobipen/playground/blob/master/random_smiles_rdkit.ipynb

nbviewer
http://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/random_smiles_rdkit.ipynb

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “Generate possible list of SMLIES with RDKit #RDKit

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.