Generate possible list of SMLIES with RDKit #RDKit

In the computer vision, it is often used data augmentation technique for getting large data set. On the other hand, Canonical SMILES representations are used in chemoinformatics area.
RDKit UGM in last year, Dr. Esben proposed new approach for RNN with SMILES. He expanded 602 training molecules to almost 8000 molecules with different smiles representation technique.
This approach seems works well.
In the UGM hackathon at this year, this random smiles generate function is implemented and it can call from new version of RDKit!
I appreciate rdkit developers!

It is very easy to use, pls see the code below.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase

I used kinase inhibitor as an example.

testsmi = 'CC(C1=C(C=CC(=C1Cl)F)Cl)OC2=C(N=CC(=C2)C3=CN(N=C3)C4CCNCC4)N'
mol = Chem.MolFromSmiles(testsmi)

Default output of MolToSmiles is canonical manner.


But if you set MolToSmiles with doRandom=True option, the function return random but valid SMILES.

mols = []
for _ in range(50):
  smi = Chem.MolToSmiles(mol, doRandom=True)
  m = Chem.MolFromSmiles(smi)

#check molecules
Draw.MolsToGridImage(mols, molsPerRow = 10)

Different SMILES but same molecule!

There are many deep learning approaches which use SMIELS as input. It is useful for these models to augment input data I think.

I uploaded my example code on google colab and github my repo.