In the computer vision, it is often used data augmentation technique for getting large data set. On the other hand, Canonical SMILES representations are used in chemoinformatics area.
RDKit UGM in last year, Dr. Esben proposed new approach for RNN with SMILES. He expanded 602 training molecules to almost 8000 molecules with different smiles representation technique.
Click to access Bjerrum_RDKitUGM_Smiles_Enumeration_for_RNN.pdf
This approach seems works well.
In the UGM hackathon at this year, this random smiles generate function is implemented and it can call from new version of RDKit!
I appreciate rdkit developers!
It is very easy to use, pls see the code below.
from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole from rdkit import rdBase print(rdBase.rdkitVersion) >2018.09.1
I used kinase inhibitor as an example.
testsmi = 'CC(C1=C(C=CC(=C1Cl)F)Cl)OC2=C(N=CC(=C2)C3=CN(N=C3)C4CCNCC4)N' mol = Chem.MolFromSmiles(testsmi) mol
Default output of MolToSmiles is canonical manner.
print(Chem.MolToSmiles(mol)) >CC(Oc1cc(-c2cnn(C3CCNCC3)c2)cnc1N)c1c(Cl)ccc(F)c1Cl
But if you set MolToSmiles with doRandom=True option, the function return random but valid SMILES.
mols = [] for _ in range(50): smi = Chem.MolToSmiles(mol, doRandom=True) print(smi) m = Chem.MolFromSmiles(smi) mols.append(m) >Fc1c(Cl)c(C(Oc2cc(-c3cn(nc3)C3CCNCC3)cnc2N)C)c(cc1)Cl >O(c1cc(-c2cnn(c2)C2CCNCC2)cnc1N)C(c1c(Cl)c(ccc1Cl)F)C >--snip-- >c1(N)ncc(-c2cnn(C3CCNCC3)c2)cc1OC(c1c(Cl)ccc(F)c1Cl)C #check molecules Draw.MolsToGridImage(mols, molsPerRow = 10)
Different SMILES but same molecule!
There are many deep learning approaches which use SMIELS as input. It is useful for these models to augment input data I think.
I uploaded my example code on google colab and github my repo.
Colab
https://colab.research.google.com/drive/1dMmgCpskrfI1afh3qmPdv8ixkGTuOCDH
github
https://github.com/iwatobipen/playground/blob/master/random_smiles_rdkit.ipynb
nbviewer
http://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/random_smiles_rdkit.ipynb
One thought on “Generate possible list of SMLIES with RDKit #RDKit”