Calculate USRCAT with RDKit #RDKit

Some years ago, I posted blog about USRCAT.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505738/
USRCAT is shape based method like ROCS. And it works very fast. The code was freely available but to use the code, user need to install it.
But as you know, new version of RDKit implements this function! That is good news isn’t it.
I tried the function just now.
Source code is following.

import os
import seaborn as sns
import pandas as pd
from rdkit import Chem
from rdkit.Chem import rdBase
from rdkit.Chem import RDConfig
from rdkit.Chem import AllChem
from rdkit.Chem.rdMolDescriptors import GetUSRScore, GetUSRCAT
from rdkit.Chem import DataStructs
print( rdBase.rdkitVersion )
path = os.path.join( RDConfig.RDDocsDir, "Book/data/cdk2.sdf" )

mols = [ mol for mol in Chem.SDMolSupplier( path ) ]
for mol in mols:
    AllChem.EmbedMolecule( mol, 
                           useExpTorsionAnglePrefs = True,
                           useBasicKnowledge = True )
usrcats = [ GetUSRCAT( mol ) for mol in mols ]
fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2 ) for mol in mols ]

data = { "tanimoto":[], "usrscore":[] }

for i in range( len( usrcats )):
    for j in range( i ):
        tc = DataStructs.TanimotoSimilarity( fps[ i ], fps[ j ] )
        score = GetUSRScore( usrcats[ i ], usrcats[ j ] )
        data["tanimoto"].append( tc )
        data["usrscore"].append( score )
        print( score, tc )
df = pd.DataFrame( data )

fig = sns.pairplot( df )
fig.savefig( 'plot.png' )

Run the code.

iwatobipen$ python usrcattest.py
# output
2017.09.1
0.4878222403055059 0.46296296296296297
0.2983740604270427 0.48148148148148145
0.36022943735904756 0.5660377358490566
0.3480531986117265 0.5
0.3593106395905704 0.6595744680851063
0.25662588527525304 0.6122448979591837
0.18452571918677163 0.46296296296296297
0.18534407651655047 0.5769230769230769
0.1698894448811921 0.5660377358490566
0.19927998441539707 0.6956521739130435
0.2052241644475582 0.15714285714285714
0.21930710455068858 0.10526315789473684
0.21800341857284924 0.1038961038961039

Tanimoto coeff and USRScore showed different score ( 2D vs 3D pharmacophore ). I think USRScore provides new way to estimate molecular similarity.

RDKit is really cool toolkit. I love it. 😉

Advertisements

Visit Berlin

This week I visited Berlin in business travel. I could have useful discussion and enjoy my travel.
It was pleasure for me to discuss lots of people and learn about new technology. On the other hand I felt my inability in English. It was very difficult to discuss with foreign peoples in very limited times. Hmm… ;-(

BTW, In my free time, I visited the East Side Gallery. This site is The East Side Gallery is international memorial for freedom. I saw lots of art. I was very impressed by the art.


Also I enjoyed German traditional food. 😉
Currywurst

Eisbein

And huge potato salad!!!!

That became a good experience. I need learn more and more. Keep learning!!

Draw high quality molecular image in RDKit #rdkit

Recently, I want to draw high quality image molecule using RDKit. Older version of RDKit png image is not enough for me.
I found the solution in RDKit discuss. The discussion recommended to install cairocffi.
I installed cairocffi via conda.

iwatobipen$ conda install -c conda-forge cairocffi

But… Result is not enough for me. ( this case is my Mac environment. Linux environment worked fine. )

Next I tried to conversion of SVG to PNG. Fortunately, I found cairosvg and the package can convet svg to PNG.
Following code is example.
And I found tips for RDKitters!.
DrawingOptions can modify the drawing settings. For example, font size, bond width etc.

import argparse
import cairosvg
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import DrawingOptions

parser = argparse.ArgumentParser( 'smiles to png inmage' )
parser.add_argument( 'smiles' )
parser.add_argument( '--filename', default="mol." )
DrawingOptions.atomLabelFontSize = 55
DrawingOptions.dotsPerAngstrom = 100
DrawingOptions.bondLineWidth = 3.0 

parser.add_argument( 'smiles' )

if __name__=='__main__':
    param = parser.parse_args()
    smiles = param.smiles
    fname = param.filename
    mol = Chem.MolFromSmiles( smiles )
    Draw.MolToFile( mol, fname+"png" )
    Draw.MolToFile( mol, "temp.svg" )
    cairosvg.svg2png( url='./temp.svg', write_to= "svg_"+fname+"png" )

The code generate png image not only directly from smiles but also from svg.
Here is result.

iwatobipen$ python drawing.py smiles "CC1=CC=C(C=C1)C2=CC(=NN2C3=CC=C(C=C3)S(=O)(=O)N)C(F)(F)F"

Direct png.

SVG to PNG

Image from SVG is high quality I think. 😉

QED calculation on RDKit 2017.09 #RDKit

QED (quantitative estimate of drug-likeness ) is an one of score of drug likeness reported by Hopkins group.
https://www.ncbi.nlm.nih.gov/pubmed/22270643

The author provided QED calculator for pipeline pilot. So QED could not calculate without pipeline pilot.
But, now we can calculate QED by using RDKit!
RDKit 201709 was implemented QED descriptor. Seems good, let’s use the function.
It is very simple. Just call qed!. I used dataset the same as yesterday.

import os
from rdkit.Chem import rdBase, RDConfig
from rdkit import Chem
from rdkit.Chem import PandasTools
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Descriptors import qed
print( rdBase.rdkitVersion )

sdfpath = os.path.join( RDConfig.RDDocsDir, "Book/data/cdk2.sdf" )
mols = [ m for m in Chem.SDMolSupplier( sdfpath ) if m != None ]
df = PandasTools.LoadSDF( sdfpath )
print( len( mols ))

df.head( 2 )

df[ "QED" ] =  df.ROMol.apply( qed )
df.head(2 )

from rdkit.Chem import QED
for mol in mols:
    print( QED.properties( mol ) )

It is easy isn’t it ?
I pushed sample code to my repository.
https://github.com/iwatobipen/chemo_info/blob/master/rdkit201709/QED_calc.ipynb
By the way, original QED score was based on ChEMBL ver 09. So, dataset is old. Does the score show difference when we use new version of ChEMBL ? 😉

New function of RDKit 2017.09 #RDKit

Recently I updated my rdkit env from 201703 to 201709 by using conda.
New version of rdkit was implemented cool function named rdRGroupDeompositon.
The function enable us to render RGroups as DataFrame.
I tried to visualize cdk2.sdf dataset.
Code that I wrote is bellow.(using jupyter notebook)

from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit.Chem import PandasTools
from rdkit.Chem import rdBase
from rdkit.Chem import RDConfig
from rdkit.Chem.Draw import IPythonConsole
import os
PandasTools.InstallPandasTools()
base = RDConfig.RDDocsDir
datapath = os.path.join( base, "Book/data/cdk2.sdf")
mols = [ mol for mol in Chem.SDMolSupplier( datapath ) if mol != None ]
# mol object that has 3D conformer information did not work well. So I remove the conformation info.
for m in mols: tmp = m.RemoveAllConformers()
# define core to RG decomposition.
core = Chem.MolFromSmiles('[nH]1cnc2cncnc21')
from rdkit.Chem import rdRGroupDecomposition
tables = PandasTools.LoadSDF( datapath )
rg = rdRGroupDecomposition.RGroupDecomposition( core )
for mol in mols[:5]:
    rg.Add( mol )
# Do RG deconpositon.
rg.Process()

Then visualize RGdecomp result.

import pandas as pd
PandasTools.molRepresentation="svg"
modlf = PandasTools.LoadSDF( datapath )
frame = pd.DataFrame( rg.GetRGroupsAsColumns() )
frame

Result is following image. 😉
New version of RDKit is cool & powerful tool for chemoinformatics. I really respect the developer of rdkit.

molecule encoder/decoder in deepchem #rdkit #deepchem

Today I updated deepchem in my mac.
It was easy to install new version of deepchem on Mac.

iwatobipen$ git clone https://github.com/deepchem/deepchem.git
iwatobipen$ cd deepchem
iwatobipen$ bash scripts/install_deepchem_conda.sh

That’s all. 😉

New version of deepchem is implemented MoleculeVAE. MoeculeVAE generates new molecules by using pre defined model.
Deepchem can use pre defined model that was trained with Zinc Dataset.
OK let’s run the code.
I tested moleculeVAE by reference to following web page.
https://www.deepchem.io/_modules/deepchem/models/autoencoder_models/test_tensorflowEncoders.html

Deepchem provides lots of useful function for data preparation.
For example, convert smiles to one hot vector and vise versa.
I used cdk2.sdf for structure generation.

from __future__ import print_function
import os
from rdkit import Chem, RDConfig
from rdkit.Chem import Draw
import deepchem as dc
from deepchem.models.autoencoder_models.autoencoder import TensorflowMoleculeEncoder, TensorflowMoleculeDecoder
from deepchem.feat.one_hot import zinc_charset
from deepchem.data import DiskDataset

datadir = os.path.join( RDConfig.RDDocsDir, 'Book/data/cdk2.sdf' )

mols = [ mol for mol in Chem.SDMolSupplier( datadir ) ]
smiles = [ Chem.MolToSmiles( mol ) for mol in mols ]
print( len( smiles ))

tf_encoder = TensorflowMoleculeEncoder.zinc_encoder()
tf_decoder = TensorflowMoleculeDecoder.zinc_decoder()

featurizer = dc.feat.one_hot.OneHotFeaturizer( zinc_charset, 120 )

# default setting ; encode smiles to one_hot vector and padding to 120 character.
features = featurizer( mols )
print( features.shape )

dataset = DiskDataset.from_numpy( features, features )
prediction = tf_encoder.predict_on_batch( dataset.X )
one_hot_dec = tf_decoder.predict_on_batch( prediction )
decoded_smiles = featurizer.untransform( one_hot_dec )

for smiles in decoded_smiles:
    print( smiles[0] )
    print( Chem.MolFromSmiles( smiles[0] ))
mols = [ Chem.MolFromSmiles( smi[0] ) for smi in decoded_smiles ]
im = Draw.MolsToGridImage( mols )
im.save( 'res.png' )

And results was …. ;-(

iwatobipen$ python molVAE.py
/Users/iwatobipen/.pyenv/versions/anaconda3-2.4.0/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Using TensorFlow backend.
47
2017-10-10 22:25:23.051035: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051056: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051060: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051064: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
(47, 120, 35)
TIMING: dataset construction took 0.059 s
Loading dataset from disk.
CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2
[22:25:29] SMILES Parse Error: syntax error for input: 'CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2'
None
CC(C))CCN1CCCC1)c2nc[nH+]nn2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C))CCN1CCCC1)c2nc[nH+]nn2'
None
CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1'
None
CC(C)(C)N1CCCCC1)c2nc[nH+]nn2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(C)N1CCCCC1)c2nc[nH+]nn2'
None
CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1'
None
Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C'
None
CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C'
None
Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C'
None
CC(CC)(C)CC(C)(C)CCNCCC(CCC)N
<rdkit.Chem.rdchem.Mol object at 0x10079c210>
COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3'
None
Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C'
None
Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C'
None
CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N'
None
CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO
[22:25:29] SMILES Parse Error: syntax error for input: 'CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO'
None
CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O
[22:25:29] SMILES Parse Error: syntax error for input: 'CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O'
None
CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3'
None
COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC'
None
CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2'
None
CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4'
None
Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C'
None
Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C'
None
Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)'
None
Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n'
None
CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O'
None
CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())'
None
CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F
[22:25:29] SMILES Parse Error: syntax error for input: 'CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F'
None
CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1'
None
CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O'
None
c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])
[22:25:29] SMILES Parse Error: syntax error for input: 'c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])'
None
CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3'
None
CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4
[22:25:29] SMILES Parse Error: syntax error for input: 'CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4'
None
CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on'
None
CN(C)C(=O)CC(C(=O)[O-])C(=O)CSc1ccc[nH+]c1=N
[22:25:29] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18

None
Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)'
None
CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2'
None
CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)'
None
CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC
[22:25:29] SMILES Parse Error: syntax error for input: 'CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC'
None
CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())'
None
CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1'
None
CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1'
None
CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1'
None
C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O'
None
Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C'
None
C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1'
None
CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC'
None
CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))'
None
CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1
[22:25:29] SMILES Parse Error: syntax error for input: 'CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1'
None
[22:25:29] SMILES Parse Error: syntax error for input: 'CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C))CCN1CCCC1)c2nc[nH+]nn2'
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(C)N1CCCCC1)c2nc[nH+]nn2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1'
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C'
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO'
[22:25:29] SMILES Parse Error: syntax error for input: 'CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C'
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())'
[22:25:29] SMILES Parse Error: syntax error for input: 'CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])'
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3'
[22:25:29] SMILES Parse Error: syntax error for input: 'CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4'
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on'
[22:25:29] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18

[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)'
[22:25:29] SMILES Parse Error: syntax error for input: 'CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x12a606128>>
Traceback (most recent call last):
  File "/Users/iwatobipen/.pyenv/versions/anaconda3-2.4.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 701, in __del__
TypeError: 'NoneType' object is not callable

Hmm…. I could not get suitable structure from Molecule autoencoder.
It has difficulty for molecule generator because structure of input data based on SMILES strings.  Now ratio of invalid smiles were high. But I think DeepChem and rdkit show nice combination for chemoinformatics.

Beyond the Ro5!

Recently, I was interested in an article of JMC.
http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00717

The author analyzed in-house compound selection and found rule that Easy-to-understand scouring function AB-MPS.

AB-MPS is defined by following equation.
AB-MPS = Abs( cLogP – 3 ) + NAR + NRB
Where NAR means number of aromatic rings and NRB means number of rotatable bounds.
They found that AB-MPS of beyond the Ro5 compounds shows good correlation with Oral bioavailability (F) and some kinds of ADMET parameters.
It is not true everywhere but I think the parameter is good indicator for medicinal chemist because easy to understand and based on in-house dataset ( for author’s company ). We can make more complex predictive model by using machine learning method, but the method is difficult to understand why these compounds are good.
In house dataset is key factor of its strengths.
I am still thinking about how to collect in-house data and how to use these dataset more efficiently.