New function of RDKit 2017.09 #RDKit

Recently I updated my rdkit env from 201703 to 201709 by using conda.
New version of rdkit was implemented cool function named rdRGroupDeompositon.
The function enable us to render RGroups as DataFrame.
I tried to visualize cdk2.sdf dataset.
Code that I wrote is bellow.(using jupyter notebook)

from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit.Chem import PandasTools
from rdkit.Chem import rdBase
from rdkit.Chem import RDConfig
from rdkit.Chem.Draw import IPythonConsole
import os
PandasTools.InstallPandasTools()
base = RDConfig.RDDocsDir
datapath = os.path.join( base, "Book/data/cdk2.sdf")
mols = [ mol for mol in Chem.SDMolSupplier( datapath ) if mol != None ]
# mol object that has 3D conformer information did not work well. So I remove the conformation info.
for m in mols: tmp = m.RemoveAllConformers()
# define core to RG decomposition.
core = Chem.MolFromSmiles('[nH]1cnc2cncnc21')
from rdkit.Chem import rdRGroupDecomposition
tables = PandasTools.LoadSDF( datapath )
rg = rdRGroupDecomposition.RGroupDecomposition( core )
for mol in mols[:5]:
    rg.Add( mol )
# Do RG deconpositon.
rg.Process()

Then visualize RGdecomp result.

import pandas as pd
PandasTools.molRepresentation="svg"
modlf = PandasTools.LoadSDF( datapath )
frame = pd.DataFrame( rg.GetRGroupsAsColumns() )
frame

Result is following image. 😉
New version of RDKit is cool & powerful tool for chemoinformatics. I really respect the developer of rdkit.

Advertisements

molecule encoder/decoder in deepchem #rdkit #deepchem

Today I updated deepchem in my mac.
It was easy to install new version of deepchem on Mac.

iwatobipen$ git clone https://github.com/deepchem/deepchem.git
iwatobipen$ cd deepchem
iwatobipen$ bash scripts/install_deepchem_conda.sh

That’s all. 😉

New version of deepchem is implemented MoleculeVAE. MoeculeVAE generates new molecules by using pre defined model.
Deepchem can use pre defined model that was trained with Zinc Dataset.
OK let’s run the code.
I tested moleculeVAE by reference to following web page.
https://www.deepchem.io/_modules/deepchem/models/autoencoder_models/test_tensorflowEncoders.html

Deepchem provides lots of useful function for data preparation.
For example, convert smiles to one hot vector and vise versa.
I used cdk2.sdf for structure generation.

from __future__ import print_function
import os
from rdkit import Chem, RDConfig
from rdkit.Chem import Draw
import deepchem as dc
from deepchem.models.autoencoder_models.autoencoder import TensorflowMoleculeEncoder, TensorflowMoleculeDecoder
from deepchem.feat.one_hot import zinc_charset
from deepchem.data import DiskDataset

datadir = os.path.join( RDConfig.RDDocsDir, 'Book/data/cdk2.sdf' )

mols = [ mol for mol in Chem.SDMolSupplier( datadir ) ]
smiles = [ Chem.MolToSmiles( mol ) for mol in mols ]
print( len( smiles ))

tf_encoder = TensorflowMoleculeEncoder.zinc_encoder()
tf_decoder = TensorflowMoleculeDecoder.zinc_decoder()

featurizer = dc.feat.one_hot.OneHotFeaturizer( zinc_charset, 120 )

# default setting ; encode smiles to one_hot vector and padding to 120 character.
features = featurizer( mols )
print( features.shape )

dataset = DiskDataset.from_numpy( features, features )
prediction = tf_encoder.predict_on_batch( dataset.X )
one_hot_dec = tf_decoder.predict_on_batch( prediction )
decoded_smiles = featurizer.untransform( one_hot_dec )

for smiles in decoded_smiles:
    print( smiles[0] )
    print( Chem.MolFromSmiles( smiles[0] ))
mols = [ Chem.MolFromSmiles( smi[0] ) for smi in decoded_smiles ]
im = Draw.MolsToGridImage( mols )
im.save( 'res.png' )

And results was …. ;-(

iwatobipen$ python molVAE.py
/Users/iwatobipen/.pyenv/versions/anaconda3-2.4.0/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Using TensorFlow backend.
47
2017-10-10 22:25:23.051035: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051056: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051060: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051064: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
(47, 120, 35)
TIMING: dataset construction took 0.059 s
Loading dataset from disk.
CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2
[22:25:29] SMILES Parse Error: syntax error for input: 'CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2'
None
CC(C))CCN1CCCC1)c2nc[nH+]nn2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C))CCN1CCCC1)c2nc[nH+]nn2'
None
CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1'
None
CC(C)(C)N1CCCCC1)c2nc[nH+]nn2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(C)N1CCCCC1)c2nc[nH+]nn2'
None
CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1'
None
Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C'
None
CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C'
None
Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C'
None
CC(CC)(C)CC(C)(C)CCNCCC(CCC)N
<rdkit.Chem.rdchem.Mol object at 0x10079c210>
COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3'
None
Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C'
None
Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C'
None
CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N'
None
CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO
[22:25:29] SMILES Parse Error: syntax error for input: 'CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO'
None
CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O
[22:25:29] SMILES Parse Error: syntax error for input: 'CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O'
None
CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3'
None
COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC'
None
CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2'
None
CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4'
None
Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C'
None
Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C'
None
Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)'
None
Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n'
None
CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O'
None
CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())'
None
CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F
[22:25:29] SMILES Parse Error: syntax error for input: 'CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F'
None
CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1'
None
CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O'
None
c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])
[22:25:29] SMILES Parse Error: syntax error for input: 'c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])'
None
CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3'
None
CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4
[22:25:29] SMILES Parse Error: syntax error for input: 'CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4'
None
CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on'
None
CN(C)C(=O)CC(C(=O)[O-])C(=O)CSc1ccc[nH+]c1=N
[22:25:29] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18

None
Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)'
None
CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2'
None
CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)'
None
CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC
[22:25:29] SMILES Parse Error: syntax error for input: 'CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC'
None
CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())'
None
CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1'
None
CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1'
None
CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1'
None
C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O'
None
Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C'
None
C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1'
None
CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC'
None
CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))'
None
CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1
[22:25:29] SMILES Parse Error: syntax error for input: 'CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1'
None
[22:25:29] SMILES Parse Error: syntax error for input: 'CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C))CCN1CCCC1)c2nc[nH+]nn2'
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(C)N1CCCCC1)c2nc[nH+]nn2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1'
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C'
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO'
[22:25:29] SMILES Parse Error: syntax error for input: 'CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C'
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())'
[22:25:29] SMILES Parse Error: syntax error for input: 'CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])'
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3'
[22:25:29] SMILES Parse Error: syntax error for input: 'CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4'
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on'
[22:25:29] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18

[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)'
[22:25:29] SMILES Parse Error: syntax error for input: 'CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x12a606128>>
Traceback (most recent call last):
  File "/Users/iwatobipen/.pyenv/versions/anaconda3-2.4.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 701, in __del__
TypeError: 'NoneType' object is not callable

Hmm…. I could not get suitable structure from Molecule autoencoder.
It has difficulty for molecule generator because structure of input data based on SMILES strings.  Now ratio of invalid smiles were high. But I think DeepChem and rdkit show nice combination for chemoinformatics.

tensorboard embeddings + RDKit #RDKit

Mainly I use Keras for deep learning. Because Keras is easy to use and easy to understand for me.
Keras has callback function to call tensorboard. But It has difficulties in use tensorboard embeddings.
You know, tensorboard embeddings is unique function to visualize future of word vectors.
I want to use tensorboard embeddings for visualization of chemical space.
Basic introduction of embeddings is described in following URL.
https://www.tensorflow.org/programmers_guide/embedding

I referred following URL and changed some lines.
https://github.com/normanheckscher/mnist-tensorboard-embeddings/blob/master/mnist_t-sne.py

Following code will read SDF and calculate Fingerprints and perform PCA or t-SNE.
And the results can view via tensorboard. Fortunately, RDKit has MolsToGridImage function. The function is useful to make spriteimage for embeddings !!!

mport numpy as np
import pandas as pd
import sys
import argparse
import os
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from rdkit.Chem import Draw

import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector

FLAGS = None

def getFpArr( mols, nBits = 1024 ):
    fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits=nBits ) for mol in mols ]
    X = []
    for fp in fps:
        arr = np.zeros( (1,) )
        DataStructs.ConvertToNumpyArray( fp, arr )
        X.append( arr )
    return np.array( X )

def getResponse( mols, prop="ACTIVITY" ):
    Y = []
    for mol in mols:
        act = mol.GetProp( prop )
        act = 9. - np.log10( float( act ) )
        if act >= 6:
            Y.append(np.asarray( [1,0] ))
        else:
            Y.append(np.asarray( [0,1] ))
    return np.asarray( Y )


def generate_embeddings():
    sdf = Chem.SDMolSupplier( FLAGS.sdf )
    X = getFpArr( [ mol for mol in sdf ]  )
    sess = tf.InteractiveSession()
    with tf.device( '/cpu:0' ):
        embedding = tf.Variable( tf.stack( X[:], axis=0 ), trainable=False, name='embedding' )
    tf.global_variables_initializer().run()
    saver = tf.train.Saver()
    writer = tf.summary.FileWriter( FLAGS.log_dir+'/projector', sess.graph )
    config = projector.ProjectorConfig()
    embed = config.embeddings.add()
    embed.tensor_name = 'embedding:0'
    embed.metadata_path = os.path.join( FLAGS.log_dir + '/projector/metadata.tsv' )
    embed.sprite.image_path = os.path.join( FLAGS.data_dir + '/mols.png' )
    embed.sprite.single_image_dim.extend( [100, 100] )
    projector.visualize_embeddings( writer, config )
    saver.save( sess, os.path.join(FLAGS.log_dir, 'projector/amodel.ckpt'), global_step=len(X) )
def generate_metadata_file():
    sdf = Chem.SDMolSupplier( FLAGS.sdf )
    Y = getResponse( [ mol for mol in sdf ])
    def save_metadata( file ):
        with open( file, 'w' ) as f:
            f.write('id\tactivity_class\n')
            for i in range( Y.shape[0] ):
                c = np.nonzero( Y[i] )[0][0]
                f.write( '{}\t{}\n'.format( i, c ))
    save_metadata( FLAGS.log_dir + '/projector/metadata.tsv' )

def main(_):
    if tf.gfile.Exists( FLAGS.log_dir+'/projector' ):
        tf.gfile.DeleteRecursively( FLAGS.log_dir+'/projector' )
        tf.gfile.MkDir( FLAGS.log_dir + '/projector' )
    tf.gfile.MakeDirs( FLAGS.log_dir + '/projector' )
    generate_metadata_file()
    generate_embeddings()

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument( '--sdf', type=str )
    parser.add_argument( '--log_dir', type=str, default='/Users/iwatobipen/develop/py35env/testfolder/tensorflowtest/mollog' )
    parser.add_argument( '--data_dir', type=str, default='/Users/iwatobipen/develop/py35env/testfolder/tensorflowtest/mollog')

    FLAGS, unparsed = parser.parse_known_args()
    sdf = [ mol for mol in Chem.SDMolSupplier( FLAGS.sdf ) ]
    im = Draw.MolsToGridImage( sdf, molsPerRow=10, subImgSize=( 100, 100 ))
    im.save( os.path.join( FLAGS.data_dir + '/mols.png' ))
    tf.app.run( main=main, argv=[sys.argv[0]] + unparsed )

To run the code.
Type

$ python tensormolembedding.py --sdf your.sdf

Then launch tensorboard and access localhost:6006.

$ tensorboard --logdir your_log_dir

Then, I could get following image.
This image is results of PCA, but also it can perform t-SNE analysis.

I pushed my code to my repo.
https://github.com/iwatobipen/deeplearning/tree/master/tensorflowembedding
Tensorflow has cool function ;-).

Handle pymol via CUI.

I often use Pymol to visualize PDB files.
Recently I want to merge some PDB files in one Pymol session file from CUI.
Because I run the task as batch. So I searched API document and tried it.
At first I need launch pymol in silent mode ( no GUI ).
And then load pdb files.
Next I set color of each object by b factor as spectrum.
Finally save object as pymol session file and closed pymol.
Every thing worked well.

Following code and sample files are pushed my repo.
https://github.com/iwatobipen/pymolscript

#test

import pymol
from pymol import cmd
pymol.finish_launching(['pymol','-qc'])
cmd.load('1atp.pdb')
cmd.load('1atp2.pdb')
cmd.load('1atp3.pdb')
cmd.spectrum('b', 'blue_white_red','1atp', 0, 100)
cmd.spectrum('b', 'yellow_cyan_blue','1atp2', 0, 100)
cmd.spectrum('b', 'green_magenta','1atp3', 0, 100)
cmd.save('somecolors.pse')
pymol.finish_launching()

Create MMPDB ( matched molecular pair )!

Matched molecular pair analysis is very common method to analyze SAR for medicinal chemists. There are lots of publications about it and applications in these area.
I often use rdkit/Contrib/mmpa to make my own MMP dataset.
The origin of the algorithm is described in following URL.
https://www.ncbi.nlm.nih.gov/pubmed/20121045

Yesterday, good news announced by @RDKit_org. It is release the package that can make MMPDB.
I tried to use the package immediately.
This package is provided from github repo. And to use the package, I need to install apsw at first. APSW can install by using conda.
And the install mmpdb by python script.

iwatobipen$ conda install -c conda-forge apsw
iwatobipen$ git clone https://github.com/rdkit/mmpdb.git
iwatobipen$ cd mmpdb
iwatobipen$ python setup.py install

After success of installation, I could found mmpdb command in terminal.
I used CYP3A4 inhibition data from ChEMBL for the test.
I prepared two files, one has smiles and id, and the another has id and ic50 value.
* means missing value. In the following case, I provided single property ( IC50 ) but the package can handle multiple properties. If reader who is interested the package, please show more details by using mmpdb –help command.

iwatobipen$ head -n 10 chembl_cyp3a4.csv 
CANONICAL_SMILES,MOLREGNO
Cc1ccccc1c2cc(C(=O)n3cccn3)c4cc(Cl)ccc4n2,924282
CN(C)CCCN1c2ccccc2CCc3ccccc13,605
Cc1ccc(cc1)S(=O)(=O)\N=C(/c2ccc(F)cc2)\n3c(C)nc4ccccc34,1698776
NC[C@@H]1O[C@@H](Cc2c(O)c(O)ccc12)C34CC5CC(CC(C5)C3)C4,59721
Cc1ccc(cc1)S(=O)(=O)N(Cc2ccccc2)c3ccccc3C(=O)NCc4occc4,759749
O=C(N1CCC2(CC1)CN(C2)c3ccc(cc3)c4ccccc4)c5ccncc5,819161
iwatobipen$ head -n 10 prop.csv 
ID	STANDARD_VALUE
924282	*
605	*
1698776	*
59721	19952.62
759749	2511.89
819161	2511.89

mmdb fragment has –cut-smarts option.
It seems attractive for me! 😉
”’
–cut-smarts SMARTS alternate SMARTS pattern to use for cutting (default:
‘[#6+0;!$(*=,#[!#6])]!@!=!#[!#0;!#1;!$([CH2]);!$([CH3]
[CH2])]’), or use one of: ‘default’,
‘cut_AlkylChains’, ‘cut_Amides’, ‘cut_all’,
‘exocyclic’, ‘exocyclic_NoMethyl’
”’
Next step, make mmpdb and join the property to db.

# run fragmentation and my input file has header, delimiter is comma ( default is white space ). Output file is cyp3a4.fragments.
# Each line of inputfile must be unique!
iwatobipen$ mmpdb fragment chembl_cyp3a4.csv --has-header --delimiter 'comma' -o cyp3a4.fragments
# rung indexing with fragmented file and create a mmpdb. 
iwatobipen$ mmpdb index cyp3a4.fragments -o cyp3a4.mmpdb

OK I got cyp3a4.mmpdb file. (sqlite3 format)
Add properties to a DB.
Type following command.

iwatobipen$ mmpdb loadprops -p prop.csv cyp3a4.mmpdb
Using dataset: MMPs from 'cyp3a4.fragments'
Reading properties from 'prop.csv'
Read 1 properties for 17143 compounds from 'prop.csv'
5944 compounds from 'prop.csv' are not in the dataset at 'cyp3a4.mmpdb'
Imported 5586 'STANDARD_VALUE' records (5586 new, 0 updated).
Generated 83759 rule statistics (1329408 rule environments, 1 properties)
Number of rule statistics added: 83759 updated: 0 deleted: 0
Loaded all properties and re-computed all rule statistics.

Ready to use DB. Let’s play with the DB.
Identify possible transforms.

iwatobipen$ mmpdb transform --smiles 'c1ccc(O)cc1' cyp3a4.mmpdb --min-pair 10 -o transfom_res.txt
iwatobipen$ head -n3 transfom_res.txt 
ID	SMILES	STANDARD_VALUE_from_smiles	STANDARD_VALUE_to_smiles	STANDARD_VALUE_radius	STANDARD_VALUE_fingerprint	STANDARD_VALUE_rule_environment_id	STANDARD_VALUE_counSTANDARD_VALUE_avg	STANDARD_VALUE_std	STANDARD_VALUE_kurtosis	STANDARD_VALUE_skewness	STANDARD_VALUE_min	STANDARD_VALUE_q1	STANDARD_VALUE_median	STANDARD_VALUE_q3	STANDARD_VALUE_max	STANDARD_VALUE_paired_t	STANDARD_VALUE_p_value
1	CC(=O)NCCO	[*:1]c1ccccc1	[*:1]CCNC(C)=O	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1049493	14	3632	5313.6	-0.71409	-0.033683	-6279.7	498.81	2190.5	7363.4	12530	-2.5576	0.023849
2	CC(C)CO	[*:1]c1ccccc1	[*:1]CC(C)C	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1026671	20	7390.7	8556.1	-1.1253	-0.082107	-6503.9	-0	8666.3	13903	23534	-3.863	0.0010478

Output file has information of transformation with statistics values.
And the db can use to make a prediction.
Following command can generate two files with prefix CYP3A-.
CYP3A_pairs.txt
CYP3A_rules.txt

iwatobipen$ mmpdb predict --reference 'c1ccc(O)cc1' --smiles 'c1ccccc1' cyp3a4.mmpdb  -p STANDARD_VALUE --save-details --prefix CYP3A
iwatobipen$ head -n 3 CYP3A_pairs.txt
rule_environment_id	from_smiles	to_smiles	radius	fingerprint	lhs_public_id	rhs_public_id	lhs_smiles	rhs_smiles	lhs_value	rhs_value	delta
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	1016823	839661	C[C@]12CC[C@@H]3[C@H](CC[C@H]4C[C@@H](O)CC[C@@]43C)[C@@H]1CC[C@H]2C(=O)CO	CC(=O)[C@@H]1CC[C@H]2[C@H]3CC[C@H]4C[C@@H](O)CC[C@]4(C)[C@@H]3CC[C@@]21C	1000	15849	14849
868610	[*:1]O	[*:1][H]	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	3666	47209	O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	O=c1cc(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12	15849	5011.9	-10837
iwatobipen$ head -n 3 CYP3A_rules.txt 
rule_environment_statistics_id	rule_id	rule_environment_id	radius	fingerprint	from_smiles	to_smiles	count	avg	std	kurtosis	skewness	min	q1	median	q3	max	paired_t	p_value
28699	143276	868610	0	59SlQURkWt98BOD1VlKTGRkiqFDbG6JVkeTJ3ex3bOA	[*:1]O	[*:1][H]	16	-587.88	14102	-0.47579	-0.065761	-28460	-8991.5	-3247.8	10238	23962	0.16674	0.8698
54091	143276	1140189	1	tLP3hvftAkp3EUY+MHSruGd0iZ/pu5nwnEwNA+NiAh8	[*:1]O	[*:1][H]	15	-1617	13962	-0.25757	-0.18897	-28460	-9534.4	-4646	7271.1	23962	0.44855	0.66062

It is worth that the package ca handle not only structure based information but also properties.
I learned a lot of things from the source code.
RDKit org is cool community!
I pushed my code to my repo.
https://github.com/iwatobipen/mmpdb_test

original repo URL is
https://github.com/rdkit/mmpdb
Do not miss it!

3d conformer fingerprint calculation using RDKit # RDKit

Recently, attractive article was published in ACS journal.
The article describes how to calculate 3D structure based fingerprint and compare some finger prints that are well known in these area.
New method called “E3FP” is algorithm to calculate 3D conformer fingerprint like Extended Connectivity Fingerprint(ECFP). E3FP encodes information only atoms that are connected but also atoms that are not connected.
http://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.7b00696

The author showed several examples. Very similar in 2D but not similar in 3D and vice versa.
Also compare E3FP similarity and ROCS score( TANIMOTO COMBO ) and showed good performance.
I was interested in the fingerprint. Fortunately, the author published the code in Anaconda cloud!!!!!!!
Install it and use it ASAP. ;-D
I am mac user, so installation is very very very easy! Just type.
I found some tips to use the package.
At first, molecules need _Name property to perform calculation.
Second, mol_from_sdf can read molecule from sdf but the function can not read sdf that has multiple molecules. So, I recommend to use molecule list instead of SDF.

conda install -c sdaxen sdaxen_python_utilities
conda install -c keiserlab e3fp

I used CDK2.sdf for test.
E3FP calculates unfolded finger print. But it can convert folded fingerprint and rdkit fingerprint using flod and to_rdkit function.

%matplotlib inline
import pandas as pd
import numpy as np
from rdkit import Chem
from e3fp.fingerprint.generate import fp, fprints_dict_from_mol
from e3fp.conformer.generate import generate_conformers
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
IPythonConsole.ipython_useSVG=True
# this sdf has 3D conformer, so I do not need to generate 3D conf.
mols = [ mol for mol in Chem.SDMolSupplier( "cdk2.sdf", removeHs=False ) ]
fpdicts = [ fprints_dict_from_mol( mol ) for mol in mols ]
# get e3fp fingerprint
# if molecule has multiple conformers the function will generate multiple fingerprints.
fps = [ fp[5][0] for fp in fpdicts]
# convert to rdkit fp from e3fp fingerprint
binfp = [ fp.fold().to_rdkit() for fp in fps ]
# getmorganfp
morganfp = [ AllChem.GetMorganFingerprintAsBitVect(mol,2) for mol in mols ]

# calculate pair wise TC
df = {"MOLI":[], "MOLJ":[], "E3FPTC":[], "MORGANTC":[],"pairidx":[]}
for i in range( len(binfp) ):
    for j in range( i ):
        e3fpTC = DataStructs.TanimotoSimilarity( binfp[i], binfp[j] )
        morganTC = DataStructs.TanimotoSimilarity( morganfp[i], morganfp[j] )
        moli = mols[i].GetProp("_Name")
        molj = mols[j].GetProp("_Name")
        df["MOLI"].append( moli )
        df["MOLJ"].append( molj )
        df["E3FPTC"].append( e3fpTC )
        df["MORGANTC"].append( morganTC )
        df["pairidx"].append( str(i)+"_vs_"+str(j) )

The method is fast and easy to use. Bottle neck is how to generate suitable conformer(s).
Readers who interested in the package, please check the authors article.

I pushed my sample code to my repo.
https://github.com/iwatobipen/3dfp/blob/master/sample_script.ipynb

Platfrom-as-a-Service for Deep Learning.

Yesterday, I enjoyed mishima.syk #10. I uploaded my presentation and code to mishimasyk repo.
I introduced briefly about a PaaS for DL named ‘Floyd’. I think the service is interesting because I can run DL on cloud with GPU!

So, I describe very simple example to start DL with “FLOYD” 😉
At first, Make account from the site.
Next, install command line tools. Just type pip install -U floyd-cli!

# Install floyd-cli
$ pip install -U floyd-cli

Third step, login the floyd.

# from terminal
$ floyd login

Then web browser will launch and the page provides authentication token. Copy and paste it.
Ready to start!
Let’s play with floyd.
Fist example is iris dataset classification using sklearn.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
dataset = load_iris()
X = dataset.data
y = dataset.target

trainx, testx, trainy, testy = train_test_split( X, y, test_size=0.2,random_state = 123 )

svc = SVC( kernel='rbf' )
svc.fit( trainx, trainy )

rfc = RandomForestClassifier()
rfc.fit( trainx, trainy )

predsvc = svc.predict( testx )
predrf = rfc.predict( testx )

print( classification_report(testy, predsvc ))

Use floyd run command to start the code after initialize the project.

$ mkdir test_pj
$ cd test_pj
$ floyd init
$ floyd run 'python svc_rf_test.py'
Creating project run. Total upload size: 168.9KiB
Syncing code ...
[================================] 174656/174656 - 00:00:02
Done
RUN ID                  NAME                     VERSION
----------------------  ---------------------  ---------
xxxxxxxx  iwatobipen/test_pj:10         10

To view logs enter:
    floyd logs xxxxxxxx

I could check the status via web browser.

Next run the DNN classification model.
It is very very simple example. not so deeeeeeeeeeeep.

mport numpy as np
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import np_utils

dataset = load_iris()
X = dataset.data

xdim = 4
y = dataset.target
y = np_utils.to_categorical( y, 3 )
trainx, testx, trainy, testy = train_test_split( X, y, test_size=0.2,random_state = 123 )

model = Sequential()
model.add( Dense( 16, input_dim = xdim  ) )
model.add( Activation( 'relu' ))
model.add( Dense( 3 ))
model.add( Activation( 'softmax' ))
model.compile( loss = 'categorical_crossentropy',
               optimizer = 'rmsprop',
               metrics = ['accuracy'])

hist = model.fit( trainx, trainy, epochs = 50, batch_size = 1 )
classes = model.predict( testx, batch_size = 1 )

print( [ np.argmax(i) for i in classes ] )
print( [ np.argmax(i) for i in testy ] )
loss, acc = model.evaluate( testx, testy )

print( "loss, acc ={0},{1}".format( loss, acc ))

To run the code in the same manner.

iwatobipen$ floyd run 'python dnn_test.py'
Creating project run. Total upload size: 168.9KiB
Syncing code ...
[================================] 174653/174653 - 00:00:02
Done
RUN ID                  NAME                     VERSION
----------------------  ---------------------  ---------
xxxxxxx  iwatobipen/test_pj:11         11

To view logs enter:
    floyd logs xxxxxxx

Check the log from web site.

2017-07-09 01:51:37,703 INFO - Preparing to run TaskInstance <TaskInstance: iwatobipen/test_pj:11 (id: Uus7cp996732cBWdgt3nz3) (checksum: 144078ab50a63ea6276efee221669d13) (last update: 2017-07-09 01:51:37.694913) [queued]>
2017-07-09 01:51:37,723 INFO - Starting attempt 1 at 2017-07-09 01:51:37.708707
2017-07-09 01:51:38,378 INFO - adding pip install -r floyd_requirements
2017-07-09 01:51:38,394 INFO - Executing command in container: stdbuf -o0 sh command.sh
2017-07-09 01:51:38,394 INFO - Pulling Docker image: floydhub/tensorflow:1.1.0-py3_aws.4
2017-07-09 01:51:39,652 INFO - Starting container...
2017-07-09 01:51:39,849 INFO -
################################################################################

2017-07-09 01:51:39,849 INFO - Run Output:
2017-07-09 01:51:40,317 INFO - Requirement already satisfied: Pillow in /usr/local/lib/python3.5/site-packages (from -r floyd_requirements.txt (line 1))
2017-07-09 01:51:40,320 INFO - Requirement already satisfied: olefile in /usr/local/lib/python3.5/site-packages (from Pillow->-r floyd_requirements.txt (line 1))
2017-07-09 01:51:43,354 INFO - Epoch 1/50
2017-07-09 01:51:43,460 INFO - 1/120 [..............................] - ETA: 8s - loss: 0.8263 - acc: 0.0000e+00
 58/120 [=============>................] - ETA: 0s - loss: 1.5267 - acc: 0.6552
115/120 [===========================>..] - ETA: 0s - loss: 1.2341 - acc: 0.6522
120/120 [==============================] - 0s - loss: 1.2133 - acc: 0.6583
2017-07-09 01:51:43,461 INFO - Epoch 2/50
..........................
 57/120 [=============>................] - ETA: 0s - loss: 0.1135 - acc: 0.9649
115/120 [===========================>..] - ETA: 0s - loss: 0.1242 - acc: 0.9739
120/120 [==============================] - 0s - loss: 0.1270 - acc: 0.9750
2017-07-09 01:51:48,660 INFO - Epoch 50/50
2017-07-09 01:51:48,799 INFO - 1/120 [..............................] - ETA: 0s - loss: 0.0256 - acc: 1.0000
 57/120 [=============>................] - ETA: 0s - loss: 0.0911 - acc: 0.9825
114/120 [===========================>..] - ETA: 0s - loss: 0.1146 - acc: 0.9737
120/120 [==============================] - 0s - loss: 0.1161 - acc: 0.9750
2017-07-09 01:51:48,799 INFO - [1, 2, 2, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 2, 2, 2, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 2, 2, 0]
2017-07-09 01:51:48,800 INFO - [1, 2, 2, 1, 0, 2, 1, 0, 0, 1, 2, 0, 1, 2, 2, 2, 0, 0, 1, 0, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0]
2017-07-09 01:51:48,800 INFO - 30/30 [==============================] - 0s
2017-07-09 01:51:48,800 INFO - loss, acc =0.23778462409973145,0.8666666746139526

The following software packages (in addition to many other common libraries) are available in all the environments:
h5py, iPython, Jupyter, matplotlib, numpy, OpenCV, Pandas, Pillow, scikit-learn, scipy, sklearn

Also, user can install additional packages from pypi. ( Not anaconda … 😦 ) To install that, put file named ‘floyd_requirements.txt’ in the project folder.

In summary, Floyd is very interesting service. Easy to set up DL environment and use GPU on cloud.
I want to support anaconda in FLOYD, because I want to use chemoinformatics package like RDKit, Openbabel etc…