New function of RDKit 2017.09 #RDKit

Recently I updated my rdkit env from 201703 to 201709 by using conda.
New version of rdkit was implemented cool function named rdRGroupDeompositon.
The function enable us to render RGroups as DataFrame.
I tried to visualize cdk2.sdf dataset.
Code that I wrote is bellow.(using jupyter notebook)

from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit.Chem import PandasTools
from rdkit.Chem import rdBase
from rdkit.Chem import RDConfig
from rdkit.Chem.Draw import IPythonConsole
import os
PandasTools.InstallPandasTools()
base = RDConfig.RDDocsDir
datapath = os.path.join( base, "Book/data/cdk2.sdf")
mols = [ mol for mol in Chem.SDMolSupplier( datapath ) if mol != None ]
# mol object that has 3D conformer information did not work well. So I remove the conformation info.
for m in mols: tmp = m.RemoveAllConformers()
# define core to RG decomposition.
core = Chem.MolFromSmiles('[nH]1cnc2cncnc21')
from rdkit.Chem import rdRGroupDecomposition
tables = PandasTools.LoadSDF( datapath )
rg = rdRGroupDecomposition.RGroupDecomposition( core )
for mol in mols[:5]:
    rg.Add( mol )
# Do RG deconpositon.
rg.Process()

Then visualize RGdecomp result.

import pandas as pd
PandasTools.molRepresentation="svg"
modlf = PandasTools.LoadSDF( datapath )
frame = pd.DataFrame( rg.GetRGroupsAsColumns() )
frame

Result is following image. 😉
New version of RDKit is cool & powerful tool for chemoinformatics. I really respect the developer of rdkit.

Advertisements

molecule encoder/decoder in deepchem #rdkit #deepchem

Today I updated deepchem in my mac.
It was easy to install new version of deepchem on Mac.

iwatobipen$ git clone https://github.com/deepchem/deepchem.git
iwatobipen$ cd deepchem
iwatobipen$ bash scripts/install_deepchem_conda.sh

That’s all. 😉

New version of deepchem is implemented MoleculeVAE. MoeculeVAE generates new molecules by using pre defined model.
Deepchem can use pre defined model that was trained with Zinc Dataset.
OK let’s run the code.
I tested moleculeVAE by reference to following web page.
https://www.deepchem.io/_modules/deepchem/models/autoencoder_models/test_tensorflowEncoders.html

Deepchem provides lots of useful function for data preparation.
For example, convert smiles to one hot vector and vise versa.
I used cdk2.sdf for structure generation.

from __future__ import print_function
import os
from rdkit import Chem, RDConfig
from rdkit.Chem import Draw
import deepchem as dc
from deepchem.models.autoencoder_models.autoencoder import TensorflowMoleculeEncoder, TensorflowMoleculeDecoder
from deepchem.feat.one_hot import zinc_charset
from deepchem.data import DiskDataset

datadir = os.path.join( RDConfig.RDDocsDir, 'Book/data/cdk2.sdf' )

mols = [ mol for mol in Chem.SDMolSupplier( datadir ) ]
smiles = [ Chem.MolToSmiles( mol ) for mol in mols ]
print( len( smiles ))

tf_encoder = TensorflowMoleculeEncoder.zinc_encoder()
tf_decoder = TensorflowMoleculeDecoder.zinc_decoder()

featurizer = dc.feat.one_hot.OneHotFeaturizer( zinc_charset, 120 )

# default setting ; encode smiles to one_hot vector and padding to 120 character.
features = featurizer( mols )
print( features.shape )

dataset = DiskDataset.from_numpy( features, features )
prediction = tf_encoder.predict_on_batch( dataset.X )
one_hot_dec = tf_decoder.predict_on_batch( prediction )
decoded_smiles = featurizer.untransform( one_hot_dec )

for smiles in decoded_smiles:
    print( smiles[0] )
    print( Chem.MolFromSmiles( smiles[0] ))
mols = [ Chem.MolFromSmiles( smi[0] ) for smi in decoded_smiles ]
im = Draw.MolsToGridImage( mols )
im.save( 'res.png' )

And results was …. ;-(

iwatobipen$ python molVAE.py
/Users/iwatobipen/.pyenv/versions/anaconda3-2.4.0/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Using TensorFlow backend.
47
2017-10-10 22:25:23.051035: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051056: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051060: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-10 22:25:23.051064: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
(47, 120, 35)
TIMING: dataset construction took 0.059 s
Loading dataset from disk.
CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2
[22:25:29] SMILES Parse Error: syntax error for input: 'CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2'
None
CC(C))CCN1CCCC1)c2nc[nH+]nn2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C))CCN1CCCC1)c2nc[nH+]nn2'
None
CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1'
None
CC(C)(C)N1CCCCC1)c2nc[nH+]nn2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(C)N1CCCCC1)c2nc[nH+]nn2'
None
CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1'
None
Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C'
None
CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C'
None
Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C'
None
CC(CC)(C)CC(C)(C)CCNCCC(CCC)N
<rdkit.Chem.rdchem.Mol object at 0x10079c210>
COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3'
None
Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C'
None
Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C'
None
CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N'
None
CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO
[22:25:29] SMILES Parse Error: syntax error for input: 'CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO'
None
CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O
[22:25:29] SMILES Parse Error: syntax error for input: 'CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O'
None
CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3'
None
COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC'
None
CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2'
None
CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4'
None
Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C'
None
Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C'
None
Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)'
None
Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n'
None
CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O'
None
CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())'
None
CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F
[22:25:29] SMILES Parse Error: syntax error for input: 'CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F'
None
CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1'
None
CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O'
None
c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])
[22:25:29] SMILES Parse Error: syntax error for input: 'c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])'
None
CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3'
None
CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4
[22:25:29] SMILES Parse Error: syntax error for input: 'CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4'
None
CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on'
None
CN(C)C(=O)CC(C(=O)[O-])C(=O)CSc1ccc[nH+]c1=N
[22:25:29] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18

None
Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)'
None
CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2'
None
CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)'
None
CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC
[22:25:29] SMILES Parse Error: syntax error for input: 'CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC'
None
CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())'
None
CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1'
None
CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1'
None
CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1'
None
C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O'
None
Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C'
None
C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1'
None
CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC'
None
CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))'
None
CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1
[22:25:29] SMILES Parse Error: syntax error for input: 'CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1'
None
[22:25:29] SMILES Parse Error: syntax error for input: 'CCCCNC(=O)CCn1c(=O)ncc[n1)ccn2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C))CCN1CCCC1)c2nc[nH+]nn2'
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'CC(C)(C)C(CCC(=O)NCCc1cc[nH+]cn1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(C)N1CCCCC1)c2nc[nH+]nn2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)CCNCCCC)CN=C)c1cc[nH+]cn1'
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnnc1SCc2ccccc2Crc2NCCC))F)C'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((C)))C(=O)NCc1ccccc1)c(/c(=C)C(C(=])C'
[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1ccnn1CCC(=O)N((C)(C)Ccc2[n(ccn2)CCC)(C)C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(c1)CC)Cc2cc((cn=O)C(=O)N3'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1ncsc(=O)n1C)CC(=O)Nc2ccss[N/C(=O)[C)[O-])C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1c(c(=O)n2c(n1)C(=O)CC(C)C)C)C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CN(C))(O)CNN(Cc1cccn1+c2cc[nH]c2c2N'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC11ccc2n1c(ccnH=)Nc3cc((c(c3=O)C)NCC2OO'
[22:25:29] SMILES Parse Error: syntax error for input: 'CCNH+]1CCc2cnc2c1c(n(c3=))c4cc(cc(cc3C)C))CC1=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)Nc1cc(ccc1OC)CCc2c3c([nH+nn2)cccco3'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'COc1ccc2c(n1)C)C)ccnc3ccccc3)NCC=)))C(=O)CC'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)c1ccc(c(c1)C)2CC=c3cc(ccc3)NC(C)C)C(=O)N2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(=O)Nc1cccccc1OC)CCc2c3c([nH+]n23cccccc4'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cccc2c11[nH]nc(c2))CCC(=O)N(C3)CCc3ccc(H+]c)C'
[22:25:29] SMILES Parse Error: extra open parentheses for input: 'Cc1cc(cc211[nH]cc(c2)CCCC(=O)CC(=O)NC3CCCn4ccc(=O)n4C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Ccc1ccc2ccnn))n3c2ccc1)c4ccccc4F)'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1nc(c2c(nn))n3c2ccc3)S4ccs2)C(C)(C[NH+]CCC))C)))n'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC(CCc1ccncc1)Cc2cccc3c2NN3CCNCC3=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(=O)(=OOc1ccn1C(=C)C(=O)N)2cc(=O)(c2=O)())'
[22:25:29] SMILES Parse Error: syntax error for input: 'CNS(=O)(=O)c(ccn1C((C)C(=O)Nc2ccc(cc2Cl)F'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCCC)CCC(=O)(=O)c1cO)1C(CC)C(=O)Nc2ccccc2F)/s1'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC((Cc1c(=O)nnc(=O)C(=O)CC(C)))c2ccccc2C2=O'
[22:25:29] SMILES Parse Error: syntax error for input: 'c1cs=c1C(=O)N(C2(CCCC2))c3c[nH]c(=O)])'
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc(1CCc=O)NCC2CCC(CC2)c3c[nH+]c(=+)n3'
[22:25:29] SMILES Parse Error: syntax error for input: 'CN(Cc1ccccc1)c2nc(nnn+]c2c(c2SCc4ccccc4'
[22:25:29] SMILES Parse Error: syntax error for input: 'CS(=O)(=O)c1cc==O)2c(n3c2cccc1SCC(=O)NO)))c(cc)/(([O)on'
[22:25:29] Can't kekulize mol.  Unkekulized atoms: 14 15 16 17 18

[22:25:29] SMILES Parse Error: syntax error for input: 'Cc1c=O(c=n1Cc2cccn2)c3c(n2)CC((((=C)C)])CC-/)'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CC))C(=O)NCc1cccnc1Cl)CCc(=O)(=))CC///n2'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CC(C)(CCN)C(=O)CSc1cc(=O)c(=CC))]))))))))S2/c2)=O)C))/s)'
[22:25:29] SMILES Parse Error: syntax error for input: 'CCN1CC11C(C2)C(=O)N2C2N(C2S((=O)NC3CC2)C(=O)OCC'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCN=N)c1cc=1Cc==)(=O)C)(C))C)CC2CCCCCC2)))())'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(CCc2ccccc2)C[=O)c3c(=O)n(c)+n3)c3nc(nc))c1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'CCS(=O)(=O)CS(=O)NC1CC)C(=O)Nc1ccc2c1cccc1F))n1'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CC1CCc2c(s2c2nc(n(=O)n)NC(=O)c3cccc(c3)SS(=O)=OO)C1'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH+]1CCCC(=O)[O-])CCc1c(=O)oc2c2c2cccc2C(s=])CCCC2=O'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'Cc1cn(c(=O)c1C)CC(=O)Nc2ccss[N]S(=O)N)C(O)C)3(CCCC(C)\C)//)))C'
[22:25:29] SMILES Parse Error: extra close parentheses for input: 'C[NH]11CCCC(=O)[O-])NC(=O)CN1C)c2ccn12)C((O)/O))C)CCCC)2)C1'
[22:25:29] SMILES Parse Error: unclosed ring for input: 'CCN1CC11(CCN)n(c(=NCc2ccccc2))4ccn23C4CCCC4=O2CCC'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC(C=()Scc1n[nH]c2c(n1)CCCC(=O)CC(=O)N3CC[NH+](CC3)CCc4cccon)))'
[22:25:29] SMILES Parse Error: syntax error for input: 'CC1CCc2c(c3cccccn2)C[H+]ccc(1O((((=+]3)Cc4ccss4Cl)1'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x12a606128>>
Traceback (most recent call last):
  File "/Users/iwatobipen/.pyenv/versions/anaconda3-2.4.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 701, in __del__
TypeError: 'NoneType' object is not callable

Hmm…. I could not get suitable structure from Molecule autoencoder.
It has difficulty for molecule generator because structure of input data based on SMILES strings.  Now ratio of invalid smiles were high. But I think DeepChem and rdkit show nice combination for chemoinformatics.

Beyond the Ro5!

Recently, I was interested in an article of JMC.
http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00717

The author analyzed in-house compound selection and found rule that Easy-to-understand scouring function AB-MPS.

AB-MPS is defined by following equation.
AB-MPS = Abs( cLogP – 3 ) + NAR + NRB
Where NAR means number of aromatic rings and NRB means number of rotatable bounds.
They found that AB-MPS of beyond the Ro5 compounds shows good correlation with Oral bioavailability (F) and some kinds of ADMET parameters.
It is not true everywhere but I think the parameter is good indicator for medicinal chemist because easy to understand and based on in-house dataset ( for author’s company ). We can make more complex predictive model by using machine learning method, but the method is difficult to understand why these compounds are good.
In house dataset is key factor of its strengths.
I am still thinking about how to collect in-house data and how to use these dataset more efficiently.

Camp! Asagiri jamboree auto camping ground

当たったらいいねって冗談半分で出したハガキにめでたく当選して、初めてキャンプに行って来ました。道具なんて持ってないですから0から買い揃えましたよ。
テント、タープ、テーブル、食器チョイチョイ、シュラフ、などなど。

初めてのキャンプということで天候が不安でしたが、とても良い天気で風もなく楽しむことができました。私はひどい日焼けになってしまい鼻の頭が真っ赤っかになりました、、、今回はSwen、colemanさんの初心者向け企画だったので、クッカー、食材などは提供していただけるという楽々プラン。

現地でまずは説明書とにらめっこしながらタープとテントを張って、、、
お昼食べて休憩後色々、遊んだり、火起こしやったりとアウトドアを楽しみ、

夕方からはダッチオーブンを使ってポトフを作り、ステーキ焼いて、米炊いて。。
富士山も綺麗にくっきりと見えるし、素敵な感じですね。
今回は現地で食材を提供していただいたのでとても楽チンでした。スタッフの皆様に感謝の限りです。


夜は参加している皆さんで焚き火を囲んで、マシュマロ焼いて食べたりビール飲んだりダンスをしたりと。日常からちょっと離れた雰囲気で家族もみんなとても楽しんでいました。

夜が想定以上に寒く、ちょっと山をなめていたことを反省しつつ、、、次の朝も食材は提供していただきホットサンドとコンスープを食す。寒かったので温かいスープはほんとありがたいですね。

二日目のイベントは火起こし大会。マグネシウムファイヤースターターを使って薪に誰が一番早く火をつけるかという企画です。前日私は散々やったけど全然火がつかなくて当日は応援担当w。結果相方がバッチリ結果を出して2位入賞。

景品ででコールマンのタープをいただきました。(ง°`ロ°)งよっしゃぁぁ‼
みんな楽しんで私も楽しかったのでまた行きたいなと思う今日この頃。

次回までに、、、クッカー欲しい。防寒対策ちゃんと考えるべし。ランタン一個じゃ足らん。出費が増えるばかりだ、、、、
企画、運営をしてくださったスタッフ、関係者の皆様に感謝し日曜の夜が終わるのでした。

tensorboard embeddings + RDKit #RDKit

Mainly I use Keras for deep learning. Because Keras is easy to use and easy to understand for me.
Keras has callback function to call tensorboard. But It has difficulties in use tensorboard embeddings.
You know, tensorboard embeddings is unique function to visualize future of word vectors.
I want to use tensorboard embeddings for visualization of chemical space.
Basic introduction of embeddings is described in following URL.
https://www.tensorflow.org/programmers_guide/embedding

I referred following URL and changed some lines.
https://github.com/normanheckscher/mnist-tensorboard-embeddings/blob/master/mnist_t-sne.py

Following code will read SDF and calculate Fingerprints and perform PCA or t-SNE.
And the results can view via tensorboard. Fortunately, RDKit has MolsToGridImage function. The function is useful to make spriteimage for embeddings !!!

mport numpy as np
import pandas as pd
import sys
import argparse
import os
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
from rdkit.Chem import Draw

import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector

FLAGS = None

def getFpArr( mols, nBits = 1024 ):
    fps = [ AllChem.GetMorganFingerprintAsBitVect( mol, 2, nBits=nBits ) for mol in mols ]
    X = []
    for fp in fps:
        arr = np.zeros( (1,) )
        DataStructs.ConvertToNumpyArray( fp, arr )
        X.append( arr )
    return np.array( X )

def getResponse( mols, prop="ACTIVITY" ):
    Y = []
    for mol in mols:
        act = mol.GetProp( prop )
        act = 9. - np.log10( float( act ) )
        if act >= 6:
            Y.append(np.asarray( [1,0] ))
        else:
            Y.append(np.asarray( [0,1] ))
    return np.asarray( Y )


def generate_embeddings():
    sdf = Chem.SDMolSupplier( FLAGS.sdf )
    X = getFpArr( [ mol for mol in sdf ]  )
    sess = tf.InteractiveSession()
    with tf.device( '/cpu:0' ):
        embedding = tf.Variable( tf.stack( X[:], axis=0 ), trainable=False, name='embedding' )
    tf.global_variables_initializer().run()
    saver = tf.train.Saver()
    writer = tf.summary.FileWriter( FLAGS.log_dir+'/projector', sess.graph )
    config = projector.ProjectorConfig()
    embed = config.embeddings.add()
    embed.tensor_name = 'embedding:0'
    embed.metadata_path = os.path.join( FLAGS.log_dir + '/projector/metadata.tsv' )
    embed.sprite.image_path = os.path.join( FLAGS.data_dir + '/mols.png' )
    embed.sprite.single_image_dim.extend( [100, 100] )
    projector.visualize_embeddings( writer, config )
    saver.save( sess, os.path.join(FLAGS.log_dir, 'projector/amodel.ckpt'), global_step=len(X) )
def generate_metadata_file():
    sdf = Chem.SDMolSupplier( FLAGS.sdf )
    Y = getResponse( [ mol for mol in sdf ])
    def save_metadata( file ):
        with open( file, 'w' ) as f:
            f.write('id\tactivity_class\n')
            for i in range( Y.shape[0] ):
                c = np.nonzero( Y[i] )[0][0]
                f.write( '{}\t{}\n'.format( i, c ))
    save_metadata( FLAGS.log_dir + '/projector/metadata.tsv' )

def main(_):
    if tf.gfile.Exists( FLAGS.log_dir+'/projector' ):
        tf.gfile.DeleteRecursively( FLAGS.log_dir+'/projector' )
        tf.gfile.MkDir( FLAGS.log_dir + '/projector' )
    tf.gfile.MakeDirs( FLAGS.log_dir + '/projector' )
    generate_metadata_file()
    generate_embeddings()

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument( '--sdf', type=str )
    parser.add_argument( '--log_dir', type=str, default='/Users/iwatobipen/develop/py35env/testfolder/tensorflowtest/mollog' )
    parser.add_argument( '--data_dir', type=str, default='/Users/iwatobipen/develop/py35env/testfolder/tensorflowtest/mollog')

    FLAGS, unparsed = parser.parse_known_args()
    sdf = [ mol for mol in Chem.SDMolSupplier( FLAGS.sdf ) ]
    im = Draw.MolsToGridImage( sdf, molsPerRow=10, subImgSize=( 100, 100 ))
    im.save( os.path.join( FLAGS.data_dir + '/mols.png' ))
    tf.app.run( main=main, argv=[sys.argv[0]] + unparsed )

To run the code.
Type

$ python tensormolembedding.py --sdf your.sdf

Then launch tensorboard and access localhost:6006.

$ tensorboard --logdir your_log_dir

Then, I could get following image.
This image is results of PCA, but also it can perform t-SNE analysis.

I pushed my code to my repo.
https://github.com/iwatobipen/deeplearning/tree/master/tensorflowembedding
Tensorflow has cool function ;-).

Handle pymol via CUI.

I often use Pymol to visualize PDB files.
Recently I want to merge some PDB files in one Pymol session file from CUI.
Because I run the task as batch. So I searched API document and tried it.
At first I need launch pymol in silent mode ( no GUI ).
And then load pdb files.
Next I set color of each object by b factor as spectrum.
Finally save object as pymol session file and closed pymol.
Every thing worked well.

Following code and sample files are pushed my repo.
https://github.com/iwatobipen/pymolscript

#test

import pymol
from pymol import cmd
pymol.finish_launching(['pymol','-qc'])
cmd.load('1atp.pdb')
cmd.load('1atp2.pdb')
cmd.load('1atp3.pdb')
cmd.spectrum('b', 'blue_white_red','1atp', 0, 100)
cmd.spectrum('b', 'yellow_cyan_blue','1atp2', 0, 100)
cmd.spectrum('b', 'green_magenta','1atp3', 0, 100)
cmd.save('somecolors.pse')
pymol.finish_launching()

LEGO & SCRACH

子供の夏休みに合わせて地元でものづくり体験的なイベントがありました。パン屋さんだったり、銀行だったり、いろんな体験ができます。その中でプログラミングで機械の動作原理を知ろうというセッションがあったので応募したら当選しました。
ということで、子供に体験させてみました。
今回の講座ではLEGO Webdoというレゴブロックを使いました。これはセンサーやモーターがついておりそれらはPC上でコーディングすることでいろんな動作制御が可能です。言語はScrachなので子供でもわかりやすい。
http://www.rika.com/lego/wedo2_1
今回はコマとそれを回すスピナーを作って動かしてみようというものでした。
まずはレゴでコマとスピナーを作ります。
スピナーができてきた↓

コマを作ってそのあとはPCでコード作ります。コードといってもブロックを組み合わせるパターンですので直感的です。
今回のコードは
モーター回す=>音を鳴らす=>スピナーをコマから離したら=>モーター止める。
です。
↓完成して実行するところ

コード書いてもの作って動くまで体験できるって良いですね。
僕のこともの頃はこんなのなかったな。電子工作ではんだこて使って、お風呂の水位センサーとか作ってなった時嬉しかった。もの作って動くと感動するよなぁ。
LEGO奥が深いわ。