Call Knime from Jupyter notebook! #Chemoinformatics #RDKit #Knime

I read exiting blog post yesterday! URL is below.
https://www.knime.com/knime-blog-general
@dr_greg_landrum developed very cool tools which can call knime from jupyter notebook and can execute jupyter notebool from knime.

Details of the tool is described in the Knime blog post. I am interested the tool and I can’t wait to try it in myself. So I used it from my mac book pro. At first I installed python knime package via pypi.

iwatobipen$ pip install knime

Now ready. I tried to make sample work flow. My workflow receives SMILES strings from jupyter and calculates RDKit descriptors, normalize then and return the result to notebook.

After that, I build regression model for solubility and apply it to test data. Dataset is supplied from rdkit Book/data folder.

OK, let’s go to the code. Following code is referenced Gregs blog post URL is above. First import packages.

%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import style
import os
import pandas as pd
import numpy as np
from rdkit import Chem
from rdkit.Chem import RDConfig
from rdkit.Chem import rdMolDescriptors
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import knime
style.use('ggplot')

Next, prepare dataset for training and test. Type of data which passes Knime is pandas dataframe.

train_path = os.path.join(RDConfig.RDDocsDir, 'Book/data/solubility.train.sdf')
test_path = os.path.join(RDConfig.RDDocsDir, 'Book/data/solubility.test.sdf')

train_mols = [m for m in Chem.SDMolSupplier(train_path)]
train_y = np.asarray([m.GetProp('SOL') for m in train_mols], dtype=np.float32)
train_table = {'smiles':[Chem.MolToSmiles(m) for m in train_mols]}
train_df = pd.DataFrame(train_table)

test_mols =  [m for m in Chem.SDMolSupplier(test_path)]
test_y = np.asanyarray([m.GetProp('SOL') for m in test_mols], dtype=np.float32)
test_table = {'smiles':[Chem.MolToSmiles(m) for m in test_mols]}
test_df = pd.DataFrame(test_table)

Next, define Knime executable path and workspace path. My env is Mac so it is a little bit different to original blog post.

#My Knime env uploaded to 3.7 from 3.6.
knime.executable_path = '/Applications/KNIME 3.6.1.app/Contents/MacOS/Knime'
workspace = '/Users/iwatobipen/knime-workspace/'

Then check workflow. I made following workflow in advance. And I could see image of the WF on notebook. ;)

workflow = 'jupyter_integration'
knime.Workflow(workflow_path=workflow, workspace_path=workspace)

Now ready, let’s run the WF for descriptor calculation!

# training data
with knime.Workflow(workflow_path=workflow, workspace_path=workspace) as wf:
    wf.data_table_inputs[0] = train_df
    wf.execute()
train_x = wf.data_table_outputs[0]

# test data
with knime.Workflow(workflow_path=workflow, workspace_path=workspace) as wf:
    wf.data_table_inputs[0] = test_df
    wf.execute()
test_x = wf.data_table_outputs[0]

I could get dataset for build regression model and test. So I fit SVR of sklearn and test the model performance.

svr = SVR()
svr.gamma = 'auto'
svr.fit(train_x, train_y)

pred = svr.predict(test_x)
print(r2_score(test_y, pred))
print(mean_squared_error(test_y, pred))
>0.703289693583441
>1.2055499665537373

It seems not so bat. Check the performance with matplotlib.

a, b = min(test_y), max(test_y)
data = np.linspace(a,b, num=100)
plt.scatter(pred, test_y, c='b', alpha=0.5)
plt.plot(data, data, c='r')
plt.xlabel('pred')
plt.ylabel('actual')

The model can predict solubility of test molecules with high accuracy. In summary, integration Knime and Jupyter notebook has high potential for chemoinformatics I think. Because jupyter has flexibility and knime is powerful tool for routine work.
Whole code can view from following URL.
https://nbviewer.jupyter.org/github/iwatobipen/playground/blob/master/Knime_test.ipynb

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.