Integration RDKit and Knime using Python scripting node.

Knime is tool for making workflow.
Old version of Knime can’t call RDKit from python scripting node directory.
New version of Knime and python scripting node can do that. It’s means that user can build more flexible work flow. 😉
I set up my environment and test it.
At first I installed new version of knime and python scripting node, and rdkit using anaconda.
Then set knime preference like following picture.
Path of conda’s python is …/Users/{username}/.pyenv/versions/anaconda-2.4.0/bin/python2.7.
Screen Shot 2015-12-29 at 9.31.06 AM

Next, I made sample workflow.
Screen Shot 2015-12-29 at 9.26.20 AM

This flow retrieve user defined ChemblID data from CHEMBLDB and generate Matched Pairs using ErlWood Chemoinformatics node.
At the same time, Chemical sketcher node provide user query.
This node can handle multiple molecules!
Screen Shot 2015-12-29 at 9.26.05 AM
Then python script(2:1) get user query mols and transformation rules as rxn format and convert molecules based on MMP.
Python snippet is following.

# Write in python scripting node
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd

mols = input_table_1['Molecule (RDKit Mol)']
rxns = input_table_2['Transformation']
# Too many rxn, it'll take long time. So, I get first 100 rxns.
rxns = [ AllChem.ReactionFromRxnBlock(str(rxn)) for rxn in rxns ][:100]

counter = 0
products = set()

for mol in mols:
    for rxn in rxns:
        rxnsmi = AllChem.ReactionToSmiles(rxn).replace("*","*:1")
        reaction = AllChem.ReactionFromSmarts( rxnsmi )
        ps = reaction.RunReactants( [mol] )
        
        for y in range(len(ps)):
            for z in range(len(ps[y])):
                p = ps[y][z]
                try:
                    Chem.SanitizeMol(p)
                except:
                    pass
                products.add(Chem.MolToSmiles(p,isomericSmiles=True))
    counter += 1
#output_table = pd.DataFrame( list(products))
output_table = pd.DataFrame(list(products), columns=['smiles'])

This node can handle data as pandas DataFrame.
Finally out put strings of smiles dataframe.
Now I got transformed molecules.
Screen Shot 2015-12-29 at 9.25.42 AM

There are some bugs in this node, but combination of knime and rdkit will be powerful tools for chemoinformatics.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s