Integration RDKit and Knime using Python scripting node.

Knime is tool for making workflow.
Old version of Knime can’t call RDKit from python scripting node directory.
New version of Knime and python scripting node can do that. It’s means that user can build more flexible work flow. 😉
I set up my environment and test it.
At first I installed new version of knime and python scripting node, and rdkit using anaconda.
Then set knime preference like following picture.
Path of conda’s python is …/Users/{username}/.pyenv/versions/anaconda-2.4.0/bin/python2.7.
Screen Shot 2015-12-29 at 9.31.06 AM

Next, I made sample workflow.
Screen Shot 2015-12-29 at 9.26.20 AM

This flow retrieve user defined ChemblID data from CHEMBLDB and generate Matched Pairs using ErlWood Chemoinformatics node.
At the same time, Chemical sketcher node provide user query.
This node can handle multiple molecules!
Screen Shot 2015-12-29 at 9.26.05 AM
Then python script(2:1) get user query mols and transformation rules as rxn format and convert molecules based on MMP.
Python snippet is following.

# Write in python scripting node
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd

mols = input_table_1['Molecule (RDKit Mol)']
rxns = input_table_2['Transformation']
# Too many rxn, it'll take long time. So, I get first 100 rxns.
rxns = [ AllChem.ReactionFromRxnBlock(str(rxn)) for rxn in rxns ][:100]

counter = 0
products = set()

for mol in mols:
    for rxn in rxns:
        rxnsmi = AllChem.ReactionToSmiles(rxn).replace("*","*:1")
        reaction = AllChem.ReactionFromSmarts( rxnsmi )
        ps = reaction.RunReactants( [mol] )
        
        for y in range(len(ps)):
            for z in range(len(ps[y])):
                p = ps[y][z]
                try:
                    Chem.SanitizeMol(p)
                except:
                    pass
                products.add(Chem.MolToSmiles(p,isomericSmiles=True))
    counter += 1
#output_table = pd.DataFrame( list(products))
output_table = pd.DataFrame(list(products), columns=['smiles'])

This node can handle data as pandas DataFrame.
Finally out put strings of smiles dataframe.
Now I got transformed molecules.
Screen Shot 2015-12-29 at 9.25.42 AM

There are some bugs in this node, but combination of knime and rdkit will be powerful tools for chemoinformatics.

広告

コメントを残す

以下に詳細を記入するか、アイコンをクリックしてログインしてください。

WordPress.com ロゴ

WordPress.com アカウントを使ってコメントしています。 ログアウト / 変更 )

Twitter 画像

Twitter アカウントを使ってコメントしています。 ログアウト / 変更 )

Facebook の写真

Facebook アカウントを使ってコメントしています。 ログアウト / 変更 )

Google+ フォト

Google+ アカウントを使ってコメントしています。 ログアウト / 変更 )

%s と連携中