Develop new knime node with python #chemoinformatics #knime

The Golden Week is a collection of four national holidays within seven days. My kid graduated the elementary school and his favorite dodgeball team and he joined vollyball team. I quit coaching the dodgeball team at same time. So I have time for coding and drinking beer again ;)

I watched knime summit about new version of Knime. It’s really cool. I have interested in the node development with python. Previous version can develop own node but it required java. But now we can develop knime node with python instead of java.

The function is supported from knime version=4.6. The details are described in knime official blog post.
https://www.knime.com/blog/4-steps-for-your-python-team-to-develop-knime-nodes

I read the blog post and tried to make my own knime chemoinforomatics node today. I make node which standardize molecule with chembl_structure_pipeline. This library is really useful for normalizeing molecules.

Following section shows my log.

At first, I got template code(basic.zip) from here .

The structure of the zip file is below.

(base) iwatobipen@penguin:~/dev/knime_dev/basic$ tree
.
├── config.yml
├── Example_with_Python_node.knwf
├── my_conda_env.yml
├── README.md
└── tutorial_extension
    ├── icon.png
    ├── knime.yml
    ├── LICENSE.TXT
    └── my_extension.py

I modified config.yml and my_conda_env.yml below.

#config.yml
org.tutorial.first_extension: # {group_id}.{name} from the knime.yml
  src: /home/iwatobipen/dev/knime_dev/basic/tutorial_extension # Path to folder containing the extension files
  conda_env_path: /home/iwatobipen/miniconda3/envs/my_python_env # Path to the Python environment to use
  debug_mode: true # Optional line, if set to true, it will always use the latest changes of execute/configure, when that method is used within the KNIME Analytics Platform

#my_conda_envy.yml
name: my_python_env
channels:
  - knime
  - conda-forge
dependencies:
  - python=3.9
  - knime-extension=4.7
  - knime-python-base=4.7
  - rdkit
  - chembl_structure_pipeline

How to define the config.yml is well documented in the knime blogpost.

After defining the my_conda_env.yml, I made conda env with the yml-file.

$ conda env create -f mt_conda_env.yml

After making the env, I wrote code for knime node, my node get smiles strings as an input then standardize molecules from SMILES and generate molecularhash as an output.

The code is below. Decorator is used for making input and output. Following code is defined one input port and one out put port. It is able to add additional port with @knext.input_table and @knext.output_talbe decorators(https://knime-python.readthedocs.io/en/stable/content/content.html#python-script-api).

import logging
import knime.extension as knext
from rdkit import Chem
from rdkit.Chem import rdMolHash
from functools import partial
from chembl_structure_pipeline import standardize_mol
from chembl_structure_pipeline import get_parent_mol

LOGGER = logging.getLogger(__name__)
#molhash = partial(rdMolHash,MolHash(rdMolHash.HashFunction.HeAtomTautomer))

@knext.node(name="chembl structure pipeline", node_type=knext.NodeType.MANIPULATOR, icon_path="demo.png", category="/")
@knext.input_table(name="SMILES column", description="read smiles")
@knext.output_table(name="Output Data", description="rdkit mol which is standarized with chembl structure pipeline")
class TemplateNode:
    """Short one-line description of the node.
    This is sample node which is implemented with chembl structure pipeline.
    input data should be SMILES.
    """

    # simple code
    def std_mol(self, smiles):
        mol = Chem.MolFromSmiles(smiles)
        if mol == None:
            return None
        else:
            stdmol = standardize_mol(mol)
            pm, _ = get_parent_mol(stdmol)
            Chem.Kekulize(pm)
            return pm
    
    def get_mol_hash(sel, rdmol):
        res = rdMolHash.MolHash(rdmol, rdMolHash.HashFunction.HetAtomTautomer)
        return res

    column_param = knext.ColumnParameter(label="label", description="description", port_index=0)
   

    def configure(self, configure_context, input_schema_1):   
            
        #return input_schema_1.append(knext.Column(Chem.rdchem.Mol, "STD_ROMol"))
        return input_schema_1.append(knext.Column(Chem.rdchem.Mol, "STD_ROMol")).append(knext.Column(knext.string(), 'MolHash'))

 
    def execute(self, exec_context, input_1):
        input_1_pandas = input_1.to_pandas()
        input_1_pandas['STD_ROMol'] = input_1_pandas['column1'].apply(self.std_mol)
        input_1_pandas['MolHash'] = input_1_pandas['STD_ROMol'].apply(self.get_mol_hash)
        return knext.Table.from_pandas(input_1_pandas)


After writing the code, add the one line “-Dknime.python.extension.config=/home/iwatobipen/dev/knime_dev/basic/config.yml” to knime.ini which is located in knime install folder.

Then launch knime I could see my own knime node ;)

I make simple workflow with the node. chembl structure pipeline is my developed node ;)

I added some smiles from table creator node.

And run the node, I could get standardized molecules as the output.

The work flow do not only standardization of molecules but also generate molhash. So the output will be like below. Count row is count of molhash. It can see count 2 in 2-hydroxy pyridine and pyridone. Ofcourse they are tautomer.

It’s interesting for me to make new node with python. It’s useful for not only coder but also no coder I think.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

One thought on “Develop new knime node with python #chemoinformatics #knime

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.