Make interactive plot with Knime #RDKit #Chemoinformatics #Knime

Dalia Goldman provided very cool presentation in RDKit UGM 2018 about Knime.
https://github.com/rdkit/UGM_2018/blob/master/Presentations/Goldmann_KNIMEandRDKit.pdf

She demonstrated interactive analysis with RDKit knime node and Javascript node. I was really interested but it was difficult to build the workflow by myself at that time.

BTW, I need to learn knime for data preparation in this week. So I learned about knime and how to make interactive plot with knime.

Following sample is very simple but shows power of knime.
The example is making interactive chemical space plot with knime. All work flow is below. Version of knime is 3.7.

At frist load SDF and calculate descriptors and fingerprint with RDKit node and the split fingerprint with FingerprintExnpander. Then conduct PCA with calculated FP.
Then convert molecule to SVG with openbabel node for visualization.

Key point of this flow is wrapped metanode!
This node is constructed from two JS node ‘Scatter plot’ and ‘Card view’.

After the making metanode, I defined visual layout. The setting can call from right botton of menue bar

And I set card view option as show selected only and several scatter plot option.

Now ready. Then run the work flow! I can view selected compounds from chemical space.
Image is below.

New version of Knime is powerful tool for not only data analysis but also data visualization. ;-)

Advertisements

Visualize chemical space using Knime rdkit node

Usually I use python for analyse, visualize chemical space. Because, I love coding. ;-)
I know, work flow tool is useful solution to do that.

So, I tried to plot chemical space using Knime. Knime is one of famous work flow tool and lots of nodes are developed.

I made very simple work flow to do PCA. My work flow is following.
workflow

At first, the flow read smiles strings from excel file. And convert smies to RDKit molecule.
Then calculate morgan FP using RDKit Finger printer. You know, the node can also calculate various FP like MACCS, topological etc.
Next, extend bit vector to 1024 bit columns.
And do PCA and make scatter plot. The plotting node is implemented in Erlwood chemoinformatics node.
When I call view scatter plot, I got following dynamic scatter plot.
scatter plot
The node can select each columns easily and user can set color or size own criteria. And visualize structure as label. Wow cool!

And I set activity cliff viewer.
The node needs two parameter, one of smiles and another is distance matrix of similarity.
N x N distance matrix is generated using distance matrix calculate node.
Finally run the flow, I got network view of activity cliffs.
Screen Shot 2016-08-24 at 11.28.47 PM
Edges that are colored green are indicated activity cliffs. ( in my case delta pIC50 >= 1.0 and similarity >= 0.5 )
Hmm but the image seems to difficult to understand SAR. Cytoscape is suitable tool to visualize network.
Mistake ???

Activity cliffs table seems good.

Knime is powerful tool for medchem.

KNIME-RDKIT-R

RDKitとPythonを使ってデータを加工して
Rで解析するというケースを頻繁にやっていると全部つながるといいなーと思います。
RPY2を使ってうまく繋げるのもいいですがKNIMEを使うとそれぞれの連結ができて便利かも。
ということでトライしてみました。
KNIME上のRノードでは全部Rという名前のオブジェクトに対して処理をするようで
それを理解するのに時間がかかりました。
例えばPLS回帰をする場合はこんな感じです。
Rノードおのおのでplsパッケージ読み込まないと、
おいおいmvrなんて扱わないぜ。とおこられます。

R<-R
library("pls")
R<-data.frame(R[,-3])
R <- plsr(pIC50~., data=R, ncomp=20, validation="LOO")

で作ったのはこんな感じです。
一個はモデルを作ってモデルと実測を比較するため
もう一個はモデルに当てるためのフローです。
RDKITノードでディスクリプタは簡単にとれるし、そのあとのモデリングは上のようなスニペット書いとけば後はよろしくな感じでいいので、楽できそうです。
ローカルで全部やってるからパフォーマンスが悪い、、、
まあいいか。

knime

knime2