MMPS in rdkit

I like Molecular Matched Pair Analysis because of it’s easy to understand and it is intuitively. Recently P(pair) is extended to S(series) molecular matched series. Developer of openbabel reported MMPS in ACS. And also, they developed application that is implemented MMPS named Matsy. I saw Matsy in JCUP and it was quite impressive forContinue reading “MMPS in rdkit”


12月のハンズオンに向けて使いそうなものを一式入れて見るという作業。 なお、virtualbox vagrantはすでに入っているという想定です。 もう、だいぶ僕のインストール力が下がったので環境はanacondaにがっつり依存します。 virtualenvが推奨かもしれませんが今回は直接突っ込みます。 まずベースのOS(ubuntu)を入れて起動しましょう。 しばし待ちます。 サーバーを起動して接続します。 仮想環境上にanaconda(今回は3系)を入れてパスを通します。ずっと使うので.bashrcに書いときます。(書いた。) つづいてハンズオンで使うDLのモジュールと使うかもしれないRDKitをcondaでいれましょう。 バージョンはちょっと古いものになります。がcondaではいる利便性がそれに優っていると考えました。 インストールはそこそこ時間がかかると思います。 Install tensoflow, keras, rdkit ;-) * tensoflow version 0.1 * keras version 1.0.7 sshでつなげているとCUIなので使い勝手を考えてjupyterを外から繋がるようにしましょう。 Vagrantfileの29行目あたりにあるコメントアウトされている部分を有効化してipアドレスを使えるようにします。(これは本体側の設定) “private_network”, ip: “” 再起動した後でVM上のjupyter の設定をします。 passwordを設定します。 仮想環境上でipython を起動して下記の要領で設定しましょう。 下記のコマンドを打つとプロンプトがでてきてパスワードが設定できますので適当に入れます。 でてきた鍵を~/.jupyter/jupyter_notebook_config.pyに書きます。 ここまで設定したらjupyter notebook を起動して見ましょう。 ノートサーバーが立ち上がります。外部環境から にいくとパスワードを聞かれるので入れます。 そうしたらいつもの風景が見えるかと。 いろいろインストールできている確認しましょう。 だいたい必要なものが入っているかな。。。 Dockerの場合は古いですがまえにDockerhubにそれっぽいのを上げています。

Install redmine in vagrant

I tried to install redmine in VM today, and I got some trouble in this work. How to install redmine…. 1st, install virtual box and vagrant using dmg file.( for osx ) 2nd official document says procedure is very simple just type following command. But I got error. This is because “/opt/vagrant/embedded/bin/curl” doesContinue reading “Install redmine in vagrant”

Molecular Fragmentation for MMPA

Recently I want to develop new MMP service. In this development process, I want to control number of cuts of molecules. Fortunately, RDKit has good function to do it. So, I checked the function. Following memorandum for my self. Read cdk2.sdf from datadir. Check molecules. OK! Go next. rdMMPA.FragmentMol is function for fragmentation molecules. AndContinue reading “Molecular Fragmentation for MMPA”

Tips for MCS of RDKit

Find MCS is useful function for me, because sometime I want to extract common substructure from compounds. But, in the case of large amount of compounds set give me boring results like a ethyl and so on. It’s no wonder. FindMCS function of RDKit has unique solution to solve that. To use “threshold” option IContinue reading “Tips for MCS of RDKit”

Extract Chemical Data From PDF, HTML, text etc.

I think medicinal chemist often grapple with many patents ,literatures and etc. You know, recently there are many commercially available patent database. So, if we could use these databases, we can get data that is embedded in patens. But, if we don’t have them, we need to extract data from pdf or xml. This isContinue reading “Extract Chemical Data From PDF, HTML, text etc.”

Make QSAR models and predict activity using pandas_ml and RDKit

In mishima.syk y_sama introduced us about pandas_ml. I know it’s so late,but I was interested in the module, so I used pandas_ml for QSAR. Pandas_ml is library of python to integrate pandas, scikit-learn, xgboost and seaborn. To use pandas_ml, I installed xgboost python binding before and installed pandas_ml using pip command. I tried to buildContinue reading “Make QSAR models and predict activity using pandas_ml and RDKit”

Install rdkit to CentOS7

I use OSX or unbuntu for coding usually. But, some case CentOS is needed to make some services. So, I checked the way to install rdkit to centos7. I used docker for my test, because docker can make virtual environment in my mac book. First I pull the Cent OS ver7. Then run the image.Continue reading “Install rdkit to CentOS7”

Visualize chemical space using Knime rdkit node

Usually I use python for analyse, visualize chemical space. Because, I love coding. ;-) I know, work flow tool is useful solution to do that. So, I tried to plot chemical space using Knime. Knime is one of famous work flow tool and lots of nodes are developed. I made very simple work flow toContinue reading “Visualize chemical space using Knime rdkit node”

Scoring 3D diversity using RDKit #RDKit

Recently importance of 3D character of molecules are increasing. If I design a libraries, I want to estimate not only 2D, but also 3D diversity. Fortunately RDKit implemented function for characterize the 3D character of molecules named ‘plane of best fit’ (PBF). You can call this function from rdkit/Contrib/PBF folder. ;-) Great!!! And reference ofContinue reading “Scoring 3D diversity using RDKit #RDKit”

Make virtual machine for chemoinformatics #RDKit

Recently stable version of docker for mac is released. It’s good news for me. ;-) I used boot2docker before but, now I switched docker for mac. Because it’s easy to install and share the file with host OS. Of course, I installed docker for mac ! Virtual machine is one of useful way toContinue reading “Make virtual machine for chemoinformatics #RDKit”

Make polar plots of exit vector about di-amine molecules #RDKit

I posted blog about compare exit vector distance of two molecules some days ago. I used cartesian system to calculate distance of two molecules before. Today I tried to make polar plot using RDKit and matplotlib. ;-) I used 3d amine dataset and made my old code. At first I make scatter plot of similarityContinue reading “Make polar plots of exit vector about di-amine molecules #RDKit”

Natural Product likenes Score

Some years ago, large amount of molecules produced by using palladium catalysed cross coupling reaction, like suzuki-miyaura, negishi, stille, etc. It showed great impact for medicinal chemistry but these reaction tend to produce flat molecules like low fsp3 score. Now I often read the word ‘Escape from flat land, sp3 rich molecules, 3D diversity …’.Continue reading “Natural Product likenes Score”

calculate exit vector distance of each molecule v2.

Somedays ago, I posted the topics about ‘exit vector’. The way to represent chemical space give me new insight for molecular design. I think it’s useful for replacement of linkage or scaffold. It’s important for drug discovery to design new molecules that similar ligand for target, but not similar for competitor’s one. So, I addedContinue reading “calculate exit vector distance of each molecule v2.”

Interesting Web app Named ‘ChemTreeMap’ using RDKit.

I like web app because user does not need client soft to use it. I often use cytoscape to visualise molecular network. Network view is very informative. Yesterday, I found cool web app that using RDKit. URL is following. The app is an open source application for visualizing molecular networks. If user can useContinue reading “Interesting Web app Named ‘ChemTreeMap’ using RDKit.”

Callback function of keras.

I’m still building QSAR models using deep learning. And I thought I got problem of over fitting. :-) Training error was decreasing but, validation error was increasing depend on number of epochs. :-/ It seems over fitting and I could not avoid the event even if I used drop out function. Tried lots of learningContinue reading “Callback function of keras.”

ErGFingerprint in RDKit

Sometime, medicinal Chemists think about scaffold hopping approach in drug discovery project to overcome their issue. When I think about scaffold hopping, I consider about what’s key interaction of molecule and protein. It’s called pharmacophore. And also hopping approach is used me too approach to find another IP space. BTW, In 2006, researchers in LillyContinue reading “ErGFingerprint in RDKit”

Find MCS in R

Find maximum common substructure is useful for finding core scaffold. I think that finding MCS, using commercially available tools is common (pipeline pilot ?). I often use RDkit. ;-) Today I found the library that search MCS in R, named fmcsR. That’s sounds nice, because if fmcsR works fine, I’ll implement the library to SpotfireContinue reading “Find MCS in R”

New version of RDKit

Some days ago, new version of RDKit (2015.09.01) was released. I’m looking forward to this release. A lots of bug fix and new implementation was. I really thank to developers. Mac user can install rdkit using Homebrew ! (anaconda is not yet) I installed rdkit using homebrew, there are no trouble in El capitan. OneContinue reading “New version of RDKit”

Array or sparse array ?

In a process of lead optimization, chemist often do SAR expansion around potent compound. If lead compound can be break down three parts A(head), B(core), C(tail), chemist(me…) often fix one part(e.g core B) and change two parts. After optimize A and C then, fix A, C and change B. This approach is called array synthesisContinue reading “Array or sparse array ?”