今年を振り返ってみる

今年も残り2時間くらいになったところ。思い返すと今年は色々な機会や出会いに恵まれていたと思います。 twitterのフォローロフォワーの関係だった方に実際の学会でお会いしたり、ブログを書いていてレスをもらったり。 いうこと聞かないけど子供がだいぶ成長してきたり。 社内的な部分は相変わらずパッとしませんでしたが、対外的な活動ではいろいろとレスをいただきました。感謝しております。 私は一応今の職務は創薬化学研究者という位置付けで仕事をさせていただいています。日々の仕事の中で不平不満があったりするのですが、小さいころから化学が好きだったので今の職につけるということだけでとても恵まれていますよね、、、。 一方で、いろんな方とお話をしたり、業務をしながら創薬科学者の役割って何なんだろうと考える頻度が増えました。昨今、国内外には非常に優秀なCMOが沢山あります。合成に特化したところから、デザインからしてくれるところ、評価してくれるところ、ナドナド。潤沢な予算と、正確な判断ができるのであれば、自分で考えて合成しなくてもいい状況が作れてしまうわけです。しかもコスト燃やすとなった場合、この業界で生き残って行くために必要なことって何なんだろう。 まあ何にせよ、自分も経験つんで、勉強しないと正確な判断できないので、どうであれ足を止めたらそこでアウトでしょう。来年も理解が遅くともぼちぼち勉強を進めようと思います。 また、今年のmishimasykも楽しい話題が盛りだくさんでした。次回はどんな話題がいいでしょうか。社外の方との勉強会はとてもいい刺激になります。 来年も良い出会いがあるといいなあと思います。 皆様にとっても良い一年になることを願いつつそろそろ眠くなったので寝ようかなと。 今年の一番のフォト?フランスストラスブールでの一枚です

Convert rdkit molecule object to igraph graph object.

Molecules are often handled as graph in chemoinformatics. There are some libraries for graph analysis in python. Today, I wrote a sample script that convert from molecule to graph. I used python-igraph and rdkit. RDkit has method to get adjacency matrix from molecule so, I used the method. Code is following. Now test it. SeemsContinue reading “Convert rdkit molecule object to igraph graph object.”

Convert chemical file format .

Recently I knew KCF file format. The format represents molecules as graph structure. And it is used in KEGG. KCF uses atom label and orientation, and bond information. RDKit or Openbabel can not convert sdf 2 kcf. Someone who want to convert sdf to kcf. KEGG site provide us API for file format conversion. ExampleContinue reading “Convert chemical file format .”

D3 based open source data visualization tool ;-)

Some days ago, I posted superset. It was amazing for me. Superset needs database for retrieve data to make visualization. It is difficult to handle database such as Postgres, Oracle, MySQL etc. for medicinal chemist. Today, I found very nice tool for data visualization based on D3.js. The name is ‘raw’. The tool can useContinue reading “D3 based open source data visualization tool ;-)”

Build regression model in Keras

I introduced Keras in mishimasyk#9. And my presentation was how to build classification model in Keras. A participant asked me that how to build regression model in Keras. I could not answer his question. After syk#9, I searched Keras API and found good method. Keras has Scikit-learn API. The API can build regression model. ;-)Continue reading “Build regression model in Keras”

Cool web based data analytical platform

Yesterday, I enjoyed mishima.syk#9. ;-) I hope all participants also enjoyed the meeting. BTY, I found cool platform for data analysis, named “Superset”. https://github.com/airbnb/superset You can see cool review in README.md. If reader who want to install superset, it is very easy. For example in MacOS. Only use pip ! Now you can access localhost:8088.Continue reading “Cool web based data analytical platform”

Quantum annealing for QSAR!

In the chemoinformatics area, it is important to describe molecular similarity. Merit of fingerprint bit vector based similarity calculation is speed I think. But sometime ECFP4 or any other related methods do not sense of chemst feeling. By the way graph based similarity like a MCS is useful but calculation cost is high. You know,Continue reading “Quantum annealing for QSAR!”

GLARE algorithm using RDKit with python3

Now version of RDKit has many tools. And I interested in the Glare algorithm. https://github.com/rdkit/rdkit/blob/master/Contrib/Glare/glare.py This algorithm is used for good quality library generation from large set of reagents. In the method, key point is pre calculation of reagent properties and sum the value for product. So, It does not need calculate product property onContinue reading “GLARE algorithm using RDKit with python3”

Generate dataset for deep learning

I discussed with experts about the issue of deep learning in drug discovery. In my understanding, there are two major problems. First, we can’t use large amount of dataset for building model in the early stage of project. Second, we need to find descriptor of molecules for DL. BTW, in the image classification area, thereContinue reading “Generate dataset for deep learning”

RemoteMonitor in keras

There are several packages to perform deep learning in python. And my favorite one is keras. https://keras.io/ Today, I found new function in keras.callbacks named RemoteMonitor. The function provide real time visualization of learning. So, I wrote very simple example using IRIS dataset. At first to use RemoteMonitor, I need clone api from following URL.Continue reading “RemoteMonitor in keras”

MMPS in rdkit

I like Molecular Matched Pair Analysis because of it’s easy to understand and it is intuitively. Recently P(pair) is extended to S(series) molecular matched series. Developer of openbabel reported MMPS in ACS. http://pubs.acs.org/doi/abs/10.1021/jm500022q And also, they developed application that is implemented MMPS named Matsy. I saw Matsy in JCUP and it was quite impressive forContinue reading “MMPS in rdkit”

ケモインフォに使えそうなパッケージをまるっとまとめたVMを作ってみよう。

12月のハンズオンに向けて使いそうなものを一式入れて見るという作業。 なお、virtualbox vagrantはすでに入っているという想定です。 もう、だいぶ僕のインストール力が下がったので環境はanacondaにがっつり依存します。 virtualenvが推奨かもしれませんが今回は直接突っ込みます。 まずベースのOS(ubuntu)を入れて起動しましょう。 しばし待ちます。 サーバーを起動して接続します。 仮想環境上にanaconda(今回は3系)を入れてパスを通します。ずっと使うので.bashrcに書いときます。(書いた。) つづいてハンズオンで使うDLのモジュールと使うかもしれないRDKitをcondaでいれましょう。 バージョンはちょっと古いものになります。がcondaではいる利便性がそれに優っていると考えました。 インストールはそこそこ時間がかかると思います。 Install tensoflow, keras, rdkit ;-) * tensoflow version 0.1 * keras version 1.0.7 sshでつなげているとCUIなので使い勝手を考えてjupyterを外から繋がるようにしましょう。 Vagrantfileの29行目あたりにあるコメントアウトされている部分を有効化してipアドレスを使えるようにします。(これは本体側の設定) config.vm.network “private_network”, ip: “192.168.33.10” 再起動した後でVM上のjupyter の設定をします。 passwordを設定します。 仮想環境上でipython を起動して下記の要領で設定しましょう。 下記のコマンドを打つとプロンプトがでてきてパスワードが設定できますので適当に入れます。 でてきた鍵を~/.jupyter/jupyter_notebook_config.pyに書きます。 ここまで設定したらjupyter notebook を起動して見ましょう。 ノートサーバーが立ち上がります。外部環境から http://192.168.33.10:8888 にいくとパスワードを聞かれるので入れます。 そうしたらいつもの風景が見えるかと。 いろいろインストールできている確認しましょう。 だいたい必要なものが入っているかな。。。 Dockerの場合は古いですがまえにDockerhubにそれっぽいのを上げています。 https://hub.docker.com/r/iwatobipen/chemoinfo_test/

Install redmine in vagrant

I tried to install redmine in VM today, and I got some trouble in this work. How to install redmine…. 1st, install virtual box and vagrant using dmg file.( for osx ) https://www.virtualbox.org/wiki/Downloads https://www.vagrantup.com/downloads.html 2nd official document says procedure is very simple just type following command. But I got error. This is because “/opt/vagrant/embedded/bin/curl” doesContinue reading “Install redmine in vagrant”

Dark Chemical Matter (DCM) in screening deck

I was interested in the article, because of the title. “The performance of dark chemical matter in high throughput screening” https://www.ncbi.nlm.nih.gov/pubmed/27762554 What is dark chemical matter(DCM) ? The definition of DCM is that compounds, which have been tested and found inactive in 50 or more assays, exhibit hit rates that are comparable to those ofContinue reading “Dark Chemical Matter (DCM) in screening deck”

Molecular Fragmentation for MMPA

Recently I want to develop new MMP service. In this development process, I want to control number of cuts of molecules. Fortunately, RDKit has good function to do it. So, I checked the function. Following memorandum for my self. Read cdk2.sdf from datadir. Check molecules. OK! Go next. rdMMPA.FragmentMol is function for fragmentation molecules. AndContinue reading “Molecular Fragmentation for MMPA”

Tips for MCS of RDKit

Find MCS is useful function for me, because sometime I want to extract common substructure from compounds. But, in the case of large amount of compounds set give me boring results like a ethyl and so on. It’s no wonder. FindMCS function of RDKit has unique solution to solve that. To use “threshold” option IContinue reading “Tips for MCS of RDKit”

An article in JMC.

My background is organic chemistry. I was trained medicinal chemistry on the job. A topic about training medicinal chemistry was discussed in following article. And It was impressive for me. https://www.ncbi.nlm.nih.gov/pubmed/27668824 “On the job training” is important for medchems because it makes their back ground of molecular design etc. The author said, “”” For example,Continue reading “An article in JMC.”

Extract Chemical Data From PDF, HTML, text etc.

I think medicinal chemist often grapple with many patents ,literatures and etc. You know, recently there are many commercially available patent database. So, if we could use these databases, we can get data that is embedded in patens. But, if we don’t have them, we need to extract data from pdf or xml. This isContinue reading “Extract Chemical Data From PDF, HTML, text etc.”