Cryo-EM data analysis with deep learning #arXiv

Recently number of publications with cryo-EM is growing. And also number of data storage is growing.
From EMDB, http://www.ebi.ac.uk/pdbe/emdb/statistics_main.html/
I could get following statistics.

And now, Cryo-EM is collecting a lot of attention in drug discovery area because the method has possibility of determination for difficult targets that can not accessible to X-ray analysis. I’m not sure about cryo-EM but have interest the technology.

One of the challenges is molecular model building. For Cryo-EM, 2D projection images need to convert volumetric data and model the atomic coordinates of each amino acid. The modeling process is still time consuming step.

Today I found exciting article.
https://arxiv.org/pdf/1901.00785.pdf
The authors developed new approach for molecular modeling named ‘A2-Net’ which is used Neural network (3D convolutional network) and MCTS.

Neural network is used in step one, which determines the 3D coordinates of atoms in each amino acid. And MCTS is used in step two, which prune the candidate amino acids in main chain.

It is interesting for me that, first step of A2 net is prediction of amino acids category and their coordinates from the volumetric data!

After getting the proposal, they used 3D stacked hourglass network for further pose estimation. Then MCTS is used.

Finally their method is outperform to ROSETTA-denovo.

The article indicates that Deep learning is powerful method for 3D detection. I would like to learn 3D detection and Cryo-EM.

明けましておめでとうございます

2019年に入りました。2018年後半の指骨折に始まり年末年始のインフルエンザ、インフルエンザ明けの謎の発熱&咳からやっと復活しつつあります。健康の大切さを改めて痛感する年末年始となりました。また、抗インフルエンザ薬を飲み回復が早かったのを実感し、薬を作れる仕事に携われる事は改めて素晴らしいなと思った次第です。

まあ、今年は何はともあれ自己管理を徹底したいです。
昨年は業務内容がガラッと変わり自分のやりたいことと、アサイメントが比較的同じベクトルになっていて自分にとって、とても良い年でした。ただチョット自分の興味に突っ込みすぎた部分もあり、もう少し創薬研究をどうやったら加速できるかという部分を強く意識していきたいと思います。また、昨年は出張が多い年でもあり、スケジュール管理の甘さが露呈しました今年はこの辺も意識しよう。

社会人になってから今までを振り返ると、分野のフロント行ってる方と議論できる機会を持てて恵まれてるなって思います。今年も、積極的に外にも目を向けたいなと思います。いやほんと周りはみんな優秀な方ばかりなので勉強たくさんできて感謝感謝です。

で、今年の目標
1. Githubにもっとコードあげる
 Githubでコード管理しておくとあれなんだっけって時に振り返れていいですよね。ということで今まで以上にうまく利用したいと思います。
2. 英語
 まあコレはもう必須ですよね、、、海外出張行ってディスカッションするにも下手でもいいから話せないことにはなんともなりませんし、、、
3. pytorch
 なんとなく使ったことはあるけどあまり深く勉強はしてなかったので今年はコレでなんか作りたいですね。今まで僕はTensorflowまたはKerasメインでしたので一度Define by runの方も触ってみようと思います
4.数学
 コレはまあ前からなんですが論文読んで自分で実装できるくらいの素養をつけたい。
5.体調管理+体力作り!
 最近運動不足で10kmも走ると満足してしまうようになってしまったのでもう一度体鍛えてフルマラソンに挑戦したいですね。。。

つらつらと駄文を書いてしまいましたがまあ今年もぼちぼちやっていこうと思っています。
いつも家でパソコンばっかりいじっている私を受け入れてくれる家族に感謝を述べて今日の投稿を終わりにしたいと思います。

皆様にとって今年が昨年以上に良い年になりますように!

Diary….

I like my town. This town is comfortable for me to live in, because it is not too urban like Tokyo or rural.

There are many beautiful place and following pictures are my favorite place.  The water in this river is very clean. I can see firefly in summer around here.

I want to  this scenery to continue forever.

After walk, I went to see a doctor, my finger is getting well. I hope my finger get well soon…

Make Docker image File with Chemoinformatics toolkits.

Docker https://www.docker.com is an open platform for distributed applications for developers and sysadmins.
Providing docker image that means every one build same environment easily.
It’s means create own image and share image.
Docker provide container.
Containers running on a single machine all share the same OS kernel.
It’s difference of normal VM.
I used docker before but I have not written my own image, so today I wrote Dockerfile for test.
At first I build docker env in my pc using homebrew.

brew cask install virtualbox
brew install docker
brew install boot2docker

After installed, start up docker.
And I started up a Docker instance.(ubuntu)

boot2docker init
boot2docker up
docker run -i -t ubuntu /bin/bash

OK.
Next I wrote Docker file based on ubuntu.
Recent version of ubuntu, user can install RDKit using apt-get command (2015 03 01 for wily).
So, Dockerfile is following.
Following image will provide user RDKit, and some chemoinformatics apps.

from ubuntu

MAINTAINER iwatobipen

RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get -y install python-pip python-dev build-essential
RUN pip install --upgrade pip
RUN apt-get -y install wget
RUN apt-get -y install python-pillow
RUN apt-get -y install python-numpy python-scipy python-patsy python-statsmodels
RUN apt-get -y install python-rdkit librdkit1 rdkit-data rdkit-doc
RUN apt-get -y install python-scikits-learn python-pandas python-pandas-lib
RUN apt-get -y install python-matplotlib python-matplotlib-data
RUN pip install seaborn
RUN pip install tornado
RUN apt-get -y install ipython

After save the file.
Build image using following command.

iwatobipen$ docker build -t iwatobipen/rdkit-ubuntu:0.1 .

Wow I could docker image!
Next I push the image to dockerhub.

iwatobipen$ docker login
..............
Login Succeeded
iwatobipen$ docker push iwatobipen/rdkit-ubuntu
The push refers to a repository [docker.io/iwatobipen/rdkit-ubuntu] (len: 2)
33e1d12d46e0: Pushed 
.......
size: 28540

Result is following URL ;-)
https://hub.docker.com/r/iwatobipen/rdkit-ubuntu/
OK, Run the image.

iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu /bin/bash
root@c644a9d6f0c6:/# 

Logged in…
Use ipython.

root@c644a9d6f0c6:/# ipython
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
Type "copyright", "credits" or "license" for more information.

IPython 1.2.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import rdBase

In [2]: rdBase.rdkitVersion
Out[2]: '2013.09.1'

Oops! mistake…
I used old version of ubuntu….;-(

docker run -i -t ubuntu:15.10 /bin/bash

Then build Docker file.
Using “from ubuntu:15.10” instead of “from ubuntu15:10”.

Then tag the image.

iwatobipen$ docker tag c053137912aa iwatobipen/rdkit-ubuntu15
iwatobipen$ docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
iwatobipen/rdkit-ubuntu15   0.1                 c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu15   latest              c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu     0.1                 33e1d12d46e0        24 hours ago        701.6 MB
iwatobipen/rdkit-ubuntu     latest              33e1d12d46e0        24 hours ago        701.6 MB
<none>                      <none>              0221cd7430fb        24 hours ago        494.8 MB
ubuntu                      latest              e9ae3c220b23        5 days ago          187.9 MB
pacur/ubuntu-wily           latest              f4e4eb1c359e        2 weeks ago         859.9 MB
ubuntu                      15.10               5eb72b199374        3 weeks ago         131.4 MB
iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu15
root@f192406f11ba:/# ipython
Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import Chem

In [2]: from rdkit import rdBase

In [3]: rdBase.rdkitVersion  
Out[3]: '2015.03.1'

Good, and push docker hub.
Results is here.
https://hub.docker.com/r/iwatobipen/rdkit-ubuntu15/
It’s fun!

Visualizing the process of lead optimization

Some time we set milestones to management of portfolio, or/and to check the progeress of projects.
These data were reported document, power point slides etc, so it’s difficult to grasp situation of LO timely.
Researchers at GSK published a solution of visualize LO process.
It was impressive for me.
Link is here.
http://www.ncbi.nlm.nih.gov/pubmed/26262898

They called “LO telemetry” that shows time course of total risk of compounds.
Total risk is calculated based on potency of each target, ADME, Tox and physchem profiles.
Ideally, total risk will decreased progress of project. But, there are a lot of problems in drug discovery project (at least for me! ;-) ).
Fig5 shows one of the example.
The figure shows progress of lead optimization and design entropy(chemical diversity).
Design entropy is suddenly increased because of Tox problems. PhysChem prop risk slightly increased also.
To avoid tox problem(adverse effect) chemist think about change of chemical series or dynamic change of structure. It risk to loss of potency, but Fig 5 shows there strategy keep row score of pharmacological risk.

The paper reported that LO project team can check the telemetry. It tells team about bottlenecks and progress of there project.
Also the system can use portfolio management.
It useful to decision make, motivate the team.
On the other hand, the telemetry provides a vivid description of each projects.
How do you think about metrics of Lead Optimization.

Passport for compound.

I was interested in the title.
“Compound Passport Service”
http://dx.doi.org/10.1016/j.drudis.2015.06.011
AZ made passport for compound to manage compound rights tracking.

The system can manage status of compounds, like ownership, permission and structure shared.

I really impressed with the concept and system because I think that management of compound(and right) logistics is key factor in Drug Discovery.

I want to develop seamless compound logistics system and tracking system of medicinal chemistry…

The result

I went to my old school the day before yesterday, I like my old school.
An event about chemoinformatics was held here.
I enjoyed presentation of all participants.
My prediction result was …….(please don’t ask ;-))

All teams used two approaches SBDD or LBDD, and winner used LBDD approach.
It’s worth to think about which method is more effective for VS.
Is it not always true that more rich resource produces more effective prediction ?
I don’t have the answer yet.

In the contest, two teams used deep learning for the prediction.
An academia team that used DL, presented very cool approach.
Sometime turning of hyper parameters are problematic in the deep learning.
Because a lots of parameters have to optimise and the process is very time consuming step.
So they chose random sampling strategy to optimise the parameters. And they run the calculation using super computer.
I agree the strategy.
Benjo et.al. reported random search for hyper-parameter optimisation before.
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Is DL still hot area in ML? I’ll check some papers.
My snippet was uploaded git hub.(not include results and SDF)
https://github.com/iwatobipen/chemo_info/tree/master/ipab2015

I thank all participants for having good discussion, and thank my family for allowing me to cording in my off-time.

Bonus..
Yesterday, I found a Dinosaur !!
Wow ;-)
IMG_1349