Cryo-EM data analysis with deep learning #arXiv

Recently number of publications with cryo-EM is growing. And also number of data storage is growing.
From EMDB,
I could get following statistics.

And now, Cryo-EM is collecting a lot of attention in drug discovery area because the method has possibility of determination for difficult targets that can not accessible to X-ray analysis. I’m not sure about cryo-EM but have interest the technology.

One of the challenges is molecular model building. For Cryo-EM, 2D projection images need to convert volumetric data and model the atomic coordinates of each amino acid. The modeling process is still time consuming step.

Today I found exciting article.
The authors developed new approach for molecular modeling named ‘A2-Net’ which is used Neural network (3D convolutional network) and MCTS.

Neural network is used in step one, which determines the 3D coordinates of atoms in each amino acid. And MCTS is used in step two, which prune the candidate amino acids in main chain.

It is interesting for me that, first step of A2 net is prediction of amino acids category and their coordinates from the volumetric data!

After getting the proposal, they used 3D stacked hourglass network for further pose estimation. Then MCTS is used.

Finally their method is outperform to ROSETTA-denovo.

The article indicates that Deep learning is powerful method for 3D detection. I would like to learn 3D detection and Cryo-EM.





1. Githubにもっとコードあげる
2. 英語
3. pytorch
 なんとなく使ったことはあるけどあまり深く勉強はしてなかったので今年はコレでなんか作りたいですね。今まで僕はTensorflowまたはKerasメインでしたので一度Define by runの方も触ってみようと思います




I like my town. This town is comfortable for me to live in, because it is not too urban like Tokyo or rural.

There are many beautiful place and following pictures are my favorite place.  The water in this river is very clean. I can see firefly in summer around here.

I want to  this scenery to continue forever.

After walk, I went to see a doctor, my finger is getting well. I hope my finger get well soon…

Make Docker image File with Chemoinformatics toolkits.

Docker is an open platform for distributed applications for developers and sysadmins.
Providing docker image that means every one build same environment easily.
It’s means create own image and share image.
Docker provide container.
Containers running on a single machine all share the same OS kernel.
It’s difference of normal VM.
I used docker before but I have not written my own image, so today I wrote Dockerfile for test.
At first I build docker env in my pc using homebrew.

brew cask install virtualbox
brew install docker
brew install boot2docker

After installed, start up docker.
And I started up a Docker instance.(ubuntu)

boot2docker init
boot2docker up
docker run -i -t ubuntu /bin/bash

Next I wrote Docker file based on ubuntu.
Recent version of ubuntu, user can install RDKit using apt-get command (2015 03 01 for wily).
So, Dockerfile is following.
Following image will provide user RDKit, and some chemoinformatics apps.

from ubuntu

MAINTAINER iwatobipen

RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get -y install python-pip python-dev build-essential
RUN pip install --upgrade pip
RUN apt-get -y install wget
RUN apt-get -y install python-pillow
RUN apt-get -y install python-numpy python-scipy python-patsy python-statsmodels
RUN apt-get -y install python-rdkit librdkit1 rdkit-data rdkit-doc
RUN apt-get -y install python-scikits-learn python-pandas python-pandas-lib
RUN apt-get -y install python-matplotlib python-matplotlib-data
RUN pip install seaborn
RUN pip install tornado
RUN apt-get -y install ipython

After save the file.
Build image using following command.

iwatobipen$ docker build -t iwatobipen/rdkit-ubuntu:0.1 .

Wow I could docker image!
Next I push the image to dockerhub.

iwatobipen$ docker login
Login Succeeded
iwatobipen$ docker push iwatobipen/rdkit-ubuntu
The push refers to a repository [] (len: 2)
33e1d12d46e0: Pushed 
size: 28540

Result is following URL ;-)
OK, Run the image.

iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu /bin/bash

Logged in…
Use ipython.

root@c644a9d6f0c6:/# ipython
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
Type "copyright", "credits" or "license" for more information.

IPython 1.2.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import rdBase

In [2]: rdBase.rdkitVersion
Out[2]: '2013.09.1'

Oops! mistake…
I used old version of ubuntu….;-(

docker run -i -t ubuntu:15.10 /bin/bash

Then build Docker file.
Using “from ubuntu:15.10” instead of “from ubuntu15:10”.

Then tag the image.

iwatobipen$ docker tag c053137912aa iwatobipen/rdkit-ubuntu15
iwatobipen$ docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
iwatobipen/rdkit-ubuntu15   0.1                 c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu15   latest              c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu     0.1                 33e1d12d46e0        24 hours ago        701.6 MB
iwatobipen/rdkit-ubuntu     latest              33e1d12d46e0        24 hours ago        701.6 MB
<none>                      <none>              0221cd7430fb        24 hours ago        494.8 MB
ubuntu                      latest              e9ae3c220b23        5 days ago          187.9 MB
pacur/ubuntu-wily           latest              f4e4eb1c359e        2 weeks ago         859.9 MB
ubuntu                      15.10               5eb72b199374        3 weeks ago         131.4 MB
iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu15
root@f192406f11ba:/# ipython
Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import Chem

In [2]: from rdkit import rdBase

In [3]: rdBase.rdkitVersion  
Out[3]: '2015.03.1'

Good, and push docker hub.
Results is here.
It’s fun!

Visualizing the process of lead optimization

Some time we set milestones to management of portfolio, or/and to check the progeress of projects.
These data were reported document, power point slides etc, so it’s difficult to grasp situation of LO timely.
Researchers at GSK published a solution of visualize LO process.
It was impressive for me.
Link is here.

They called “LO telemetry” that shows time course of total risk of compounds.
Total risk is calculated based on potency of each target, ADME, Tox and physchem profiles.
Ideally, total risk will decreased progress of project. But, there are a lot of problems in drug discovery project (at least for me! ;-) ).
Fig5 shows one of the example.
The figure shows progress of lead optimization and design entropy(chemical diversity).
Design entropy is suddenly increased because of Tox problems. PhysChem prop risk slightly increased also.
To avoid tox problem(adverse effect) chemist think about change of chemical series or dynamic change of structure. It risk to loss of potency, but Fig 5 shows there strategy keep row score of pharmacological risk.

The paper reported that LO project team can check the telemetry. It tells team about bottlenecks and progress of there project.
Also the system can use portfolio management.
It useful to decision make, motivate the team.
On the other hand, the telemetry provides a vivid description of each projects.
How do you think about metrics of Lead Optimization.

Passport for compound.

I was interested in the title.
“Compound Passport Service”
AZ made passport for compound to manage compound rights tracking.

The system can manage status of compounds, like ownership, permission and structure shared.

I really impressed with the concept and system because I think that management of compound(and right) logistics is key factor in Drug Discovery.

I want to develop seamless compound logistics system and tracking system of medicinal chemistry…

The result

I went to my old school the day before yesterday, I like my old school.
An event about chemoinformatics was held here.
I enjoyed presentation of all participants.
My prediction result was …….(please don’t ask ;-))

All teams used two approaches SBDD or LBDD, and winner used LBDD approach.
It’s worth to think about which method is more effective for VS.
Is it not always true that more rich resource produces more effective prediction ?
I don’t have the answer yet.

In the contest, two teams used deep learning for the prediction.
An academia team that used DL, presented very cool approach.
Sometime turning of hyper parameters are problematic in the deep learning.
Because a lots of parameters have to optimise and the process is very time consuming step.
So they chose random sampling strategy to optimise the parameters. And they run the calculation using super computer.
I agree the strategy.
Benjo reported random search for hyper-parameter optimisation before.

Is DL still hot area in ML? I’ll check some papers.
My snippet was uploaded git hub.(not include results and SDF)

I thank all participants for having good discussion, and thank my family for allowing me to cording in my off-time.

Yesterday, I found a Dinosaur !!
Wow ;-)