タグ: diary

Make Docker image File with Chemoinformatics toolkits.

Docker https://www.docker.com is an open platform for distributed applications for developers and sysadmins.
Providing docker image that means every one build same environment easily.
It’s means create own image and share image.
Docker provide container.
Containers running on a single machine all share the same OS kernel.
It’s difference of normal VM.
I used docker before but I have not written my own image, so today I wrote Dockerfile for test.
At first I build docker env in my pc using homebrew.

brew cask install virtualbox
brew install docker
brew install boot2docker

After installed, start up docker.
And I started up a Docker instance.(ubuntu)

boot2docker init
boot2docker up
docker run -i -t ubuntu /bin/bash

OK.
Next I wrote Docker file based on ubuntu.
Recent version of ubuntu, user can install RDKit using apt-get command (2015 03 01 for wily).
So, Dockerfile is following.
Following image will provide user RDKit, and some chemoinformatics apps.

from ubuntu

MAINTAINER iwatobipen

RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get -y install python-pip python-dev build-essential
RUN pip install --upgrade pip
RUN apt-get -y install wget
RUN apt-get -y install python-pillow
RUN apt-get -y install python-numpy python-scipy python-patsy python-statsmodels
RUN apt-get -y install python-rdkit librdkit1 rdkit-data rdkit-doc
RUN apt-get -y install python-scikits-learn python-pandas python-pandas-lib
RUN apt-get -y install python-matplotlib python-matplotlib-data
RUN pip install seaborn
RUN pip install tornado
RUN apt-get -y install ipython

After save the file.
Build image using following command.

iwatobipen$ docker build -t iwatobipen/rdkit-ubuntu:0.1 .

Wow I could docker image!
Next I push the image to dockerhub.

iwatobipen$ docker login
..............
Login Succeeded
iwatobipen$ docker push iwatobipen/rdkit-ubuntu
The push refers to a repository [docker.io/iwatobipen/rdkit-ubuntu] (len: 2)
33e1d12d46e0: Pushed 
.......
size: 28540

Result is following URL 😉
https://hub.docker.com/r/iwatobipen/rdkit-ubuntu/
OK, Run the image.

iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu /bin/bash
root@c644a9d6f0c6:/# 

Logged in…
Use ipython.

root@c644a9d6f0c6:/# ipython
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
Type "copyright", "credits" or "license" for more information.

IPython 1.2.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import rdBase

In [2]: rdBase.rdkitVersion
Out[2]: '2013.09.1'

Oops! mistake…
I used old version of ubuntu….;-(

docker run -i -t ubuntu:15.10 /bin/bash

Then build Docker file.
Using “from ubuntu:15.10” instead of “from ubuntu15:10”.

Then tag the image.

iwatobipen$ docker tag c053137912aa iwatobipen/rdkit-ubuntu15
iwatobipen$ docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
iwatobipen/rdkit-ubuntu15   0.1                 c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu15   latest              c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu     0.1                 33e1d12d46e0        24 hours ago        701.6 MB
iwatobipen/rdkit-ubuntu     latest              33e1d12d46e0        24 hours ago        701.6 MB
<none>                      <none>              0221cd7430fb        24 hours ago        494.8 MB
ubuntu                      latest              e9ae3c220b23        5 days ago          187.9 MB
pacur/ubuntu-wily           latest              f4e4eb1c359e        2 weeks ago         859.9 MB
ubuntu                      15.10               5eb72b199374        3 weeks ago         131.4 MB
iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu15
root@f192406f11ba:/# ipython
Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import Chem

In [2]: from rdkit import rdBase

In [3]: rdBase.rdkitVersion  
Out[3]: '2015.03.1'

Good, and push docker hub.
Results is here.
https://hub.docker.com/r/iwatobipen/rdkit-ubuntu15/
It’s fun!

Visualizing the process of lead optimization

Some time we set milestones to management of portfolio, or/and to check the progeress of projects.
These data were reported document, power point slides etc, so it’s difficult to grasp situation of LO timely.
Researchers at GSK published a solution of visualize LO process.
It was impressive for me.
Link is here.
http://www.ncbi.nlm.nih.gov/pubmed/26262898

They called “LO telemetry” that shows time course of total risk of compounds.
Total risk is calculated based on potency of each target, ADME, Tox and physchem profiles.
Ideally, total risk will decreased progress of project. But, there are a lot of problems in drug discovery project (at least for me! 😉 ).
Fig5 shows one of the example.
The figure shows progress of lead optimization and design entropy(chemical diversity).
Design entropy is suddenly increased because of Tox problems. PhysChem prop risk slightly increased also.
To avoid tox problem(adverse effect) chemist think about change of chemical series or dynamic change of structure. It risk to loss of potency, but Fig 5 shows there strategy keep row score of pharmacological risk.

The paper reported that LO project team can check the telemetry. It tells team about bottlenecks and progress of there project.
Also the system can use portfolio management.
It useful to decision make, motivate the team.
On the other hand, the telemetry provides a vivid description of each projects.
How do you think about metrics of Lead Optimization.

Passport for compound.

I was interested in the title.
“Compound Passport Service”
http://dx.doi.org/10.1016/j.drudis.2015.06.011
AZ made passport for compound to manage compound rights tracking.

The system can manage status of compounds, like ownership, permission and structure shared.

I really impressed with the concept and system because I think that management of compound(and right) logistics is key factor in Drug Discovery.

I want to develop seamless compound logistics system and tracking system of medicinal chemistry…

The result

I went to my old school the day before yesterday, I like my old school.
An event about chemoinformatics was held here.
I enjoyed presentation of all participants.
My prediction result was …….(please don’t ask ;-))

All teams used two approaches SBDD or LBDD, and winner used LBDD approach.
It’s worth to think about which method is more effective for VS.
Is it not always true that more rich resource produces more effective prediction ?
I don’t have the answer yet.

In the contest, two teams used deep learning for the prediction.
An academia team that used DL, presented very cool approach.
Sometime turning of hyper parameters are problematic in the deep learning.
Because a lots of parameters have to optimise and the process is very time consuming step.
So they chose random sampling strategy to optimise the parameters. And they run the calculation using super computer.
I agree the strategy.
Benjo et.al. reported random search for hyper-parameter optimisation before.
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Is DL still hot area in ML? I’ll check some papers.
My snippet was uploaded git hub.(not include results and SDF)
https://github.com/iwatobipen/chemo_info/tree/master/ipab2015

I thank all participants for having good discussion, and thank my family for allowing me to cording in my off-time.

Bonus..
Yesterday, I found a Dinosaur !!
Wow 😉
IMG_1349

Think about SAR analysis.

I lost a chance of participation in RDKIT-UGM because ticket was sold out. ;-(
I’ll try next year….

SAR analysis is key for drug discovery.
MMPA is one of major tool, I like the method.
Because MMPA is easy to check effect of substituent in molecule.
But sometime, it difficult to understand why the parameter is changed.
I found interesting way to analyse SAR using MMP, it’s called ‘non-additive SAR’.
Link is following.
http://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00018

‘Non-additive’ means …. If the effect of adding a specific substituent to position A depends on the presence of another substituent in position B.

I met this situation sometime.
…Hmm this part increase activity dramatically when the scaffold has this substituent.

So, I think non-additive SAR is useful for med-chem.

The author described some example about non-additive SAR in drug discovery project.
And source code can get from supporting information.

I customise the code and apply for my project.
The results seems to interesting. 😉

Importance of communication

I often think about how to Importance of communicate with bench chemist and computational chemist.
Sometime, in my situation, communication tools are limited because of skills, tool license, or any other reasons.
For example, comp. chemist presented us using MOE but bench chemists often use PyMol so data translation is required. And spotfire is difficult for some chemists because of data preparation.
Another example, synthetic chemistry is difficult for computational chemists.

If we have a lots of tools to communication, it’s more problematic situation because we have to understand how to use them.

Researchers in NIBR are developed cool platform to solve these problems.
Link is following url.http://pubs.acs.org/doi/abs/10.1021/ci500598e
The platform named FOCUS manage design and analyse cycle.

The story of FOCUS development was interesting for me because they use subversion and jenkins for automated global deployment process.
It’s means agile development is effective for global system development.

The system seems to user friendly and high functionality.
They did not only develop the system but also train the users. I think key to success is user training, because this is the most important but difficult step. They did it.

It is an exciting report for me.

review for kinetics

I found nice review about kinetics of drug binding and residence time.
http://www.ncbi.nlm.nih.gov/pubmed/25782745

To improve in vitro, in vivo potency, I some time try to get SKR for designing molecule.
If I got correlation only residence time and lipophilicity or molecular weight, the information is not so good.
Because too liphophilic or heavy molecule is not so drug like.
So, I’m thinking about what is the best way to using kinetic data.
In this review, there are some example about how to use kinetic data for molecular design.

I was interested in Table1, because there are a lot of examples that are using SPR for kinetic data analysis.
Sometime I think analysis of kinetic using SPR is difficult because of instability of target protein or another factor but lots of success stories are. Hmm.

SKR is attractive for me as same as SAR, but rational design of molecule using kinetic data is still challenging area for me.