Compare chainer example with GPU and without GPU.

Some days ago, @fmkz__ -san posted about chainer.

I have not ran example code, so tried it.
At first, Run with out GPU.

iwatobipen$ time python
load MNIST dataset
epoch 1
graph generated
train mean loss=0.192379207825, accuracy=0.941316669857
test  mean loss=0.0953161568585, accuracy=0.968600004911
epoch 20
train mean loss=0.00968988090991, accuracy=0.997333335777
test  mean loss=0.0921416912593, accuracy=0.984500007629
save the model
save the optimizer

real	6m33.396s
user	11m35.861s
sys	0m18.857s

Next, Run with GPU.
Only add option ‘–gpu=0’.

iwatobipen$ time python --gpu=0
load MNIST dataset
epoch 1
graph generated
train mean loss=0.194442151414, accuracy=0.941800003027
test  mean loss=0.0869260625821, accuracy=0.972800006866
epoch 20
train mean loss=0.00370079859973, accuracy=0.998933334351
test  mean loss=0.104757102392, accuracy=0.983700006008
save the model
save the optimizer

real	2m4.095s
user	2m1.759s
sys	0m1.336s

3 to 5 times faster with GPU than without GPU.

Hit triage, case of MTase

I often see keyword “PAINS” in literature .

Frequent hitters of HTS deck are problematic and PAINS filter is useful to remove them because false positives are time and cost consuming.

Researcher of Lilly, reported nice letter in J. Med. Chem. Lett.

The author analysed there HTS campaign of Methyltransferases( MTase ).

They screened 9 MTase and NHR, PDE as counter.

I’m interested in Table3 listed in promiscuous MTase Scaffolds.

I thought some scaffolds mimic SAM and maybe theses scaffolds are often show in kinase inhibitors. And I thought it seems good starting point of SAR at first. But it was wrong….

Some compounds shows sub micromolar IC50, but ITC experiments could not detect heat. Also they did HDX experiments and got same results. ( no protein exchange )

It’s worth for me to know that ITC / HDX / SPR or related biochemical assays are useful to pick up real binder.

And author concluded following message.

We hope that sharing our learning from this challenging target class highlights the need for robust confirmation of actives ……… to avoid wasting further resources on likely false positives.

I like the sentence.

Share the knowledge in precompetitive to overcome difficulties. ;-)

Make Docker image File with Chemoinformatics toolkits.

Docker is an open platform for distributed applications for developers and sysadmins.
Providing docker image that means every one build same environment easily.
It’s means create own image and share image.
Docker provide container.
Containers running on a single machine all share the same OS kernel.
It’s difference of normal VM.
I used docker before but I have not written my own image, so today I wrote Dockerfile for test.
At first I build docker env in my pc using homebrew.

brew cask install virtualbox
brew install docker
brew install boot2docker

After installed, start up docker.
And I started up a Docker instance.(ubuntu)

boot2docker init
boot2docker up
docker run -i -t ubuntu /bin/bash

Next I wrote Docker file based on ubuntu.
Recent version of ubuntu, user can install RDKit using apt-get command (2015 03 01 for wily).
So, Dockerfile is following.
Following image will provide user RDKit, and some chemoinformatics apps.

from ubuntu

MAINTAINER iwatobipen

RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get -y install python-pip python-dev build-essential
RUN pip install --upgrade pip
RUN apt-get -y install wget
RUN apt-get -y install python-pillow
RUN apt-get -y install python-numpy python-scipy python-patsy python-statsmodels
RUN apt-get -y install python-rdkit librdkit1 rdkit-data rdkit-doc
RUN apt-get -y install python-scikits-learn python-pandas python-pandas-lib
RUN apt-get -y install python-matplotlib python-matplotlib-data
RUN pip install seaborn
RUN pip install tornado
RUN apt-get -y install ipython

After save the file.
Build image using following command.

iwatobipen$ docker build -t iwatobipen/rdkit-ubuntu:0.1 .

Wow I could docker image!
Next I push the image to dockerhub.

iwatobipen$ docker login
Login Succeeded
iwatobipen$ docker push iwatobipen/rdkit-ubuntu
The push refers to a repository [] (len: 2)
33e1d12d46e0: Pushed 
size: 28540

Result is following URL ;-)
OK, Run the image.

iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu /bin/bash

Logged in…
Use ipython.

root@c644a9d6f0c6:/# ipython
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
Type "copyright", "credits" or "license" for more information.

IPython 1.2.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import rdBase

In [2]: rdBase.rdkitVersion
Out[2]: '2013.09.1'

Oops! mistake…
I used old version of ubuntu….;-(

docker run -i -t ubuntu:15.10 /bin/bash

Then build Docker file.
Using “from ubuntu:15.10” instead of “from ubuntu15:10”.

Then tag the image.

iwatobipen$ docker tag c053137912aa iwatobipen/rdkit-ubuntu15
iwatobipen$ docker images
REPOSITORY                  TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
iwatobipen/rdkit-ubuntu15   0.1                 c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu15   latest              c053137912aa        6 minutes ago       841.3 MB
iwatobipen/rdkit-ubuntu     0.1                 33e1d12d46e0        24 hours ago        701.6 MB
iwatobipen/rdkit-ubuntu     latest              33e1d12d46e0        24 hours ago        701.6 MB
<none>                      <none>              0221cd7430fb        24 hours ago        494.8 MB
ubuntu                      latest              e9ae3c220b23        5 days ago          187.9 MB
pacur/ubuntu-wily           latest              f4e4eb1c359e        2 weeks ago         859.9 MB
ubuntu                      15.10               5eb72b199374        3 weeks ago         131.4 MB
iwatobipen$ docker run -i -t iwatobipen/rdkit-ubuntu15
root@f192406f11ba:/# ipython
Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import Chem

In [2]: from rdkit import rdBase

In [3]: rdBase.rdkitVersion  
Out[3]: '2015.03.1'

Good, and push docker hub.
Results is here.
It’s fun!

Set up GPU for chainer in OSx El Capitan.

I enjoyed a UGM that held on Nov. 5, 6 and hope participants also enjoyed the meeting.

One of the topics of the meeting was “Deep Learning”(???).
I introduced about python library for deep learning called ‘chainer’ .

I presented my challenge to build model for QSAR, using chainer but results were ….(Hahaha)
Chainer is very powerful library, but I have one problem to use chainer.
The problem was that my environment(OSx El Capitan) can’t use GPU from chainer.
I searched google or asked developer to solve it.
And now, I solved the problem!
It depend on OS version.
New Mac OS El capitan can’t over write DYLD_LIBRARY_PATH!
So, when I added dyld_library_path of cuda to .bashrc, it was not reflected!
The solution of this problem was very simple.
Make all symbolic-links to /usr/local/lib .
So type following command from terminal (I used bash).

iwatobipen$ ln -s /usr/local/cuda/lib/* /usr/local/lib

That’s all.
It took long time to solve it…
Now I try to make QSAR model using DL with GPU.

User who use mac.
CUDA can install using homebrew (Very simple way to install cuda).
Chainer and chainer-cuda-deps can install using pip.
So, I recommend using pyenv to develop your own environment.

Mishima.syk will be held on 12 Dec.
Don’t miss it ! ;-)

fish for drug discovery

Some years ago, I was interested in phenotypic screening that use zebrafish.
In drug discovery project, we use animal models (often use rodent) to estimate efficacy, safety or etc.
Also Knock out/in mouse is used for target validation.
It’s still high cost and low throughput step.

So, I think zebrafish is one of the interesting tool for check efficacy, safety, or target validation.

Calum A. MacRael et. al. reported exciting reviews in Nat.Rev.DD.

They described advantages of screening in zebrafish.
1st, Broad range of accessible biology.
2nd, Early insight into toxicity.
They listed lots of examples of published screens in zebrafish in Table1.
And interesting that physiology, metabolism are well conserved.
Screening often go on 96 well pate, so it’s indicate that screening throughput is high.

Some success examples are described, one project of UCB goes Phase II trials.
Not only cancer but also other diseases are targeted this screening system.

Recently genome editing technology TALENs, CRISPR-cas enable access knock-in / out line rapidly.
I think it’s new era of drug discovery.

I didn’t have any experience of zebrafish screening.
So, I wonder solubility requirements of chemical library for screening, because fish in aqueous media.
How about DMSO tolerance ?
I’ll read some reports about the screening.

make 3d PCA plot

I often use PCA(principal component analysis) to reduce dimension.
I do PCA using Python sklearn or R language.

Basic function of R “biplot” makes 2D chart.
It’s easy way to make biplot.

Today I found cool library of R, named “pca3d”.
Install is easy! Just type following command.


Now make chart.
I used iris data set for test.

> library(pca3d)
> data(iris)
> pca <- prcomp(iris[,-5], scale.=TRUE)
> pca3d(pca, group=iris[,5], biplot=TRUE)
[1] 0.06599283 0.05354630 0.02004088
Creating new device

Enter pca3d command, X quarts was launched and I got 3d biplot.
The chart can move.
Screen Shot 2015-10-23 at 11.07.58 PM

Hmm, 3D chart is useful to check the distribution of datapoint, but… I like 2D biplot. ;-)

If reader who is interested in the library, please check following site.

Metrics for Drug likeness

There are some indexes about drug likeness.
Of course, I sometime use LE, LLE for lead optimization.
Recently QED is new metrics for estimation of drug likeness.
Which parameters do you like ?
I read following paper that reported Patrick Barton et. al.
It’s interesting for me.

The author proposed new index named “AEI” ADMET efficiency index.
AEI is defined following formula.
AEI = pActivity -|logP| / PSA x 100.

The key difference of AEI and LE is that AEI is considered not only lipophilicity but also polarity(PSA).

The concept based on 75/3 Rules.
I think the proposal is useful, Fig2 shows cleanly separate BCS class II/III compounds using logP and PSA.

The author proposed compound classification system using AEI and provided decision tree(Fig8) .

Compounds falling into class A have AEI>7 with an AEI ranging from 7 to 31 and a mean of 9.9. For this class the median daily dose is 25 mg with 93% of this class having a total daily dose of <300 mg. Compounds within the range 4 < AEI 300 mg. Class C compounds where AEI < 4 are the most concerning representing a relatively poor LLE and ADMET profile.

The tree helpful for chemist to understand the compound profile.
I’ll try to calculate AEI for inhouse dataset.