Machine learning workflow tool for none programmer #memo #machinelearning #dss

I’m on summer vacation. This summer is high temperature and humidity….. So it is tough for me to running. ;-( And now very big typhoon is coming to Japan. Oops…

Let’s leave that aside for now.

Today I would like to introduce very cool tool for machine learning. Recently we can use many machine learning package with python. Of course I use them for example scikit-learn, tenthorflow, pytorch and kearas etc etc…

It is good tool for programmer but it is little bit difficult for none programmer. Today I found very interesting tool for machine learning which name is data science studio(DSS).

DSS is developed by dataiku where begines in 2013 very new company. DSS is the collaborative data science software. I saw demo on youtube and felt nice. I would like to use it. Fortunately DSS can use freely. User can choice some options for installation, I chose Docker for DSS installation.

It was very easy to build the environment. Just type following 2 lines. ;-)

docker pull dataiku/dss
docker run -p 10000:10000 -d dataiku/dss

Now dss container is build and start. I can access the container localhost:10000.

I made test project named boston which is machine learning project with boston dataset from sklearn.

After uploading the sample data, I could see data make regression model very easily. Following screen shot is view of script. It is easy to filter, remove data, just click and select action from column header.

And also it is very easy to make chart just select chart style and drag&drop the column which you want to visualize. Following example is the scatter plot TAX vs LSAT.

Model building is easy! Just click on/off panel which you want to use. Of course it is easy to set up many combination of hyperprameters.

Now ready just click train button for training.

DSS track all training result and after training I could access all training results.

Following sc is decision trees of Gradient Boost and variables importance.

DSS has many algorithms as default but it is easy to implement your own algorithms.

Work flow of machine learning is datapreparation, analyze data, prepare and optimize model and test the model. DSS is very easy to make the work flow.

The tool don’t have chemoinformatics tools such as rdkit, openbabel etc, but I think it can install DSS server.

Recently progress of machine learning are is very fast. I have to think which is more efficient coding by myself or using workflow tool kit.

Coding is very interesting for me but it will take more time….

Make virtual machine for chemoinformatics #RDKit

Recently stable version of docker for mac is released. It’s good news for me. ;-)
https://www.docker.com/products/docker
I used boot2docker before but, now I switched docker for mac. Because it’s easy to install and share the file with host OS.
Of course, I installed docker for mac !

Virtual machine is one of useful way to test my code and keep native environment clean.
Today I wrote sample Docker file for chemoinformatics.
The virtual machine can run rdkit and keras. It’s means the machine can do deep learning with rdkit.
My Docker file is following.
https://github.com/iwatobipen/docker4chmoinfo/blob/master/docker4chemoinfo/Dockerfile
Linux environment can use new version of RDKit.

FROM ubuntu:16.04 
MAINTAINER iwatobipen <seritaka@gmail.com>
RUN apt-get -y update && apt-get -y install wget
RUN apt-get -y install bzip2
RUN apt-get -y install git-all
RUN apt-get -y install libfreetype6-dev libxft-dev
RUN wget http://repo.continuum.io/archive/Anaconda3-4.1.1-Linux-x86_64.sh
RUN bash ./Anaconda3-4.1.1-Linux-x86_64.sh -b -p /opt/conda 
RUN rm ./Anaconda3-4.1.1-Linux-x86_64.sh
ENV PATH /bin:/usr/bin:/opt/conda/bin:PATH
RUN conda install -y -c rdkit rdkit=2016.03.3
RUN pip install seaborn
RUN conda install -y -c conda-forge keras=1.0.7
CMD /bin/bash

The run the image use docker run command.

 docker run -i -t iwatobipen/chemoinfo_test /bin/bash

OK check the VM.

root@b33953263220:/# ipython
Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:53:06) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from rdkit import Chem
In [2]: from rdkit import rdBase  
In [3]: rdBase.rdkitVersion
Out[3]: '2016.03.3'

In [4]: import keras
Using Theano backend.

In [5]: 

Hmm, my vm works well.
If user run the machine, it already to do deep learning.
My dockerhub repo is following.
https://hub.docker.com/r/iwatobipen/chemoinfo_test/

Enjoy.