Machine learning workflow tool for none programmer #memo #machinelearning #dss

I’m on summer vacation. This summer is high temperature and humidity….. So it is tough for me to running. ;-( And now very big typhoon is coming to Japan. Oops…

Let’s leave that aside for now.

Today I would like to introduce very cool tool for machine learning. Recently we can use many machine learning package with python. Of course I use them for example scikit-learn, tenthorflow, pytorch and kearas etc etc…

It is good tool for programmer but it is little bit difficult for none programmer. Today I found very interesting tool for machine learning which name is data science studio(DSS).

DSS is developed by dataiku where begines in 2013 very new company. DSS is the collaborative data science software. I saw demo on youtube and felt nice. I would like to use it. Fortunately DSS can use freely. User can choice some options for installation, I chose Docker for DSS installation.

It was very easy to build the environment. Just type following 2 lines. ;-)

docker pull dataiku/dss
docker run -p 10000:10000 -d dataiku/dss

Now dss container is build and start. I can access the container localhost:10000.

I made test project named boston which is machine learning project with boston dataset from sklearn.

After uploading the sample data, I could see data make regression model very easily. Following screen shot is view of script. It is easy to filter, remove data, just click and select action from column header.

And also it is very easy to make chart just select chart style and drag&drop the column which you want to visualize. Following example is the scatter plot TAX vs LSAT.

Model building is easy! Just click on/off panel which you want to use. Of course it is easy to set up many combination of hyperprameters.

Now ready just click train button for training.

DSS track all training result and after training I could access all training results.

Following sc is decision trees of Gradient Boost and variables importance.

DSS has many algorithms as default but it is easy to implement your own algorithms.

Work flow of machine learning is datapreparation, analyze data, prepare and optimize model and test the model. DSS is very easy to make the work flow.

The tool don’t have chemoinformatics tools such as rdkit, openbabel etc, but I think it can install DSS server.

Recently progress of machine learning are is very fast. I have to think which is more efficient coding by myself or using workflow tool kit.

Coding is very interesting for me but it will take more time….