New ML package for cheminformatics #cheminformatics #QSAR #ML

I introduced scikit-mol in my blog post before. The package integrates scikit-learn and rdkit. It’s easy to use because user can build QSAR model from scikit-learn’s API. I like the package. And recently I found another useful package for cheminformatics named ‘molflux‘ witch is developed by researchers in Exsicentia, famous AI Drug Discovery pharma. molfluxContinue reading “New ML package for cheminformatics #cheminformatics #QSAR #ML”

New type of python notebook #marimo #cheminformatics #RDKit

Jupyter-lab, Jupyter-note book, streamlt and other packages are useful for data science beucase it can analyze and visualize data step by step. I like streamlit and dash for making simple web app. And some days ago I found new and cool package named marimo. From the documentaion, marimo is an open-source reactive notebook for Python — reproducible,Continue reading “New type of python notebook #marimo #cheminformatics #RDKit”

Plot calibration curve with scikit-learn 1.0 #chemoinformatics #scikit-learn #memo

Recently scikit-learn ver. 1.0(nightly build) is released. I often use sklearn for my ask. So I would like to use new version ;) Current stable version is 0.24 so I installed 1.0.rc2 via pip. Here is a release Highlights and notes. https://scikit-learn.org/dev/auto_examples/release_highlights/plot_release_highlights_1_0_0.html https://scikit-learn.org/dev/whats_new/v1.0.html#changes-1-0 Ver 1.0 CalibrationDisplay method which can make calibration-curve plot easily. So IContinue reading “Plot calibration curve with scikit-learn 1.0 #chemoinformatics #scikit-learn #memo”

Package for ML task management #chemoinformatics #memo #machine_learning #RDKit

Now we can build lots of predictive models rapidly with useful ML tools such as keras, pytorch, scikit-learn, lightGBM etc… The problem for me is that how to manage these experimental results. I posted about the topics previously and I used MLflow, optuna as examples. These tools are has different features but both are veryContinue reading “Package for ML task management #chemoinformatics #memo #machine_learning #RDKit”

Useful ML tool for chemoinformatics #chemoinformatics #RDKit #Machine learning

Yesterday, I moved my main PC from Ubuntu18.04 to 20.04LTS. Now it works well. And I’m building new(clean) env for my coding. Today I would like to share useful package for machine learning named pycaret. Brief introduction of PyCaret is below. —from original site—PyCaret is an open-source, low-code machine learning library in Python that automatesContinue reading “Useful ML tool for chemoinformatics #chemoinformatics #RDKit #Machine learning”

Tool for machine learning model logging #chemoinformatics #machine_learning

It is difficult to manage machine learning models because to obtain good model, many trials which called parameter optimization are required and then lots of models are generated. Optuna is one of the useful package for model management and parameter optimization. I like it and posted some code about optuna. So today, I would likeContinue reading “Tool for machine learning model logging #chemoinformatics #machine_learning”

Optimize ML model with optuna and visualize the result with MLFlow #informatics #machine learning

As you know Optuna is very useful and powerful package for machine learning. I often use the package in my own task. And MLFLOW is also useful package. I posted about mlflow before. MLflow has many functions for visualize experiment results and manage models. https://iwatobipen.wordpress.com/2018/11/14/tracking-progress-of-machine-learning-machinelearning/ I think it will be useful if models can beContinue reading “Optimize ML model with optuna and visualize the result with MLFlow #informatics #machine learning”

Conformal prediction with python and rdkit_2 #RDKit #QSAR #Conformal_prediction

I posted about conformal prediction with python and rdkit some days ago. After that I could get very informative advice from @kjelljorner. Thanks a lot! His advice was below. Kjell Jorner @kjelljorner3dReplying to @iwatobipen I can recommend the cross conformal prediction or bootstrapped conformal prediction (also in nonconformist) to avoid having to put aside data for calibration.Continue reading “Conformal prediction with python and rdkit_2 #RDKit #QSAR #Conformal_prediction”

Predict probabilistic distribution with NGBoost #NGBoost #RDKit #QSAR #Chemoinformatics

Recently novel gradient boosting method was published from Andrew Ng group. It is interesting that NGBoost can calculate not only probability but also probabilistic distribution. It is useful for QSAR because we would like to know not only predicted value/class but also uncertainly of the prediction. Fortunately NGBoost is available from python! It can beContinue reading “Predict probabilistic distribution with NGBoost #NGBoost #RDKit #QSAR #Chemoinformatics”

New molecular fingerprint for chemoinformatics #map4 #RDKit #memo #chemoinformatics

Molecular fingerprint(FP) is a very important for chemoinformatics because it is used for building many predictive models not only ADMET but also biological activities. As you know, ECFP (Morgan Fingerprint) is one of golden standard FP of chemoinformatics. Because it shows stable performance against any problems. After ECFP is reported, many new fingerprint algorithm isContinue reading “New molecular fingerprint for chemoinformatics #map4 #RDKit #memo #chemoinformatics”

Model interporation with new drawing code of RDKit #RDKit #Machine learning #chemoinformatics

Following code does not use new drawing code but it revised one of my old post. :) I think, everyone who visits my blog has already read Gregs nice blog post about the drawing similarity map with new code. If you don’t read it I recommend to read it soon. URL is below. ;)http://rdkit.blogspot.com/2020/01/similarity-maps-with-new-drawing-code.html NowContinue reading “Model interporation with new drawing code of RDKit #RDKit #Machine learning #chemoinformatics”

Python package for Ensemble learning #Chemoinformatics #Scikit learn

Ensemble learning is a technique for machine learning. I wrote post about blending learning before. URL is below.https://iwatobipen.wordpress.com/2018/11/11/ensemble-learning-with-scikit-learn-and-xgboost-machine-learning/I implemented the code by myself at that time. Ensemble learning sometime outperform than single model. So it is useful for try to use the method. Fortunately now we can use ensemble learning very easily by using aContinue reading “Python package for Ensemble learning #Chemoinformatics #Scikit learn”

Python package of machine learning for imbalanced data #machine_learning #chemoinformatics

Recently I’m struggling with imbalanced data. I didn’t have any idea to handle it. So my predictive model showed poor performance. Some days ago, I found useful package for imbalanced data learning which name is ‘imbalanced learn‘. It can be installed from conda. The package provides methods for over sampling and under sampling. I hadContinue reading “Python package of machine learning for imbalanced data #machine_learning #chemoinformatics”

Machine learning workflow tool for none programmer #memo #machinelearning #dss

I’m on summer vacation. This summer is high temperature and humidity….. So it is tough for me to running. ;-( And now very big typhoon is coming to Japan. Oops… Let’s leave that aside for now. Today I would like to introduce very cool tool for machine learning. Recently we can use many machine learningContinue reading “Machine learning workflow tool for none programmer #memo #machinelearning #dss”

Make original sklearn classifier-2 #sklearn #chemoinfo

After posted ‘Make original sklearn classifier’, I could get comment from my follower @yamasaKit_-san and @kzfm-san. (Thanks!) So I checked diversity of models with principal component analysis(PCA).The example is almost same as yesterday but little bit different at last part.Last part of my code is below. Extract feature importances from L1 layer classifiers and mono-randomContinue reading “Make original sklearn classifier-2 #sklearn #chemoinfo”

Tracking progress of machine learning #MachineLearning

To conduct machine learning it is needed to optimize hyper parameters. For example scikit-learn provides grid search method. And you know there are several packages to do that such as hyperopt or gyopt etc. How do you mange builded models? It is difficult for me. Recently I am interested in mlflow . MLflow is anContinue reading “Tracking progress of machine learning #MachineLearning”

Ensemble learning with scikit-learn and XGBoost #machine learning

I often post about the topics of deep learning. But today I would like to post about ensemble learning. There are lots of documents describes Ensemble learning. And I think following document is very informative for me. Kaggle Ensembling Guide I interested one of the method, named ‘blending’. Regarding above URL, the merit of ‘blending’Continue reading “Ensemble learning with scikit-learn and XGBoost #machine learning”