Useful ML package for cheminformatics #RDKit #cheminformatics #ML

As many readers know that scikit-learn is the one of useful python package for cheminformatics. However to use scikitk-learn in cheminformatics tasks user need to prepare data with other packages becuase scikit-learn doesn’t support chemicaldata handling. So is it nice if you can use chemical data in scikitlearn API? I think so. Fortunately there isContinue reading “Useful ML package for cheminformatics #RDKit #cheminformatics #ML”

Update Scikitlearn and visualize chemical space with rdkit #RDKit #Chemoinformatics #Scikitlearn

As most of readers know that new version of scikitlearn is released ;) There are lots of improvements are implemented. And you can see the details in original documenation.https://scikit-learn.org/stable/whats_new/v1.3.html One of interesting news is that scikit-learn v13 has implemented HDBSCAN. The origainal article is following link. And it implemented indpendend package of python.Article: https://link.springer.com/chapter/10.1007/978-3-642-37456-2_14Package: https://github.com/scikit-learn-contrib/hdbscanContinue reading “Update Scikitlearn and visualize chemical space with rdkit #RDKit #Chemoinformatics #Scikitlearn”

Probabilistic Random Forest approach to predict experimental value #RDKit #chemoinformatics #machine_learning

To build predictive model, input value(X) and target value(y) is required. But in the drug discovery area target value always has experimental error. So any experimental value (target value) may have uncertainly and it makes difficult to build predictive model. Recently Ola Engkvist group who is in AZ published interesting article in Jounral of chemoinformatics.Continue reading “Probabilistic Random Forest approach to predict experimental value #RDKit #chemoinformatics #machine_learning”

Make original sklearn classifier-2 #sklearn #chemoinfo

After posted ‘Make original sklearn classifier’, I could get comment from my follower @yamasaKit_-san and @kzfm-san. (Thanks!) So I checked diversity of models with principal component analysis(PCA).The example is almost same as yesterday but little bit different at last part.Last part of my code is below. Extract feature importances from L1 layer classifiers and mono-randomContinue reading “Make original sklearn classifier-2 #sklearn #chemoinfo”

Ensemble learning with scikit-learn and XGBoost #machine learning

I often post about the topics of deep learning. But today I would like to post about ensemble learning. There are lots of documents describes Ensemble learning. And I think following document is very informative for me. Kaggle Ensembling Guide I interested one of the method, named ‘blending’. Regarding above URL, the merit of ‘blending’Continue reading “Ensemble learning with scikit-learn and XGBoost #machine learning”

Make predictive models with small data and visualize it #Chemoinformatics

I enjoyed chemoinformatics conference held in Kumamoto in this week. The first day of the conference, I could hear about very interesting lecture. That was very basic data handling and visualization tutorial but useful for newbie of chemoinformatics. I would like to reproduce the code example, so I tried it. First, visualize training data. ItContinue reading “Make predictive models with small data and visualize it #Chemoinformatics”