Recently scikit-learn ver. 1.0(nightly build) is released. I often use sklearn for my ask. So I would like to use new version ;)
Current stable version is 0.24 so I installed 1.0.rc2 via pip.
$ pip install scikit-learn==1.0rc2
Here is a release Highlights and notes.
Ver 1.0 CalibrationDisplay method which can make calibration-curve plot easily. So I tested it with solubility data. As you know calibration is applied against classification model such as SVC. To use calibration, classifier object should have predict_proba method.
Calibration curve shows reliability of the model. To make the curve, binned data is used and X-axis should be average probability of test_dataset and Y-axis should be fraction of positive. If the model work perfectly, the curve will be straight line with gradient 1.
OK let’s test it. Most of code is same as previous my post. And I used RF and SVC for test. I uploaded my code on my gist.
The curve before calibration is below. RF model seems work better than SVC.
Then I calibrate these models and try to make calibration curve again. To perform model calibration, I used CalibratedClassifierCV which is implemented scikit-learn. SVC seems slightly improved after calibration. The curve is shown below.
Data visualization is important and useful for understanding my model/analysis.
Scikit-learn is really useful package for chemoinformatics. I’ll read documentation deeply and test them.