Make original sklearn classifier-2 #sklearn #chemoinfo

After posted ‘Make original sklearn classifier’, I could get comment from my follower @yamasaKit_-san and @kzfm-san. (Thanks!)

So I checked diversity of models with principal component analysis(PCA).
The example is almost same as yesterday but little bit different at last part.
Last part of my code is below. Extract feature importances from L1 layer classifiers and mono-random forest classifier. And then conduct PCA.

labels = ["rf", "et", "gbc", "xgbc", "mono_rf"]
feature_importances_list = [clf.feature_importances_ for clf in blendclf.l1_clfs_]
feature_importances_list.append(mono_rf.feature_importances_)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
res = pca.fit_transform(feature_importances_list)

Then plot results with matplotlib. And I used adjustText library for plot labeling. adjustText is powerful package for making nice view of labeled plot.

from adjustText import adjust_text
x, y = res[:,0], res[:,1]
plt.plot(x, y, 'bo')
plt.xlim(-0.1, 0.3)
plt.ylim(-0.05, 0.1)

texts = [plt.text(x[i], y[i], '{}'.format(labels[i])) for i in range(len(labels))]
adjust_text(texts)

The plot indicates that two models ET, RF learned similar feature importances to mono Randomforest but XGBC and GBC learned different feature importance. So, combination of ET and RF is redundant for Ensemble learning I think.

To build good ensemble model, I need to think about combination of models and how to tune many hyper parameters.

You can check whole code from the URL below.
http://nbviewer.jupyter.org/github/iwatobipen/skensemble/blob/master/solubility2.ipynb

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.