I often post about the topics of deep learning. But today I would like to post about ensemble learning.
There are lots of documents describes Ensemble learning. And I think following document is very informative for me.
I interested one of the method, named ‘blending’.
Regarding above URL, the merit of ‘blending’ are …
Blending has a few benefits:
It is simpler than stacking.
It wards against an information leak: The generalizers and stackers use different data.
You do not need to share a seed for stratified folds with your teammates. Anyone can throw models in the ‘blender’ and the blender decides if it wants to keep that model or not.
There are two layers in blending. First layer is a set of multiple classifiers that is trained with training data. And second layer is a classifier that is trained with output of the test set of the first layer.
I tried to write a code for blending. Following code I used scikit-learn and XGBoost.
First, import libraries and define the dictionary for my conveniense.
import click import numpy as np from sklearn.model_selection import StratifiedKFold from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import GradientBoostingClassifier from sklearn.svm import SVC from sklearn.pipeline import Pipeline from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from xgboost import XGBClassifier l1_clf_dict = {'RF': RandomForestClassifier(n_estimators=100), 'ETC': ExtraTreesClassifier(n_estimators=100), 'GBC': GradientBoostingClassifier(learning_rate=0.05), 'XGB': XGBClassifier(n_estimators=100), 'SVC': SVC(probability=True, gamma='auto')} l2_clf_dict = {'RF': RandomForestClassifier(n_estimators=100), 'ETC': ExtraTreesClassifier(n_estimators=100), 'GBC': GradientBoostingClassifier(learning_rate=0.05), 'XGB': XGBClassifier(n_estimators=100), 'SVC': SVC(probability=True, gamma='auto')}
Then defined model build function. Following code can be applied multiple classification problem.
The code seems a little bit complicated and it can only return set of trained classifiers. I would like to improve the code in the near the future.
@click.command() @click.option('--l1', default='all', type=str, help='models of first layers input format is csv. RF/ETC/GBC/XGB/SVC') @click.option('--l2', default='XGB', type=str, help='model of second layer default is XGB') @click.option('--nfolds', default=10, type=int, help='number of KFolds default is 10') @click.option('--traindata', default='train.npz', type=str, help='data for training') def buildmodel(l1, l2, nfolds, traindata): skf = StratifiedKFold(nfolds) dataset = np.load(traindata)['arr_0'] X = dataset[:,:-1] y = dataset[:,-1] idx = np.random.permutation(y.size) X = X[idx] y = y[idx] num_cls = len(set(y)) train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=794) if l1 == 'all': l1 = list(l1_clf_dict.keys()) clfs = list(l1_clf_dict.values()) else: clfs = [l1_clf_dict[clf] for clf in l1.split(',')] dataset_blend_train = np.zeros((train_X.shape[0], len(clfs), num_cls )) dataset_blend_test = np.zeros((test_X.shape[0], len(clfs), num_cls )) for j, clf in enumerate(clfs): dataset_blend_test_j = np.zeros((test_X.shape[0], nfolds, num_cls)) for i, (train, val) in enumerate(skf.split(train_X, train_y)): print('fold {}'.format(i)) X_train = train_X[train] y_train = train_y[train] X_val = train_X[val] y_val = train_y[val] clf.fit(X_train, y_train) # use clfs predicted value for next layer's training y_pred = clf.predict_proba(X_val) dataset_blend_train[val, j, :] = y_pred dataset_blend_test_j[:, i, :] = clf.predict_proba(test_X) dataset_blend_test[:, j, :] = dataset_blend_test_j.mean(1) l2_clf = l2_clf_dict[l2] print('Blending') print(dataset_blend_train.shape) dataset_blend_train = dataset_blend_train.reshape((dataset_blend_train.shape[0], -1)) l2_clf.fit(dataset_blend_train, train_y) dataset_blend_test = dataset_blend_test.reshape((dataset_blend_test.shape[0], -1)) y_pred = l2_clf.predict_proba(dataset_blend_test) y_pred print(classification_report(test_y, np.argmax(y_pred, 1))) print(confusion_matrix(test_y, np.argmax(y_pred, 1))) print("*"*50) for i, key in enumerate(l1): print('layer 1 {}'.format(l1[i])) l1_pred = clfs[i].predict_proba(test_X) print(classification_report(test_y, np.argmax(l1_pred, 1))) print(confusion_matrix(test_y, np.argmax(l1_pred, 1))) print("*"*50) return (clfs, l2_clf) if __name__=='__main__': buildmodel()
Now I just finished making base code.
Let’s make sample code and run it.
# make data import numpy as np from sklearn.datasets import load_iris x = load_iris().data y = load_iris().target data = np.hstack((x, y.reshape(y.size, 1))) np.savez('train.npz', data)
Run the code :)
iwatobipen$ python blending.py train.npz fold 0 fold 1 fold 2 --snip-- fold 7 fold 8 fold 9 Blending (120, 5, 3) precision recall f1-score support 0.0 1.00 1.00 1.00 13 1.0 1.00 0.83 0.91 6 2.0 0.92 1.00 0.96 11 micro avg 0.97 0.97 0.97 30 macro avg 0.97 0.94 0.96 30 weighted avg 0.97 0.97 0.97 30 [[13 0 0] [ 0 5 1] [ 0 0 11]] ************************************************** layer 1 RF precision recall f1-score support 0.0 1.00 1.00 1.00 13 1.0 0.86 1.00 0.92 6 2.0 1.00 0.91 0.95 11 micro avg 0.97 0.97 0.97 30 macro avg 0.95 0.97 0.96 30 weighted avg 0.97 0.97 0.97 30 [[13 0 0] [ 0 6 0] [ 0 1 10]] ************************************************** layer 1 ETC precision recall f1-score support 0.0 1.00 1.00 1.00 13 1.0 0.71 0.83 0.77 6 2.0 0.90 0.82 0.86 11 micro avg 0.90 0.90 0.90 30 macro avg 0.87 0.88 0.88 30 weighted avg 0.91 0.90 0.90 30 [[13 0 0] [ 0 5 1] [ 0 2 9]] ************************************************** layer 1 GBC precision recall f1-score support 0.0 1.00 1.00 1.00 13 1.0 0.75 1.00 0.86 6 2.0 1.00 0.82 0.90 11 micro avg 0.93 0.93 0.93 30 macro avg 0.92 0.94 0.92 30 weighted avg 0.95 0.93 0.93 30 [[13 0 0] [ 0 6 0] [ 0 2 9]] ************************************************** layer 1 XGB precision recall f1-score support 0.0 1.00 1.00 1.00 13 1.0 0.75 1.00 0.86 6 2.0 1.00 0.82 0.90 11 micro avg 0.93 0.93 0.93 30 macro avg 0.92 0.94 0.92 30 weighted avg 0.95 0.93 0.93 30 [[13 0 0] [ 0 6 0] [ 0 2 9]] ************************************************** layer 1 SVC precision recall f1-score support 0.0 1.00 1.00 1.00 13 1.0 1.00 1.00 1.00 6 2.0 1.00 1.00 1.00 11 micro avg 1.00 1.00 1.00 30 macro avg 1.00 1.00 1.00 30 weighted avg 1.00 1.00 1.00 30 [[13 0 0] [ 0 6 0] [ 0 0 11]] **************************************************
All classifiers in the first layer showed good performance. This is very simple case and very small data to estimate of merit of the blending. I will check another data from now.
Ref.
http://www.chioka.in/stacking-blending-and-stacked-generalization/
Great example of blending, although I believe blending is stacking – one and the same.
Thankfully, modern sklearn provides a stacking classifier we can use to directly blend and stack models. Using xgboost as the meta model works very well.
Thanks for your information I will try to use it ;)
Also thanks for sharing cool ML contents!