Python package for Ensemble learning #Chemoinformatics #Scikit learn

Ensemble learning is a technique for machine learning. I wrote post about blending learning before. URL is below.
I implemented the code by myself at that time.

Ensemble learning sometime outperform than single model. So it is useful for try to use the method. Fortunately now we can use ensemble learning very easily by using a python package named ‘ML-Ens‘ Installation is very easy, only use pip command common way for pythonista I think ;)

After installing the package user can build and train ensemble learning model with few lines. I would like to introduce two example of them one is stacking method and the other is a blending method. OK let’s go to code.

At first, load dataset and make input features. I used morgan fingerprint as input data.

from rdkit import Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import AllChem
from rdkit import RDPaths
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import PandasTools
import numpy as np
import pandas as pd
from IPython.display import HTML
traindf = PandasTools.LoadSDF(os.path.join(RDPaths.RDDocsDir,'Book/data/solubility.train.sdf'))
testdf = PandasTools.LoadSDF(os.path.join(RDPaths.RDDocsDir, 'Book/data/solubility.test.sdf'))
# Chek data

cls2lab = {'(A) low':0, '(B) medium':1, '(C) high':2}

def fp2np(fp):
    arr = np.zeros((0,))
    DataStructs.ConvertToNumpyArray(fp, arr)
    return arr
trainfp = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in traindf.ROMol]
testfp =  [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in testdf.ROMol]
trainX = np.array([fp2np(fp) for fp in trainfp])
testX = np.array([fp2np(fp) for fp in testfp])
trainY = np.array([cls2lab[i] for i in traindf.SOL_classification.to_list()])
testY =  np.array([cls2lab[i] for i in testdf.SOL_classification.to_list()])

Then import several package for ensemble learning. SuperLearner is class for stacking and BlendEnsemble is class for blending.

Making ensemble model is easy. Just use add method to layer addition and finally call add_meta method for adding final prediction layer.

from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import r2_score, accuracy_score
from sklearn.svm import SVR, SVC

# For stacnking
ensemble = SuperLearner(scorer=accuracy_score, random_state=794, verbose=2)
ensemble.add([RandomForestClassifier(n_estimators=100, random_state=794), SVC(gamma='auto', C=1000)])
ensemble.add_meta(LogisticRegression(solver='lbfgs', multi_class='auto')), trainY)
pred = ensemble.predict(testX)
accuracy_score(testY, pred)

# Blending
from mlens.ensemble import BlendEnsemble
ensemble2 = BlendEnsemble(scorer=accuracy_score, test_size=0.2, verbose=2)
ensemble2.add([RandomForestClassifier(n_estimators=794, random_state=794),
ensemble2.add_meta(LogisticRegression(solver='lbfgs', multi_class='auto')), trainY)
pred_b = ensemble2.predict(testX)
accuracy_score(pred_b, testY)

Also more models can added with add method. I uploaded whole code on my gist. After calling fit, it is easy to access result data by using data method.

code example

Unfortunately the ensemble models described in the post don’t outperform single random forest model but mlens is nice tool for ensemble learning there is still room of improvement for model performance such as kind of models, hyper parameters etc.

Original document give more informations. Please go to link if reader has interest.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: