PLS Regression using Scikit-learn

Today, I tried to build PLS regression model using scikit-learn.
I got data from this link .
Training data is “solubility.train.sdf”, and test data is “solubility.test.sdf” .

Then, let’s try .

#! /usr/bin/python
from sklearn.cross_decomposition import PLSCanonical, PLSRegression
from sklearn import metrics
import numpy as np

from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
nms = [ x[0] for x in Descriptors._descList ]
def calculator( mols ):
    calc = MoleculeDescriptors.MolecularDescriptorCalculator( nms )
    res = [ calc.CalcDescriptors( mol ) for mol in mols ]
    return res

trainMols = [ mol for mol in Chem.SDMolSupplier("solubility.train.sdf") ]
testMols =  [ mol for mol in Chem.SDMolSupplier("solubility.test.sdf") ]

trainDescrs = calculator( trainMols )
testDescrs = calculator( testMols )

trainActs = np.array([ float( mol.GetProp('SOL') ) for mol in trainMols  ])
testActs = np.array([ float( mol.GetProp('SOL') ) for mol in testMols  ])

pls2 = PLSRegression( n_components = 15 ) trainDescrs, trainActs )

sol_pred = pls2.predict( testDescrs )
print metrics.r2_score( testActs, sol_pred )

$ python

It was easy to build regression model using sklearn.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s