Binning data

Now I am trying to build new SAR expansion method. And to do that, I want to convert a dataset from continues to binned one.
I searched google and found some method to achieve that. One is using Numpy and the other one is using Pandas.
I thought using pandas is more efficient way.
Basic example is following.
Numpy has digitize method. The method returns indices of the bins to which each value in input array belongs.

# using numpy
import numpy as np
x = np.array( [ 0.2, 6.4, 3.0, 1.6 ] )
bins = np.array( [ 0.0,1.0, 2.5, 4.0,10.0 ] )
inds = np.digitize( x,bins )
print( inds )
Out[91]:
 array([1, 4, 3, 2])

Next example is using Pandas.
Pandas has cut function.

cut method can handle labels argument to return results as labels.

# using pandas
import numpy as np
import pandas as pd
x = np.array( [ 0.2, 6.4, 3.0, 1.6 ] )

res=pd.cut( x,5 )
res2=pd.cut( x,5, labels=['a','b','c','d','e'] )

print( res )
Out[86]:
[(0.194, 1.44], (5.16, 6.4], (2.68, 3.92], (1.44, 2.68]]
Categories (5, object): [(0.194, 1.44] < (1.44, 2.68] < (2.68, 3.92] < (3.92, 5.16] < (5.16, 6.4]]

print( res2 )
Out[89]:
[a, e, c, b]
Categories (5, object): [a < b < c < d < e]

That’s all.

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: