Now I am trying to build new SAR expansion method. And to do that, I want to convert a dataset from continues to binned one.
I searched google and found some method to achieve that. One is using Numpy and the other one is using Pandas.
I thought using pandas is more efficient way.
Basic example is following.
Numpy has digitize method. The method returns indices of the bins to which each value in input array belongs.
# using numpy import numpy as np x = np.array( [ 0.2, 6.4, 3.0, 1.6 ] ) bins = np.array( [ 0.0,1.0, 2.5, 4.0,10.0 ] ) inds = np.digitize( x,bins ) print( inds ) Out[91]: array([1, 4, 3, 2])
Next example is using Pandas.
Pandas has cut function.
cut method can handle labels argument to return results as labels.
# using pandas import numpy as np import pandas as pd x = np.array( [ 0.2, 6.4, 3.0, 1.6 ] ) res=pd.cut( x,5 ) res2=pd.cut( x,5, labels=['a','b','c','d','e'] ) print( res ) Out[86]: [(0.194, 1.44], (5.16, 6.4], (2.68, 3.92], (1.44, 2.68]] Categories (5, object): [(0.194, 1.44] < (1.44, 2.68] < (2.68, 3.92] < (3.92, 5.16] < (5.16, 6.4]] print( res2 ) Out[89]: [a, e, c, b] Categories (5, object): [a < b < c < d < e]
That’s all.