What is Binning?

 In real cases, the data are huge and contains unwanted amount of meaningless data which does not help in anyway to develop meaningful machine learning algorithm. Thus in such cases we need to smoothen the data in order to get meaningful algorithm. One of the process of smoothing the data is called binning. There are basically two types of datacategorical and continuous data. Binning is the process of converting continuous data into categorical data or discrete data. 

Binning or discretization is the process of transforming numerical variables into categorical counterparts.

Binning method for data smoothing – 

Here, we are need the Binning method for data smoothing. In this method the data is first categorized and grouped and then the sorted data are put together into a number of buckets or bins. As binning methods consult the neighborhood of values, they perform local smoothing.

How to perform smoothing on the data?

There are three approaches to perform smoothing –
  1. Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
  2. Smoothing by bin median : In this method each bin value is replaced by its bin median value.
  3. Smoothing by bin boundary : In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.
Sorted data for price(in dollar) : 2, 6, 7, 9, 13, 20, 21, 25, 30
Partition using equal frequency approach:
Bin 1 : 2, 6, 7
Bin 2 : 9, 13, 20
Bin 3 : 21, 24, 30

Smoothing by bin mean :
Bin 1 : 5, 5, 5
Bin 2 : 14, 14, 14
Bin 3 : 25, 25, 25

Smoothing by bin median :
Bin 1 : 6, 6, 6
Bin 2 : 13, 13, 13
Bin 3 : 24, 24, 24

Smoothing by bin boundary :
Bin 1 : 2, 7, 7
Bin 2 : 9, 9, 20
Bin 3 : 21, 21, 30

Binning can also be used as a discretization technique. Here discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals.
For example, attribute values can be discretized by applying equal-width or equal-frequency binning, and then replacing each bin value by the bin mean or median, as in smoothing by bin means or smoothing by bin medians, respectively. Then the continuous values can be converted to a nominal or discretized value which is same as the value of their corresponding bin.

What is Binning? What is Binning? Reviewed by Mihir Jha on January 24, 2020 Rating: 5

No comments:

Powered by Blogger.

Search This Blog

Blog Archive

About Me

My photo
Myself Mihir Kumar Jha hailing from Bangalore, India, an electrical and electronics engineer by degree, an software engineer by profession , a physicist by luck and lastly a creative website developer by choice.