In real cases, the data are huge and contains unwanted amount of meaningless data which does not help in anyway to develop meaningful machine learning algorithm. Thus in such cases we need to smoothen the data in order to get meaningful algorithm. One of the process of smoothing the data is called binning. There are basically two types of data, categorical and continuous data. Binning is the process of converting continuous data into categorical data or discrete data.
Binning or discretization is the process of transforming numerical variables into categorical counterparts.
Binning method for data smoothing –
Here, we are need the Binning method for data smoothing. In this method the data is first categorized and grouped and then the sorted data are put together into a number of buckets or bins. As binning methods consult the neighborhood of values, they perform local smoothing.
How to perform smoothing on the data?
There are three approaches to perform smoothing –
- Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
- Smoothing by bin median : In this method each bin value is replaced by its bin median value.
- Smoothing by bin boundary : In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.
Sorted data for price(in dollar) : 2, 6, 7, 9, 13, 20, 21, 25, 30
Partition using equal frequency approach: Bin 1 : 2, 6, 7 Bin 2 : 9, 13, 20 Bin 3 : 21, 24, 30 Smoothing by bin mean : Bin 1 : 5, 5, 5 Bin 2 : 14, 14, 14 Bin 3 : 25, 25, 25 Smoothing by bin median : Bin 1 : 6, 6, 6 Bin 2 : 13, 13, 13 Bin 3 : 24, 24, 24 Smoothing by bin boundary : Bin 1 : 2, 7, 7 Bin 2 : 9, 9, 20Bin 3 : 21, 21, 30
No comments: