Reason for data normalization in ML Models
November 14, 2015 Leave a comment
Standardization/Normalization is a common requirement for majority of algorithms (except like ID3 impl of Trees) which transforms asymmetric training data into symmetric. ML Algorithms behave badly if the training data is not brought on to the same scale because of the noise/outliers or the non-guassian properties of features.
Types of normalization
- Z-transform: This is also called as Standardization.
- Normalization: This is also called Min-Max Scaling(based on min max values of the variable).
Apart from normalization/standardization techniques, other pre-processing methods to transform data from non-linear to linear can be logarithmic and square root scaling. These are usually performed when the data is characterized by “bursts”, i.e. the data is well grouped in low values, but some portion of it has relatively larger values.
Feature normalization is to make different features to the same. Illustration
|AcctID||Fico||Revolving_Balance||Num of Cards|
|Data point #1||10001||755||20000||5|
|Data point #2||10002||820||5000||2|
Features are Fico_score, Revolving_balance and Num_of_cards. Out of these features, one feature ‘Revolving_Balance’ is in 1000s scale, ‘Fico’ in 100s scale and ‘Num of Cards’ in 10s scale.
Now if we calculate distances between data points, since one of the dimensions have very large values, it overrides other dimensions (you can see above example, the distance contributed by number of cards would completely be nullified by the distance contributed by Revolving_Balance, if the data is not normalized)
The only models that does not care about rescaling of data is when we build the decision trees (like ID3, see the implementation here)
When to use which method? It is hard to say, we have to choose based on some experimentation.