What is the MinMax Scaler?

The MinMax Scaler is a variant to normalize the training data of a Machine Learning model, i.e. to bring the numerical values to a uniform scale. This leads to the model learning and converging faster since the gradient changes uniformly and does not make large jumps due to the different scales.

What does normalization mean in Deep Learning?

Normalization of data means that all features of the model are brought to a uniform scale. For this purpose, the range between 0 and 1 or between -1 and 1 is usually chosen. In reality, numerical features usually have different scales as well. For example, if we take the age of a person and his or her salary, then a period of 100 years is very high, while a monthly salary of €100 is relatively low.

Normalization is also only used when the underlying data does not follow a Gaussian normal distribution. Thus, if one assumes that salary and age are normally distributed, one should not perform normalization.

Why normalize data before model training?

Normalization of data has many positive properties on the training of Machine Learning models. However, care should be taken to use it when the model used does not assume a normal distribution of the data, as is the case with a neural network, for example. If, on the other hand, models such as LDA, Gaussian Naive Bayes, or Logistic Regression are used, normalization should be dispensed with and standardization used instead.

The following advantages result from the normalization of data:

Using a uniform scale shows no bias in the model towards large numerical values. Otherwise, features that naturally have higher numbers could contribute more to model prediction than others.
Training is more uniform because there are no large jumps in numbers that could lead to irregularities. This allows higher learning rates to be used and speeds up training.
Normalization can also reduce the risk of Internal Covariate Shift. Internal Covariate Shift refers to the phenomenon when the hidden layers of a neural network respond to a change in the distribution of input values. As a result, the weights in the layers change very strongly and the model does not converge.

What is the difference between standardization and normalization?

Normalization of data describes the process of bringing numerical values to a uniform scale, for example, to the range between 0 and 1 or between -1 and 1. Normalization should be used primarily when the underlying data does not follow a normal distribution.

Das Diagramm zeigt mehrere Glockenkurven der Gauß-Verteilung. — Different Forms of Normal Distribution | Source: Wikipedia

Standardization, while in many cases also causing the values to lie on a uniform scale, actually has the goal of changing the distribution of the values so that they have a mean of 0 and a standard deviation of 1. Standardization is thus used so that all numerical input values follow an equal distribution.

The normalization is strongly influenced by outliers, i.e. data points that take on significantly larger or smaller values than the surrounding data points. Due to the scaling into uniform values, the remaining values are very close to each other and take on very similar values. This makes it almost impossible to distinguish between these values. Therefore, outliers should be removed from the data set before normalization.

Standardization, on the other hand, is little or not at all affected by outliers. Because there is no scaling within fixed limits, the outliers can be located at the outer ends of the normal distribution. Thus, the information on the outliers is also preserved in the model.

Depending on the selected model to be trained, it also results in whether the data can be standardized or normalized. Some models require a normal distribution of data, which is why only standardization should be used for them. These include for example, LDA, Gaussian Naive Bayes, or Logistic Regression. For neural networks, on the other hand, normalization can be used since no distribution of data is assumed.

How does the MinMax Scaler work?

The MinMax Scaler is a form of normalization that scales the values between 0 and 1. It gets its name because the maximum and minimum values of the feature are used for normalization. The concrete formal of the MinMax Scaler is:

\(\) \[x_{\text{scaled}} = \frac{x – x_{\text{min}}}{x_{\text{max}} – x_{\text{min}}}\]

MinMax Scaler vs. Standard Scaler

In practice, the question often arises whether to use the MinMax Scaler or the Standard Scaler. Although both are called scalers, the MinMax Scaler is a normalization and the Standard Scaler is a standardization. Thus, both have different areas of application, since the MinMax Scaler brings the values to a uniform scale, while the Standard Scaler brings about a normal distribution of the data.

Thus, the use of either method depends on the model being trained and whether normalization or standardization of the data is to be performed.

What are the advantages and disadvantages of the MinMax Scaler?

The MinMax Scaler is a popular technique for scaling properties in a data set to a fixed range of values. The advantages of this scaling technique include

Simplicity: Implemented in common machine learning libraries, such as scikit-learn, the scaler can be readily loaded and used with just a few lines of code.
Retention of relationships: The definition of the MinMax scaler ensures that the order of the data points is preserved. In addition, the relative distance of the data points is also largely preserved, making it a popular method for machine learning models, as the information content of the data is retained.
Compatibility with distance-based algorithms: The MinMax Scaler is particularly popular with distance-based algorithms, such as k-nearest neighbors or clustering, as the similarity between the data points is preserved.
Maintains interpretability: The MinMax Scaler does not change the units and range of the original data, so the new data points can be easily interpreted and understood. Depending on the application, interpretability plays a crucial role. This is where the MinMax Scaler comes into its own.

In addition to these arguments for using the MinMax Scaler, however, the disadvantages of the scaling method should also be considered before use and adapted to the respective application:

Sensitivity to outliers: Outliers can greatly affect the performance of the MinMax Scaler as they play directly into the calculation of the values. If an extreme outlier is added to a data set, the values of the scaling can change significantly, so that the majority of the data points may be very close to each other.
Limited range: The values of the MinMax scaler are always in the range between 0 and 1. Depending on the application, this data distribution may be unsuitable for effective analysis and a larger data range, such as between -1 and 1, may be required.
Effects on the data distribution: With skewed distributions or extreme values, the distribution of the original data may be changed after scaling. This can have a significant impact on the prediction performance of certain algorithms.
Dependence on the data range: It should also be noted that the performance of the MinMax scaler also depends on the data range of the original data. If the data points are already in a narrow data range, the MinMax Scaler may not be able to adequately capture the variations.

Knowing these advantages and disadvantages helps to make the decision for or against using the MinMax Scaler. This choice should be made depending on the specific characteristics of the data set and attributes.

How to use the MinMax Scaler in Python?

Because the MinMax Scaler is included in the Scikit-Learn machine learning library, this scaling method can be used very easily in Python. In this section, we will take a closer look at the individual steps with sample data.

Import the corresponding function from Scikit-Learn.

Create sample data and an instance of the MinMax scaler, which you store under the name “scaler”.

The scaler must then be adapted to the corresponding data.

In our example, the data has a two-dimensional structure and is stored in a list of lists.

4. The data can now be scaled using “transform”.

The new values are calculated based on the adjusted scaler. With the help of “fit_transform”, the adjustment to the data and the actual scaling can be combined in one step.

As you can see, each column of the data set is scaled based on the respective minimum and maximum values. When training a machine learning model, it is important that the scaler is only adapted to the training data and then applied to both the training and test data. This ensures the consistency of the scaling process.

With the help of “inverse_transform”, the scaled values can be converted back into the original values. This can be useful, for example, for interpreting the original data range.

With the help of these few steps, data can be transformed in Python with the MinMax Scaler. This is an important preparatory step in order to be able to pass the data set on to a machine learning model later.

This is what you should take with you

The MinMax Scaler is a popular feature scaling technique in Machine Learning.
It scales the features of a dataset to a specific range, typically between 0 and 1.
The main advantage of the MinMax Scaler is that it preserves the shape of the original distribution while bringing the values within a desired range.
It is especially useful when the data has varying scales or outliers.
The MinMax Scaler is easy to use, thanks to the scikit-learn library in Python.
However, it has some limitations, such as sensitivity to outliers and the potential for information loss.
It is important to apply the MinMax Scaler properly, fitting it only on the training data and using the same scaler to transform the testing data.