Support Vector Machines (SVMs) are mathematical algorithms that are used in the field of machine learning to classify objects. In the area of text or image classification, they have advantages over neural networks because they can be trained more quickly and already deliver good results with a small amount of training data.
How do Support Vector Machines work?
Suppose we have data with two classes (blue, yellow) and two features (x, y). We want the SVM to decide whether the data object is classified as blue or yellow based on the features x and y. Since we have only two dimensions, we can map our training data into a coordinate system.
The Support Vector Machine delivers as a result the so-called hyperplane, which best separates the two groups. In two-dimensional space, this is a simple line. This plane is used to decide in which class a data object falls. In our example, all objects to the left of the hyperplane are classified as “yellow” and all to the right as “blue”.
The Support Vector Machine tries to select the hyperplane in different training runs so that the gap becomes maximum. This measures the distance from the nearest element of each group to the hyperplane. If this is maximum, it means that the plane was chosen so that the SVM separates both classes as much as possible.
How does it behave with Non-Linear Separable Data?
Unfortunately, real-world applications cannot always be separated by a simple line as cleanly as in our example. By changing the data set just a little bit, the problem directly becomes much more complex.
Although we can clearly distinguish the two sets of data by eye, they cannot be separated with a simple linear hyperplane. One way to use Support Vector Machines anyway is to introduce a new dimension and create it so that the data points are separable in a higher dimensional space by a hyperplane.
Since this can not only be very difficult in some cases but also becomes computationally very complex in most cases and unnecessarily inflates the algorithm, we will not discuss this alternative in more detail in this article.
How does the Kernel Trick for Support Vector Classification work?
Instead of being able to classify such non-linear data sets using new dimensions, Support Vector Machines use the so-called kernel trick instead. Our problem is that SVMs can only separate data classes using linear hyperplanes. Therefore, we need to modify the non-linear data so that it can also be separated using a linear context. To do this, we need to find a higher-dimensional space in which the data is linearly separable.
Mathematically, this corresponds to an optimization problem as we have already seen above. We want to find the hyperplane that has the maximum distance to the next data point from each class. If we have a non-linear data set, the so-called mapping function also appears in this optimization function. This maps each data object to the point in higher-dimensional space where the data is separable by a hyperplane.
The biggest problem is to find this mapping, which we need for the calculation of the optimization problem. Theoretically, there is an infinite number of functions that can solve this problem for us. However, we do not want our computer to have to calculate all these possibilities. Therefore, a mathematical theorem comes to our aid, namely the so-called Mercer’s Theorem.
In simple words, it says that we do not need to know the exact mapping to solve our optimization problem. It is enough if we know how to calculate the vectors of the data points with each other. For this operation, there are the so-called kernel functions (e.g. Gauss kernel or spectrum kernel). Any function that satisfies Mercer’s theorem is a kernel function and can be used instead of an explicit mapping. This makes it much easier for us to optimize non-linearly separable data.
What are the Advantages and Disadvantages of SVMs?
|Simple training of the model||Finding the right kernel function and parameters can be computationally intensive|
|Support Vector Machines scale well for higher-dimensional data||Support Vector Machines can not filter noise well|
|Good alternative to Neural Networks||Requires more records than number of features to work|
|Can be used for linear and non-linear separable data||No probability interpretation of the result (decides exclusively the side of the hyperplane)|
This is what you should take with you
- Support Vector Machines are machine learning algorithms for classifying data objects.
- SVMs try to find the best so-called hyperplane, which separates the data groups most clearly from each other.
- If the data is not separable with a linear element, for example, a straight line or a plane, we can use the so-called kernel trick.
- SVMs are good alternatives to neural networks in the classification of objects.
Explanation of Recurrent Neural Networks and LSTM models with example.
Other Articles on the Topic of SVMs
- Transformation of non-linear applications into higher dimensions using Monkey Learn as an example.