Support Vector Machines (SVMs) are mathematical algorithms that are used in the field of machine learning to classify objects. In the area of text or image classification, they have advantages over neural networks because they can be trained more quickly and already deliver good results with a small amount of training data.
How do Support Vector Machines work?
Suppose we have data with two classes (blue, yellow) and two features (x, y). We want the SVM to decide whether the data object is classified as blue or yellow based on the features x and y. Since we have only two dimensions, we can map our training data into a coordinate system.
The Support Vector Machine delivers as a result the so-called hyperplane, which best separates the two groups. In two-dimensional space, this is a simple line. This plane is used to decide in which class a data object falls. In our example, all objects to the left of the hyperplane are classified as “yellow” and all to the right as “blue”.
The Support Vector Machine tries to select the hyperplane in different training runs so that the gap becomes maximum. This measures the distance from the nearest element of each group to the hyperplane. If this is the maximum, it means that the plane was chosen so that the SVM separates both classes as much as possible.
How does it behave with Non-Linear Separable Data?
Unfortunately, real-world applications cannot always be separated by a simple line as cleanly as in our example. By changing the data set just a little bit, the problem directly becomes much more complex.
Although we can clearly distinguish the two sets of data by eye, they cannot be separated with a simple linear hyperplane. One way to use Support Vector Machines anyway is to introduce a new dimension and create it so that the data points are separable in a higher dimensional space by a hyperplane.
Since this can not only be very difficult in some cases but also becomes computationally very complex in most cases and unnecessarily inflates the algorithm, we will not discuss this alternative in more detail in this article.
How does the Kernel Trick for Support Vector Classification work?
Instead of being able to classify such non-linear data sets using new dimensions, Support Vector Machines use the so-called kernel trick instead. Our problem is that SVMs can only separate data classes using linear hyperplanes. Therefore, we need to modify the non-linear data so that it can also be separated using a linear context. To do this, we need to find a higher-dimensional space where the data is linearly separable.
Mathematically, this corresponds to an optimization problem as we have seen above. We want to find the hyperplane with the maximum distance to the next data point from each class. If we have a non-linear data set, the so-called mapping function also appears in this optimization function. This maps each data object to the point in higher-dimensional space where the data is separable by a hyperplane.
The biggest problem is to find this mapping, which we need for the calculation of the optimization problem. Theoretically, there is an infinite number of functions that can solve this problem for us. However, we do not want our computer to have to calculate all these possibilities. Therefore, a mathematical theorem comes to our aid, namely the so-called Mercer’s Theorem.
In simple words, it says that we do not need to know the exact mapping to solve our optimization problem. It is enough if we know how to calculate the vectors of the data points with each other. For this operation, there are the so-called kernel functions (e.g. Gauss kernel or spectrum kernel). Any function that satisfies Mercer’s theorem is a kernel function and can be used instead of explicit mapping. This makes it much easier for us to optimize non-linearly separable data.
How to evaluate the performance of SVM?
The performance of a Support Vector Machine (SVM) can be evaluated using various evaluation metrics, including:
- Accuracy: The proportion of correctly classified samples out of the total number of samples in the dataset.
- Precision: The proportion of true positive samples (correctly predicted positive) out of the total number of predicted positive samples.
- Recall: The proportion of true positive samples (correctly predicted positive) out of the total number of actual positive samples.
- F1 score: The harmonic mean of precision and recall, which gives equal weightage to both metrics.
- Confusion matrix: A matrix that shows the number of true positives, true negatives, false positives, and false negatives for a binary classification problem.
- ROC curve: A graphical representation of the true positive rate (sensitivity) vs. false positive rate (1-specificity) at different classification thresholds.
- Precision-Recall curve: A graphical representation of the precision vs. recall at different classification thresholds.
The choice of evaluation metric depends on the specific problem and the cost of different types of errors. For example, in a medical diagnosis problem, false negatives (missing a disease) may be more costly than false positives (diagnosing a disease when it is not present), and hence recall may be a more relevant metric.
It is also important to use cross-validation techniques such as k-fold cross-validation to ensure that the model’s performance is not overfitting the training data. This involves dividing the data into k folds, training the model on k-1 folds, and testing it on the remaining fold. The process is repeated k times, and the performance metrics are averaged over the k iterations.
What are the Advantages and Disadvantages of SVMs?
|Simple training of the model||Finding the right kernel function and parameters can be computationally intensive|
|Support Vector Machines scale well for higher-dimensional data||Support Vector Machines can not filter noise well|
|Good alternative to Neural Networks||Requires more records than number of features to work|
|Can be used for linear and non-linear separable data||No probability interpretation of the result (decides exclusively the side of the hyperplane)|
What are the different types of Support Vector Machines?
There are mainly three types of support vector machines (SVMs):
- Linear SVMs: Linear SVMs are used for classification tasks where the data can be separated by a straight line or a hyperplane in a high-dimensional space. Linear SVMs are computationally efficient and can handle large datasets, but may not be suitable for datasets with complex or nonlinear relationships between the inputs and outputs.
- Nonlinear SVMs: Nonlinear SVMs are used for classification tasks where the data cannot be separated by a straight line or a hyperplane in a high-dimensional space. Nonlinear SVMs use kernel functions to transform the data into a higher-dimensional space where it can be separated by a hyperplane. Examples of kernel functions include polynomial, radial basis function (RBF), and sigmoid functions.
- Support Vector Regression (SVR): SVR is used for regression tasks where the goal is to predict a continuous output variable rather than a discrete class label. SVR works by finding a hyperplane that minimizes the distance between the predicted outputs and the actual outputs. SVR can handle both linear and nonlinear relationships between the inputs and outputs and can be used with different kernel functions.
Each type of SVM has its own strengths and weaknesses, and the choice of SVM depends on the nature of the problem and the characteristics of the data. Linear SVMs are useful when the data is separable by a straight line, and when computational complexity is a concern. Nonlinear SVMs are useful when the data is not separable by a straight line, and when the relationships between the inputs and outputs are complex or nonlinear. SVR is useful when the goal is to predict a continuous output variable, and when the relationships between the inputs and outputs are not straightforward.
How to implement a Support Vector Machine in Python?
To implement SVM in Python, you can use the scikit-learn library, which provides an easy-to-use and well-documented SVM implementation. Below is a step-by-step guide to implementing it in Python:
Import the required libraries:
from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.metrics import accuracy_score
Lade den Datensatz, den Du verwenden möchtest. Scikit-learn bietet mehrere integrierte Datensätze, die für Experimente verwenden werden können. Zum Beispiel den Iris-Datensatz, der eine Unterscheidung von verschieden Blütenformen beinhaltet:
Load the dataset you want to use. Scikit-learn provides several built-in datasets that can be used for experiments. For example, the Iris dataset includes a distinction of different flower shapes:
iris = datasets.load_iris() X = iris.data y = iris.target
Divide the data set into a training set and a test set. It is common to use a 70/30 or 80/20 split for training and testing:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Instantiate an SVM object with the desired hyperparameters. The most important hyperparameter is the kernel, which determines the type of decision boundary the SVM should learn. The other hyperparameters control the strength of the regularization and other aspects of the model:
svm = SVC(kernel='linear', C=1)
Train the SVM with the training data:
Prediction of the labels of the test data:
y_pred = svm.predict(X_test)
Evaluate SVM performance using a performance metric such as accuracy:
acc = accuracy_score(y_test, y_pred) print("Accuracy:", acc)
Note that this is a very simple example and that there are many more advanced techniques for tuning hyperparameters and optimizing the performance of an SVM. However, this example should provide a good starting point for implementing SVM in Python.
This is what you should take with you
- Support Vector Machines are machine learning algorithms for classifying data objects.
- SVMs try to find the best so-called hyperplane, which separates the data groups most clearly from each other.
- If the data is not separable with a linear element, for example, a straight line or a plane, we can use the so-called kernel trick.
- SVMs are good alternatives to neural networks in the classification of objects.
What is the Stochastic Gradient Descent?
Explanation of the Stochastic Gradient Descent in comparison to the conventional gradient method.
What is the Softmax-Function?
Discover softmax function in Machine Learning: implementation in Python (NumPy, TensorFlow), probabilistic classification. Read more!
What is the MinMax Scaler?
Learn how to use the MinMax Scaler in Python for feature scaling. Scale your data to a desired range for better machine learning results.
What is a Knowledge Graph?
Learn about knowledge graphs and how they organize complex information in a meaningful way. Discover the benefits of knowledge graphs now.
Q-Learning – easily explained!
Learn how to train agents to make optimal decisions with Q-Learning in reinforcement learning. Discover its applications and algorithms.
What is ReLU-function (Rectified Linear Unit)?
Learn how to improve neural network performance with ReLU activation function. Check out our guide to ReLU and its features.
What is the Dropout Layer?
Learn about the Dropout Layer in neural networks, a technique used for regularization and preventing overfitting.
ResNet: Residual Neural Networks -easily explained!
Discover the power of ResNet: a deep learning neural network architecture for image recognition. Learn about ResNet in this comprehensive guide.
What is the Curse of Dimensionality?
Learn about the curse of dimensionality and its impact on data analysis. Explore techniques to overcome it. Read more now.
What is Batch Normalization?
Discover the power of batch normalization in machine learning. Learn how this technique can improve model performance and convergence rates.
Other Articles on the Topic of SVMs
- Transformation of non-linear applications into higher dimensions using Monkey Learn as an example.