Skip to content

What is Supervised Learning?

Supervised Learning is a subcategory of Artificial Intelligence and Machine Learning. It is characterized by the fact that the training data already contains a correct label. This allows an algorithm to learn to predict these labels for new data objects. The opposite of this is so-called unsupervised learning, where these labels are not present in the data set and the algorithm must be trained differently.

How does supervised learning work?

Supervised learning algorithms use datasets to learn correlations from the inputs and then make the desired prediction. Optimally, the prediction and the label from the dataset are identical. The training dataset contains inputs and already the correct outputs for them. The model can use these to learn from it in several iterations. The accuracy in turn indicates how often the correct output could be predicted from the given inputs. This is calculated using the loss function and the algorithm tries to minimize it until a satisfactory result is achieved.

You can think of it as a person who wants to learn English and can already speak German. With a German-English dictionary or a vocabulary book, the person can learn relatively easily on her own by covering the English column and then trying to “predict” the English word from the German word. She will repeat this training until she can correctly predict the English words a sufficient number of times. The person can measure her progress by counting the words she has translated incorrectly and putting them in proportion to all the words she has translated. The person will try to minimize this ratio over time until she can correctly translate all German words into English.

Supervised learning can be divided into two broad categories:

  • Classification is used to assign new data objects to one or more predefined categories. The model tries to recognize correlations from the inputs that speak for the assignment to a category. An example of this are images that are to be recognized and then assigned to a class. The model can then predict for an image, for example, whether a dog can be seen in it or not.
  • Regressions explain the relationship between inputs, called independent variables, and outputs called dependent variables. For example, if we want to predict the sales of a company and we have the marketing activity and the average price of the previous year, the regression can provide information about the influence of the marketing efforts on sales.

Supervised Learning Applications

There are a variety of business applications that can benefit from supervised learning algorithms. We have briefly summarized the most popular ones below:

  • Object recognition in images: As mentioned earlier, supervised learning models can be used to recognize objects in images or assign images to a class. Companies use this feature, for example, in autonomous driving to recognize objects to which the car should react.
  • Prediction: If companies are able to predict future scenarios or states very accurately, they can weigh different decision options well against each other and choose the best one. For example, high-quality regression analysis for expected sales in the next year can be used to decide how much budget to allocate to marketing.
  • Customer sentiment analysis: Through the Internet, customers have many channels to publish their reviews of the brand or a product public. Therefore, companies need to keep track of whether customers are mostly satisfied or not. With a few reviews, which are classified as good or bad, efficient models can be trained, which can then automatically classify a large number of comments.
  • Spam detection: In many mail programs there is the possibility to mark concrete emails as spam. This data is used to train machine learning models that directly mark future emails as spam so that the end-user does not even see them.

Problems with Supervised Learning

The good results that supervised learning models achieve in many cases unfortunately also have some disadvantages that these algorithms bring with them:

  • Labeling training data is in many cases a laborious and expensive process if the categories are not yet available. For example, there are few images for which it is categorized whether there is a dog in them or not. This has to be done manually first.
  • Training supervised learning models can be very time-consuming.
  • Human errors or discriminations are learned as well. So if a training dataset for classifying job applicants discriminates against certain social groups, the model will most likely continue to do so.

This is what you should take with you

  • Supervised learning is a subcategory of artificial intelligence and describes models that are trained on data sets that already contain a correct output label.
  • Supervised learning algorithms can be divided into classification and regression models.
  • Companies use these models for a wide variety of applications, such as spam detection or object recognition in images.
  • Supervised learning is not without problems, as labeling data sets is expensive and can contain human errors.
  • IBM has written an interesting article on the topic of supervised learning, which also briefly describes concrete supervised learning algorithms.
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner