Skip to content

Reinforcement Learning – simply explained!

Reinforcement learning is the fourth major learning method in machine learning, along with supervised, unsupervised, and semi-supervised learning. The main difference is that the model does not need any data to train. It learns structures by being rewarded for desired behaviors and punished for bad ones.

Examples of Reinforcement Learning

Before we can look in detail at what the training process looks like for such models, we should understand in which situations these algorithms can help:

  • Reinforcement learning is used when teaching a computer to play games. The aim is to learn which tactics lead to victory and which do not.
  • In autonomous driving, these learning algorithms are also used so that the vehicle can decide on its own which course of action is best.
  • For the air conditioning of server rooms, reinforcement learning models decide when and how much to cool down the room to use energy efficiently.

The applications of reinforcement learning are generally characterized by the fact that a large number of successive decisions have to be made. The programmer could also prescribe these concretely to the computer (example room temperature: “If the temperature rises above 24 °C, then cool down to 20 °C”).

With the help of reinforcement learning, however, one wants to avoid formulating a chain of if-then conditions. On the one hand, this may simply be impossible in many use cases, such as autonomous driving, since the programmer cannot foresee all eventualities. On the other hand, it is hoped that these models will also enable the development of new strategies for complex problems, which a human being might not be able to do at all.

How Reinforcement Learning Works

Reinforcement learning models should be trained to make a series of decisions independently. Suppose we want to train such an algorithm, the agent, to play the game Pac-Man as successfully as possible. The agent starts at an arbitrary position in the game field and has a limited number of possible actions it can perform. In our case, these would be the four directions (up, down, right, or left) that it can go on the playing field.

The environment in which the algorithm finds itself in this game is the playing field and the movement of ghosts, which must not be encountered. After each action, for example go up, the agent receives a direct feedback, the reward. In Pac-Man, these are either getting points or an encounter with a ghost. It can also happen that after an action there is no direct reward, but it takes place in the future, for example in one or two further actions. For the agent, rewards that are in the future are worth less than immediate rewards.

Over time, the agent develops a so-called policy, i.e. a strategy of actions that promise the highest long-term reward. In the first rounds, the algorithm selects completely random actions, since it has not yet been able to gain any experience. Over time, however, a promising strategy emerges.

Differences between Machine Learning Methods

In the field of machine learning, a distinction is made between a total of four different learning methods:

  1. Supervised learning algorithms learn relationships using a dataset that already contains the label that the model should predict. However, they can only recognize and learn structures that are contained in the training data. Supervised models are used, for example, in the classification of images. Using images that are already assigned to a class, they learn to recognize relationships that they can then apply to new images.
  2. Unsupervised learning algorithms learn from a dataset, but one that does not yet have these labels. They try to recognize their own rules and structures in order to be able to classify the data into groups that have the same properties as far as possible. Unsupervised learning can be used, for example, when you want to divide customers into groups based on common characteristics. For example, order frequency or order amount can be used for this purpose. However, it is up to the model itself to decide which characteristics it uses.
  3. Semi-supervised learning is the mixture of supervised learning and unsupervised learning. The model has a relatively small data set with labels available and a much larger data set with unlabeled data. The goal is to learn relationships from the small amount of labeled information and test those relationships in the unlabeled data set to learn from them.
  4. Reinforcement learning differs from previous methods in that it does not need training data, but simply works and learns via the described reward system.

Is Reinforcement Learning the Future of Deep Learning?

Reinforcement learning will not be able to replace deep learning in the future. These two sub-areas are strongly connected, but they are not the same. Deep learning algorithms are very good at recognizing structures in large data sets and applying them to new, unknown data. Reinforcement Learning models, on the other hand, make decisions even without training data sets.

In many areas, machine learning and deep learning models will continue to be sufficient to achieve good results. The success of Reinforcement Learning, on the other hand, means that new areas of Artificial Intelligence can now be opened up that were previously unthinkable. However, there are also applications, such as stock trading, where Reinforcement Learning will replace deep learning models as it provides better results.

In this area, attempts have been made to learn how to recognize and trade new stocks from past market data. For the stock business, however, it can be much more promising to train a Reinforcement Learning algorithm to develop a concrete strategy, independent of past data.

This is what you should take with you

  • Reinforcement learning is a learning method in the field of machine learning.
  • It refers to models that are trained to predict a sequence of decisions that promise the highest possible success rate.
  • Reinforcement learning is used, for example, to teach computers to play games or to make the right decisions in autonomous driving.

Other Articles on the Topic of Reinforcement Learning

Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner