Skip to content

Long Short-Term Memory Networks (LSTM)- simply explained!

The Long Short-Term Memory (short: LSTM) model is a subtype of Recurrent Neural Networks (RNN). It is used to recognize patterns in data sequences, such as those that appear in sensor data, stock prices, or natural language. RNNs are able to do this because, in addition to the actual value, they also include its position in the sequence in the prediction.

What are Recurrent Neural Networks?

In order to understand how Recurrent Neural Networks work, we have to take another look at how regular feedforward neural networks are structured. In these, a neuron of the hidden layer is connected with the neurons from the previous layer and the neurons from the following layer. In such a network, the output of a neuron can only be passed forward, but never to a neuron on the same layer or even the previous layer, hence the name “feedforward”.

Das Bild zeigt den Grundaufbau eines Künstlichen Neuronalen Netzwerkes mit den verschiedenen Schichten.
Structure Feedforward Neural Network

This is different for recurrent neural networks. The output of a neuron can very well be used as input for a previous layer or the current layer. This is much closer to how our brain works than how feedforward neural networks are constructed. In many applications, we also need to understand steps computed immediately before improving the overall result.

What Problems do RNNs face?

Recurrent Neural Networks were a real breakthrough in the field of Deep Learning, as for the first time, the computations from the recent past were also included in the current computation, significantly improving the results in language processing. Nevertheless, during training, they also bring some problems that need to be taken into account.

As we have already explained in our article on the gradient method, when training neural networks with the gradient method, it can happen that the gradient either takes on very small values close to 0 or very large values close to infinity. In both cases, we cannot change the weights of the neurons during backpropagation, because the weight either does not change at all or we cannot multiply the number with such a large value. Because of the many interconnections in the recurrent neural network and the slightly modified form of the backpropagation algorithm used for it, the probability that these problems will occur is much higher than in normal feedforward networks.

Regular RNNs are very good at remembering contexts and incorporating them into predictions. For example, this allows the RNN to recognize that in the sentence “The clouds are at the ___” the word “sky” is needed to correctly complete the sentence in that context. In a longer sentence, on the other hand, it becomes much more difficult to maintain context. In the slightly modified sentence “The clouds, which partly flow into each other and hang low, are at the ___ “, it becomes much more difficult for a Recurrent Neural Network to infer the word “sky”.

How do Long Short-Term Memory Models work?

The problem with Recurrent Neural Networks is that they have a short-term memory to retain previous information in the current neuron. However, this ability decreases very quickly for longer sequences. As a remedy for this, the LSTM models were introduced to be able to retain past information even longer.

The problem with Recurrent Neural Networks is that they simply store the previous data in their “short-term memory”. Once the memory in it runs out, it simply deletes the longest retained information and replaces it with the new data. The LSTM model attempts to escape this problem by retaining only selected information in short-term memory.

For this purpose, the LSTM architecture consists of a total of three different stages:

  1. In the so-called Forget Gate, it is decided which current and previous information are kept and which are thrown out. This includes the hidden status from the previous run and the current status. These values are passed into a sigmoid function, which can only output values between 0 and 1. The value 0 means that all previous information is forgotten and 1 accordingly that all previous information is kept.
  2. In the Input Gate, it is decided how valuable the current input is to solve the task. For this purpose, the current input is multiplied by the hidden state and the weight matrix of the last run.
  3. In the Output Gate, the output of the LSTM model is calculated. Depending on the application, it can be, for example, a word that complements the meaning of the sentence.

This is what you should take with you

  • LSTM models are a subtype of Recurrent Neural Networks.
  • They are used to recognize patterns in data sequences, such as those that appear in sensor data, stock prices, or natural language.
  • A special architecture allows the LSTM model to decide whether to retain previous information in short-term memory or discard it. As a result, even longer dependencies in sequences are recognized.

Other Articles on the Topic of LSTM

  • TensorFlow provides a tutorial on how to use LSTM layers in their models.
close
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner