Recurrent Neural Networks (RNNs) are the third major type of neural network, along with Feedforward Networks and Convolutional Neural Networks. They are used with time series data and sequential information, i.e. data where the previous data point has an influence on the current one. These networks contain at least one layer that is recurrent. In this article, we will take a closer look at what this means.
What are Recurrent Neural Networks?
In order to understand how Recurrent Neural Networks work, we have to take another look at how normal feedforward neural networks are structured. In these, a neuron of the hidden layer is connected with the neurons from the previous layer and the neurons from the following layer. In such a network, the output of a neuron can only be passed forward, but never to a neuron on the same layer or even the previous layer, hence the name “feedforward”.
This is different for recurrent neural networks. The output of a neuron can very well be used as input of a previous layer or the current layer. This is actually much closer to the way our brain works than the structure of feedforward neural networks. In many applications, we also need an understanding of steps computed immediately before improving the overall result.
Applications of RNNs
Recurrent Neural Networks are mainly used in natural language processing or time-series data, i.e. when the information from the immediate past plays a major role. When translating texts, we should also keep the previously processed sequence of words in the “memory” of the neural network, instead of only translating word by word independently.
As soon as we have a proverb or idiom in the text to be translated, we must also take the preceding words into account, since this is the only way we can recognize that it is a proverb. If the applications in language processing become more complex, the context is of even greater importance. For example, when we want to gather information about a specific person across a text.
Types of RNNs
Depending on how far back the output of a neuron is passed within the network, we distinguish a total of four different types of recurrent neural networks:
- Direct-Feedback-Network: The output of a neuron is used as the input of the same neuron.
2. Indirect-Feedback-Network: The output of a neuron is used as input in one of the previous layers.
3. Lateral-Feedback-Network: Here, the output of a neuron is connected to the input of a neuron of the same layer.
4. Complete-Feedback-Network: The output of a neuron has connections to the inputs of all (!) neurons in the network, whether in the same layer, a previous layer, or a subsequent layer.
Problems with Recurrent Neural Networks
Recurrent Neural Networks were a real breakthrough in the field of Deep Learning, as for the first time, the computations from the recent past were also included in the current computation, significantly improving the results in language processing. Nevertheless, during training they also bring some problems that need to be taken into account.
As we have already explained in our article on the gradient method, when training neural networks with the gradient method, it can happen that the gradient either takes on very small values close to 0 or very large values close to infinity. In both cases, we cannot change the weights of the neurons during backpropagation, because the weight either does not change at all or we cannot multiply the number with such a large value at all. Because of the many interconnections in the recurrent neural network and the slightly modified form of the backpropagation algorithm used for it, the probability that these problems will occur is much higher than in normal feedforward networks.
Regular RNNs are very good at remembering contexts and incorporating them into predictions. For example, this allows the RNN to recognize that in the sentence “The clouds are in the ___” the word “sky” is needed to correctly complete the sentence in that context. In a longer sentence, on the other hand, it becomes much more difficult to maintain context. In the case of the slightly modified sentence “The clouds, which partly flow into each other and hang low, are in the “, it is already significantly more difficult for a Recurrent Neural Network to infer the word “sky”.
Long Short Term Memory (LSTM)
Due to the described problems of classical RNNs, a special form of recurrent neural network has developed, the so-called LSTM models. These are specially designed to include contexts over a longer period of time in the calculation. The main distinguishing feature from conventional RNNs is the so-called cell state. Roughly speaking, this is where the important information that must be preserved at all costs is stored. Various gates are used to carefully decide which data is allowed to “enter” the selected circle of the cell state or must leave it again. In a later article, we will deal with LSTM models in detail.
Recurrent Neural Networks are very old by Machine Learning standards and were first introduced back in 1986. For a long time, they, and in particular the LSTM architecture were the nonplus ultra in the field of language processing to maintain context. However, since 2017 and the introduction of Transformer models and Attention Masks, this position has fundamentally changed.
These are not only able to consider even longer contexts than LSTM models, but can also take into account the word position within the sentence and include it in the prediction. The only drawback to them is the very large model size even for simple applications and thus resource-intensive training. Nevertheless, Transformers have now replaced Recurrent Neural Networks in most linguistic applications.
This is what you should take with you
- Recurrent Neural Networks differ from Feedforward Neural Networks in that the output of neurons is also used as input in the same or previous layers.
- They are particularly useful in language processing and for time series data when the past context should be taken into account.
- We distinguish different types of RNNs, namely direct feedback, indirect feedback, lateral feedback, or full feedback.
Explanation of Recurrent Neural Networks and LSTM models with example.
Other Articles on the Topic of Recurrent Neural Networks
- More information about RNNs can be found on the Tensorflow page.