ResNet: Residual Neural Networks - easily explained!

Residual Neural Networks (ResNet) are special types of neural networks used in image processing. They are characterized by their deep architectures, which can still produce low error rates.

What architecture has been used in image recognition so far?

After the great success of a Convolutional Neural Network (CNN) at the ImageNet competition in 2012, CNNs were the dominant architecture in machine vision. The approach is modeled on how our eye works. When we see an image, we automatically split it into many small sub-images and analyze them individually. By assembling these sub-images, we process and interpret the image. How can this principle be implemented in a Convolutional Neural Network?

The work happens in the so-called Convolution Layer. To do this, we define a filter that determines how large the partial images we are looking at should be, and a step length that decides how many pixels we continue between calculations, i.e. how close the partial images are to each other. By taking this step, we have greatly reduced the dimensionality of the image.

The next step is the Pooling Layer. From a purely computational point of view, the same thing happens here first as in the Convolution Layer, with the difference that we only take either the average or maximum value from the result, depending on the application. This preserves small features in a few pixels that are crucial for the task solution.

Finally, there is a Fully-Connected Layer in the Convolutional Neural Network, as we already know it from regular Neural Networks. Now that we have greatly reduced the dimensions of the image, we can use the tightly meshed layers. Here, the individual sub-images are linked together again in order to recognize the connections and carry out the classification.

What is the problem with deep neural networks?

In order to achieve better results, the architectures used became deeper and deeper. Thus, several CNN blocks were simply stacked on top of each other in the hope of achieving better results. However, the problem of the so-called vanishing gradient arises with deep neural networks.

Error Rate with 20 and 56 Layers | Source: Deep Residual Learning for Image Recognition

The training of a network happens during the so-called backpropagation. In short, the error travels through the network from the back to the front. In each layer, it is calculated how much the respective neuron contributed to the error by calculating the gradient. However, the closer this process approaches the initial layers, the smaller the gradient can become so that there is no or only very slight adjustment of neuron weights in the front layers. As a result, deep network structures often have a comparatively high error.

In practice, however, we cannot make it so easy for ourselves and simply blame the decreasing performance on the vanishing gradient problem. In fact, it can even be handled relatively well with so-called batch normalization layers. The fact that deeper neural networks have a worse performance can furthermore also be due to the initialization of the layers or to the optimization function.

How do residual neural networks solve the problem?

The basic building blocks of a residual neural network are the so-called residual blocks. The basic idea here is that so-called “skip connections” are built into the network. These ensure that the activation of a layer is added together with the output of a later layer.

Residual Block | Source: Deep Residual Learning for Image Recognition

This architecture allows the network to simply skip certain layers, especially if they do not contribute anything to a better result. A residual neural network is composed of several of these so-called residual blocks.

What problems can arise with ResNets?

Especially with Convolutional Neural Networks, it naturally happens that the dimensionality at the beginning of the skip connection does not match that at the end of the skip connection. This is especially the case if several layers are skipped. In Convolutional Neural Networks, the dimensionality is changed in each block with the help of a filter. Thus, the skip connection faces the problem of simply adding the inputs of previous layers to the output of later layers.

To solve this problem, the residual can be multiplied by a linear projection to align the dimensions. In many cases, for example, a 1×1 convolutional layer is used for this purpose. However, it can also happen that an alignment of dimensions is not necessary at all.

How to build a ResNet block in TensorFlow?

A ResNet block is relatively easy to program in TensorFlow, especially if you ensure that the dimensions are the same when merging.

In this case, the input first passes through a dense layer with 1024 neurons. This is followed by a block consisting of a dropout layer and two dense layers, which first limits the number of neurons to 512 before it is increased again to 1024. Then the merging with the add layer takes place. Since both inputs have a dimensionality of 1024, they can be added up without any problems.

How does the training of a ResNet model work?

The training of a residual neural network runs through the standard process as known from the training of deep neural networks, but with the difference that the skip connections must be taken into account. This special feature favors the training of even larger and deeper neural networks.

According to the principle of backpropagation, the training data is fed into the model and passes through all layers up to the output layer. The output of the model is then compared with the actual label of the data set in order to calculate the loss. This error is then passed through the network structure from back to front using the gradient method to adjust the weights of the neurons so that the error is minimized.

So far, this process is similar to a normal deep neural network. However, the skip connections of the ResNet now ensure that the gradient information can flow more easily through the network, as layers can be bypassed whose neurons could have a vanishing gradient and therefore hinder the learning process. Without this skip connection, such layers would cause the learning process to be interrupted as the weights cannot be updated.

Of course, residual neural networks can be trained using more complex methods, such as a dropout layer or batch normalization, in addition to standard training methods such as backpropagation or stochastic gradient descent (SGD). These techniques further improve performance and prevent possible overfitting.

Finally, the training process of residual neural networks is very similar to that of normal deep neural networks. The data is fed into the model and runs through it so that a loss or error can be calculated. This runs through the network from back to front to update the weights using the gradient. The only difference here is that layers with a low gradient can be skipped during this run to achieve faster convergence and higher accuracy.

What are the advantages and disadvantages of using ResNets?

ResNets have become a popular choice for training deep neural networks. Before using these layers, however, the advantages and disadvantages of using them should be considered:

Advantages:

Improved accuracy: In various benchmark datasets, residual neural networks were able to achieve higher accuracy than conventional deep neural networks.
Faster convergence: The skip connections ensure that faster convergence can be achieved, as the gradients can flow more directly through the model and vanishing gradients are bypassed. This also leads to faster training and therefore lower costs for the use of resources.
Better generalization: Due to the structure of ResNets, these models learn more general structures in the data and do not focus on dataset-specific features. This increases the generalization of the model and delivers better results with unseen data.
Transfer learning: ResNets are a popular choice for use within transfer learning, as they have achieved very good results here. General models are trained and then fine-tuned to specific applications using smaller data sets. These models are particularly important in the field of natural language processing.

Disadvantages:

Increased complexity: The skip connections lead to a higher complexity of the model, which in turn can be reflected in higher computing requirements and higher memory requirements. It must therefore be decided on a case-by-case basis whether the introduction of these layers is worthwhile or whether the neural network can manage without them.
Overfitting: With small data sets, residual neural networks can also lead to overfitting, as the model structure is too complex and cannot be sufficiently learned with the few training instances. Therefore, suitable regularization techniques should be used to avoid this.
Interpretability: As with normal deep neural networks, it is very difficult to interpret large network structures. The branched network makes it impossible to understand exactly how the model arrives at decisions. The skip connections make this interpretability even more difficult.

Residual neural networks offer a powerful and versatile neural network architecture that can lead to good results, especially when building deep neural networks. However, the use of this structure must be re-evaluated from application to application, as its use also has some disadvantages.

This is what you should take with you

Residual Neural Networks, or ResNets for short, offer a way to train deep neural networks without a high error rate.
For this purpose, they are composed of many so-called residual blocks, which are characterized by a skip connection.
The skip connection allows the network to skip one or more layers if they do not improve the result.