What is a Generative Adversarial Network?

A Generative Adversarial Network (short: GAN) is a unique model built from two competing neural networks that improve each other during training. The first model tries to generate data that looks as real as possible, for example, images, and the downstream model in turn takes this data as input and classifies it as authentic or artificial.

The Generative Adversarial Network can be thought of as a game of cops and robbers. The robber tries as best as he can to commit thefts without being caught by the police. The more often he gets caught, the better he gets at remaining undetected, as he learns from his previous mistakes. The police, on the other hand, try to detect as many thefts as possible and improve their skills in finding the robber.

What are generative models?

To understand how GANs are built and how they work, let’s take a step back and look at the general types of models in machine learning. In general, we distinguish between supervised and unsupervised learning, as well as discriminative and generative models.

Let’s say we want to teach a child a new language, for example, English. If we do this according to the principle of supervised learning, we simply give him a dictionary with the English words and the translation into his native language, for example, German. The child will find it relatively easy to start learning and will probably be able to progress very quickly by memorizing the translations. Beyond that, however, it will have problems reading and understanding texts in English because it has only learned the German-English translations and not the grammatical structure of sentences in English.

Das Bild zeigt die verschiedenen Machine Learning Felder im Überblick. — Types of Machine Learning | Source: Autor

According to the principle of unsupervised learning, the scenario would look completely different. We would simply present the child with five English books, for example, and he would have to learn everything else on his own. This is, of course, a much more complex task. With the help of the “data,” the child could, for example, recognize that the word “I” occurs relatively frequently in texts and in many cases also appears at the beginning of a sentence, and draw conclusions from this.

When a supervised learning algorithm is used to assign the training data sets to a class depending on certain criteria, it is called a discriminative model. Unsupervised learning models, on the other hand, are mostly used to generate new artificial examples that have similar characteristics and distributions as the training examples. This is called a generative model.

How do Generative Adversarial Networks work?

Generative Adversarial Networks combine the different types of machine learning by combining two neural networks. Let’s assume we want to train as realistic as possible images of a sunset using a GAN. The images first pass through the so-called generator model.

The generator model tries to get better and better at generating realistic images of sunsets during the training. This model receives no real input from the training data set but so-called random noise. This is a purely random vector of a certain length with randomly generated numerical values. The distribution from which the numbers are drawn is secondary. Simply put, the generator model creates images more or less out of thin air that is supposed to resemble a sunset. Understandably, these will look relatively bad at the beginning.

The images generated in this way are then given to the second neural network, the so-called discriminator model. This network always receives either a real image from the training data set or a randomly generated image from the generator model and must decide for each image whether it is real or artificial.

The Generative Adversarial Network now learns in two steps. The discriminator model is supervised and thus tries to increase the accuracy of its classification by classifying real images as real and artificial ones as artificial. The generator model, on the other hand, is unsupervised and gets feedback on how many of the generated images were “unmasked” as fake and how many images were falsely passed as real. Then the model tries to increase the rate of fake images that were marked as real by the discriminator.

These opposing, or “adversarial, goals of the two networks improve each other and thus lead to an optimal result.

How does the Generative Adversarial Network work by example?

The Generative Adversarial Network consists of two neural networks: the generator and the discriminator. The generator takes a random noise vector as input and outputs an image of a dog. The discriminator takes an image as input and outputs a probability that the image is real (i.e., from a dataset of actual dog images) or fake (i.e., generated by the generator).

During training, the generator is trained to produce images that fool the discriminator into thinking they are real. At the same time, the discriminator is trained to distinguish between real and fake images.

The training process is iterative and includes the following steps:

The generator produces a set of fake dog images with random noise as input.
The discriminator is given a batch of real dog images and the batch of fake images generated by the generator and makes predictions about whether each image is real or fake.
The loss function is calculated for both the generator and the discriminator based on their performance on this batch of images.
The generator is updated to improve its ability to produce images that fool the discriminator.
The discriminator is updated to improve its ability to distinguish between real and fake images.
Steps 1-5 are repeated until the generator can produce realistic-looking images of dogs.

Once training is complete, you can use the generator to produce new images of dogs by simply providing a random noise vector as input. The generator will then output a new, never-before-seen image of a dog.

What are GANs used for?

In many cases, you train the generator and discriminator together, but in the end, it is only the trained generator that is really of interest. There are several use cases for which a generator is useful, such as:

When creating deepfakes, the trained generator is used to create videos of people that did not happen.
The generator can be used to transform two-dimensional drawings into three-dimensional space.
Generative Adversarial Networks also produce high-quality videos of older material that originally had poorer resolution.
GANs are also frequently used in the creation of artificially aged persons from images.

What are the limitations of GANs?

Generative Adversarial Networks (GANs) have attracted a lot of attention in the field of artificial intelligence due to their ability to generate realistic data. However, like any other technology, GANs have their limitations.

One of the main limitations of GANs is their instability during the training process. GANs consist of two neural networks, the generator, and the discriminator. The generating network generates new data, while the discriminating network evaluates how realistic the data is. During training, the two networks compete with each other, which can lead to instability and result in the generator producing poor-quality data or not converging at all.

Another limitation of GANs is the difficulty in generating high-resolution images. GANs often have difficulty producing high-quality images with a resolution of 512×512 or more. This is because the generator network must learn a large number of features to generate high-resolution images, which can be challenging.

GANs also require large amounts of data for training. The generator network must learn from a large dataset to produce high-quality data. This can be a major challenge, especially in cases where data is scarce or where generating new data is expensive or time-consuming.

In addition, GANs may experience mode collapse, where the generator produces a limited number of outputs and ignores other possible variations in the data distribution. This can lead to a lack of diversity in the generated data.

Finally, GANs can be susceptible to bias in the training data, which can cause the generator to produce biased or discriminatory results. This is particularly problematic when GANs are used in sensitive areas such as criminal justice or hiring practices.

In summary, while GANs have made remarkable progress in generating realistic data, they also have significant limitations. As with any technology, understanding and addressing these limitations is essential to the responsible use and development of GANs.

This is what you should take with you

A Generative Adversarial Network is a model that consists of two competing neural networks and tries to generate new, artificial data from a given distribution.
The generator part tries to generate data that looks as real as possible and the discriminator in turn tries to classify it as false.
The Generative Adversarial Network can be used, for example, to improve videos or images with poor quality or to display two-dimensional drawings in three dimensions.