Deepfakes are artificially created videos, images, or audio files created using deep learning models. For example, existing video sequences are used and faked by replacing faces. They are intended to appear as realistic as possible, even though they were generated by an ML model. In addition to using deepfakes for private videos, they can also be used to disseminate targeted misinformation.
How are deepfakes made?
Currently, two main technical models are used to produce deepfake videos.
Autoencoders are machine learning models that consist of an encoder part and a decoder part. They are actually used to learn a compressed yet information-rich representation of unstructured data. For example, we can use the same image as input and output. This would allow the autoencoder to learn a vector representation of the image (code in the diagram) that is as compressed as possible and stores all the important features. This vector is then used by the decoder to generate the original image from it again. The better the learned vector representation of the autoencoder, the more realistic the generated image.
A total of two autoencoders are trained for a deepfake. The first model is fed with images or videos of the person who is to be seen in the final product. In most cases, these are celebrities, politicians, or athletes, in our example person A. The second model is trained on images of another person (person B), who provides the facial expressions or gestures to be imitated.
When these two models are trained, one uses an image of person B and encodes it with the encoder from the second model. The resulting vector is then fed into the decoder from the first model, which creates an image that looks like person A, but has taken over the movements and facial expressions of person B.
The so-called General Adversarial Networks are the second way to train an ML model to create deepfakes. In short, we train two neural networks together. The first is trained to produce artificial images that share as many features as possible with the original training images. The second network, in turn, tries to find the differences between the artificially created images and the original images. So we train two networks that are competing against each other, both getting better and better as a result.
What are the types of deepfakes?
The training of such models and accordingly the creation of good deepfakes is very time and computationally intensive. Due to the great advances in the field of Graphics Processing Unit (GPU), this technique has only become accessible to the masses, as the training costs have dropped significantly as a result. Most deepfake files fall into one of the following categories:
- Face Swapping: The face and facial expressions of person A should be projected onto the body of person B. This can even replace the entire body of person B in a video or image with the body of person A.
- Body Puppetry: Movements, gestures, or facial expressions of person A are recorded and these are then to be artificially taken over by person B.
- Voice Swapping: A freely written text is to be performed as authentically as possible with the voice of a person. This method can also be combined with body puppetry, for example.
How can you detect deepfakes?
High-quality deepfakes are initially tricky or even impossible to detect for the naked eye and novices in this field. In general, there are two approaches to unmasking such fake video or audio files.
The first approach is less concerned with the specific file and more with the circumstances. The following questions can be helpful in dealing with deepfakes:
- Would the person showed really do or say something like that? Is it to be expected that what is shown really happened?
- Can you find other sources, e.g. videos, newspaper articles, etc., that confirm what is shown?
- Can you find other footage of the same scene from a different angle?
If these questions can be answered with “yes”, the risk of falling victim to a deepfake is significantly lower. Beyond that, however, there are even more detailed and technical questions that can provide information about a deepfake:
- Are there typical deepfake features, e.g. a perfectly symmetrical face, crooked glasses, two different earrings, or similar?
- Do the lip movements look human? Do they match the spoken text?
- Does the person blink unusually often?
For training purposes, we have linked some videos at the end of the article that are proven deepfakes. There you can test if you would have recognized them right away.
What is the danger of deepfakes?
Deepfakes can pose a threat to us in many areas of everyday life.
For example, it is possible that these artificial files are used for the so-called CEO fraud. In this case, one gets a call from the superior or even the management as realistically as possible, which is aimed at making money transfers to fraudsters. When we hear the real voices of colleagues or superiors, we are unlikely to be as suspicious as when we receive a phishing email with a malicious link attached.
Beyond that, however, there are much more serious dangers that threaten the widespread distribution of high-quality deepfakes. These media files can be used to spread targeted disinformation by creating and distributing offensive videos or audio files. Not only does this put individuals in a bad light, but in the worst case, it can even lead to upheaval in society.
This is what you should take with you
- Deepfakes are artificially created media, such as videos, images, or audio files, which have been created using Deep Learning methods.
- They try to represent people in a different context or environment, of which there are no original files.
- Technically, these files are created with the help of so-called autoencoders or general adversarial networks.
Other Articles on the Topic of Deepfakes
- An interesting video on how to create a deepfake of David Beckham, you can find here.