The Generative Pretrained Transformer 3, GPT-3 for short, is a Deep Learning model from the field of Natural Language Processing, which is capable of independently composing texts, conducting dialogs or deriving programming code from text, among other things. The third version of the model, just like the previous ones, was trained and made available by OpenAI.
What is a Generative Pretrained Transformer?
If you take a closer look at the name Generative Pretrained Transformer, the GPT-3 model is essentially a Transformer model. Transformer models are so-called sequence-to-sequence models, i.e. they use a word sequence as input and generate a suitable word sequence as output.
The new feature of this machine learning algorithm is the use of so-called attention masks. They provide information about which words or tokens within a sequence are important for the task. This also allows the context of a sentence to be preserved across many words. This was a completely new approach compared to other models, such as LSTM, that have been used for such applications.
The different generations of GPT models do not really differ in their technical design but are based on different data sets with which they have been trained. The GPT-3 model, for example, uses these data sets:
- Common Crawl includes data from twelve years of web scraping including website data, metadata, and texts.
- WebText2 contains websites that were mentioned in Reddit posts. As a quality feature, the URLs must have at least a Reddit score of 3.
- Books1 and Books2 are two datasets consisting of books available on the Internet.
- Wikipedia Corpus contains English Wikipedia pages on various topics.
What can you use it for?
There are various use cases for the application of a GPT-3 model. In addition to pure text creation and continuation, complete computer programs can be created, among other uses. Here are some example applications that OpenAI mentions on its homepage:
- Question-Answering System: With the help of a short, content-based text, the appropriate answers can be generated to a wide variety of questions.
- Grammar corrections: In the English language, grammatically incorrect sentences can be improved.
- Summaries: Longer texts can be summarized into short, concise sections. The difficulty levels can also be freely selected so that more complicated texts are summarized in the simplest possible language.
- Conversion of natural language into programming code: The GPT-3 model can convert linguistic paraphrases of algorithms into concrete code. Various languages and applications are supported, such as Python or SQL.
- Generate marketing text: The model can also be used to generate appealing marketing texts adapted to the product from simple and short product descriptions.
What are the weaknesses of a GPT-3 model?
Although the GPT-3 model covers a wide range of tasks and performs very well in them, there are a few weaknesses of the model. The two main points mentioned in many contributions are:
- The model can currently only use 2048 tokens (about 1,500 words) as input and return them as output. Current research projects are trying to increase this size further.
- The GPT-3 model does not have any kind of memory. This means that each computation and task is considered individually, regardless of what the model computes before or after it.
If you look at the use cases from our previous chapter, you can quickly think that this model can already replace many human activities in the near future. Although the results in individual cases are already very impressive, the model is currently still rather far from taking over real tasks or jobs. For example, if we take programming as a use case, few programs will get by with 1,500 “words” as output. Even if the code is computed and assembled in different stages, it is rather unlikely that the independently generated building blocks can work together properly.
This is what you should take with you
- The Generative Pretrained Transformer, GPT-3 for short, is a model from OpenAI that is used in the field of Natural Language Processing.
- It can be used, among other things, to convert natural language into programming code, to create content faithful summaries of texts, or to build a question-answering system.
- Although the progress in this area is amazing, the output size of 2048 tokens, or about 1,500 words is currently still a major weakness.
Filter bubbles explained with definition, examples and ways to avoid them.
Other Articles on the Topic of GPT-3
- Via the OpenAI API, the GPT-3 model is freely available and can be used for its own applications.