History of the ChatGPT neural network: development path to GPT-4, what will happen in GPT-5
Digital specialists have a feeling that everyone already knows about ChatGPT and actively uses it in their tasks, or at least tried to do so. This is not so: according to a study by anketolog, out of 2,432 Russians surveyed, 43% had not heard of this neural network at all, and 53% would like to use it. The survey was conducted from April 26 to May 7, 2023: at that time, the neural network had already become famous due to the first release and the release of an updated version based on the GPT-4 language model.
We think that our audience is more familiar with ChatGPT than the people from the study above - mainly digital specialists read us. But few people know how ChatGPT works and few are familiar with the history of its development. We decided to conduct an educational program - understanding the principles of the neural network at a fundamental level will help us use it more effectively.
The GPT 4 language model that ChatGPT runs on can be integrated into your online service via an API. Services such as Copy.ai, Writesonic and Gerwin AI operate based on the language model. If you have an idea for monetizing ChatGPT, you don’t have to immediately create an expensive website: you can create a Telegram bot as an MVP. You can create a task for chatbot developers on Workspace . Workspace is the No. 1 tender platform in the digital field: collect responses and choose the best performer.
How ChatGPT works
Essentially, ChatGPT is a chatbot-style interface that serves as a bridge between the user and the GPT language model. The paid version of the product runs on the GPT-4 language model, and the free version runs on GPT-3.5. To understand the principle of operation of a neural network, we will explain how the language models of the GPT family are structured.
All versions of GPT use a casual modeling approach. That is, they simply select word by word based on an analysis of previous words. For example, more recent versions of T9 do the same thing, which can suggest the next word based on the entered text. The T9 system owes this capability to the simple language model that is part of it.
Neural networks of the GPT family operate on the basis of a deep neural network architecture called “ Transformer ”. This architecture was invented in 2017 at Google Brain, a Google research group that studies and develops neural networks. The creation of the Transformer architecture was a turning point in the design of neural networks and made it possible to create such a “smart” product as ChatGPT.
What is the advantage of the Transformer architecture: the neural network based on it consists of separate layers that work in parallel, and also allow you to capture the context and long-term dependencies in the request. In the case of language models, this means that the neural network generates the next word based on all previous words from the query and the connections between them. The less advanced T9 suggests the next word only based on one previous word - this is due to a more primitive architecture.
Like other neural networks, GPT is a language model based on statistical patterns and has no real understanding or consciousness. It simply predicts the likely next words based on the information it was trained on. She read the texts, identified patterns and built algorithms based on them for use in subsequent generation.
The Transformer block device made it possible to greatly scale neural networks, feeding them huge amounts of data without causing inadequate demands on computing power. Therefore, after updates, GPT texts become more and more similar to human ones. With the growing number of neural network algorithms, its answers are increasingly becoming practical advice.
We will try to analyze the structure of the GPT language model so that everyone can understand it.
How the GPT language model works
GPT consists of an input layer, transformer and decoder blocks. Essentially, these are separate neural networks that are included in the Transformer architecture models.
What happens in GPT when you enter a query:The input layer receives the request that we want to process or continue. This layer converts text into numeric vectors called embeddings . The proximity of vectors in vector space reflects the syntactic structure and semantic similarity of words. For example, in one sentence the word “lock” may mean a fortress, and in another it can mean a thing into which a key is inserted. Embeddings help GPT understand what the most likely meaning of a word is and generate text based on the semantics of the words in the query. Embeddings also help to establish connections between individual words and understand the syntactic structure of sentences.
Embeddings are processed by several transformer blocks. Each block allows the model to process and capture different aspects of the text, such as semantics, syntax, and context. Each block consists of an attention engine and a multilayer perceptron.
The attention mechanism allows the model to focus on specific words in context and take their influence into account when processing the rest of the text. Those embeddings that the neural network identifies as “important” will receive greater weight.
Next, the multilayer perceptron transforms the data using linear operations and nonlinear activation functions over the embeddings. Due to this, GPT identifies complex dependencies between embeddings in order to be more likely to generate coherent and high-quality text.
After a request has been computed in multiple blocks, GPT uses a decoder to generate continuation text based on the embeddings, as well as their weights and post-computation parameters. The decoder calculates the probabilities of possible next words and outputs the most likely ones.
Let's look at how the GPT language model evolved and at what point the ChatGPT service appeared on its basis.
History of ChatGPT development
GPT-1
A year after Google Brain unveiled its Transformer architecture, OpenAI released the paper “ Improving Language Understanding by Generative Pre-Training ” and the first version of their language model, GPT-1. The first version of the model was not publicly available - it was an internal development of OpenAI. GPT-1 was an example of OpenAI's innovative approach to machine learning - a method of generative pre-training.
The GPT (Generative Pre-trained Transformer) architecture differed from Google's Transformer architecture in several key aspects:GPT-1 used several blocks due to multi-head attention technology. This allowed the language model to pay attention to different aspects of the text at the same time, which contributed to a better understanding of context and relationships between words.
Compared to Transformer, GPT-1 used higher dimensional embeddings. For example, if the embedding dimension is 300, each word is represented as a vector in 300-dimensional space. Each of the three hundred meanings will be a characteristic of the word, such as "semantics", "context" or "syntax". Increasing the embedding dimension allowed GPT-1 to capture the subtle nuances and complex dependencies of words in the text.
The most important. GPT-1 used a generative pre-training method. This means that the model was first trained on a large amount of unlabeled data, and then further trained on specific tasks with an instructor. This approach allowed GPT-1 to have a general representation of language and generalize its knowledge. But the most important thing is that it has become very easy to train such a language model - you load data into it, and it understands itself. Then correct it and it’s ready for release.
the maximum request size is 512 tokens (tokens can be roughly considered as morphemes in words);
educational information - 7,000 books;
number of parameters - 120 million;
12 layers.
GPT-2
Version GPT-2 was released in February 2019 - it was the first public version of the language model. GPT-2 was the result of a scaling of the GPT-1 language model. Its architecture has not fundamentally changed - except that the number of layers has been increased to 48 and 40 GB of data have been loaded into it, due to which its number of parameters has increased 10 times. Thanks to this, the neural network itself learned to answer questions, generate fairly complex essays, and translate texts from language to language with varying degrees of success.
Characteristics of GPT-2:maximum request size is 1,024 tokens;
educational information - 8 million web pages or 40 GB of data;
number of parameters - 1.5 billion;
48 layers.
GPT-3, GPT-3.5 and the release of the first version of ChatGPT
The beta version of GPT-3 was released in June 2020. Even more data was loaded into it, due to which the number of neural network parameters again increased 10 times compared to the previous version. With the upgrade, the neural network gained even more skills. She began to work even better with text, learning to give more complex answers in different styles, as well as write program code and carry out simple mathematical calculations.
Characteristics of GPT-3:maximum request size is 2,048 tokens;
educational information - 570 GB of data;
neural network volume - 800 gigabytes;
number of parameters - 175 billion;
96 layers.
Surprisingly, this version of the language model still has not gained worldwide popularity, although its capabilities were close to the ChatGPT version. It's all about the lack of an interface in the form of a chatbot. GPT-3 was published publicly, but had to be accessed through an API and was not free. But this version of the language model is available in variations with a different number of parameters: for example, the lightweight version of Ada has only 350 million of them.
The revolution came when the InstructGPT language model, also known as GPT-3.5, appeared, which became the basis of ChatGPT. The key difference between InstructGPT and GPT-3 is that it was further trained by people assessing the quality of answers. Its maximum number of tokens has also increased to 4,096.
Due to additional training, the neural network has become more prepared for use by ordinary people who are not prompt engineers, but simply write simple queries. Its power remained approximately the same as in the GPT-3 version. At the same time, open public access, a user-friendly interface and hitherto incredible capabilities have made ChatGPT world famous.
GPT-4
GPT-4 is a version of the language model that was released on March 14, 2023. This version has become multimodal, as it has learned to work not only with text, but also with images. The number of parameters of this version is unknown; experts estimate it at approximately 500 billion. The new version of the neural network began to generate even better responses, but due to the fact that the increase in power was not multiple, there was no incredible leap in functionality. In my subjective opinion, a neural network is still not capable of replacing a commercial author.
This version of the language model is available in 2 forms: in the paid ChatGPT Plus subscription, as well as in the chatbot of the Bing browser from Microsoft. We talked about how to use the updated version of ChatGPT in Bing in the article “ ChatGPT Review: What it can do and how to use the neural network effectively .” Microsoft came up with a new neural network model because Bill Gates' company invested more than $10 billion in OpenAI.
On March 23, 2023, OpenAI added support for plugins that further expand the functionality of the service.
What will happen in GPT-5
Sam Altman, CEO of OpenAI, said that the company is not currently training GPT-5. His speech took place on April 13, 2023 at the “ The Future of Business with AI ” conference. According to Sam, the company is currently engaged in preventing threats that may occur in connection with the release of a more powerful version of the neural network. GPT-5 is expected to become a strong artificial intelligence - smarter than humans. Initially, the release of the new version was planned for early 2024. This date is currently unknown.