In this article, we learn about the three changes that made ChatGPT and other LLMs significantly improved

In recent years, Large Language Models (LLMs) have been widely appreciated around the world and are popular in the field of natural language processing. This allows us to describe Intelligent Systems with a better and clearer understanding of language than ever before.


LLMs such as GPT-3, T5, PaLM, etc. have improved significantly in performance, and these models will continue to exist because they can do everything from imitating humans by learning to read, to generating text and summarizing long paragraphs of content. And according to some in-depth research, LLMS do well when they are large. By training these models on large amounts of data, they can understand the grammar, semantics, and pragmatics of human language.


ChatGPT, the popular large language model developed by OpenAI, has grown so fast because of advanced techniques such as human feedback reinforcement learning (RLHF). With RLHF, machine learning algorithms combined with human input improve the performance of the model. It's fine-tuned for pre-trained LLMS for tasks like chatbots, virtual assistants, and more.


In addition, the basic pretraining model on which ChatGPT and other LLMs are based has been significantly improved. This is mainly due to changes in three areas:


1. It has been proved that the Scaling of the model is very helpful to improve its performance. Take the Pathways Language Model (PaLM) for example, which greatly affects its performance through extended small-shot learning, Small sample learning can reduce the number of task-specific training instances needed to adjust the model for specific applications.


By using the Pathways language model to extend and train 540 billion parameters on a 6144 TPU v4 chip, PaLM demonstrated the benefits of repetitive extension, outperforming various traditional models and showing great improvement. Therefore, both depth and width expansion are important factors in improving the performance of the base model.


2. Another change is the process of increasing the number of markers during pre-training. Models like Chinchilla (open source language models) have demonstrated that large language models perform better by adding pre-training data.


Chinchilla is a computationally optimal model. Chinchilla consistently outperformed Gopher on the same computational budget, training on 70B parameters and four times more data than the Gopher model, It works even better than LLMs like GPT-3, Jurassic-1, and Megatron-Turing NLG. This clearly describes that for each computationally optimal training, the number of markers should be scaled accordingly -- that is, twice the size of the model, and therefore twice the number of training markers.


3. The third change is the use of clean and diverse pre-training data. This is demonstrated by the performance of Galactica, a large-scale language model for storing, mixing, and reasoning scientific knowledge. After training on several scientific paper texts, Galactica outperforms GPT-3, Chinchilla and other models. Another large-scale language model, BioMedLM, a domain-specific LLM for biomedical texts, shows a huge performance boost when trained against domain-specific data. It clearly shows that pre-training on domain-specific data beats training on generic data.


conclusion

The success of LLMs is undoubtedly due to a mixture of factors, including the use of RLHF and the development of the basic model for pre-training. These three changes greatly affect the performance of LLMs. In addition, GLaM (Common Language Model) significantly improves performance by using a sparsely activated Mixture of Experts architecture to expand the capacity of the model with less training cost. As a result, these changes open the way for more advanced language models that will continue to make our lives easier.