Business news

The Quiet Revolution: How LLMs Made a Leap in Development Over 10 Years

Posted on April 19, 2024

Introduction

In the last decade, the revolutionary emergence of Large Language Models (LLMs) has marked a pivotal juncture in the annals of technological advancement. Unveiling capabilities that blur the lines between artificial and human intellect, these models have ushered in an era of innovation, transforming countless facets of human endeavor. In this article GVS Chaitanya, tech-marketing specialist, embarks on a meticulous exploration of LLMs’ evolutionary journey, illuminating the groundbreaking strides and the profound paradigm shifts reshaping the landscape of artificial intelligence.

2010s: From RNNs to GRUs

RNN

While the very idea of LLMs emerged in the 1960s when the first ever chatbot, Eliza, was created by MIT researcher Joseph Weizenbaum, it took decades for them to take shape of modern AI models. The process has significantly accelerated since the 2010s, when the dominant AI models were recurrent neural networks (RNNs). The RNNs are designed to handle sequential information, such as sentences in a text. They process sequence elements one at a time, using the self-referential mechanism to memorize the processed ones. Such a design enables the use of RNNs for speech recognition and language modelling, but only while dealing with small data sets.

The RNN architecture has become a foundation for further neural network development, bringing machine translation and voice recognition to new heights. Without RNNs such ubiquitous tools as online translators or voice assistants would not have been possible.

LSTM

Long Short-Term Memory (LSTM) networks became an important step in the RNN evolution. They were first introduced in the 1990s to fix RNNs’ inability to process large amounts of data. Unlike RNNs, LSTMs have a memory unit that can keep information for an extended period of time. This unit has input, output and the forget gates, which decide what information to add or remove from, enabling the model to learn long-term dependencies. This invention made it possible to use RNNs for recognising speech and translating languages.

GRU

Seventeen years after the release of the first LSTM network in 1997, its upgraded version, the Gated Recurrent Unit (GRU), came into existence. Compared to its predecessor, GRU has two additional vectors – an update gate and reset gate. They can be trained to keep data from long ago without losing it in the course of iterations or delete the information irrelevant for making a prediction.

As was the case with all the previous advancements, GRUs opened new opportunities for LLMs applications. For instance, they were able to solve tasks which had been impossible to solve before, including time series prediction and language modelling.

The Rise of Transformers

Just three years after GRU emerged in 2014, computer scientist Ashish Vaswani wrote a research paper that started an AI revolution. Titled “Attention Is All You Need”, the study was dedicated to Transformers, a new type of AI model architecture that altered the existing rules. Transformers, in contrast with RNNs, could process words in relation to all the other words in a sentence thanks to an embedded self-attention mechanism. This brought AI’s understanding of context and meaning to a new level and paved the way for the most modern LLMs such as GPT, or generative pre-trained transformer.

GPT

AI models designed for natural language processing (NLP) have arguably become mainstream after the 2018 release of OpenAI’s GPT. This model used the transformer architecture and was trained on an immense corpus of text data, which made it understand and generate text that can hardly be distinguished from human speech. This milestone opened an era of increasingly sophisticated AI models that broadened the scale of AI applications, especially in the sphere of content creation.

Every new version of GPT improved the capacity to generate coherent and contextually relevant text. This trend can be indirectly confirmed by the increasing number of parameters in the new models. While GPT-2 had 1.5 billion parameters, GPT-3 got 175 billion parameters, gaining the ability to perform tasks it wasn’t explicitly trained on.

GPT-3, for example, can generate comprehensive articles from just topics or keywords. It can also power chatbots and virtual assistants for conversing with users in a truly human-like manner, answering questions and completing simple transactions quickly.

The AI models keep evolving, becoming faster and learning how to process images alongside text. They have also gained safety and bias mitigation capabilities, including content filtering algorithms and feedback loops to learn from its interactions.

BERT

Another breakthrough in AI development was the release of Bidirectional Encoder Representations from Transformers (BERT) in 2018. This is an NLP model that can understand the context of words in search queries thanks to bidirectional transformer training. This method takes into account both the preceding and succeeding context within a sequence, greatly improving predictive accuracy.

BERT and GPT differ primarily in their training approaches and applications. In the process of BERT’s training, portions of the input data get masked in order to train the model to predict these masked words. GPT, on the other hand, learns from previous words, allowing it to perform well in text generation tasks. While BERT is intended to handle tasks that require a thorough understanding of language context, GPT is ideal for producing coherent and contextually relevant text.

Key Trends

Bigger, Faster, Stronger… and More Specialised

The advent of transformer architecture demonstrated an unprecedented increase in the size of LLMs, ushering in a new era of generating text indistinguishable from human and solving complex problems such as code snippet creation.

As LLMs evolved, so did the development of models tailored to specific industries or tasks, such as medical diagnosis, legal analysis, and financial forecasting. This specialization enables more precise and relevant AI applications across various business sectors.

Fight Against Bias

The proliferation of LLMs has brought to light the growing dangers of automated systems perpetuating societal prejudice. AI algorithms trained on historical data can unintentionally embed and amplify social injustices and inequalities, influencing decision-making in a wide range of areas, including employment, healthcare, and criminal justice.

While communities around the world raise the alarm about the bias problem, transparent AI models and diverse training datasets that ensure justice have yet to be developed.

Calls for Regulation

Another risk that alarmed millions of people around the world is the possibility of AI rendering many existing jobs obsolete. As LLMs penetrate an increasing number of economic sectors, the debate over the responsible use of AI technology, which will not leave hundreds of millions of people unemployed and destabilize countries, heats up.

The proposed measures include developing policies that strike a balance between innovation and ethics, as well as protecting individuals’ rights while promoting technological advancement.

Conclusion

As we stand on the precipice of a new era, the evolution of Large Language Models heralds a transformative phase in artificial intelligence. These multimodal marvels, proficient in parsing text, image, and voice, not only redefine speed and quality but also foreshadow an imminent surge in AI’s societal, economic, and personal impact. This monumental trajectory underscores the imperative for ethically grounded, responsible AI development and deployment, ensuring that as we advance technologically, we do so with mindfulness and integrity.