The image above was created using AI. More specifically, this was the third image generated by Stable Diffusion, when given the prompt “How GPT3 Works”.
It has now been half a year since ChatGPT took the internet and the world by storm, being an instant, highly controversial success. Hundreds of millions of users have reportedly used the machine learning model to generate tons of content, in uses ranging from generating python code, to writing an outline for a graduate essay, to passing a graduate law exam. The tool is known worldwide to be extremely versatile and well-written, being able to generate content at rates significantly faster than humans.
As ChatGPT itself puts it, “In a rapidly evolving digital landscape, one technological marvel has captivated the attention of millions and reshaped the way we communicate and interact online: ChatGPT. Born out of the groundbreaking GPT-3.5 architecture developed by OpenAI, ChatGPT has swiftly risen to prominence, becoming a cultural phenomenon in its own right. From answering burning questions to providing creative writing prompts, this AI-powered language model has seamlessly integrated itself into our daily lives, revolutionizing the way we seek information, entertain ourselves, and even converse with virtual companions.”
However, despite its massive popularity, how the machine learning model actually works is a mystery in many people. Today, we clarify that mystery.
What is ChatGPT’s Goal?
ChatGPT, like many other language models, is programmed to generate text based on information it receives from user-designated inputs. These inputs, in ChatGPT’s case, come from someone typing into the Chat Box. ChatGPT then takes these inputs, processes it in an algorithm it is trained with, and outputs text.
How is ChatGPT Trained?
In layman’s terms, ChatGPT is first trained by looking at an immense sample size of text. The idea behind this is that with enough information, ChatGPT’s algorithms are able to make statistical generalizations, and apply them to make statistic-based predictions when generating text. More specifically, when ChatGPT looks through massive plots of data, it retains many bits of important information, such as where words are placed in a sentence, what group of characters forms a word, etc. The way ChatGPT is trained is particularly interesting, as the data it learns from is a combination of internet content and user input.
The second phase of training involves making preliminary predictions. This phase of training feeds ChatGPT some input material, and asks it to complete it. The first predictions will always always be wrong. However, ChatGPT learns from its mistakes, updating its algorithm automatically to statistically account for its mistakes. Gradually, ChatGPT automatically refines its prediction process, becoming more and more consistent the longer it is trained.
All in all, the ChatGPT pipeline looks something like this:
Input words > Input # Vectors > Magical Algorithms > Output # Vectors > Output words
In this timeline, the most important but also the most complicated section is the ‘Magical Algorithms’ section. Although not quite magical, this process is quite perplexing, as it passes the Input # Vectors through 96 transformer decoder layers (which are basically just extremely complicated functions), and at the end of these layers, outputs text that we, humans, are able to understand.
I hope that this clarifies!
