Decoding AI Jargons: The Inner Workings of GPT and Transformer Models 🤖


AI is a hot topic of discussion, but do you understand how models like ChatGPT and Gemini operate? Are they truly a threat to humans, or is that just a myth? Let's explore these questions and delve into how OpenAI’s ChatGPT functions. What does GPT mean?
GPT stands for Generative Pretrained-Model Transformer. It means it generates/predicts the next possible word for you using the pretrained model’s data with the help of the Transformer.
What are Transformers?
In 2017, Google published a paper titled "Attention is All You Need", which introduced the concept of Transformers. These models provide the semantic meaning of words and are used in applications like Google Translate and Natural Language Processing (NLP). Transformers are a specific type of neural network architecture that understands the entire context of a conversation, enabling a deeper understanding of the user's intent. GPT took it and applied it to predict the next token in a sequence.
It uses two modules: encoders and decoders.
Encoding Process:
Initially, we take the input data and divide it into tokens. This process essentially transforms your data into a sequence of numerical representations organized in an array. Example:
Input: AI is taking our job :(
The tokenizer process begins by breaking each word and assigning a number to it (every GPT has a different tokenization process).
Note: These numbers are not exactly correct; they are used for illustration purposes.
In this way, they use a predefined dictionary where each word is assigned a unique number. This process helps transform words into numerical representations for further processing. You can check this website to experiment with the tokenizer: https://tiktokenizer.vercel.app
Vector Embedding:
After the tokenizer process, it moves into vector embedding, where it finds the semantic meaning to understand and process data more effectively. For instance, how does anyone know what the number "10421" means? It must have some connection to the number to establish the corresponding relationship.
In the above figure, we move 5 units to the left from France and discover that French people like to eat croissants. Similarly, if we move the same distance from India to the left, we might find biscuits as a result. This illustrates how vector embedding builds relationships between words.
Example:
Input 1: The code broke the server.
Input 2: The server broke the code.
After tokenization, these sentences might have the same vector embedding due to identical tokens. How do we differentiate between them since they have different meanings?
Positional Encoding:
This is where positional encoding comes in. Positional encoding provides a relative position for each token or word in a sequence, as the positioning of words affects the sentence's meaning. It ensures these two sentences have different tokens by adding specific numbers to them.
Self-Attention Mechanism:
Next, it goes into the self-attention mechanism, where tokens of words in a sentence interact with each other to refine their meanings based on the entire sentence context.
Example:
Input 1: The river Bank
Input 2: The HDFC Bank
In these examples, the word "Bank" has different meanings, which results in different interpretations of the sentences.
This self-attention occurs through multi-head attention, which rearranges words multiple times to provide different perspectives of the same sentence. This process is GPU-intensive.
Once this is complete, it moves to the decoder part.
Decoding Process:
The decoder begins the process of inference, generating predictions for the next word in a sequence. For example, when you input a message to GPT, such as "How are you?", the model predicts the next word by evaluating several possible options and their associated probabilities.
This is achieved through linear transformations and the softmax function, which selects the word with the highest probability and appends it to the input sequence. For instance, given the input "How are you?", the model might predict "I" as the next word, resulting in "How are you? I".
This process repeats iteratively, adding words one by one, such as "How are you? I am", until a complete response is generated. This is how GPT operates, using mathematical models and algorithms.
AI = Data + Algorithm.
Summary
GPT is a type of AI model that helps predict what word should come next in a sentence. It uses a special system called a "Transformer" to understand the context of the words you type. First, it breaks down your input into smaller parts called tokens, which are like pieces of a puzzle. Then, it figures out the meaning of these tokens and how they relate to each other.
When you ask GPT a question or give it a prompt, it uses this understanding to guess the next word, one at a time, until it forms a complete response. This process involves a lot of math and algorithms, but to you, it feels like magic because it can generate human-like text based on your input.
Subscribe to my newsletter
Read articles from Ashish Dabral directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Ashish Dabral
Ashish Dabral
I am a student, who loves writing technology articles and learning in public.