Decoding AI Jargons with Chai

GPT (Generative Pre-trained Transformer) models have revolutionized artificial intelligence by creating human-like text based on input prompts. This article breaks down complex terminology into simple explanations with practical examples to help you understand how these powerful AI systems work.
What is GPT?
GPT stands for Generative Pre-trained Transformer, a type of advanced AI model designed to understand and generate human-like text based on input prompts. As the name suggests, it consists of three core elements:
Generative: It generates new content (text, code, images, etc.)
Pre-trained: It learns from massive amounts of text data before being used, means it is being trained.
Transformer: It uses a special architecture that helps it understand relationships between words and all the working of ai lies here.
GPT models have evolved through several versions (GPT-2, GPT-3, GPT-4, etc.), with each iteration offering greater capabilities and understanding.
The Basic Building Blocks of GPT
Transformers
It's the architecture that powers GPT models. Think of it as the engine that allows the AI to process and generate text.
In simple terms, transformers are components within AI models that process information by paying attention to relationships between different parts of the input data. Unlike earlier AI models that processed text sequentially (word by word), transformers can look at an entire text sequence at once and understand how each part relates to the others.
Encoder and Decoder
The original transformer architecture has two main components:
Encoder: Converts human text into a form the AI can understand deeply.
It's like a translator that takes your words and transforms them into a special AI language that captures the meaning and context.
Decoder: Takes the AI's internal representations and converts them back into human-readable text.
It's the reverse translator that produces the text you actually see.
From Words to Numbers: How GPT Processes Text
Tokenization and Vocabulary Size
Before an AI can work with text, it needs to break it down into manageable pieces:
Tokenization is the process of splitting text into smaller units called tokens. These can be words, parts of words, or even individual characters.
For example, the sentence "I love machine learning" might become tokens ["I", "love", "machine", "learning"].
Vocabulary Size refers to the total number of unique tokens the model knows. Think of this as the AI's dictionary - the larger it is, the more nuanced the AI's understanding can be.
Vectors and Embeddings
Once text is tokenized, it needs to be converted into numbers that the AI can process:
Vectors are lists of numbers that represent tokens mathematically. For instance, the word "king" might be represented as [0.2, 0.8, 0.1, 0.5][1].
Embeddings take these vectors and place them in a multi-dimensional space where similar words are positioned closer together. In this space:
"Cat" and "kitten" would be near each other
"Hot" and "cold" might be far apart
"River" and "bank" might have a complex relationship
In short, this mathematical representation allows the AI to "understand" relationships between words.
Position Encoding
The transformer architecture processes all tokens simultaneously, which creates a problem: it could lose track of word order. Position encoding solves this by adding information about where each token appears in the text.
Consider the user's example:
"Krunal loves Sachita but Sachita does not"
"Sachita loves Krunal but Krunal does not"
These sentences contain identical words but have opposite meanings because of word order. Position encoding ensures the model understands this distinction by adding position information to each token's representation.
The Attention Mechanism: The Heart of GPT
Self-Attention
Self-attention is the revolutionary mechanism that allows GPT to understand context. For each word in a sentence, self-attention asks: "How much should I focus on every other word (including myself) to understand this word's meaning in this context?"
For example, in "The elephant couldn't cross the bridge because it was too heavy":
When processing "it," the model needs to figure out what "it" refers to
Through self-attention, it focuses heavily on "elephant" (rather than "bridge")
This helps the model understand that "it" refers to "the elephant"
This mechanism is what gives GPT its impressive context awareness.
How Self-Attention Works
Self-attention operates through three key vectors created for each token:
Query vector: Represents what the current word is "asking about"
Key vector: Represents what other words "offer" in response
Value vector: Contains the actual information to be passed along
The process works like this:
Calculate attention scores between words by comparing queries and keys
Convert these scores to weights using the softmax function
Create a weighted sum of value vectors based on these weights
This produces a new representation of each word that incorporates context
Softmax Function
The softmax function converts a set of numbers into a probability distribution where all values are between 0 and 1 and sum to 1.
In self-attention, if the raw attention scores for a word with respect to three other words are [5.0, 2.0, 1.0], softmax might convert these to [0.8, 0.15, 0.05], indicating:
The first word gets 80% of the attention
The second word gets 15% of the attention
The third word gets 5% of the attention
The name "softmax" comes from the fact that it's a "softer" version of simply taking the maximum value โ it emphasizes the highest values while still considering others.
Multi-Head Attention
Multi-head attention runs the self-attention mechanism multiple times in parallel. Each "head" might learn to focus on different aspects of language:
One might focus on subject-verb relationships
Another might focus on pronouns and their referents
Another might track temporal relationships
This is like having several people read the same text, each paying attention to different aspects, then combining their insights for a deeper understanding.
Important Generation Parameters
Temperature
Temperature controls the randomness of the model's outputs:
Low temperature (0.2-0.5): More predictable, conservative outputs
High temperature (0.8-1.0+): More random, diverse, and potentially creative outputs
Think of temperature as the "creativity dial" - higher settings produce more surprising and varied responses, while lower settings keep the AI more focused and predictable.
Knowledge Cutoff
The knowledge cutoff refers to the date until which the model has been trained on data. Models don't continuously learn from the internet - they have a specific cutoff date for their knowledge.
For example, if a model has a knowledge cutoff of January 2024, it won't know about events that happened after that date unless you tell it about them in your prompt.
Semantic Meaning
GPT's ability to understand semantic meaning involves recognizing the context and relationships between words rather than just analyzing words in isolation. It tries to predict the next word by understanding what makes sense conceptually in the given context.
For example, in "I went to the hospital because I was _____", the model understands that words like "sick," "injured," or "bleeding" would make semantic sense, while "happy" or "delicious" would not.
Conclusion
GPT models represent a remarkable achievement in artificial intelligence. By understanding key concepts like tokenization, embeddings, position encoding, and self-attention, we can better appreciate how these systems work.
At their core, GPT models learn patterns from vast amounts of text and use these patterns to predict what should come next, token by token. This approach, combined with the transformer architecture's attention mechanism, has created AI systems capable of producing human-like text that was unimaginable just a few years ago.
When working with these models, remember that they're not truly "understanding" text as humans doโthey're making sophisticated predictions based on patterns they've observed. This knowledge helps us use these tools more effectively and responsibly in our writing, coding, and communication tasks.
To learn more about:
Attention Is All You Need - Whitepaper by Google - 2017
Subscribe to my newsletter
Read articles from Yash Pandav directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Yash Pandav
Yash Pandav
I am Yash Pandav, with a strong foundation in programming languages including ๐ ๐๐ซ๐, ๐ ๐๐ซ๐๐๐๐ง๐๐ฅ๐ฉ, and ๐พ, and I specialize in ๐๐ช๐ก๐ก-๐จ๐ฉ๐๐๐ ๐ฌ๐๐ ๐๐๐ซ๐๐ก๐ค๐ฅ๐ข๐๐ฃ๐ฉ using ๐๐๐๐๐ฉ.๐๐จ, ๐๐ค๐๐.๐๐จ, ๐๐ญ๐ฅ๐ง๐๐จ๐จ.๐๐จ, and ๐๐ค๐ฃ๐๐ค๐ฟ๐ฝ. My experience includes building scalable web applications, optimizing backend performance, and implementing RESTful APIs. I'm also well-versed in ๐๐๐ฉ & ๐๐๐ฉ๐๐ช๐, ๐๐๐ฉ๐๐๐๐จ๐ ๐ข๐๐ฃ๐๐๐๐ข๐๐ฃ๐ฉ, and ๐๐ก๐ค๐ช๐ ๐ฉ๐๐๐๐ฃ๐ค๐ก๐ค๐๐๐๐จ like ๐ผ๐ฅ๐ฅ๐ฌ๐ง๐๐ฉ๐ and ๐พ๐ก๐ค๐ช๐๐๐ฃ๐๐ง๐ฎ.I'm also exploring the world of ๐ฟ๐๐ฉ๐ ๐๐๐๐๐ฃ๐๐, with hands-on work in data analysis, visualization, and ML fundamentals. Recently, I dove deep into the world of Generative AI through the GenAI Cohort, where I built intelligent RAG-powered applications that bridge unstructured data (PDFs, CSVs, YouTube) with LLMs. This has opened doors to developing more advanced, context-aware AI systems.or platforms like Twitter or LinkedIn bio sections?