Decoding AI Jargons with Chai

Table of contents
- 🔄 What Is a Transformer?
- 🧩 Step 1: Tokenization (Making Language Machine-Readable)
- 🧭 But Wait! What About Word Order?
- 🧠 Self-Attention: Letting Words "Talk" to Each Other
- 🔄 Enter GPT: Predicting the Next Character
- 🔥 What Does Temperature Do?
- 🧪 Training vs. Inference: Two Phases of an AI Model
- 🌐 But What About Real-Time Info Like Weather?
- 📚 What's Vocab Size?
- 🔚 Final Step: Decoding the Response
- 🎉 Bringing It All Together
Ever wondered how ChatGPT or other AI tools magically understand what you're typing and respond almost like a human? It's not magic—it's math, models, and a lot of machine learning. One of the most important building blocks behind these tools is a special model called a Transformer.
Let’s break it down in simple terms.
🔄 What Is a Transformer?
A Transformer is a type of model that reads and understands entire sentences all at once, rather than going word by word like earlier models did.
Imagine reading a whole paragraph before trying to answer a question about it. That’s what makes Transformers powerful—they understand context as a whole, not just in bits and pieces.
🧩 Step 1: Tokenization (Making Language Machine-Readable)
When you type something like:
“What’s the weather like in Delhi today?”
An AI doesn’t understand this directly. The first step is to convert your sentence into tokens—small chunks of text that the machine can understand.
This process is called tokenization, and it’s done by a component called the encoder.
Think of tokens as words or even parts of words, depending on the model.
These tokens are specific to the AI model you're using.
Different models = different token definitions.
Once tokenized, these tokens are converted into vector embeddings—basically, numerical representations of words. These vectors capture the semantic meaning of the prompt.
🧭 But Wait! What About Word Order?
Here’s a catch:
Two different sentences like:
“The cat chased the dog.”
“The dog chased the cat.”
...would generate similar tokens and vector embeddings, even though the meaning is completely different.
That’s where positional encoding comes in. It helps the model understand the order of words, ensuring that:
- “The cat chased the dog” ≠ “The dog chased the cat.”
It’s like giving each word a GPS coordinate so the AI knows where each word sits in the sentence.
🧠 Self-Attention: Letting Words "Talk" to Each Other
Words often change their meaning based on context. For example:
"Apple" in “I ate an apple” = fruit
"Apple" in “Apple released a new iPhone” = company
Even though the token is the same, the meaning is not.
That’s why we use something called a self-attention model, where tokens interact with each other. This helps them adjust their vector embeddings based on their surroundings—just like how humans interpret meaning from context.
To make this even more effective, models use a multi-head attention mechanism. It’s like having multiple “perspectives” or “views” on the same sentence to understand it deeply at different layers.
🔄 Enter GPT: Predicting the Next Character
At the heart of models like ChatGPT (which stands for Generative Pre-Trained Transformer) is a simple goal:
Predict the next character or word you're about to type.
But don’t mistake “simple” for “easy.”
GPT does this by being trained on massive amounts of data, learning how language works based on context.
Here’s how the process works:
After tokenization and positional encoding, your prompt goes into the multi-head attention model.
It runs multiple rounds of attention to refine its understanding.
The result is passed through a linear function, which calculates the probability of each possible next token.
These probabilities go into the SoftMax function, which picks the most likely next token.
🔥 What Does Temperature Do?
Ever noticed how AI can either give very to-the-point or creative answers?
That depends on the temperature setting.
Low temperature → chooses high-probability tokens → short, factual responses
High temperature → explores less likely tokens → more creative, diverse responses
It’s kind of like adjusting the “spice level” in your food—more spice (temperature), more variety!
🧪 Training vs. Inference: Two Phases of an AI Model
Every AI model has two phases:
Training Phase
This is when the model learns patterns from huge datasets.
During training, the prompt goes through the multi-head attention system again and again, refining its predictions until it gets better and better.
Inference Phase
This is when you use the model.
By this point, the model already knows how to process language and just applies what it has learned.
🌐 But What About Real-Time Info Like Weather?
Here’s a fun fact:
Most large language models (LLMs) like GPT can’t access live data like the current weather—unless they are specifically trained on it or connected to tools that can.
This training process is expensive and usually happens once or twice a year.
The last time data was fed into the model is called its knowledge cutoff date.
So if you ask an AI:
“What’s the weather in Delhi today?”
...unless it's connected to real-time data, it won’t know. But if you trained it on weather data from yesterday, it could answer about yesterday’s weather just fine.
📚 What's Vocab Size?
The vocab size of a model refers to how many unique tokens it can recognize and generate.
A bigger vocab = more language variety
A smaller vocab = more limitations in expression
🔚 Final Step: Decoding the Response
Once the model picks the best token using SoftMax, that token is still just a number.
To turn that number into something humans can read, it goes through a decoder—a machine that converts tokens back into words, sentences, and paragraphs.
🎉 Bringing It All Together
Here's a quick summary of how your words turn into a smart AI response:
You type something → gets tokenized by the encoder
Tokens → converted into vector embeddings
Embeddings → processed with positional encoding
Tokens talk to each other via self-attention
Multiple layers of multi-head attention refine the context
A linear function calculates the next likely token
SoftMax chooses the best one based on temperature
Final token → converted back into text via the decoder
You get a human-like response. Boom. 💥
AI may seem magical on the surface, but underneath, it’s a clever symphony of math, language patterns, and training.
Next time you chat with an AI, remember: it’s a Transformer doing some serious brainwork in milliseconds. ⚡
Subscribe to my newsletter
Read articles from Rahul Kapoor directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
