Decoding AI Jargons with Chai

Greetings to everyone!

This article is for everyone curious about AI jargons—you don't need to be an AI student to understand them.

In today’s world, everyone is trying to use AI to make their work easier. Some are curious to know how it works—how ChatGPT or similar tools generate text that feels like a human is chatting with us. Others start to explore but get stuck on understanding terms like tokenization, knowledge cutoff, etc. They think something magical is happening that they can’t understand—but that’s not true.

Just stay with me till the end, and you’ll have a good grasp of these AI jargons. So, today’s article is all about decoding these terms while enjoying a nice sip of Chai (Tea).


Knowledge Cutoff 🧠📅

Have you noticed that earlier, ChatGPT-like tools didn’t respond well to recent events or research? Sometimes they said things like: "I have knowledge up to July 2022." What does this mean?

It means these tools are trained on data only up to a certain date—they don’t know about anything after that. This is called the Knowledge Cutoff.

And there you go! You now understand your first AI jargon: Knowledge Cutoff. 👏

Now, you might wonder: Why don’t developers update the model daily if it helps give better results?

Great question! But currently, it's not feasible. The data these models train on is scraped from the web and must go through multiple stages before training. Training on huge datasets also requires massive computing resources. So retraining every day just isn’t practical.


How ChatGPT Talks About Current Events 🧠🔍

So, if ChatGPT isn’t updated daily, how does it know about recent events?

The answer: ChatGPT is just a tool. It uses a model behind the scenes. That model may not know current events, but if we give it context, it can answer the questions.

So what ChatGPT does is: it first searches the web, grabs content related to your question, and feeds both the content and your question to the model. Now the model has enough context to respond meaningfully.


Let’s Get Technical (But Easy!) 🤖

Let’s say you ask ChatGPT: “Can you tell me about the future of AI?”

You probably know that computers don’t understand human language like English or Hindi. They understand numbers.

So yes—you guessed it—we need to convert your sentence into numbers!

Step 1: Tokenization 🔤

First, the sentence is broken into smaller chunks. For easier understanding, let’s break it into words:

Sentence: “Can you tell me about the future of AI?”

Tokens: [‘Can’, ‘you’, ‘tell’, ‘me’, ‘about’, ‘the’, ‘future’, ‘of’, ‘AI’, ‘?’]

This step is called Tokenization—dividing a sentence into smaller parts called tokens.


Step 2: Vectorization 🔢

Now we need to convert these words into numbers. Every word is mapped to a number using a dictionary:

Example mapping:

  • ‘Can’ → 23

  • ‘you’ → 48

  • ‘tell’ → 97

  • ... and so on

So we get something like: [23, 48, 97, 12, 64, 9, 102, 8, 3, 77]

This is called Vectorization—converting text into machine-readable numbers.

Vocab Size refers to the number of unique words the model knows from training. The larger the vocab size, the more words the model understands.


Step 3: Word Embeddings 🧬

Now, those numbers are still too simple. Each number is converted into a high-dimensional vector (like 700+ dimensions) using embedding techniques.

Example: ‘AI’ → 3 → [0.23, -0.76, 0.14, ..., 0.87]

This vector captures hidden features like “technology,” “intelligence,” “modern,” etc. Similar words (like “robotics” or “machine”) will have similar vectors.


Step 4: Positional Encoding 📍

We don’t just care about the words—we care about their order in the sentence.

So, we add Positional Encoding to each word’s vector to represent where it appears in the sentence.

Now we understand not just the word but also where it appears!


Step 5: Attention Mechanism 👀

The meaning of a word often depends on the words around it. For example:

  • Sentence 1: river bank

  • Sentence 2: ICICI bank

“Bank” means something different in both sentences. So, we let each token look at all other tokens to adjust its meaning—this is called Self-Attention.

Multihead Attention means doing this in multiple parallel layers. Each layer focuses on different relationships like grammar, context, or sentiment.

This step makes the model understand the true meaning of each word in context.


Final Steps: Training and Inference 🏋️‍♂️➡️🧠

Now that the sentence has meaningful vectors, the model can use them.

  • Training Phase: The model learns by predicting the next token. If it gets it wrong, it updates itself and tries again.

  • Inference Phase: When we use the model, it predicts the next token step-by-step, picking the one with the highest probability using something called Softmax.

The temperature setting decides how “creative” the model gets. A low temperature means it picks the most likely word; a high one adds randomness.

Once all tokens are predicted, it converts the vectors back to text—and there’s your answer!


In Conclusion 🌟

ChatGPT-like models just predict the next word—but because they’re trained on such massive data, it feels magical.

So, now you know:

  • What tokenization is

  • What a knowledge cutoff means

  • What embeddings are

  • What attention does

All while sipping on your chai ☕

Hope you found this article helpful!

Thank you!

0
Subscribe to my newsletter

Read articles from Himanshu Upadhyay directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Himanshu Upadhyay
Himanshu Upadhyay