Simplifying GenAI: A Beginner's Guide


Imagine Babu Bhaiya trying to run a tech startup instead of a garage. Raju wants to build an AI model to earn quick money, and Shyam’s stuck understanding how it all works.
If that sounds chaotic but hilarious—welcome to the world of Generative AI and Large Language Models (LLMs).
What is GenAI?
Let’s say Raju wants to write love letters automatically to multiple girls (classic Raju move). He doesn’t want to write each one manually, so he uses Generative AI (GenAI).
GenAI is a type of artificial intelligence that can generate new content like:
Text (love letters ✅)
Images (fake lottery tickets ✅)
Music, code, even videos!
What is an LLM?
Now, the tool Raju uses is called a Large Language Model (LLM)—like ChatGPT.
It’s like teaching Babu Bhaiya a million love letters and asking him to write his own. He can now guess what words should come next, based on what he has read.
LLMs are a part of GenAI, focused only on text-based content.
Tokenization – "Utha le re deva… lekin words ko!"
Before an LLM can understand text, it needs to break it down. That’s called tokenization.
Let’s take this dialogue:
"Utha le re deva, utha le."
Tokenization breaks this sentence into smaller parts (called tokens), like:
["Utha", "le", "re", "deva", ",", "utha", "le", "."]
For an example here is GTP-4o & GPT-4o mini tokeniser
How to create token Using tiktoken
(used by GPT models like GPT-4o)
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
text = "Utha le re deva, utha le."
tokens = enc.encode(text)
print(tokens)
#Output
""
[52, 18819, 505, 322, 334, 2873, 11, 337, 18819, 505, 13]
""
This tokenizer breaks the sentence into subword units or byte-pair encodings (BPE), which is how models like GPT-4 handle large vocabularies efficiently.
Vector Embeddings – “Paise ka rang gulabi hai!”
Once we have tokens, we need to convert them into numbers (because computers don’t understand text, only numbers). This is called embedding.
Each word is turned into a vector — a list of numbers that represent its meaning and context.
For example, “Paise” and “Rupees” will have similar embeddings because they’re used similarly.
Imagine Babu Bhaiya sees a pink note and says, “Paise ka rang gulabi hai!” That “gulabi” is like the vector capturing the value, context, and meaning of that money.
This is what vector embeddings look like.
How to create vector embeddings using text-embedding-3-small
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI()
dialogs = "Utha le re baba"
response = client.embeddings.create(
model="text-embedding-3-small",
input=dialogs
)
print("Vector embeddings:", response) -> This will be all Vector embeddings
print("Length of embedding vector for first dialogue:", len(response.data[0].embedding))
-> Length of embedding vector for first dialogue: 1536
Positional Encoding – “Time kya ho raha hai?”
Now let’s talk about positional encoding—because word order matters.
“150 Rs dega” ≠ “Dega 150 Rs”
To make sure the model knows the position of each word, we add positional encoding to each token's embedding. It’s like giving every word a watch so it knows when to speak.
In the LLM world, positional encodings help models understand who said what and when—like keeping track of Babu Bhaiya’s endless rants.
Transformers – "Attention is all you need"
In 2017, Google published the groundbreaking paper "Attention Is All You Need", introducing the Transformer model.
It revolutionized AI by introducing the self-attention mechanism, allowing models to focus on relevant parts of a sentence, no matter how far apart the words are.
Example:
In the sentence:
"Raju said to Babu Bhaiya that Shyam ate the banana."
To understand who ate the banana, the model uses attention to focus on “Shyam,” not “Raju” or “Babu Bhaiya.”
It’s like in Hera Pheri—when the phone rings and everyone is fighting, but only Shyam pays attention to the caller ID. He attends to what matters.
What is Self-Attention?
In GenAI, especially in Transformers, self-attention helps the model decide which words in a sentence to focus on, when trying to understand the meaning or generate a response.
Let’s break it down with scenes from Phir Hera Pheri:
Scene Setup: The Golden Gun & The Fake Deal
Remember the scene where:
Raju gets involved in the "double-your-money" scheme.
Shyam is skeptical.
Babu Bhaiya is just excited to become a crorepati.
Everyone is talking over each other. Now imagine an AI trying to understand what’s important in that conversation.
Self-attention allows a model to look at all words in a sentence and decide which ones are most important when processing each individual word.
Technically self-attention give the vector embeddings a chance to talk to each other
It’s like Babu Bhaiya hearing “gun,” and suddenly paying extra attention—even if that word was spoken much earlier or later in the sentence.
Example: Dialogue
“Us aadmi ke paas golden gun hai. Agar paisa nahi diya toh goli maar dega.”
Let’s say the AI is processing the word: “goli” (bullet).
To understand it, the model uses self-attention to focus on:
“gun”
“golden gun”
“aadmi”
Even if “gun” appeared earlier, self-attention connects the dots. Just like how Shyam starts panicking after realizing what "golden gun" really means—he mentally links that to "goli."
What is Backpropagation?
Backpropagation (short for “backward propagation of errors”) is an algorithm used to train artificial neural networks by adjusting the network’s weights to minimize the difference between the predicted output and the actual output (the error).
Imagine the three main characters — Baburao, Raju, and Shyam — are trying to deliver a message (like a prediction) to their friend but it often goes wrong.
The message they deliver is the network’s output.
The friend’s reaction tells them if the message was right or wrong — this is like the error or loss.
They want to improve their message delivery to get a perfect reaction next time.
Step 1: Forward Pass — Message delivery
Baburao tells a joke → Raju repeats it → Shyam delivers it to the friend.
The friend reacts (laughs or not). This is the prediction.
Step 2: Calculate Error — Friend’s reaction
If the friend doesn’t laugh (bad reaction), the error is high.
If the friend laughs (good reaction), the error is low.
So, the characters see how far off their message was from the perfect joke.
Step 3: Backward Pass (Backpropagation) — Who messed up?
They ask, “Who messed up the joke?”
Maybe Baburao told it wrong, or Raju repeated it badly, or Shyam delivered it with a bad tone.
Backpropagation is like tracing the error backward through the chain:
Shyam thinks, “Maybe I should deliver more clearly next time.”
Raju thinks, “Maybe I should repeat the joke correctly.”
Baburao thinks, “Maybe I should tell a better joke.”
Each character adjusts their part to improve the overall message.
Concept | Hera Pheri Example | AI Equivalent |
GenAI | Writing fake love letters | Creating text/images/videos |
LLM | Raju memorizing and generating letters | Text generation model |
Tokenization | Babu splitting words like chhutta | Breaking text into smaller parts |
Embeddings | "Gulabi note" meaning value | Word meanings as vectors |
Positional Encoding | Knowing who spoke when | Order of words in a sentence |
Attention | Shyam noticing the caller ID | Focusing on important words |
Transformers | All characters working (somehow) together | Powerful architecture for GenAI |
Subscribe to my newsletter
Read articles from Ujjawal Kant directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
