GenAI Basics: A Beginner's Guide

Imagine Babu Bhaiya trying to run a tech startup instead of a garage. Raju wants to build an AI model to earn quick money, and Shyam’s stuck understanding how it all works.

If that sounds chaotic but hilarious—welcome to the world of Generative AI and Large Language Models (LLMs).

What is GenAI?

Let’s say Raju wants to write love letters automatically to multiple girls (classic Raju move). He doesn’t want to write each one manually, so he uses Generative AI (GenAI).

GenAI is a type of artificial intelligence that can generate new content like:

Text (love letters ✅)
Images (fake lottery tickets ✅)
Music, code, even videos!

What is an LLM?

Now, the tool Raju uses is called a Large Language Model (LLM)—like ChatGPT.

It’s like teaching Babu Bhaiya a million love letters and asking him to write his own. He can now guess what words should come next, based on what he has read.

LLMs are a part of GenAI, focused only on text-based content.

Tokenization – "Utha le re deva… lekin words ko!"

Before an LLM can understand text, it needs to break it down. That’s called tokenization.

Let’s take this dialogue:

"Utha le re deva, utha le."

Tokenization breaks this sentence into smaller parts (called tokens), like:

["Utha", "le", "re", "deva", ",", "utha", "le", "."]

For an example here is GTP-4o & GPT-4o mini tokeniser

How to create token Using tiktoken (used by GPT models like GPT-4o)

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
text = "Utha le re deva, utha le."
tokens = enc.encode(text)
print(tokens)

#Output
""
[52, 18819, 505, 322, 334, 2873, 11, 337, 18819, 505, 13]
""

This tokenizer breaks the sentence into subword units or byte-pair encodings (BPE), which is how models like GPT-4 handle large vocabularies efficiently.

Vector Embeddings – “Paise ka rang gulabi hai!”

Once we have tokens, we need to convert them into numbers (because computers don’t understand text, only numbers). This is called embedding.
Each word is turned into a vector — a list of numbers that represent its meaning and context.
For example, “Paise” and “Rupees” will have similar embeddings because they’re used similarly.
Imagine Babu Bhaiya sees a pink note and says, “Paise ka rang gulabi hai!” That “gulabi” is like the vector capturing the value, context, and meaning of that money.

This is what vector embeddings look like.

How to create vector embeddings using text-embedding-3-small

from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

dialogs = "Utha le re baba"

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=dialogs
)

print("Vector embeddings:", response) -> This will be all Vector embeddings 

print("Length of embedding vector for first dialogue:", len(response.data[0].embedding)) 
-> Length of embedding vector for first dialogue: 1536

Positional Encoding – “Time kya ho raha hai?”

Now let’s talk about positional encoding—because word order matters.
“150 Rs dega” ≠ “Dega 150 Rs”
To make sure the model knows the position of each word, we add positional encoding to each token's embedding. It’s like giving every word a watch so it knows when to speak.
In the LLM world, positional encodings help models understand who said what and when—like keeping track of Babu Bhaiya’s endless rants.

Transformers – "Attention is all you need"

In 2017, Google published the groundbreaking paper "Attention Is All You Need", introducing the Transformer model.

It revolutionized AI by introducing the self-attention mechanism, allowing models to focus on relevant parts of a sentence, no matter how far apart the words are.

Example:

In the sentence:

"Raju said to Babu Bhaiya that Shyam ate the banana."

To understand who ate the banana, the model uses attention to focus on “Shyam,” not “Raju” or “Babu Bhaiya.”

It’s like in Hera Pheri—when the phone rings and everyone is fighting, but only Shyam pays attention to the caller ID. He attends to what matters.

What is Self-Attention?

In GenAI, especially in Transformers, self-attention helps the model decide which words in a sentence to focus on, when trying to understand the meaning or generate a response.

Let’s break it down with scenes from Phir Hera Pheri:

Scene Setup: The Golden Gun & The Fake Deal

Remember the scene where:

Raju gets involved in the "double-your-money" scheme.
Shyam is skeptical.
Babu Bhaiya is just excited to become a crorepati.

Everyone is talking over each other. Now imagine an AI trying to understand what’s important in that conversation.

Self-attention allows a model to look at all words in a sentence and decide which ones are most important when processing each individual word.

Technically self-attention give the vector embeddings a chance to talk to each other

It’s like Babu Bhaiya hearing “gun,” and suddenly paying extra attention—even if that word was spoken much earlier or later in the sentence.

Example: Dialogue

“Us aadmi ke paas golden gun hai. Agar paisa nahi diya toh goli maar dega.”

Let’s say the AI is processing the word: “goli” (bullet).
To understand it, the model uses self-attention to focus on:

“gun”
“golden gun”
“aadmi”

Even if “gun” appeared earlier, self-attention connects the dots. Just like how Shyam starts panicking after realizing what "golden gun" really means—he mentally links that to "goli."

What is Backpropagation?

Backpropagation (short for “backward propagation of errors”) is an algorithm used to train artificial neural networks by adjusting the network’s weights to minimize the difference between the predicted output and the actual output (the error).

Imagine the three main characters — Baburao, Raju, and Shyam — are trying to deliver a message (like a prediction) to their friend but it often goes wrong.

The message they deliver is the network’s output.
The friend’s reaction tells them if the message was right or wrong — this is like the error or loss.
They want to improve their message delivery to get a perfect reaction next time.

Step 1: Forward Pass — Message delivery

Baburao tells a joke → Raju repeats it → Shyam delivers it to the friend.
The friend reacts (laughs or not). This is the prediction.

Step 2: Calculate Error — Friend’s reaction

If the friend doesn’t laugh (bad reaction), the error is high.
If the friend laughs (good reaction), the error is low.
So, the characters see how far off their message was from the perfect joke.

Step 3: Backward Pass (Backpropagation) — Who messed up?

They ask, “Who messed up the joke?”
Maybe Baburao told it wrong, or Raju repeated it badly, or Shyam delivered it with a bad tone.
Backpropagation is like tracing the error backward through the chain:
- Shyam thinks, “Maybe I should deliver more clearly next time.”
- Raju thinks, “Maybe I should repeat the joke correctly.”
- Baburao thinks, “Maybe I should tell a better joke.”

Each character adjusts their part to improve the overall message.

Concept	Hera Pheri Example	AI Equivalent
GenAI	Writing fake love letters	Creating text/images/videos
LLM	Raju memorizing and generating letters	Text generation model
Tokenization	Babu splitting words like chhutta	Breaking text into smaller parts
Embeddings	"Gulabi note" meaning value	Word meanings as vectors
Positional Encoding	Knowing who spoke when	Order of words in a sentence
Attention	Shyam noticing the caller ID	Focusing on important words
Transformers	All characters working (somehow) together	Powerful architecture for GenAI

Simplifying GenAI: A Beginner's Guide

Table of contents