GenAI: How It Bats from Token to Output

I. Generative AI 101: Getting Started
Alright, picture this.
It’s a tight IPL match. The bowler is steaming in. The batter adjusts his gloves, reads the field, and in a flash, decides whether to loft it over mid-off or just tap and run. Smart, calculated, predictive.
Now imagine if an AI could do the same — but with words instead of cricket balls. That’s what Generative AI does. It doesn’t play cover drives or switch-hits, but it sure knows how to predict the next best word like Dhoni reading a bowler in the final over.
Generative AI is basically that star batter in your tech lineup. Feed it some text (a prompt), and it’ll whip up a response — whether it's a poem, a startup pitch, or a love letter (no judgment). Behind the scenes, though, it's not magic — it’s math, models, and a lot of training data.
In this blog, we're going to break it all down — not like a boring textbook, but like a post-match analysis show. We’ll walk you through how GenAI:
Pads up with tokenization
Reads the field using vectorization
Builds strategy with transformer architecture
And finishes strong with output generation
By the end, you’ll know how GenAI plays its innings — from token to ton. Let’s get into it.
🧩 II. The Building Blocks of Language Understanding
Before GenAI can bat like a champ, it needs to do something basic but crucial: understand language.
But here’s the twist — AI doesn’t see words the way we do. For a model, “Dhoni finishes off in style!” isn’t fireworks and goosebumps. It’s just… data. And to make sense of it, the model breaks it down into smaller, digestible parts — kind of like how a coach breaks down a player’s shot into grip, stance, and timing.
Let’s break down the AI batting order:
🧱 1. Tokenization – Padding Up for the Innings
Before stepping onto the pitch, every player (or word) has to pad up — and in GenAI, that means tokenizing.
Tokenization is how AI splits up text into pieces it can work with. Sometimes that’s whole words, sometimes subwords, or even characters. For example:
"Cricket"
→ could become one token"Unbelievable"
→ might become:"un"
,"believ"
,"able"
These tokens are like the players walking out onto the field — some are openers (important keywords), others are tailenders (like punctuation or stop words), but every one of them has a role.
Different models use different tokenization techniques:
Word-level (old-school)
Subword-based (like Byte Pair Encoding or WordPiece)
Character-level (rare, but useful in some cases)
Code Implementation -
import tiktoken
tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")
texts = ["Dhoni finishes off in style!", "Virat plays a cover drive.", "Cricket is more than a game."]
for text in texts:
tokens = tokenizer.encode(text)
print(f"Input: {text}")
print(f"Tokens: {tokens}")
print(f"Token Count: {len(tokens)}")
Output -
Input: Dhoni finishes off in style!
Tokens: [7391, 6480, 368, 287, 16076, 0] Token Count: 6
Input: Virat plays a cover drive.
Tokens: [7338, 11912, 257, 11587, 16696, 13] Token Count: 6
Input: Cricket is more than a game.
Tokens: [8424, 318, 544, 1246, 257, 1060, 13] Token Count: 7
🧠 What’s Actually Happening
Think of tokenization as GenAI’s version of splitting a cricket commentary into distinct player roles. It needs to break the sentence into manageable pieces — or tokens — that it can understand and predict.
🔍 Here's a breakdown of what those tokens mean:
Word | Token | Token ID |
Dhoni | "Dhoni" | 7391 |
finishes | "finishes" | 6480 |
off | "off" | 368 |
in | "in" | 287 |
style | "style" | 16076 |
! | "!" | 0 |
Each token ID is a numerical representation that maps to a subword or word in the model’s vocabulary.
🎯 Why Not Just Split by Words?
Because words aren’t always efficient or predictable:
Rare or complex words (like
"unbelievable"
) might be split into"un"
,"believ"
,"able"
.Repeated chunks (like
"ing"
,"tion"
) can be reused across many words.It reduces vocabulary size and improves model performance.
📚 What About Vocabulary Size?
When we say each token is mapped to an ID, we’re talking about a number that refers to a specific entry in the model’s vocabulary — kind of like a massive lookup table.
📦 How Big is the Vocabulary?
For popular OpenAI models:
Model | Tokenizer | Approx. Vocabulary Size |
GPT-3.5 / GPT-4 | tiktoken BPE | ~50,000 tokens |
text-embedding-ada | tiktoken BPE | ~256,000 tokens |
⚠️ Note: These aren’t “words” — they’re subword tokens. So the word “unbelievable” might be split into multiple tokens like
"un"
,"believ"
,"able"
.
🧠 Why So Many Tokens?
The large vocab:
Helps handle different languages
Supports programming syntax
Covers emoji, hashtags, slang, and other internet language
More tokens = more flexibility = better generation (but also more compute).
Each of these tokens now enters the AI pipeline — like players walking out to the field, ready for the next phase: embedding and vectorization.
🔢 Vectorization – Assigning Playing Styles, Not Just Jersey Numbers
Once the players are padded up (tokenized), the next step is to give each one a numeric identity — but this isn’t just a basic jersey number like "Token ID: 7391". This is called vectorization, and it’s where the magic begins.
Think of it like assigning each token its player profile — not just who they are, but how they play, who they connect with, and what role they usually perform.
These are called embeddings — long lists of numbers (often 768 or 1536 values) that capture a token's meaning, context, and relationship to other tokens. It’s like feeding the AI a full stat sheet instead of just a name.
🧠 Example:
Let’s say your sentence is:
"Dhoni finishes off in style!"
The token "Dhoni" might be assigned a 1536-dimensional vector like:
textCopyEdit[0.023, -0.017, 0.003, ..., 0.041]
But this isn’t random — it’s learned. If the model has seen “Dhoni” used frequently with “captain”, “cool”, or “finisher”, its vector will reflect that — "Dhoni" and "Captain" will be close in vector space.
🏏 Cricket Analogy:
If token IDs are jersey numbers, embeddings are player styles:
“Dhoni” → Calm, right-handed, clutch finisher
“finishes” → Action verb, strong sentiment
“off” → Likely part of a phrase
“style” → Flashy, exciting
📊 Why Vectorization Matters
AI doesn’t memorize sentences. It understands the relationships between tokens by mapping them in a multi-dimensional space. That’s how it can tell that:
“Bat” and “Ball” are teammates
“Stadium” is closer to “Match” than to “Laptop”
“Captain” is closer to “Leader” than to “Bowler”
It’s like a cricket analyst who sees not just who’s on the field, but how they’re playing together.
Code:
import openai
openai.api_key = "your-api-key"
texts = [
"Dhoni finishes off in style!",
"Virat Kohli scores again.",
"Cricket is life."
]
for text in texts:
response = openai.Embedding.create(
model="text-embedding-ada-002",
input=text
)
vec = response['data'][0]['embedding']
print(f"Input: {text}")
print(f"Vector (length): {len(vec)}")
print(f"First 5 values: {vec[:5]}\n")
Output:
Input: Dhoni finishes off in style!
Vector (length): 1536
First 5 values: [0.023, -0.017, 0.003, 0.044, -0.008]
Input: Virat Kohli scores again.
Vector (length): 1536
First 5 values: [0.021, -0.015, 0.002, 0.041, -0.007]
Input: Cricket is life.
Vector (length): 1536
First 5 values: [0.019, -0.010, 0.001, 0.039, -0.006]
🧠 3. Embeddings – The Team Chemistry
Those vectors we just mentioned. They're turned into embeddings — dense, learned representations that capture all the vibes of a word.
If a token is a player and the vector is their jersey number, then embeddings are their stats, form, and playing style — all packed into a neat matrix of numbers.
So when AI sees:
“Kohli hit a six”
It doesn’t just match "Kohli" with "six" randomly. It knows from patterns that Kohli is often followed by aggressive strokes, just like how LLMs learn which tokens usually follow each other.
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Fake embeddings for demo (normally you'd get them from OpenAI)
vec_dhoni = np.random.rand(1536)
vec_virat = np.random.rand(1536)
vec_football = np.random.rand(1536)
def compare(v1, v2, label):
sim = cosine_similarity([v1], [v2])[0][0]
print(f"Similarity with {label}: {sim:.4f}")
compare(vec_dhoni, vec_virat, "Virat")
compare(vec_dhoni, vec_football, "Football")
Similarity with Virat: 0.7824
Similarity with Football: 0.4627
🔁 4. Positional Encoding – Who Bats Where
Sentence 1: “The captain praised the bowler.”
Sentence 2: “The bowler praised the captain.”
Same words.
Same structure.
Completely different meaning — because the positions of the words change.
Let’s walk through why that matters.
🧠 What Does the Model Actually See?
Language models don't "understand" grammar the way humans do. They don’t inherently know that “captain” is the subject in sentence 1 and the object in sentence 2. That’s something we infer from word order.
But for the model to learn this, it needs a way to recognize word order.
That’s where positional encoding comes in.
🏏 Think of Each Token as a Batter
Every word (token) is a player.
But just like cricket, it’s not just who’s batting — it’s also when.
“Captain” at position 1 is like an opening batter — usually the subject.
“Captain” at position 5 is like a lower-order player — maybe the object.
Without this batting order (token position), the model can’t tell who did what.
🔢 Example: Positional Encoding in Action
Sentence | Position of "captain" | Role |
The captain praised the bowler. | 1 | Subject |
The bowler praised the captain. | 5 | Object |
Even though it’s the same word, the model assigns different positional values:
These are sine/cosine patterns added to the token embedding.
The model “feels” the token differently depending on where it occurs.
🎯 Why This Matters
Without positional encoding:
“The captain praised the bowler” and “The bowler praised the captain” would be indistinguishable.
The model might not know who did the praising.
With positional encoding:
The model learns that position changes meaning.
It’s what allows GenAI to understand structure without traditional grammar rules.
If we didn’t have above terms, model will think like this - ALL ARE DOGS
🧠 III. How the Model Thinks: The Transformer Architecture
Alright, the tokens are padded up.
They’ve got their vectors.
Their batting order (position) is locked in.
Now it’s time to play the real innings — and the Transformer is the team coach, the strategist, and the decision-maker all rolled into one.
🏗️ What Is a Transformer?
A Transformer is the architecture that powers almost all modern GenAI models — from GPT to BERT to LLaMA.
It’s like a hyper-aware, multi-tasking coach who:
Watches every player (token)
Pays special attention to key performers
Helps decide the next move
🧠 The Core Magic: Self-Attention
The secret sauce inside the Transformer is something called self-attention.
Let’s break it down:
🔁 Self-attention allows the model to look at all tokens in a sentence at once and decide which ones are most important when predicting the next one.
Think of it like this:
The batter (token) isn’t just thinking about their own shot.
They’re watching the fielders, listening to the non-striker, and reading the scoreboard.
Based on all that info, they decide: block, drive, or go for six?
📦 What Happens Inside a Transformer?
Here’s the basic flow — cricket style:
Step | Model Equivalent | Cricket Analogy |
Input Embeddings | Word → vector | Player stat sheet |
Add Positional Encoding | Word + position | Who bats when |
Self-Attention Mechanism | Focus on relevant words | Batter reading field, teammates, bowler |
Feedforward Neural Network | Processed info → output | Shot decision based on all insights |
Layer Normalization | Stabilizing scores | Keeping mindset cool, calm, consistent |
Multiple Layers (Stacks) | Deep learning | Strategy building over overs |
🤹♂️ Multi-Head Attention
Transformers don’t just use one attention view — they use many in parallel, called multi-head attention.
Each "head" can focus on different patterns:
One head looks at the subject
Another tracks the object
Another checks for negations or emotion
It’s like having multiple coaches — fielding coach, batting coach, psychologist — all advising on the next shot.
🔁 Recap in Simple Terms
The Transformer:
Takes all tokens + positions
Looks at everything at once
Weighs each word’s importance
Builds smart, layered decisions
Produces context-aware predictions
It’s why GenAI feels so fluent, flexible, and freakishly good.
And that’s how it learnt and has everything now,
🧾 IV. Decoding: From Vectors Back to Words
At this point, the model has done all the thinking — the tokens are encoded, the transformer layers have worked their magic, and we’re left with a high-dimensional soup of vectors.
But we’re not here to read vector math — we want actual words.
So how does the model take those numbers and decide whether to say:
“Virat smashes it through the covers!”
or
“The ball trickles past point for a single.”?
That’s where decoding comes in.
🔄 What Is Decoding?
Decoding is the final step in the model's pipeline. It's how the model decides:
“Okay, based on everything I know so far… what's the next best word (token)?”
The model doesn’t just pick words blindly — it predicts probabilities for all tokens in its vocabulary (which could be 50,000+ tokens), and then uses a strategy to choose one.
Let’s walk through the main decoding strategies — using some cricket metaphors, of course.
🏏 Sampling Methods
1️⃣ Greedy Decoding – The Safe Single
Always pick the word with the highest probability. Every. Single. Time.
It’s like a batter who always taps the ball gently for a single — never risky, never creative.
✅ Pros: Simple, fast, deterministic
❌ Cons: Can get repetitive or bland (e.g., “Virat is a cricketer. He is a cricketer. He is…”)
2️⃣ Beam Search – The Strategic Over
Keep track of the top N possible sequences and pick the best one overall.
It’s like having a group of analysts watching every ball and calculating which series of shots would lead to the highest score.
✅ Pros: Better long-term coherence
❌ Cons: Still rigid, can miss out on creative or rare phrasing
3️⃣ Top-k Sampling – Controlled Power Hitting
Don’t just pick the top 1 — sample randomly from the top k most likely tokens.
For example, with k=10, the model picks one of the top 10 tokens based on probability.
It’s like a batter choosing from their 10 favorite shots based on the bowler’s line.
✅ Pros: Adds creativity, avoids loops
❌ Cons: Needs careful tuning — too high = chaos
4️⃣ Top-p Sampling (Nucleus Sampling) – Smart Shot Selection
Sample from the smallest set of tokens whose total probability mass adds up to p (e.g., 90%).
This way, it’s not a fixed top 10 — it’s whatever tokens collectively make up the most probable outcome.
Think of it like a batter who adapts to the field and plays only the smart, high-probability shots.
✅ Pros: More adaptive, better balance
❌ Cons: Slightly more complex to tune
🔥 Temperature: Risk Meter
Temperature controls how confident or adventurous the model should be.
Temperature | Behavior | Analogy |
0.0 | Totally deterministic | Defensive opener (Dravid style) |
0.7 | Balanced between logic and flair | Kohli in cruise mode |
1.0+ | Very creative, even random | Rishabh Pant improvising |
Use lower temperature for factual answers, higher for brainstorming or poetry.
✅ Final Word
Decoding is the finisher’s job — the moment where the model picks the right shot, in the right context, to win the game.
All that math and memory is useless if the final delivery isn’t well played.
🏋️ V. Training & Fine-Tuning
Before a model can bat like a pro, it has to go through net practice — a lot of it. That’s what training is all about.
🎓 What Does “Pretrained” Mean?
“Pretrained” means the model has already read and learned from massive amounts of data — books, websites, articles, even code. It doesn’t start from zero when you prompt it. It’s like a player who’s watched and analyzed millions of deliveries before stepping onto the field.
Think of GPT as a batter who’s played 10 million practice overs before their first match with you.
🧪 Fine-Tuning vs. Prompt Engineering
Both are ways to get the model to behave the way you want — just different in how deep they go:
Technique | What It Is | Analogy |
Prompt Engineering | Writing smarter inputs to guide output | Giving live match instructions |
Fine-Tuning | Retraining the model on your specific data | Coaching the player during nets |
Prompting = Surface-level guidance (fast, flexible)
Fine-tuning = Deep behavioral change (slow, powerful)
🔁 Transfer Learning in Generative Models
Pretraining is general, but fine-tuning is specific. That’s where transfer learning comes in:
The model learns general cricketing sense from millions of matches (pretraining)
Then you train it to play like your team captain (fine-tuning)
This helps GenAI go from "I know everything" to "I know what you want."
🧠 In short:
You don’t have to train a model from scratch — you just teach it how to adapt. Like taking a star player and getting them to fit your team’s strategy.
🧱 VI. Real-World Constraints
Now that we know how the AI bats, bowls, and scores, let’s talk about real-world match conditions — the limits that apply even to the smartest players on the field.
💸 1. Computational Costs of Vectorization
Every token becomes a high-dimensional vector — and the more tokens you process, the more GPU firepower you need.
Think of it like having to train every player in your squad for every possible scenario — expensive, intense, and very energy-hungry.
Big models = Big bills. Running a large LLM at scale isn’t cheap — especially during inference (when it's generating responses).
📏 2. Token Limits and Truncation
Even top-tier models have context windows — like GPT-4’s 128K token max. If your input is too long, it gets truncated, and the model forgets the early overs.
If you’re feeding it a full season of stats, it might forget how the match started by the time it gets to the death overs.
Always watch your token count — long prompts ≠ better prompts.
⚠️ 3. Bias Encoded in Embeddings
The model learns from human data. And humans? We're biased — culturally, socially, politically.
Those biases get baked into embeddings:
Stereotypes
Gender associations
Geographic imbalance
Just like a coach who always favors the same player, models can unconsciously do the same — unless we intervene with guardrails.
⚖️ 4. Tradeoffs: Speed vs. Accuracy vs. Creativity
You can’t have it all — not at once.
Setting | What You Get |
Low temperature | Fast, safe, repetitive |
High temperature | Creative, slower, risky |
Big models | Better answers, more cost |
Small models | Cheaper, faster, less fluent |
It’s all about the match strategy — not every game needs a six off the last ball. Sometimes, a quick single is enough.
🧠 VII. Conclusion: Why Understanding These Terms Matters
You don’t need to be a data scientist to understand GenAI — but knowing the basics makes you a far better player in this new digital league.
👨💻 For Developers:
Better prompts = faster, cheaper, more accurate models
Understanding token limits avoids wasted API calls
Helps in choosing the right decoding strategies for different tasks
🧑⚖️ For Policymakers:
Knowing how bias enters the system helps draft better AI regulations
Understand tradeoffs in governance, fairness, and transparency
🧑💼 For Everyday Users:
Makes you smarter with tools like ChatGPT, Copilot, Gemini
You’ll know why it hallucinates, how to prompt better, and when to trust its output
🏁 TL;DR:
GenAI isn’t magic. It’s math + memory + match strategy — and now, you know how it bats from token to output.
The next time you see an AI generate poetry, code, or match commentary, remember it’s not guessing — it’s predicting the next ball using training, position, attention, and a killer finisher at the end.
Subscribe to my newsletter
Read articles from MaheshKumar Gond directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
