GenAI: How It Bats from Token to Output

MaheshKumar GondMaheshKumar Gond
17 min read

I. Generative AI 101: Getting Started

Alright, picture this.

It’s a tight IPL match. The bowler is steaming in. The batter adjusts his gloves, reads the field, and in a flash, decides whether to loft it over mid-off or just tap and run. Smart, calculated, predictive.

Now imagine if an AI could do the same — but with words instead of cricket balls. That’s what Generative AI does. It doesn’t play cover drives or switch-hits, but it sure knows how to predict the next best word like Dhoni reading a bowler in the final over.

Generative AI is basically that star batter in your tech lineup. Feed it some text (a prompt), and it’ll whip up a response — whether it's a poem, a startup pitch, or a love letter (no judgment). Behind the scenes, though, it's not magic — it’s math, models, and a lot of training data.

In this blog, we're going to break it all down — not like a boring textbook, but like a post-match analysis show. We’ll walk you through how GenAI:

  • Pads up with tokenization

  • Reads the field using vectorization

  • Builds strategy with transformer architecture

  • And finishes strong with output generation

By the end, you’ll know how GenAI plays its innings — from token to ton. Let’s get into it.

Ms Dhoni Dhoni Gif Ms Dhoni Dhoni Ms Dhoni Six Discov - vrogue.co


🧩 II. The Building Blocks of Language Understanding

Before GenAI can bat like a champ, it needs to do something basic but crucial: understand language.

But here’s the twist — AI doesn’t see words the way we do. For a model, “Dhoni finishes off in style!” isn’t fireworks and goosebumps. It’s just… data. And to make sense of it, the model breaks it down into smaller, digestible parts — kind of like how a coach breaks down a player’s shot into grip, stance, and timing.

Let’s break down the AI batting order:

🧱 1. Tokenization – Padding Up for the Innings

Before stepping onto the pitch, every player (or word) has to pad up — and in GenAI, that means tokenizing.

Tokenization is how AI splits up text into pieces it can work with. Sometimes that’s whole words, sometimes subwords, or even characters. For example:

"Cricket" → could become one token
"Unbelievable" → might become: "un", "believ", "able"

These tokens are like the players walking out onto the field — some are openers (important keywords), others are tailenders (like punctuation or stop words), but every one of them has a role.

Different models use different tokenization techniques:

  • Word-level (old-school)

  • Subword-based (like Byte Pair Encoding or WordPiece)

  • Character-level (rare, but useful in some cases)


Code Implementation -

import tiktoken

tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")

texts = ["Dhoni finishes off in style!", "Virat plays a cover drive.", "Cricket is more than a game."]

for text in texts:
    tokens = tokenizer.encode(text)
    print(f"Input: {text}")
    print(f"Tokens: {tokens}")
    print(f"Token Count: {len(tokens)}")

Output -

Input: Dhoni finishes off in style! 
Tokens: [7391, 6480, 368, 287, 16076, 0] Token Count: 6

Input: Virat plays a cover drive. 
Tokens: [7338, 11912, 257, 11587, 16696, 13] Token Count: 6

Input: Cricket is more than a game. 
Tokens: [8424, 318, 544, 1246, 257, 1060, 13] Token Count: 7

🧠 What’s Actually Happening

Think of tokenization as GenAI’s version of splitting a cricket commentary into distinct player roles. It needs to break the sentence into manageable pieces — or tokens — that it can understand and predict.

🔍 Here's a breakdown of what those tokens mean:

WordTokenToken ID
Dhoni"Dhoni"7391
finishes"finishes"6480
off"off"368
in"in"287
style"style"16076
!"!"0

Each token ID is a numerical representation that maps to a subword or word in the model’s vocabulary.


🎯 Why Not Just Split by Words?

Because words aren’t always efficient or predictable:

  • Rare or complex words (like "unbelievable") might be split into "un", "believ", "able".

  • Repeated chunks (like "ing", "tion") can be reused across many words.

  • It reduces vocabulary size and improves model performance.


📚 What About Vocabulary Size?

When we say each token is mapped to an ID, we’re talking about a number that refers to a specific entry in the model’s vocabulary — kind of like a massive lookup table.

📦 How Big is the Vocabulary?

For popular OpenAI models:

ModelTokenizerApprox. Vocabulary Size
GPT-3.5 / GPT-4tiktoken BPE~50,000 tokens
text-embedding-adatiktoken BPE~256,000 tokens

⚠️ Note: These aren’t “words” — they’re subword tokens. So the word “unbelievable” might be split into multiple tokens like "un", "believ", "able".


🧠 Why So Many Tokens?

The large vocab:

  • Helps handle different languages

  • Supports programming syntax

  • Covers emoji, hashtags, slang, and other internet language

More tokens = more flexibility = better generation (but also more compute).

Each of these tokens now enters the AI pipeline — like players walking out to the field, ready for the next phase: embedding and vectorization.


🔢 Vectorization – Assigning Playing Styles, Not Just Jersey Numbers

Once the players are padded up (tokenized), the next step is to give each one a numeric identity — but this isn’t just a basic jersey number like "Token ID: 7391". This is called vectorization, and it’s where the magic begins.

Think of it like assigning each token its player profile — not just who they are, but how they play, who they connect with, and what role they usually perform.

These are called embeddings — long lists of numbers (often 768 or 1536 values) that capture a token's meaning, context, and relationship to other tokens. It’s like feeding the AI a full stat sheet instead of just a name.

🧠 Example:

Let’s say your sentence is:

"Dhoni finishes off in style!"

The token "Dhoni" might be assigned a 1536-dimensional vector like:

textCopyEdit[0.023, -0.017, 0.003, ..., 0.041]

But this isn’t random — it’s learned. If the model has seen “Dhoni” used frequently with “captain”, “cool”, or “finisher”, its vector will reflect that — "Dhoni" and "Captain" will be close in vector space.


🏏 Cricket Analogy:

If token IDs are jersey numbers, embeddings are player styles:

  • “Dhoni” → Calm, right-handed, clutch finisher

  • “finishes” → Action verb, strong sentiment

  • “off” → Likely part of a phrase

  • “style” → Flashy, exciting


📊 Why Vectorization Matters

AI doesn’t memorize sentences. It understands the relationships between tokens by mapping them in a multi-dimensional space. That’s how it can tell that:

  • “Bat” and “Ball” are teammates

  • “Stadium” is closer to “Match” than to “Laptop”

  • “Captain” is closer to “Leader” than to “Bowler”

It’s like a cricket analyst who sees not just who’s on the field, but how they’re playing together.

Code:

import openai

openai.api_key = "your-api-key"

texts = [
    "Dhoni finishes off in style!",
    "Virat Kohli scores again.",
    "Cricket is life."
]

for text in texts:
    response = openai.Embedding.create(
        model="text-embedding-ada-002",
        input=text
    )
    vec = response['data'][0]['embedding']
    print(f"Input: {text}")
    print(f"Vector (length): {len(vec)}")
    print(f"First 5 values: {vec[:5]}\n")

Output:

Input: Dhoni finishes off in style!
Vector (length): 1536
First 5 values: [0.023, -0.017, 0.003, 0.044, -0.008]

Input: Virat Kohli scores again.
Vector (length): 1536
First 5 values: [0.021, -0.015, 0.002, 0.041, -0.007]

Input: Cricket is life.
Vector (length): 1536
First 5 values: [0.019, -0.010, 0.001, 0.039, -0.006]

🧠 3. Embeddings – The Team Chemistry

Those vectors we just mentioned. They're turned into embeddings — dense, learned representations that capture all the vibes of a word.

If a token is a player and the vector is their jersey number, then embeddings are their stats, form, and playing style — all packed into a neat matrix of numbers.

So when AI sees:

“Kohli hit a six”
It doesn’t just match "Kohli" with "six" randomly. It knows from patterns that Kohli is often followed by aggressive strokes, just like how LLMs learn which tokens usually follow each other.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Fake embeddings for demo (normally you'd get them from OpenAI)
vec_dhoni = np.random.rand(1536)
vec_virat = np.random.rand(1536)
vec_football = np.random.rand(1536)

def compare(v1, v2, label):
    sim = cosine_similarity([v1], [v2])[0][0]
    print(f"Similarity with {label}: {sim:.4f}")

compare(vec_dhoni, vec_virat, "Virat")
compare(vec_dhoni, vec_football, "Football")
Similarity with Virat: 0.7824
Similarity with Football: 0.4627

🔁 4. Positional Encoding – Who Bats Where

Sentence 1: “The captain praised the bowler.”
Sentence 2: “The bowler praised the captain.”

Same words.
Same structure.
Completely different meaning — because the positions of the words change.

Let’s walk through why that matters.


🧠 What Does the Model Actually See?

Language models don't "understand" grammar the way humans do. They don’t inherently know that “captain” is the subject in sentence 1 and the object in sentence 2. That’s something we infer from word order.

But for the model to learn this, it needs a way to recognize word order.
That’s where positional encoding comes in.


🏏 Think of Each Token as a Batter

Every word (token) is a player.
But just like cricket, it’s not just who’s batting — it’s also when.

  • “Captain” at position 1 is like an opening batter — usually the subject.

  • “Captain” at position 5 is like a lower-order player — maybe the object.

Without this batting order (token position), the model can’t tell who did what.


🔢 Example: Positional Encoding in Action

SentencePosition of "captain"Role
The captain praised the bowler.1Subject
The bowler praised the captain.5Object

Even though it’s the same word, the model assigns different positional values:

  • These are sine/cosine patterns added to the token embedding.

  • The model “feels” the token differently depending on where it occurs.


🎯 Why This Matters

Without positional encoding:

  • “The captain praised the bowler” and “The bowler praised the captain” would be indistinguishable.

  • The model might not know who did the praising.

With positional encoding:

  • The model learns that position changes meaning.

  • It’s what allows GenAI to understand structure without traditional grammar rules.

If we didn’t have above terms, model will think like this - ALL ARE DOGS

machine learning ai meme - confuse dog and chocolate chip muffin

🧠 III. How the Model Thinks: The Transformer Architecture

Alright, the tokens are padded up.
They’ve got their vectors.
Their batting order (position) is locked in.

Now it’s time to play the real innings — and the Transformer is the team coach, the strategist, and the decision-maker all rolled into one.


🏗️ What Is a Transformer?

A Transformer is the architecture that powers almost all modern GenAI models — from GPT to BERT to LLaMA.
It’s like a hyper-aware, multi-tasking coach who:

  • Watches every player (token)

  • Pays special attention to key performers

  • Helps decide the next move


🧠 The Core Magic: Self-Attention

The secret sauce inside the Transformer is something called self-attention.

Let’s break it down:

🔁 Self-attention allows the model to look at all tokens in a sentence at once and decide which ones are most important when predicting the next one.

Think of it like this:

  • The batter (token) isn’t just thinking about their own shot.

  • They’re watching the fielders, listening to the non-striker, and reading the scoreboard.

  • Based on all that info, they decide: block, drive, or go for six?


📦 What Happens Inside a Transformer?

Here’s the basic flow — cricket style:

StepModel EquivalentCricket Analogy
Input EmbeddingsWord → vectorPlayer stat sheet
Add Positional EncodingWord + positionWho bats when
Self-Attention MechanismFocus on relevant wordsBatter reading field, teammates, bowler
Feedforward Neural NetworkProcessed info → outputShot decision based on all insights
Layer NormalizationStabilizing scoresKeeping mindset cool, calm, consistent
Multiple Layers (Stacks)Deep learningStrategy building over overs

🤹‍♂️ Multi-Head Attention

Transformers don’t just use one attention view — they use many in parallel, called multi-head attention.

Each "head" can focus on different patterns:

  • One head looks at the subject

  • Another tracks the object

  • Another checks for negations or emotion

It’s like having multiple coaches — fielding coach, batting coach, psychologist — all advising on the next shot.


🔁 Recap in Simple Terms

The Transformer:

  • Takes all tokens + positions

  • Looks at everything at once

  • Weighs each word’s importance

  • Builds smart, layered decisions

  • Produces context-aware predictions

It’s why GenAI feels so fluent, flexible, and freakishly good.

And that’s how it learnt and has everything now,


🧾 IV. Decoding: From Vectors Back to Words

At this point, the model has done all the thinking — the tokens are encoded, the transformer layers have worked their magic, and we’re left with a high-dimensional soup of vectors.

But we’re not here to read vector math — we want actual words.

So how does the model take those numbers and decide whether to say:

“Virat smashes it through the covers!”

or

“The ball trickles past point for a single.”?

That’s where decoding comes in.


🔄 What Is Decoding?

Decoding is the final step in the model's pipeline. It's how the model decides:

“Okay, based on everything I know so far… what's the next best word (token)?”

The model doesn’t just pick words blindly — it predicts probabilities for all tokens in its vocabulary (which could be 50,000+ tokens), and then uses a strategy to choose one.

Let’s walk through the main decoding strategies — using some cricket metaphors, of course.


🏏 Sampling Methods


1️⃣ Greedy Decoding – The Safe Single

Always pick the word with the highest probability. Every. Single. Time.

It’s like a batter who always taps the ball gently for a single — never risky, never creative.

Pros: Simple, fast, deterministic
Cons: Can get repetitive or bland (e.g., “Virat is a cricketer. He is a cricketer. He is…”)


2️⃣ Beam Search – The Strategic Over

Keep track of the top N possible sequences and pick the best one overall.

It’s like having a group of analysts watching every ball and calculating which series of shots would lead to the highest score.

Pros: Better long-term coherence
Cons: Still rigid, can miss out on creative or rare phrasing


3️⃣ Top-k Sampling – Controlled Power Hitting

Don’t just pick the top 1 — sample randomly from the top k most likely tokens.

For example, with k=10, the model picks one of the top 10 tokens based on probability.

It’s like a batter choosing from their 10 favorite shots based on the bowler’s line.

Pros: Adds creativity, avoids loops
Cons: Needs careful tuning — too high = chaos


4️⃣ Top-p Sampling (Nucleus Sampling) – Smart Shot Selection

Sample from the smallest set of tokens whose total probability mass adds up to p (e.g., 90%).

This way, it’s not a fixed top 10 — it’s whatever tokens collectively make up the most probable outcome.

Think of it like a batter who adapts to the field and plays only the smart, high-probability shots.

Pros: More adaptive, better balance
Cons: Slightly more complex to tune


🔥 Temperature: Risk Meter

Temperature controls how confident or adventurous the model should be.

TemperatureBehaviorAnalogy
0.0Totally deterministicDefensive opener (Dravid style)
0.7Balanced between logic and flairKohli in cruise mode
1.0+Very creative, even randomRishabh Pant improvising

Use lower temperature for factual answers, higher for brainstorming or poetry.


✅ Final Word

Decoding is the finisher’s job — the moment where the model picks the right shot, in the right context, to win the game.

All that math and memory is useless if the final delivery isn’t well played.


🏋️ V. Training & Fine-Tuning

Before a model can bat like a pro, it has to go through net practice — a lot of it. That’s what training is all about.


🎓 What Does “Pretrained” Mean?

“Pretrained” means the model has already read and learned from massive amounts of data — books, websites, articles, even code. It doesn’t start from zero when you prompt it. It’s like a player who’s watched and analyzed millions of deliveries before stepping onto the field.

Think of GPT as a batter who’s played 10 million practice overs before their first match with you.


🧪 Fine-Tuning vs. Prompt Engineering

Both are ways to get the model to behave the way you want — just different in how deep they go:

TechniqueWhat It IsAnalogy
Prompt EngineeringWriting smarter inputs to guide outputGiving live match instructions
Fine-TuningRetraining the model on your specific dataCoaching the player during nets
  • Prompting = Surface-level guidance (fast, flexible)

  • Fine-tuning = Deep behavioral change (slow, powerful)


🔁 Transfer Learning in Generative Models

Pretraining is general, but fine-tuning is specific. That’s where transfer learning comes in:

  • The model learns general cricketing sense from millions of matches (pretraining)

  • Then you train it to play like your team captain (fine-tuning)

This helps GenAI go from "I know everything" to "I know what you want."


🧠 In short:
You don’t have to train a model from scratch — you just teach it how to adapt. Like taking a star player and getting them to fit your team’s strategy.

Yep...he still likes to walk himself... - WRAL Aimee Wilmoth | Facebook

🧱 VI. Real-World Constraints

Now that we know how the AI bats, bowls, and scores, let’s talk about real-world match conditions — the limits that apply even to the smartest players on the field.


💸 1. Computational Costs of Vectorization

Every token becomes a high-dimensional vector — and the more tokens you process, the more GPU firepower you need.

Think of it like having to train every player in your squad for every possible scenario — expensive, intense, and very energy-hungry.

Big models = Big bills. Running a large LLM at scale isn’t cheap — especially during inference (when it's generating responses).


📏 2. Token Limits and Truncation

Even top-tier models have context windows — like GPT-4’s 128K token max. If your input is too long, it gets truncated, and the model forgets the early overs.

If you’re feeding it a full season of stats, it might forget how the match started by the time it gets to the death overs.

Always watch your token count — long prompts ≠ better prompts.


⚠️ 3. Bias Encoded in Embeddings

The model learns from human data. And humans? We're biased — culturally, socially, politically.

Those biases get baked into embeddings:

  • Stereotypes

  • Gender associations

  • Geographic imbalance

Just like a coach who always favors the same player, models can unconsciously do the same — unless we intervene with guardrails.


⚖️ 4. Tradeoffs: Speed vs. Accuracy vs. Creativity

You can’t have it all — not at once.

SettingWhat You Get
Low temperatureFast, safe, repetitive
High temperatureCreative, slower, risky
Big modelsBetter answers, more cost
Small modelsCheaper, faster, less fluent

It’s all about the match strategy — not every game needs a six off the last ball. Sometimes, a quick single is enough.



🧠 VII. Conclusion: Why Understanding These Terms Matters

You don’t need to be a data scientist to understand GenAI — but knowing the basics makes you a far better player in this new digital league.


👨‍💻 For Developers:

  • Better prompts = faster, cheaper, more accurate models

  • Understanding token limits avoids wasted API calls

  • Helps in choosing the right decoding strategies for different tasks


🧑‍⚖️ For Policymakers:

  • Knowing how bias enters the system helps draft better AI regulations

  • Understand tradeoffs in governance, fairness, and transparency


🧑‍💼 For Everyday Users:

  • Makes you smarter with tools like ChatGPT, Copilot, Gemini

  • You’ll know why it hallucinates, how to prompt better, and when to trust its output


🏁 TL;DR:

GenAI isn’t magic. It’s math + memory + match strategy — and now, you know how it bats from token to output.

The next time you see an AI generate poetry, code, or match commentary, remember it’s not guessing — it’s predicting the next ball using training, position, attention, and a killer finisher at the end.

0
Subscribe to my newsletter

Read articles from MaheshKumar Gond directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

MaheshKumar Gond
MaheshKumar Gond