AI Under the Hood: Decoding AI Jargons Vectors, Tokens, and the Magic of LLMs

Table of contents
- 🟢 Vectors: How Text Turns Into Numbers
- 🟢 Tokens: The Building Blocks
- 🟢 Positional Encoding: Why Order of Words Matters
- 🟢 Self-Attention: Rearranging Focus
- 🟢 Semantic Meaning vs. Lexical Search
- 🟢 Vocab Size: The Universe of Tokens
- 🟢 Tokenization: Not Encryption, But Mapping
- 🟢 Temperature: Controlling Creativity
- 🟢 Multi-Head Attention: Like Multi-Threading in the Brain
- 🟢 Softmax: Tuning Probability
- 🟢 Knowledge Cutoff: The Model’s Memory Timeline
- Wrapping Up

Have you ever wondered how GPT and other large language models actually work behind the scenes?
When I started learning about Generative AI, I kept stumbling on complex jargon—so here’s a no-fluff guide that breaks them down with relatable analogies.
Sharing some prominent Jargons you might need to know before reading some amazing papers to read like “Attention Is All You Need”.
Let’s gooo!!
🟢 Vectors: How Text Turns Into Numbers
In AI models, all text is ultimately stored as vector embeddings.
A vector embedding is like a unique numeric fingerprint of a word or phrase, capturing its meaning in many dimensions.
Different GPT models have different algorithms to convert words into these embeddings, which is why you get varied context and nuances.
🟢 Tokens: The Building Blocks
Words don’t go straight into models as raw text.
Instead, words are converted into tokens simply numeric IDs or units the model can process.
They also hold positional importance, meaning their place in a sentence changes how they are understood.
🟢 Positional Encoding: Why Order of Words Matters
Imagine you have 5 words:
ICICI bank near the river.
Swap them around:
The river near ICICI bank.
The same words but different meaning.
Positional encoding assigns each token a position-aware signal, allowing the model to distinguish such differences.
This encoding allows the shifts for the final vector embedding, helping the model grasp semantic meaning.
🟢 Self-Attention: Rearranging Focus
Self-attention is like re-organizing vector embeddings based on meaning.
When the model reads a sentence, it decides which words to focus on more, dynamically weighting them to understand context.
🟢 Semantic Meaning vs. Lexical Search
Lexical (keyword) search: Looks for exact word matches.
Semantic search: Looks for the meaning behind your query.
This is where Natural Language Understanding (NLU) comes in.
Example:
“How to open a bank account?”
“How to open a savings account at ICICI?”
Even though the words differ, semantic search identifies similar intent.
LLMs rely on this to figure out what you’re actually asking.
🟢 Vocab Size: The Universe of Tokens
You might think of vocab size as the number of unique parameters, but actually:
Vocab size = the total number of unique tokens the model recognizes (words and subwords).
Model parameters = the weights the model learns during training (e.g., 10 billion).
Unlike the 26 letters of the alphabet, LLMs work with vocabularies of 50,000+ tokens to cover all linguistic possibilities.
🟢 Tokenization: Not Encryption, But Mapping
I used to think tokenization was like encryption assigning a secret code to each word.
It’s more accurate to say tokenization is mapping text to numeric IDs, with no secret keys.
While tokens can be “decoded” back to text, it’s not encryption in the security sense.
🟢 Temperature: Controlling Creativity
Temperature is a setting that influences the randomness of outputs:
Low temperature (0–0.2): Precise, deterministic answers
High temperature (0.7–1): More creative, diverse, surprising outputs
Think of it as telling the model whether to “play it safe” or “brainstorm.”
🟢 Multi-Head Attention: Like Multi-Threading in the Brain
I love this analogy:
When our brain tackles a problem, we think through different lenses at once what, why, how, when.
Similarly, multi-head attention allows the model to look at the same sentence from multiple perspectives simultaneously, like multi-threading for deeper understanding, I hope you have studied multi threading by now 👀.
🟢 Softmax: Tuning Probability
Softmax transforms raw scores into probabilities.
It’s not exactly the same as temperature, but related.
Think of softmax as distributing “attention” across possible next words higher softmax values let the model consider more options (more creative outputs), while lower values keep it conservative.
🟢 Knowledge Cutoff: The Model’s Memory Timeline
Knowledge cutoff is simply the date up to which the model was trained.
For example, GPT-4 has a knowledge cutoff around April 2023.
Anything after that, it won’t “know” unless you feed it new data.
Wrapping Up
These jargons might feel overwhelming, but together they power the capabilities of modern AI.
From tokenization to multi-head attention, each part contributes to how models can understand, generate, and reason with language.
If you’re also learning Generative AI, keep digging deeper each of these concepts is a building block.
Follow me or connect here on LinkedIn to catch the next articles in this learning series.
If you read this till here do give me a feedback, would love to learn and improve
Let’s grow our understanding of AI together!
Thanks you !
#GenerativeAI #LLM #AI #LearningSeries
Subscribe to my newsletter
Read articles from Guraasees Singh Taneja directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Guraasees Singh Taneja
Guraasees Singh Taneja
Hi there! I'm Guraasees Singh, you can call me Aasees ,I'm a passionate developer focused on building applications which solves some problem. Currently exploring Web3 and AI, I love sharing my journey and insights on technology, web development, and the latest trends in Web3 & AI.