Essential Terminologies in LLMs for Beginners


Ever wondered how AI tools like ChatGPT, Bard, or Claude can write poems, solve math, and even crack jokes?
Behind the scenes, these tools use something called LLMs — Large Language Models. While the name sounds fancy, the magic is built on a bunch of concepts that are actually easy to understand when explained right
Let’s break it all down step by step — like building blocks — using real-life analogies and simple language.
1. Tokenization — Cutting Sentences into Small Pieces
What it means:
Tokenization is the process of splitting up a sentence into smaller parts called tokens (words, sub-words, or characters).
Analogy:
Think of a sentence as a pizza. Before eating it, you slice it into pieces. Each slice is a token.
Example:
The sentence “I love pizza” becomes:
["I", "love", "pizza"]
Sometimes even smaller:
["I", "lov", "e", "pizza"]
The model doesn’t understand full sentences. It understands slices — tokens.
2. Vocabulary Size — How Many Words the Model Knows
What it means:
It’s the total number of unique tokens (words/pieces) the model can recognize.
Analogy:
Imagine a dictionary. The bigger the dictionary, the more words you know.
Example:
GPT-3’s dictionary has around 50,000 tokens. That’s like having 50,000 puzzle pieces to express any sentence.
3. Embeddings — Giving Numbers to Words (So Computers Understand)
What it means:
Embeddings are how we turn words (tokens) into numbers so computers can process and “understand” them.
Analogy:
Think of embeddings as GPS coordinates for words. Words with similar meanings live close together on the map. For example, “dog” and “puppy” might be neighbors.
Example:
The word “happy” might be turned into a long list of numbers like:
[0.21, -0.55, 1.34, ...]
Computers don’t understand words — they understand numbers. Embeddings give words a mathematical meaning.
4. Positional Encoding — Telling the Model the Order of Words
What it means:
It tells the model where each word appears in the sentence.
Analogy:
Imagine a band where everyone plays at once. If no one knows when to play, the music is a mess. Positional encoding gives timing to each instrument (word).
Example:
In “I love you” vs “You love I”, the words are the same, but the order matters. This helps the model understand the difference.
we add position to token embeddings and it becomes input embeddings
5. Vectors — The Thoughts of the Model
What it means
A vector is just a list of numbers representing a word, sentence, or meaning.
Analogy:
Each dimension in a vector is capturing some kind of info about the word, but we don't actually know what exactly it is.
Think of it as the Lego block behind each word. When you combine them in clever ways, you build whole ideas and thoughts.
💡 Example:
“Cat” might become a vector like [0.2, 1.3, -0.5, ...]
Everything inside an AI model is a big pile of numbers (vectors), shaped to mean something.
6. Encoder — The Listener
7. Decoder — The Speaker
What they do:
The encoder reads and understands input.
The decoder generates output (answers, translations, text).
Analogy:
Imagine telling a story to a translator:
Encoder = translator listening and taking notes.
Decoder = translator retelling your story in another language.
In ChatGPT, only the decoder is used, but in translation models (like Google Translate), both are important.
8. Semantic Meaning — Understanding Similar Meanings
What it means:
It’s about knowing that two different sentences or words can mean the same thing or have some kind of relation, even with different words.
Analogy:
“I’m feeling great” and “I’m on top of the world” — we humans know they both mean someone is happy. LLMs try to do the same by comparing vectors.
AI doesn’t just look at words — it tries to grasp the feeling behind them.
9. Self-Attention — Focusing on What Matters
What it means:
The model looks at every word in a sentence and decides which other words are important for understanding it.
Analogy:
When reading “The cat, which was chased by the dog, ran away,” you focus on “dog” to understand what “chased” means. That’s self-attention.
Self-attention lets the model read like a smart human — knowing what to focus on.
10. Multi-Head Attention — Seeing Things in Many Ways
What it means:
The model looks at each word from multiple perspectives at once.
Analogy:
Imagine wearing 8 different types of glasses — one sees grammar, one sees emotion, one sees logic, etc. Each gives a new view.
Example:
Multi-head attention helps the model understand “he” refers to “John” while also noticing the tone of the sentence.
Multiple attentions = better understanding.
11. Softmax — Picking the Most Likely Word
What it means:
The model scores all possible next words and chooses the most likely one.
Analogy:
It’s like rolling a weighted dice. If "cat" has a 90% chance, it’ll likely be picked over "banana" with 1%.
The model doesn’t guess blindly — it uses Softmax to make smart predictions.
12. Temperature — Controlling How Creative the Model Is
What it means:
Temperature controls how random or safe the output is.
Analogy:
It’s like choosing how spicy your food is:
Low temp (0.2) = boring, safe
High temp (1.0) = creative, surprising
Example:
Prompt: “The moon is”
Temp 0.2 → “bright at night.”
Temp 1.0 → “a silver balloon floating through dreams.”
13. Knowledge Cutoff — The Model’s Last Day at School
What it means:
The model doesn’t know anything that happened after a certain date.
Analogy:
It’s like a student who stopped reading newspapers in 2023. Don’t ask them about 2024 news!
Example:
Ask ChatGPT about the 2025 IPL, and it’ll say: “Sorry, I don’t have info after April 2023.”
Recap: Cheat Sheet Table
Term | Simple Meaning | Analogy |
Tokenization | Splitting text | Slicing a pizza |
Vocab Size | Known tokens | Model’s dictionary |
Embeddings | Word to numbers | GPS for words |
Positional Encoding | Word order | Timing in music |
Vectors | Represent meaning | Lego blocks |
Encoder | Understands input | Listener |
Decoder | Generates output | Speaker |
Semantic Meaning | True meaning | "Happy" ≈ "Joyful" |
Self-Attention | Focus finder | Spotlight on key words |
Multi-Head Attention | Multi views | Glasses with filters |
Softmax | Picks next word | Weighted dice |
Temperature | Controls creativity | Spice level |
Knowledge Cutoff | Info limit | Last day of school |
Final Thoughts
Large Language Models are amazing, but they aren’t magic. They’re built with clear blocks — just like Lego! Once you understand these pieces, you can build your way into AI, prompt engineering, or even building your own AI tools.
If you found this helpful, consider bookmarking or sharing it with a friend who’s curious about AI.
Let’s connect
That’s it for today
Subscribe to my newsletter
Read articles from Mayank Pratap Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
