Essential Terminologies in LLMs for Beginners

Ever wondered how AI tools like ChatGPT, Bard, or Claude can write poems, solve math, and even crack jokes?

Behind the scenes, these tools use something called LLMs — Large Language Models. While the name sounds fancy, the magic is built on a bunch of concepts that are actually easy to understand when explained right

Let’s break it all down step by step — like building blocks — using real-life analogies and simple language.

1. Tokenization — Cutting Sentences into Small Pieces

What it means:

Tokenization is the process of splitting up a sentence into smaller parts called tokens (words, sub-words, or characters).

Analogy:

Think of a sentence as a pizza. Before eating it, you slice it into pieces. Each slice is a token.

Example:

The sentence “I love pizza” becomes:

["I", "love", "pizza"]

Sometimes even smaller:

["I", "lov", "e", "pizza"]

The model doesn’t understand full sentences. It understands slices — tokens.

2. Vocabulary Size — How Many Words the Model Knows

What it means:

It’s the total number of unique tokens (words/pieces) the model can recognize.

Analogy:

Imagine a dictionary. The bigger the dictionary, the more words you know.

Example:

GPT-3’s dictionary has around 50,000 tokens. That’s like having 50,000 puzzle pieces to express any sentence.

3. Embeddings — Giving Numbers to Words (So Computers Understand)

What it means:

Embeddings are how we turn words (tokens) into numbers so computers can process and “understand” them.

Analogy:

Think of embeddings as GPS coordinates for words. Words with similar meanings live close together on the map. For example, “dog” and “puppy” might be neighbors.

Example:

The word “happy” might be turned into a long list of numbers like:

[0.21, -0.55, 1.34, ...]

Computers don’t understand words — they understand numbers. Embeddings give words a mathematical meaning.

4. Positional Encoding — Telling the Model the Order of Words

What it means:

It tells the model where each word appears in the sentence.

Analogy:

Imagine a band where everyone plays at once. If no one knows when to play, the music is a mess. Positional encoding gives timing to each instrument (word).

Example:

In “I love you” vs “You love I”, the words are the same, but the order matters. This helps the model understand the difference.

we add position to token embeddings and it becomes input embeddings

5. Vectors — The Thoughts of the Model

What it means

A vector is just a list of numbers representing a word, sentence, or meaning.

Analogy:

each dimension in a vector is capturing some kind of info about the word, but we don't actually know what exactly it is

Each dimension in a vector is capturing some kind of info about the word, but we don't actually know what exactly it is.

Think of it as the Lego block behind each word. When you combine them in clever ways, you build whole ideas and thoughts.

💡 Example:

“Cat” might become a vector like [0.2, 1.3, -0.5, ...]

Everything inside an AI model is a big pile of numbers (vectors), shaped to mean something.

6. Encoder — The Listener

7. Decoder — The Speaker

What they do:

The encoder reads and understands input.
The decoder generates output (answers, translations, text).

Analogy:

Imagine telling a story to a translator:

Encoder = translator listening and taking notes.
Decoder = translator retelling your story in another language.

In ChatGPT, only the decoder is used, but in translation models (like Google Translate), both are important.

8. Semantic Meaning — Understanding Similar Meanings

What it means:

It’s about knowing that two different sentences or words can mean the same thing or have some kind of relation, even with different words.

Analogy:

“I’m feeling great” and “I’m on top of the world” — we humans know they both mean someone is happy. LLMs try to do the same by comparing vectors.

AI doesn’t just look at words — it tries to grasp the feeling behind them.

9. Self-Attention — Focusing on What Matters

What it means:

The model looks at every word in a sentence and decides which other words are important for understanding it.

Analogy:

When reading “The cat, which was chased by the dog, ran away,” you focus on “dog” to understand what “chased” means. That’s self-attention.

Self-attention lets the model read like a smart human — knowing what to focus on.

10. Multi-Head Attention — Seeing Things in Many Ways

What it means:

The model looks at each word from multiple perspectives at once.

Analogy:

Imagine wearing 8 different types of glasses — one sees grammar, one sees emotion, one sees logic, etc. Each gives a new view.

Example:

Multi-head attention helps the model understand “he” refers to “John” while also noticing the tone of the sentence.

Multiple attentions = better understanding.

11. Softmax — Picking the Most Likely Word

What it means:

The model scores all possible next words and chooses the most likely one.

Analogy:

It’s like rolling a weighted dice. If "cat" has a 90% chance, it’ll likely be picked over "banana" with 1%.

The model doesn’t guess blindly — it uses Softmax to make smart predictions.

12. Temperature — Controlling How Creative the Model Is

What it means:

Temperature controls how random or safe the output is.

Analogy:

It’s like choosing how spicy your food is:

Low temp (0.2) = boring, safe
High temp (1.0) = creative, surprising

Example:

Prompt: “The moon is”

Temp 0.2 → “bright at night.”
Temp 1.0 → “a silver balloon floating through dreams.”

13. Knowledge Cutoff — The Model’s Last Day at School

What it means:

The model doesn’t know anything that happened after a certain date.

Analogy:

It’s like a student who stopped reading newspapers in 2023. Don’t ask them about 2024 news!

Example:

Ask ChatGPT about the 2025 IPL, and it’ll say: “Sorry, I don’t have info after April 2023.”

Recap: Cheat Sheet Table

Term	Simple Meaning	Analogy
Tokenization	Splitting text	Slicing a pizza
Vocab Size	Known tokens	Model’s dictionary
Embeddings	Word to numbers	GPS for words
Positional Encoding	Word order	Timing in music
Vectors	Represent meaning	Lego blocks
Encoder	Understands input	Listener
Decoder	Generates output	Speaker
Semantic Meaning	True meaning	"Happy" ≈ "Joyful"
Self-Attention	Focus finder	Spotlight on key words
Multi-Head Attention	Multi views	Glasses with filters
Softmax	Picks next word	Weighted dice
Temperature	Controls creativity	Spice level
Knowledge Cutoff	Info limit	Last day of school

Final Thoughts

Large Language Models are amazing, but they aren’t magic. They’re built with clear blocks — just like Lego! Once you understand these pieces, you can build your way into AI, prompt engineering, or even building your own AI tools.

If you found this helpful, consider bookmarking or sharing it with a friend who’s curious about AI.

Let’s connect

LinkedIn/mayankpratapsingh022

x.com/Mayank_022

That’s it for today

Essential Terminologies in LLMs for Beginners

1. Tokenization — Cutting Sentences into Small Pieces

What it means:

Analogy:

Example:

2. Vocabulary Size — How Many Words the Model Knows

What it means:

Analogy:

Example:

3. Embeddings — Giving Numbers to Words (So Computers Understand)

What it means:

Analogy:

Example:

4. Positional Encoding — Telling the Model the Order of Words

What it means:

Analogy:

Example:

5. Vectors — The Thoughts of the Model

What it means

Analogy:

💡 Example:

6. Encoder — The Listener

7. Decoder — The Speaker

What they do:

Analogy:

8. Semantic Meaning — Understanding Similar Meanings

What it means:

Analogy:

9. Self-Attention — Focusing on What Matters

What it means:

Analogy:

10. Multi-Head Attention — Seeing Things in Many Ways

What it means:

Analogy:

Example:

11. Softmax — Picking the Most Likely Word

What it means:

Analogy:

12. Temperature — Controlling How Creative the Model Is

What it means:

Analogy:

Example:

13. Knowledge Cutoff — The Model’s Last Day at School

What it means:

Analogy:

Example:

Recap: Cheat Sheet Table

Final Thoughts

Subscribe to my newsletter

Mayank Pratap Singh

Mayank Pratap Singh