The Realities of LLM Models: No Magic

Table of contents
- Introduction
- 1Y. ou Ask Nobita’s Question — User Input
- 2. Tokenization — Cutting the Request into Pieces
- 3. Embeddings — Turning Words into Numbers
- 4. Attention — Looking at All Clues
- 5. Transformer Layers — The Gadget Pocket
- 6. Training — How Doraemon Learned Everything
- 7. Generation — Producing the Answer
- 8. Hallucinations — When Doraemon Picks the Wrong Gadget
- 9. High-Level Design (HLD) of an LLM
- Conclusion
Introduction
If you’ve ever used AI like ChatGPT, Gemini, Claude, or LLaMA, you might feel they’re magical — type a question, and you get a human-like reply in seconds.
But here’s the truth: there’s no magic.
Large Language Models (LLMs) are like Doraemon — they have a gadget pocket (their neural network) filled with learned tools from past experiences (training data).
Ask them something, and they pick the right “gadget” (word, sentence, or idea) to give you an answer.
Let’s open Doraemon’s pocket and see how all LLMs work.
1Y. ou Ask Nobita’s Question — User Input
In every Doraemon episode, Nobita asks for help:
“Doraemon, I need something to finish my homework!”
Similarly, with an LLM, the user’s prompt is like Nobita’s request.
It’s the starting point of the whole process.
2. Tokenization — Cutting the Request into Pieces
Doraemon doesn’t just run to his pocket; first, he breaks Nobita’s request into smaller chunks.
Example:
“I need help in school homework”
becomes
"I" → token 101
"need" → token 202
"help" → token 503
"in" → token 45
"school" → token 340
"homework" → token 118
Why?
Models can’t understand raw text — they need it split into tokens (subwords, words, or characters).
Doraemon analogy: Doraemon writes each part of the problem on sticky notes before finding the gadget.
3. Embeddings — Turning Words into Numbers
Tokens are just IDs. To work with them, LLMs turn them into vectors — lists of numbers that capture meaning.
Example:
"school" → [0.23, 0.78, -0.12, ...]
"homework" → [0.21, 0.74, -0.11, ...]
Similar meanings have closer vectors — like “school” and “college” being near each other in vector space.
Doraemon analogy: The sticky notes are translated into secret gadget codes so Doraemon can pick the right tool.
4. Attention — Looking at All Clues
Before choosing a gadget, Doraemon thinks about the entire problem, not just the last word Nobita said.
This is what Attention Mechanism in LLMs does — it decides how much each word should look at the others.
Example:
In “help in school homework,” the word “help” connects more strongly with “homework” than with “in.”
Doraemon analogy: Doraemon replays Nobita’s whole story in his head to make sure the gadget works for the situation.
5. Transformer Layers — The Gadget Pocket
LLMs have Transformer architecture — many layers that refine the information again and again.
Each layer:
Applies Multi-Head Attention (looking from different angles)
Passes through a Feed-Forward Network (processing info deeply)
Uses Positional Encoding (remembers word order)
Doraemon analogy: Imagine Doraemon’s gadget pocket having shelves — each shelf improves the gadget before handing it to Nobita.
6. Training — How Doraemon Learned Everything
Doraemon didn’t magically know which gadget to use — he’s from the future, trained by using countless gadgets in many situations.
LLMs are trained on massive datasets — books, articles, websites — learning patterns like:
“Ice” is followed by “cream”
“Hello” often matches “How are you?”
During training:
The model predicts the next token
Checks if it’s correct
Adjusts its internal numbers (weights) to improve
7. Generation — Producing the Answer
Once trained, an LLM works like Doraemon giving Nobita the gadget:
Looks at the input (tokens)
Predicts the most likely next token
Repeats until the response is complete
Example:
Prompt: “Tell me a joke about school”
LLM: “Our school is like a Wi-Fi… slow and only works near the principal’s office.”
8. Hallucinations — When Doraemon Picks the Wrong Gadget
Sometimes Doraemon gives Nobita a gadget that makes things worse.
LLMs can also “hallucinate” — producing confident but incorrect answers because they predict plausible text, not verified truth.
9. High-Level Design (HLD) of an LLM
Flow:
User Prompt (Nobita asks Doraemon)
Tokenization (break request into pieces)
Embedding Layer (convert to numbers)
Transformer Layers (attention + processing)
Prediction (choose next word)
Generation Loop (build full answer)
Output to User (give gadget/answer)
Conclusion
All LLMs — whether it’s GPT, Claude, Gemini, or LLaMA — work in the same core way:
Break down text into tokens
Turn them into numbers
Use attention to understand context
Generate answers step-by-step
Just like Doraemon, they seem magical… but it’s really math, data, and clever engineering.
Subscribe to my newsletter
Read articles from Deepak Siddhi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
