Defining LLM in Depth


🔍 What is an LLM?
LLM (Large Language Model) is a type of AI model trained to understand and generate human language. Think of it as a powerful autocomplete engine that can finish your sentences, answer your questions, write essays, generate code, translate languages, and much more.
The word "large" in LLM refers to:
Large dataset used to train it (billions or trillions of words).
Large number of parameters (like weights in a neural network; GPT-3 has 175 billion).
Large computation power needed to train it.
🧠 What is it made of?
LLMs are built on transformer architecture, which is the core neural network design that powers them. Here’s the high-level breakdown.
🔄 Transformer Architecture The transformer (introduced by Vaswani et al. in 2017 in the paper "Attention is All You Need") is a neural network architecture that:
Takes a sequence of tokens (like words or parts of words).
Processes them in parallel using layers of attention.
Learns the context and relationships between words regardless of distance.
🔑 Key parts: Embeddings: Convert words/tokens into numerical vectors.
Positional Encoding: Add information about token positions in the sequence.
Multi-Head Self-Attention: Lets the model focus on different parts of the input simultaneously.
Feed-Forward Layers: After attention, information goes through fully connected layers.
Layer Normalization and Residual Connections: Help stabilize and speed up training.
📚 Training Process of LLMs Training an LLM happens in stages:
- Pretraining Model is trained on a huge, diverse corpus of text (web pages, books, Wikipedia, code, etc.).
Objective: Learn general language patterns and structures.
Common tasks used:
Next Token Prediction (as in GPT-style models): Given a sequence, predict the next token.
Masked Language Modeling (as in BERT): Mask some words and make the model guess them.
This is unsupervised or self-supervised — no human labeling is required.
💡 By the end of pretraining, the LLM learns:
Grammar
Word meanings
Basic world knowledge
Syntax and some reasoning ability
- Fine-tuning Once pretrained, the model is fine-tuned on a smaller, more specific dataset for a particular task (like answering questions, summarizing, or coding).
Examples:
GPT fine-tuned on chat data → ChatGPT
BERT fine-tuned on sentiment analysis data → sentiment classifier
✅ Fine-tuning makes the LLM useful in real-world applications.
- Alignment / Instruction Tuning / RLHF (in advanced models like ChatGPT) LLMs like ChatGPT go through extra steps to behave better for humans:
Instruction tuning: Fine-tuned on datasets where examples look like user instructions and the model follows them.
RLHF (Reinforcement Learning from Human Feedback):
Generate multiple responses.
Humans rate which one is better.
Train a reward model.
Use RL to make the model prefer good responses.
🔤 How does the model understand language?
LLMs don’t “understand” like humans. They Convert all input into tokens (e.g., “language” → ['lan', 'gu', 'age'])
Pass them through transformer layers
Output the next most likely token
Repeat until the desired length is reached
It’s essentially doing pattern matching and probability prediction based on training.
🧪 Example Workflow You ask: “What is the capital of France?”
The model tokenizes it → [What, is, the, capital, of, France?]
The model has seen similar patterns during training.
It computes probability of next tokens:
“Paris” → 0.93
“London” → 0.01
“Berlin” → 0.001
Chooses “Paris”.
Even though it "feels" like reasoning, it's statistical pattern matching at a massive scale.
🏗️ Model Sizes GPT-2: 1.5B parameters
GPT-3: 175B parameters
GPT-4 / ChatGPT: Likely over 500B (exact number not public)
Gemini / Claude / LLaMA / Mistral: Other major LLMs
More parameters = more capacity to learn patterns = better performance (usually, but with diminishing returns).
In this blog, we gain a clear understanding of what the term LLM means, its core implementation, and the stages of building an LLM.
Is this blog helpful and easy to follow? Leave a comment!
— END —
Upcoming: Here is a link to my handwritten notes for a better and summarized understanding of the topic.
Subscribe to my newsletter
Read articles from Mihir Chaple directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
