Defining LLM in Depth

Mihir ChapleMihir Chaple
3 min read

🔍 What is an LLM?

LLM (Large Language Model) is a type of AI model trained to understand and generate human language. Think of it as a powerful autocomplete engine that can finish your sentences, answer your questions, write essays, generate code, translate languages, and much more.

The word "large" in LLM refers to:

Large dataset used to train it (billions or trillions of words).

Large number of parameters (like weights in a neural network; GPT-3 has 175 billion).

Large computation power needed to train it.

🧠 What is it made of?

LLMs are built on transformer architecture, which is the core neural network design that powers them. Here’s the high-level breakdown.

🔄 Transformer Architecture The transformer (introduced by Vaswani et al. in 2017 in the paper "Attention is All You Need") is a neural network architecture that:

Takes a sequence of tokens (like words or parts of words).

Processes them in parallel using layers of attention.

Learns the context and relationships between words regardless of distance.

🔑 Key parts: Embeddings: Convert words/tokens into numerical vectors.

Positional Encoding: Add information about token positions in the sequence.

Multi-Head Self-Attention: Lets the model focus on different parts of the input simultaneously.

Feed-Forward Layers: After attention, information goes through fully connected layers.

Layer Normalization and Residual Connections: Help stabilize and speed up training.

📚 Training Process of LLMs Training an LLM happens in stages:

  1. Pretraining Model is trained on a huge, diverse corpus of text (web pages, books, Wikipedia, code, etc.).

Objective: Learn general language patterns and structures.

Common tasks used:

Next Token Prediction (as in GPT-style models): Given a sequence, predict the next token.

Masked Language Modeling (as in BERT): Mask some words and make the model guess them.

This is unsupervised or self-supervised — no human labeling is required.

💡 By the end of pretraining, the LLM learns:

Grammar

Word meanings

Basic world knowledge

Syntax and some reasoning ability

  1. Fine-tuning Once pretrained, the model is fine-tuned on a smaller, more specific dataset for a particular task (like answering questions, summarizing, or coding).

Examples:

GPT fine-tuned on chat data → ChatGPT

BERT fine-tuned on sentiment analysis data → sentiment classifier

✅ Fine-tuning makes the LLM useful in real-world applications.

  1. Alignment / Instruction Tuning / RLHF (in advanced models like ChatGPT) LLMs like ChatGPT go through extra steps to behave better for humans:

Instruction tuning: Fine-tuned on datasets where examples look like user instructions and the model follows them.

RLHF (Reinforcement Learning from Human Feedback):

Generate multiple responses.

Humans rate which one is better.

Train a reward model.

Use RL to make the model prefer good responses.

🔤 How does the model understand language?

LLMs don’t “understand” like humans. They Convert all input into tokens (e.g., “language” → ['lan', 'gu', 'age'])

Pass them through transformer layers

Output the next most likely token

Repeat until the desired length is reached

It’s essentially doing pattern matching and probability prediction based on training.

🧪 Example Workflow You ask: “What is the capital of France?”

The model tokenizes it → [What, is, the, capital, of, France?]

The model has seen similar patterns during training.

It computes probability of next tokens:

“Paris” → 0.93

“London” → 0.01

“Berlin” → 0.001

Chooses “Paris”.

Even though it "feels" like reasoning, it's statistical pattern matching at a massive scale.

🏗️ Model Sizes GPT-2: 1.5B parameters

GPT-3: 175B parameters

GPT-4 / ChatGPT: Likely over 500B (exact number not public)

Gemini / Claude / LLaMA / Mistral: Other major LLMs

More parameters = more capacity to learn patterns = better performance (usually, but with diminishing returns).

In this blog, we gain a clear understanding of what the term LLM means, its core implementation, and the stages of building an LLM.

Is this blog helpful and easy to follow? Leave a comment!

— END —

Upcoming: Here is a link to my handwritten notes for a better and summarized understanding of the topic.

0
Subscribe to my newsletter

Read articles from Mihir Chaple directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mihir Chaple
Mihir Chaple