ChatGPT: Beginner's Guide to AI Understanding

As the release of ChatGPT-5, I became curious about how these AI models actually work behind the scenes. I wanted to understand how a model can read a user’s query, make sense of it, and then generate a coherent, relevant response almost instantly.

By exploring the underlying technologies like Natural Language Processing (NLP), Transformers, and Large Language Models (LLMs), I realized that ChatGPT isn’t just magic - it’s a combination of clever algorithms, vast amounts of text data, and powerful computing that together allow it to understand context, track meaning across sentences, and provide human-like answers.

So here we go. . .

#	Contents
1	Overview – NLP, Transformer, LLM
2	How ChatGPT Sees Your Words
3	How AI Models Make Choices and What They Know
4	ChatGPT is Not an LLM
5	How AI Models Handle Spelling Mistakes

Natural Language Processing (NLP)

Natural Language Processing (NLP) is the foundation of ChatGPT and various other AI Models, it’s the field that enables computers to understand, interpret, and generate human language. ChatGPT uses NLP techniques to process user prompts, break them into meaningful units called tokens (we’ll learn later), and generate coherent responses.

Transformers

The Transformer architecture (from the 2017 paper “Attention Is All You Need”) is the backbone of ChatGPT. Transformers use self-attention to allow each word (token) to focus on all other words in a sentence, capturing context and relationships efficiently. They process tokens in parallel, use embeddings and positional encodings to represent language, and stack multiple layers to understand complex patterns in text.

Large Language Model (LLM)

A Large Language Model (LLM) like ChatGPT is essentially a scaled-up Transformer trained on massive amounts of text. The LLM learns language patterns, grammar, reasoning, and factual knowledge. When you interact with ChatGPT, your input is tokenized, fed through the Transformer layers of the LLM, and the model predicts the next token repeatedly to generate natural, human-like responses.

Think of ChatGPT like a Human Being

NLP - Brain & Senses 🧠👀
- NLP is like your brain and senses that allow you to understand language.
- When someone talks to you, your brain hears/reads the words and starts making sense of them.
- In ChatGPT, NLP breaks the input into tokens, interprets grammar, and understands meaning.
Transformer - Neurons & Connections 🔗
- The Transformer is like the network of neurons in your brain.
- Just like neurons pass signals and focus on the most relevant ones, self-attention lets the model figure out which words in a sentence are important and how they relate.
- Multi-head attention is like different neural pathways working in parallel, processing meaning, context, and relationships at the same time.
LLM - Experienced Mind / Knowledge Base 📚
- The Large Language Model is like your accumulated knowledge and experience.
- After years of reading and learning, your mind can answer questions, write stories, or summarize information.
- Similarly, an LLM is a massive, trained Transformer that has read vast amounts of text and can generate intelligent, coherent responses based on patterns it learned.

How ChatGPT Sees Your Words

When you type a question or a sentence, the model doesn’t just read it like a human. Instead, it breaks your text into smaller pieces called tokens, converts them into numerical vectors (embeddings), and adds information about their position in the sentence. These steps allow the model to “understand” your input in a way that it can process mathematically, setting the stage for generating a meaningful response.

pip install openai tiktoken

Step 1 - User Query

user_query = "Hello ChatGPT, explain NLP simply."
print("User Query:", user_query)

Step 2 - Tokenization

Transformers don’t understand raw text - they only process numbers (Token IDs). Each token is mapped to an integer. A token is the smallest unit of text the model understands.

It’s not always a word, it could be -

A whole word ("cat")
A subword ("play" + "ing")
A single character ("a")

import tiktoken

# Load tokenizer for GPT-3.5/GPT-4
encoding = tiktoken.get_encoding("cl100k_base")

tokens = encoding.encode(user_query)
print("Tokens:", tokens)
print("Number of tokens:", len(tokens))

Tokens: [15496, 1917, 11, 2210, 7030, 2247, 30]
Number of tokens: 7

Try my Tokenizer Application Link

Step 3 - Vector Embeddings

Each token ID is converted into a vector (a list of numbers). Think of it as a point in space representing the meaning of the token.

"cat" → [0.12, -0.03, 0.45, ...]
"dog" → [0.11, -0.02, 0.48, ...]

Notice "cat" and "dog" vectors are close in space, meaning the model knows they’re related.

Step 4 - Positional Embeddings

Transformers process all tokens in parallel, so by default they don’t know the order of words. Positional embeddings are vectors added to token embeddings to indicate the position of each token in the sentence. This helps the model understand context like: "it" refers to "cat" in "The cat sat because it was tired".

Token	Vector	Position Vector	Combined Vector
The	`[0.01,0.32,0.45]`	`[0.01,0.01,0.01]`	`[0.02,0.33,0.46]`
cat	`[0.12,0.22,0.55]`	`[0.02,0.02,0.02]`	`[0.14,0.24,0.57]`

Step 5 - Attention

Each token looks at other tokens to understand context. Helps model know relationships -

"bank" → is it river or money? Multi-Head Attention = looking at context in multiple “ways” at once.

Step 6 - Feed Forward + Layers

Tokens pass through a small neural network → refines meaning. Residual connections help the model remember previous info. Layer normalization keeps learning stable.

Step 7 - Text Generation (Predict Next Token)

The model predicts the next word/token based on context.

Example - "Hello, how are"

Model predicts - "you" → adds it to sequence.

Step 8 - Repeat Until Stop Condition

Feed the chosen token back into the model, predict again…
Continue until -

special “end of sequence” token appears,
max token limit is reached,
or user-defined stopping rule.

Step 9 - Detokenization (Final Answer)

Convert tokens back into human-readable text → the final answer.

import openai
openai.api_key = "YOUR_OPENAI_API_KEY"

user_input = "Hello AI, can you tell me a fun fact?"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": user_input}
    ],
    max_tokens=50
)

# Get generated text from the model
generated_text = response['choices'][0]['message']['content']
print("\nGenerated Text:\n", generated_text)

How AI Models Make Choices and What They Know

Knowledge Cutoff

The latest date up to which an AI model was trained on data. Anything after that date is unknown to the model (unless connected to live data). Example - If a model has a knowledge cutoff of Sept 2021, it won’t know events after that date.

Softmax

Softmax is like the decision-maker inside an AI model. The model looks at all possible next words and gives each one a “score.” Softmax turns those scores into chances (probabilities). Then the model picks the next word based on those chances - usually the highest one, but sometimes another, if randomness is allowed.

Temperature Parameter

Controls how confident vs creative the model is.

Low temp (e.g., 0.2) → model sticks to safe/highest-probability words.
High temp (e.g., 1.2) → model explores more unusual words.

🤖 ChatGPT is Not an LLM

ChatGPT is an AI chatbot application, not the underlying Large Language Model (LLM) itself. It is built on top of models like GPT-4, which serve as the core engine that predicts text. ChatGPT adds a conversational interface, context handling, safety filters, and extra tools, making the raw LLM usable and interactive for users.

💡

Unlike the LLM alone, ChatGPT can provide current date information and handle real-time prompts, so it effectively goes beyond the model’s static knowledge cutoff.

✏️ How AI Models Handle Spelling Mistakes

One of the impressive things about AI models like ChatGPT is their ability to understand and correct spelling mistakes. Even if your input has typos, the model can often infer the intended words based on context. This works because during training, the model saw millions of examples with spelling variations, grammatical errors, and real-world text patterns. When generating a response, it predicts the most likely sequence of words that make sense, effectively “auto-correcting” mistakes without explicit spell-checking. This makes conversations feel smooth and natural, even when users make minor errors.

Wrapping Up 👋

I hope this guide helped you understand how ChatGPT works - from tokens and embeddings to softmax, knowledge cutoff, and text generation.

🌐 Connect with Me

AI Explained for Beginners: How ChatGPT Understands and Talks