The Food Plate Analogy

Imagine you have a full plate of delicious food in front of you. Do you try to eat everything at once? Of course not! You take small bites with your spoon, breaking down the meal into manageable pieces that you can chew and digest properly.

This is exactly how AI models like ChatGPT handle text. When they encounter a huge text file or bunch of texts, they don't try to process everything at once. Instead, they break down the whole text into small, digestible pieces called tokens.

What Are Tokens?

Tokens are like the "spoons" of AI language processing. Just like how you decide the size of each bite when eating, AI models decide how to break down text into tokens. These tokens can be:

Single letters: [H], [e], [l], [l], [o]
Whole words: [Hello], [world], [how], [are], [you]
Parts of words: [un], [happy], [ing]
Punctuation: [.], [,], [!], [?]
Sometimes even whole sentences (though this is rare)

Let's say you're at a restaurant and the waiter gives you this sentence: "I'd like a cheeseburger, please!"

Different AI models might break this down differently:

Model A might tokenize it as: ["I'd", "like", "a", "cheese", "burger", ",", "please", "!"]

Model B might tokenize it as: ["I", "'", "d", "like", "a", "cheeseburger", ",", "please", "!"]

Model C might tokenize it as: ["I'd", "like", "a", "cheeseburger,", "please!"]

Each approach is like different people having different eating styles - some take smaller bites, others take bigger ones!

Encryption Phase (Input Processing)

When you type "Hello, how are you?" to ChatGPT:

Original message: "Hello, how are you?"
AI "encrypts" it into tokens: [Hello] [,] [how] [are] [you] [?]
Each token gets a secret number: [1234] [5678] [9012] [3456] [7890] [1111]

It's like converting your English sentence into a secret numeric code that only the AI can understand!

Decryption Phase (Output Generation)

Finally, the AI:

Predicts the next secret numbers: [2222] [3333] [4444]
"Decrypts" them back to tokens: [I'm] [doing] [great]
Combines tokens into readable text: "I'm doing great"

The Dictionary Analogy

Imagine you're learning a new language and you have a dictionary with exactly 50,000 words. That's it - no more, no less. If someone uses a word that's not in your dictionary, you're stuck!

This is exactly how AI works:

ChatGPT's vocabulary: Around 50,000 tokens Claude's vocabulary: Around 100,000 tokens
Other models: Each has their own vocabulary size

How Vocabulary Size Affects Everything

Small Vocabulary (Like a basic English dictionary):

Breaks words into more pieces
"unhappiness" → [un] [happy] [ness] (3 tokens)
More tokens needed = slower processing

Large Vocabulary (Like an advanced dictionary):

Keeps more words whole
"unhappiness" → [unhappiness] (1 token)
Fewer tokens needed = faster processing

Tokenization : What Happens to Your Words Before AI Responds?

Table of contents