The Secret Language of AI Tokens


One day, a friend asked me:
“How does ChatGPT actually read what I type?”
I smiled. “It doesn’t read letters the way we do. It reads numbers.”
The Secret Language Club (What Tokenization Is)
Imagine you’re part of a secret language club.
The rule is:
You can’t speak in full words - only in special code numbers.
“Cat” isn’t “cat” anymore - it might be 532.
“Kitten” might be 1843.
Even a giant word like Supercalifragilisticexpialidocious gets chopped into smaller pieces, each with its own number.
That process - turning words or parts of words into numbers - is called tokenization.
It’s how AI translates human language into something it can do math with.
Each AI Has Its Own Dictionary
Here’s the twist:
Every secret club has its own dictionary.
In ChatGPT’s dictionary, 532 might mean “cat.”
In Gemini’s dictionary, 532 might mean “banana.”
That’s why tokenization is model-specific - each AI has its own vocabulary and codebook.
Not Always Whole Words
You might think one token = one word. Nope.
Short words can be one token (“dog” → 912).
Long words get split into many tokens (“playground” → 731 + 485).
Even spaces and punctuation can have their own codes.
It’s like the club giving you a code for “apple” and a code for the space after “apple.”
Why Not Just Use Words?
Because computers don’t understand words - they understand numbers.
Numbers can be calculated, compared, and stored efficiently.
Tokenization is the bridge between our language and the AI’s mathematical brain.
The Takeaway
Tokenization is:
Breaking text into chunks (tokens).
Assigning each chunk a number from the AI’s vocabulary.
Following the AI’s own private dictionary, which is different for every model.
So the next time you type into ChatGPT, remember - before it “understands” you, it’s busy translating your words into its secret number language.
Subscribe to my newsletter
Read articles from Vaidik Jaiswal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
