"Tokenization: Turning Text into Secret Pieces đź§©"

What is Tokenization?
Tokenization means breaking down text into smaller pieces called tokens.
Think of tokens like the little building blocks or puzzle pieces of words or characters.
Imagine This:
You have a sentence:
"Hello, how are you?"
When you tokenize it, you split it into parts, like:["Hello", ",", "how", "are", "you", "?"]
Each of these parts is a token.
Why tokenize?
Computers don’t understand whole sentences naturally.
They understand numbers better.
Tokenization turns your text into numbers (tokens) that the computer can work with.
How your code tokenizes:
You type some text in the textarea (for example:
"Hello world!"
).When you click "Tokenize & Decode", this happens inside your code:
const tokenized = enc.encode(inputText);
The
enc.encode()
function splits your text into tokens (numbers).Each token is a small piece representing part of your text.
These tokens look like numbers, for example:
[15496, 995]
(these numbers correspond to “Hello” and “world”).
You then see these tokens displayed in your app.
Then the app uses:
const decodedText = enc.decode(tokenized);
This turns the tokens back into the original text.It’s like putting the puzzle pieces back together.
Summary:
Tokenization breaks text into tokens (small pieces).
Tokens are numbers that computers can understand.
Your app shows these tokens and lets you see the decoded original text again.
Subscribe to my newsletter
Read articles from Shubham singh boura directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
