Tokens, Vectors And Broken Dream

Rimjhim PatidarRimjhim Patidar
5 min read

We came to class thinking GPT was magic — a kind of digital wizard that somehow understood us. But it didn’t take long before the spell was broken. As we peeled back the layers, we found no mystery, no soul — just tokens, vectors, and a lot of math.

The heartbreak? 💔 Realizing that what felt like understanding was actually just prediction. Behind the curtain of conversation is a machine breaking language into tiny parts and calculating what comes next. It’s not magic — it’s logic, algorithms, and some really smart math.

In my first class on Generative AI, I learned that what once felt like magic is actually built on logic and mathematics. For example, I used to think AI understands full sentences the way humans do — grasping meaning and intent. But that’s not the case.

And unlike Google Search, which finds results based on keywords, a language model doesn’t just look things up — it actually generates responses by predicting the next token in a sequence.

And that’s the key difference between searching and generating.

If AI Had a Car, It’d Be Named ‘Car’ – Introducing GPT

Interestingly, OpenAI named its model GPT, which stands for Generative Pre-trained Transformer.

It’s not just a fancy name—each part tells us something about how the model works.

“Generative” means it can create new content by predicting what comes next. This ability to predict is so powerful that some worry it might take over jobs or even replace humans in certain tasks.

“Pre-trained” means it has already learned from a large amount of text data, and

Transformer refers to structure that helps the model understand the meaning and position of words in sentences.

🧩Tokenization : Break Sentences into Pieces

—A better version could be ::“We start with tokens, which combine to form sequences.”

Tokenization means splitting a sentence into smaller parts called tokens. These tokens are usually chunks of letters, words, or subwords.

For example, the sentence:
“I love Pizza.”

can be broken into tokens like:
“I”, “love”, “Pizza”, and “.”

A sequence is just a group of these tokens — like puzzle pieces making up a sentence.

Let’s move further to Vector Embedding : words to numbers

  • Start with Tokens

    After tokenization, your input sentence is split into smaller units (tokens) like words.

  • Assign Each Token an ID

    Each token is mapped to a unique number from a predefined vocabulary.

  • Convert Token IDs to Vectors

    These token IDs are then converted into high-dimensional vectors (numbers).

    Each vector represents the meaning of the token.

  • Capture Word Meaning and Relationships

    Similar words get similar vectors.

    For example, the vectors for “sports” and “car” will be closer together than “Honda” and “BMW”.

  • Use These Vectors in the Model

    These embeddings are passed into the AI model,

    which uses them to understand the meaning, context, and relationships in your input.

Word Order Matters :: Positional Encoding

When a language model reads a sentence, it needs to know not just what the words are, but also where they appear. That’s because the meaning of a sentence often depends on the order of the words.

Since vector embeddings only represent the meaning of words but don’t include their position, the model uses positional encoding to add information about each word’s place in the sentence.

Think of it like giving each word a special tag that tells the model, “I’m the first word,” or “I’m the fifth word,” and so on.

This way, the model can understand the correct order of words and make sense of the full sentence.

⭐ Let’s give a shoutout to the OG SELF ATTENTION

Attention is like the model’s built-in spotlight — it helps to figure out which words in a sentence deserve the most focus when trying to understand each word.

For example, in the sentence “She wore a bright yellow dress to the party,” when the model looks at the word “dress,” it shines its spotlight on words like “bright” and “yellow” to understand what kind of dress it is.

This process, called self-attention, lets the model weigh the importance of every word relative to the others — helping it grasp context and generate responses that actually make sense.

Huge shoutout to Google for their game-changing paper, “Attention Is All You Need,” which introduced this clever idea and paved the way for powerful language models like GPT.

📜Inference, Training, Backpropagation

The Secret Recipe: Making Machines Talk Like Human

So far, we’ve seen how tokens, vectors, positional encoding, and attention work together to help AI understand language. But how does the model actually learn and get better over time?

That’s where training comes in — imagine it like a student studying tons of books, practicing over and over to get better at guessing the next word in a sentence.

Backpropagation is like the student’s self-check process: when they make a mistake, they figure out exactly what went wrong and adjust their approach to avoid repeating it next time. It’s learning from errors to improve.

And when the model has learned enough, it’s time for inference — kind of like taking a test or having a conversation, where it uses everything it has practiced to give smart, meaningful answers.


A big thanks to Google for their paper “Attention Is All You Need,” which introduced this groundbreaking idea and laid the foundation for today’s advanced language models like GPT.

And this is just the beginning — next, we’ll explore how all these parts come together during training and generation to make AI sound so human.

Special thanks to Hitesh Sir and Piyush Sir for pushing me to dig deeper, ask better questions, and see beyond the surface.

7
Subscribe to my newsletter

Read articles from Rimjhim Patidar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rimjhim Patidar
Rimjhim Patidar