🎬 From Words to Vectors: The Cinematic Journey of GenAI

🎥 The Opening Scene: Tokenization

Every blockbuster begins with a script — and for GenAI, that script is human language.
But here’s the twist: computers don’t understand words like “love,” “movie,” or “AI.”
They only understand numbers.

This is where tokenization enters, like the hero’s first appearance.

  • Tokenization is the process of breaking text into smaller units called tokens.

  • A token can be a word ("movie"), a sub-word ("mov" + "ie"), or even a character ("m", "o", "v", "i", "e").

    • Example:

        Sentence: "AI is powerful"
        Tokens: ["AI", "is", "power", "ful"]
      

Think of tokenization as the scene where dialogue is broken into shots — each shot capturing meaning in a manageable piece.


🎞️ The Plot Twist: Mapping Words to Numbers

Now that we have tokens, the next challenge is: how do we make them machine-readable?
This is like turning a film script into a storyboard with numbered scenes.

  • Each token is mapped to a unique numerical ID (like giving actors their role numbers).

  • Example:

      "AI" → 101
      "is" → 202
      "power" → 303
      "ful" → 404
    

So, our sentence "AI is powerful" becomes [101, 202, 303, 404].

At this stage, the computer understands the text only as a sequence of numbers.
But just like actors are more than their ID numbers, words are more than just token IDs.
We need something deeper…


🌌 The Grand Climax: Vector Embeddings

Here comes the climax of the movieembeddings.

An embedding is like giving each word its full personality, backstory, and role in the film.

  • Instead of just being "token 101", "AI" is represented as a vector (a list of numbers) that captures its meaning, relationships, and context.

  • Example:

      "AI" → [0.12, -0.98, 0.33, ...]
      "movie" → [0.11, -0.95, 0.31, ...]
    

These vectors live in a high-dimensional space, where:

  • Words with similar meanings appear close together.
    (“king” and “queen” will be neighbors.)

  • Words with opposite meanings drift far apart.

This is what allows GenAI to:

  • Understand that “film” ≈ “movie”

  • Capture relationships like “Paris – France ≈ Rome – Italy”

  • Generate coherent and context-aware text.

Embeddings are the emotional depth of the story — the reason AI understands not just words, but meaning.


🎭 The End Credits: Why This Matters

The entire magic of GenAI — whether it’s writing a story, answering questions, or generating code — rests on this three-act structure:

  1. Tokenization (Breaking the dialogue into shots)

  2. Mapping to Numbers (Assigning scene IDs)

  3. Vector Embeddings (Giving life, meaning, and relationships)

Without tokenization, the script is too messy.
Without numbers, the computer can’t read it.
Without embeddings, there’s no real story.

Together, they turn raw words into something GenAI can understand, learn from, and create with.

0
Subscribe to my newsletter

Read articles from Saurav Kumar Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Saurav Kumar Singh
Saurav Kumar Singh