What is Vector Embedding ?


A vector embedding is a way of representing information (like words, sentences, images, or even users) as a list of numbers called a vector so that a computer can work with their meaning, not just their raw form.
Why do we need embeddings ?
Computers don’t understand “cat” or “dog” as text — they understand numbers.
But if we just assign random numbers, the relationships between concepts are lost.
That’s where embeddings come in: they capture semantic meaning so that similar things are close together in this numeric space.
Example:
"cat" → [0.12, -0.87, 0.45, ...]
"dog" → [0.14, -0.85, 0.47, ...]
"car" → [-0.50, 0.10, -0.33, ...]
Here, "cat" and "dog" have similar numbers → they’re “close” in meaning.
How embeddings are made ?
In GPT (or similar models), embeddings are learned during training:
Each token gets a vector.
The training process adjusts those vectors so words that appear in similar contexts end up close together in vector space.
Where embeddings are used ?
Search engines → Find documents with similar meaning, not just exact keywords.
Recommendation systems → Suggest similar movies, songs, or products.
Chatbots → Remember and retrieve relevant past conversations.
Image recognition → Compare image embeddings for similarity.
In GPT specifically:
When you send a prompt, GPT first turns your tokens into input embeddings.
Those embeddings go through all the Transformer layers, get transformed into output embeddings, and then converted back to tokens to generate text.
Simple Explanation of Vector Embeddings
Imagine you have a box of crayons with 1000 different colors. If I ask you to describe the color "sky blue" in numbers, you might say something like:
Amount of red: 120
Amount of green: 200
Amount of blue: 255
Now, instead of colors, think about words.
AI tries to describe each word (like “cat”, “happy”, “pizza”) using numbers — but not just three numbers like colors. It can use hundreds or even thousands of numbers to capture its meaning.
That list of numbers for each word is called a vector embedding.
Why do we need this?
Because computers don’t understand words or sentences like humans do.
You see the word "apple" and think of a fruit.
A computer sees
"apple"
and only understands it if we turn it into numbers that represent its meaning.
Embeddings are how we turn meaning into math.
Example:
Let’s say we have three words:
“king”
“queen”
“apple”
When converted to embeddings (numbers), the AI notices:
“king” and “queen” are close in meaning (both royalty)
“apple” is far from them (fruit)
So in the "number space", king and queen will be near each other, while apple is somewhere else.
Example of Vector Embeddings with Numbers
Let’s say our AI uses 3 numbers (in reality it might use 300+, but we’ll keep it small for understanding).
Word | Embedding (Vector of Numbers) |
King | [0.9, 0.8, 0.7] |
Queen | [0.88, 0.82, 0.69] |
Apple | [0.1, 0.4, 0.9] |
Step 1: See who is closer
AI measures the distance between these sets of numbers.
King → Queen: Very small distance → means they are very similar.
King → Apple: Big distance → means they are very different.
Step 2: Why are they close/far?
King & Queen: Both relate to royalty, so their numbers (features) are similar.
Apple: Related to fruit, so its numbers are different.
Think of it like putting them on a 3D map:
“King” and “Queen” stand near each other in the royal corner.
“Apple” is way over in the fruit corner.
Positional Encoding – Where is the word in the sentence ?
Transformers don’t read words one-by-one in order like humans. They look at all words at once.
So, positional encoding is a way to add “position” info to the embeddings.
🔍Why?
The sentence:
"The cat chased the dog"
"The dog chased the cat"
has the same words but different meaning because of the order.
How it works:
We take the vector for the word and add another vector that represents its position (1st word, 2nd word, etc.).
It’s like giving each word a GPS location in the sentence.
Self-Attention – Words talking to each other
This is where magic happens.
Self-attention lets every word look at all the other words in the sentence to figure out what’s important.
Example:
Sentence: “The bank raised interest rates.”
“Bank” should look at nearby words to decide if it’s a river bank or a financial bank.
It finds “interest rates” → knows this is about finance.
Multi-Head Attention – Parallel thinking
Instead of doing self-attention just once, Transformers do it many times in parallel, each focusing on a different relationship.
Example:
Sentence: “The cat sat on the mat.”
Head 1: Focus on which word is the subject.
Head 2: Focus on where things are happening.
Head 3: Focus on time or tense.
This parallel processing = richer understanding.
Inference – "Producing the answer"
After all these steps:
Word → Embedding
Add Positional Encoding
Self-Attention (words talk)
Multi-Head Attention (multiple views)
Feed-Forward layers (more processing)
The model finally predicts the next word or answer.
This process at prediction time = Inference.
Example:
Input: “Once upon a” → Model predicts “time” (highest probability word).
Conclusion :
Vector embedding is the first step in a Transformer where words are converted into dense numerical vectors.
These vectors capture the meaning, context, and relationships between words in a way that the model can process mathematically.
This transformation is like giving each word a unique "address" in a high-dimensional space so similar words are positioned closer together, enabling the model to understand meaning beyond exact spelling.
Subscribe to my newsletter
Read articles from Suraj Gawade directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
