How Vectors are constructed for AI Systems.

Chinmay MhatreChinmay Mhatre
4 min read

As large language models (LLMs) ascend in capability, so too does our capacity to interpret and navigate intricate real-world data. At the very foundation of this transformation lies the vector embedding, a mechanism that transposes diverse inputs such as text, imagery, or code into structured, high-dimensional representations.

This article offers a succinct exploration into how these vectors are formulated, and why they constitute a pivotal element of modern AI infrastructure from semantic search and intelligent retrieval to large-scale recommendation systems. How objects are converted into vectors? they’re how AI remembers, compares, and searches information efficiently.

Think of them like GPS coordinates for meaning.

What Precisely Is a Vector?

Within the realm of artificial intelligence, a vector is a numerical abstraction an one-dimensional array composed of real values (e.g., [2.4, 3.4, 4.6, …]) which serving as a distilled encapsulation of a given input.

These figures are not haphazard. Rather, they exist in a semantic space of considerable dimensionality, wherein proximity denotes semantic affinity. In essence, objects of kindred nature reside nearer to one another, while disparate entities diverge across axes.

Example: Imagine you and your friend both love SRK movies. Now if AI turns your reviews into vectors, they’ll be close to each other, because your tastes are similar!

So in AI, similar meaning = closer vectors.

The Process of Vector Construction

Consider, for instance, a document. When processed through an embedding model (typically a transformer-based architecture), it undergoes semantic compression into a fixed-length vector:

Let’s say you write a prompt \> Kantara was a powerful and spiritual film in CHATGPT.

The AI model will process this and output something like,

Document ➝ [2.4, 3.4, 4.6, …]

Each constituent value corresponds to a latent attribute be it sentiment, temporal relevance, structural complexity, or thematic alignment. Modern models, such as LLaMA 3, may utilise embeddings with 4,096 dimensions, allowing for a remarkably nuanced representation.

Desi Analogy: Imagine a food court. Dosa and Idli sit close together. Butter chicken is somewhere far. Vector embeddings do this, they group similar things, even if you never told them to.

🌐 Visualising Semantic Topography

To conceptualise this, imagine a two-dimensional plane:

  • X-axis: Document length (ephemeral → expansive)

  • Y-axis: Nature (fictive → factual)

Plotted accordingly:

  • Bottom-left: Succinct, imaginative texts (eg- aphorisms, microfiction)

  • Top-left: Concise, factual entries (eg- dispatches, briefs)

  • Bottom-right: Lengthy narratives (eg- novels)

  • Top-right: Comprehensive, non-fictional works (e.g., academic monographs)

AI doesn’t stop at 2 dimensions it can use hundreds or thousands of them, each capturing one hidden trait (like author tone, date, even “vibe”!)

Modern Analogy: Think of Instagram filters. Each slider (brightness, contrast, saturation) is a dimension. Vector space is like having 4096 sliders, each tweaking a hidden quality.

This schema, though illustrative, is vastly simplistic. In practice, semantic spaces extend into hundreds or thousands of dimensions, with each axis encoding a subtle but salient feature: authorial tone, readership demographics, ideological leaning, and more.

Why Vector Embeddings Are Indispensable?

  • Semantic retrieval: Identify relevance not through keywords, but through inherent meaning.

  • Natural clustering: Group akin items via emergent structure rather than explicit labels.

  • Context-aware generation: Feed context into LLMs via similarity-based retrieval (as in RAG pipelines).

  • Zero-shot adaptability: Leverage semantic similarity to reason about previously unseen inputs.

In short, vector embeddings render it possible for machines to internalise and navigate meaning an essential precondition for intelligent computation.

Why This Matters (For AI, Not Just Nerds)

  • You ask ChatGPT something it finds similar questions in its memory using vectors.

  • Netflix recommends shows like the one you just watched? Yep, vectors behind the scenes.

  • Google gives better results when you search in Hinglish, because it understands meaning, not just keywords.

    Summary: Vector embeddings help AI understand what we mean, not just what we type.

Key Insights

  • A vector is a numerical proxy for an input’s conceptual identity.

  • Embeddings capture multifaceted semantic dimensions—often imperceptible to humans.

  • Spatial proximity within the vector space signifies semantic proximity.

  • This principle underpins modern AI systems, from intelligent assistants to personalised discovery engines.

Thank you for reading.

0
Subscribe to my newsletter

Read articles from Chinmay Mhatre directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Chinmay Mhatre
Chinmay Mhatre

techLife 💻 Diving deep into the world of engineering software. #cs #programming #coding