Exploring Vector Databases: A Deep Dive into Vector Search Algorithms

Syed QuasimSyed Quasim
11 min read

"The real magic of AI lies not just in understanding language, but in remembering it wisely."

🚀 Continuing the Journey: Blog #2 of My AI Engineering Path

In my first blog, I explored how fine-tuning large models can sometimes do more harm than good if not handled carefully. It was the beginning of my transition from Android engineering to Gen AI development.

Now, as I dive deeper into the world of LLMs, RAGs, and embeddings, I found myself repeatedly bumping into one major player behind the scenes:

Vector Databases

And trust me, understanding how they work and how we search inside them, feels like learning the memory tricks of a genius mind.

So, this blog is about all things Vector DBs, including two brilliant search algorithms: ANNoy and HNSW and how they evolved from even earlier methods like Kd-Tree and inspired more modern ones like Inverted File Indexing.
Let's begin!


Why Not Use Traditional Databases?

Traditional databases like MySQL or MongoDB work best with exact matches and structured queries:

  • Want a user's profile with ID 102? Perfect.

  • Need all records from 2022? Simple.

But let’s say you ask a system:

"Give me documents most similar to this paragraph about quantum computing."

That’s not exact. That’s semantic.

In such cases, you don’t want exact matches. You want closest matches, conceptually.

That’s where vector embeddings and vector databases come in.


What is a Vector Database (Vector DB)?

Imagine compressing the meaning of text, images, or audio into a high-dimensional list of numbers, referred to as vector.

“Quantum computing is the future.” → [0.14, -0.92, ..., 0.58]

Vector DBs store these high-dimensional vectors and allow you to:

  • Add new vectors

  • Search for similar ones

  • Filter by metadata

All while being optimized for fast similarity search, not exact lookups.

Similarity Matters

Most vector DBs use cosine similarity or Euclidean distance to determine which vectors are "close" to a query vector.


Embedding Compression: Scaling Down Without Losing Meaning

With millions (or billions) of vectors being stored and queried, storage and memory bandwidth quickly become bottlenecks. To solve this, vector databases adopt compression techniques that reduce the size of embeddings while trying to retain their semantic meaning.

Scalar Quantization

  • This is like lowering the quality of an image to save space.

  • Converts high-precision float values (like 32-bit floats) into lower-precision integers (like 8-bit or 16-bit).

  • Each element in the vector is scaled and quantized, reducing its memory footprint.

  • ⚠️ Downside: This may reduce accuracy due to loss of granularity.

👉 Use Case: When space is a constraint and approximate results are acceptable, such as mobile inference or edge device storage.

Product Quantization (PQ)

  • Think of PQ as intelligent vector compression.

  • Splits vectors into sub-vectors. A long vector (say, 128D) is split into smaller parts (e.g., 4 groups of 32D).

  • Each sub-vector is replaced with an index from a codebook (like dictionary-based compression) — a precomputed list of centroids learned using clustering (like k-means).

  • Dramatically reduces space while preserving distance properties. Instead of storing raw numbers, we store the index of the closest centroid.

Why It Works:

  • PQ enables fast approximate nearest neighbor (ANN) search in compressed space.

  • Distances can be estimated between compressed vectors without full decompression.

📌 Meta’s FAISS uses PQ heavily to scale to billions of embeddings, enabling billion-scale search without breaking RAM limits.

Real Life Analogy:

  • Scalar Quantization = Compressing a high-res movie to 480p.

  • Product Quantization = Breaking the movie into scenes, assigning each scene a tag, and stitching it back with only the tags.


So How Do We Search These Huge Spaces?

Brute force? Too slow.

Instead, we use Approximate Nearest Neighbor (ANN) algorithms like ANNoy and HNSW, optimized for fast, approximate search in high dimensions.

Let’s unpack them.

A Timeline of Search Algorithms in Vector DBs

The evolution of search techniques has been a story of balancing speed, accuracy, and scalability.

K-D Tree (k-dimensional tree)

  • A binary space-partitioning tree structure.

  • Efficient for low-dimensional data (under 30 dimensions).

  • Splits space based on axis-aligned hyperplanes.

Visualizing the K-D Tree Search Process

The image above illustrates how K-D Trees (k-dimensional trees) work to organize and search in a 2D (or higher dimensional) space.

Left Side: Tree Structure

  • The left side represents the hierarchical binary tree built by recursively splitting the data space.

  • Each node makes a decision based on one coordinate axis (X or Y). For example:

    • At level 0, split by X

    • At level 1, split by Y

    • At level 2, split by X again and so on (alternating).

  • The tree divides the space into quadrants (like TL, BR), narrowing down the possible location of your query point.

Right Side: Spatial Partitioning

  • The right side shows how the space looks after recursive splits.

  • The numbered green circles are data points, and the orange dot is the query point.

  • Only a subset of nearby nodes are searched by following tree traversal, drastically reducing comparisons.

What's Happening?

Imagine you want to find the nearest neighbor to the orange dot (query point):

  1. The tree directs the search to a small spatial region.

  2. It then backtracks slightly to nearby nodes if they might contain closer points.

  3. Ultimately, it avoids brute force by ruling out large sections of space.

Trade-off Recap:

  • Works well for low-dimensional data (e.g., 2D, 3D).

  • Fails in high-dimensional spaces due to exponential growth of possible partitions, the “curse of dimensionality.”

  • Accuracy and speed drop drastically as dimensions increase.

➡️ Led to creation of ANNoy

ANNoy (Approximate Nearest Neighbors Oh Yeah)

🎬 Imagine This:

You’re in a dark maze (Spotify’s song collection). You want to find a room (song) that sounds closest to a song in your head.

ANNoy builds multiple trees where each tree cuts the space differently.

  • Like trying 50 different paths through the maze simultaneously.

  • At query time, it quickly picks the best door in each tree and aggregates results.

Visualizing ANNoy: Random Projection Trees in Action

In this image, we visualize how ANNoy (Approximate Nearest Neighbors Oh Yeah) partitions a high-dimensional vector(here we have 2D) space using random hyperplanes.

  • The yellow dots represent your stored vectors, think of them as songs, documents, or images turned into vectors.

  • ANNoy begins by randomly selecting two vectors and drawing a line (or hyperplane) between them.

  • This splits the space into two parts, assuming a roughly even distribution of points on either side.

  • It repeats this process recursively, generating multiple partitions until each cluster has a manageable size (below a given threshold).

  • The green region highlights one such cluster, a “leaf node” in the decision tree, which likely contains semantically similar vectors.

Now, when a query (black dot) comes in:

  • ANNoy quickly traverses this decision tree to locate the region (cluster) where the query is most likely to belong.

  • A local search is performed within that region to retrieve the nearest neighbors.

📌 Why it works: Instead of scanning all vectors, ANNoy prunes down the search space quickly — enabling fast, approximate retrieval at scale.

👶 Birth Story:

Developed by Spotify to power music recommendations.

🔻 Limitations:

  • Slower to build and less accurate than newer algorithms at scale.

  • Doesn’t dynamically adapt well to updates or streaming data.

➡️ Motivated the need for hierarchical, graph-based search: HNSW

HNSW (Hierarchical Navigable Small World)

🎬 Movie Analogy:

Imagine Inception-like dream levels, Level 2 is top level with broad paths, Level 1 is more connected, Level 0 is full of details.

  • You fall from top → middle → bottom layer to search.

  • Starts at a sparse, high-level graph (fast search).

  • Refines search at lower levels (denser connections).

  • Think of Google Maps zooming from world → country → city.

Visualizing HNSW: Navigating Through Layers Like Google Maps

This diagram beautifully illustrates how the HNSW algorithm performs fast and efficient vector search by organizing data into multiple layered graphs.

  • Each blue dot is a vector (a document, image, etc.) stored in the database.

  • The green dot is the query vector we’re searching for.

  • The red arrows show the greedy path the algorithm takes to locate the closest vectors.

How It Works:

  1. Top Layer (Layer 2):
    The search starts at a high, sparse layer with long-range connections, great for quickly skipping irrelevant regions.

  2. Middle Layer (Layer 1):
    As we descend, the graph gets denser and the connections become more local, helping us refine our search.

  3. Bottom Layer (Layer 0):
    This is the most detailed level. The algorithm continues searching among the most similar candidates to home in on the nearest neighbors.

📌 Key Insight:
Just like zooming from a world map to a street view, HNSW uses multi-resolution graphs to find answers quickly — with a search time complexity of just O(log N).

This hierarchical structure is what makes HNSW both scalable and accurate and a favorite in vector databases like FAISS, Weaviate, and Qdrant.

Core Idea:

  • Builds a multi-layer graph.

  • Uses greedy search on each level.

  • Time complexity = O(log N)

Why It’s Powerful:

  • Super fast even at billion-scale

  • Better accuracy and recall than ANNoy

  • Optimized for both read & write operations

Origin Story:

HNSW was an evolution beyond ANNoy, later adopted into many tools:

➡️ As data scales further, even HNSW needs a boost — enter Inverted Indexing.

Inverted File Index (IVF)

What is IVF?

  • Instead of comparing your query to every vector, you group vectors into clusters.

  • During search, you only compare against the closest few clusters.

Think of it as:

  • Dividing a library into topics (clusters)

  • Then searching only in the most relevant shelf (cluster) for your query

How It Works:

  • Performs coarse quantization to divide dataset into k clusters

  • Each cluster (inverted list) stores the vectors closest to that centroid

  • During search, only a few lists are visited

Often used in combination with Product Quantization for extreme scale (as seen in FAISS).

This image shows how Inverted File Indexing (IVF) works by dividing the vector space into clusters — making large-scale search efficient and configurable.

🔵 Blue Regions: Clusters

  • The entire space is split into clusters, each formed using a clustering algorithm like K-Means.

  • Each cluster (blue region) has a centroid, a representative vector (shown in red) that summarizes that region.

🔴 Red Dots: Centroids

  • These are the "addresses" or OIDs (Object IDs) for each cluster.

  • During preprocessing, all document vectors (yellow dots) are assigned to the closest centroid, making lookup faster later.

⚫ The Query:

  • When a user inputs a query (the large black dot), it is first converted to a vector using an embedding model.

  • The algorithm then checks which cluster centroid is closest to the query.

  • In this case, the black query vector falls nearest to a specific cluster, so only that region is searched for the best match.

Adjustable Search Breadth

  • You can configure the number of clusters (K) to search:

    • K = 1 → Search just the closest cluster (faster, less accurate)

    • K = 8 → Search top 8 clusters (slower, but higher recall)

  • This is like deciding how many neighborhoods to search for a missing person, start where they’re most likely to be, then expand.

Why Use IVF?

Scalable — Works well for millions to billions of vectors
Flexible — Balance accuracy vs. speed with K
Efficient — Avoids brute-force scanning of the entire dataset

Caveat:
Sometimes, the true closest vector might lie in a different cluster, so IVF is approximate, not perfect. But it’s good enough and blazing fast for most real-world use cases.

📌 IVF is widely used in Meta’s FAISS and many large-scale vector search systems today.


Why This Blew My Mind

When I first saw how fast vector DBs work, finding semantic answers in milliseconds, I realized:

“This is memory for machines. Structured like a mind palace.”

Imagine if Iron Man’s JARVIS could instantly scan every conversation, file, or image to find what you need - that’s what Vector DBs allow us to build.


Which Should You Use?

NeedUse This
Low dimensional, small scaleK-D Tree
Fast search, static dataANNoy
Large-scale, real-timeHNSW
Billions of embeddingsIVF + PQ (FAISS)
Open source + scalable infraQdrant / Weaviate

💬 Let’s Keep Growing Together

This is only my second blog in this growing Gen AI journey.

As I learn more, I’ll revisit these topics and write deeper follow-ups. I may not get everything right now and I welcome feedback and corrections. 🎯

Let’s build and learn one vector at a time.

📚 Resources:


Follow me on Medium & Hashnode to keep up with the rest of my journey. If you missed Blog #1, check it here: Fine-Tuning Isn't always fine

The learning continues... 🚀✨

0
Subscribe to my newsletter

Read articles from Syed Quasim directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Syed Quasim
Syed Quasim

Android Engineer | Kotlin/Java | Flutter | Applied ML on Mobile (TensorFlow Lite) | DevSecOps | Secure App Development | Gen AI | Avid Learner