Vector Search Over PostgreSQL: A Comparative Analysis of Memory and Disk Solutions

Junyu ChenJunyu Chen
11 min read

Introduction: real-world challenges

In the rapidly evolving landscape of LLM and RAG technologies, organizations and developers increasingly recognize that specialized vector databases are not always necessary, and in some cases, using PostgreSQL with vector extensions can be a viable and cost-effective alternative. Instead, extending relational databases with vector search capabilities offers a more versatile and affordable approach to handling structured and unstructured data.

Take PostgreSQL, for example: its extension mechanism has spawned several vector plugins such as pgvector, pgvectorscale, VectorChord (which evolved from pgvecto.rs). However, users often face challenges in implementation and selection. Here are some stories from users:

  • The query takes 30 seconds for the first time and 100ms if it's a repeat query. The problem is very slow IO.

  • Is it normal to take 3 hours to build the index for 12 million rows with 75 dimension vector?

  • After 13 hours of building index for 200 million vectors of 75 dimensions each, I have noticed a heavy memory usage at 100GB/124GB at 30% of building index, and server will eventually crash.

  • We're hitting some challenges, particularly around need for frequent index updates, throughput constraints, lack of parallel index scans and no built-in quantization.

In this article, we'll take a deep dive into these questions and help you make informed decisions by understanding not only the "what" but also the "why" behind these tools. Let's explore how to navigate these complexities and choose the right solution for your needs.

What do we value most in a vector database?

In PostgreSQL, shared_buffers is a crucial memory area acting as the primary cache for data pages, including table data and indexes. By storing frequently accessed vector index blocks in this memory buffer, PostgreSQL significantly reduces slow disk I/O. When a query needs data, PostgreSQL first checks shared_buffers. If the data is present, it's retrieved quickly without disk access. If not, the data is fetched from disk and loaded into shared_buffers for future queries. To configure this, use ALTER SYSTEM SET shared_buffers = 'xGB' in psql and restart the server.

Let's say you have some vectors in a table, after the index is built, it's time to start a query. Even though you have set enough shared_buffers for PostgreSQL, it still needs to read the index from disk for the first query. However, for the repeated query, since the vector can be retrieved directly from shared_buffers, it can be much faster.

While the first query after a cold start or index change matters, sustained performance often depends on how quickly queries execute once the 'hot' or frequently accessed parts of the index are cached in memory (shared_buffers), mimicking the repeated query scenario. Therefore, optimizing for this cached performance is crucial for many real-world applications.

In this blog, we will compare the performance of vector similarity search extensions in these situations:

  • Query performance when memory is sufficient

  • Query performance when memory is low and disk becomes a bottleneck

We will also discuss other aspects related to index building, such as:

  • Index build speed: How long does it take to build the index? If I have multiple cores in the instance, can it use them all to speed things up?

  • Memory usage: How much memory should be reserved for index creation? This is the most important consideration when choosing an instance.

  • Disk usage: How many vectors can be hosted on a given amount of disk space? This is also a consideration when choosing an instance that uses non-expandable NVMe SSD (a type of high-performance solid state drive) storage.

  • Ease of use: Can I use a simple CREATE INDEX command to create an index? How difficult is it to tune for query performance?

Finally, we will discuss the insertion performance after index building is complete. This is important if the user has streamed data that will trigger frequent index updates.

Experimental setup

Here we present information about the hardware and datasets used in all of the following experiments.

Vectors in memoryVectors on diskIndex buildingInsertion performance
Instance typeI4I Extra LargeC6ID LargeI4I Extra LargeI4I Extra Large
Vcpus4244
Storage937 GB NVMe SSD118 GB NVMe SSD937 GB NVMe SSD937 GB NVMe SSD
Memory32.0 GiB4.0 GiB32.0 GiB32.0 GiB
PostgreSQL shared_buffers24.0 GiB2.0 GiB1.0 GiB24.0 GiB
Inserted rows5,000,0005,000,0005,000,000100 after the index is created
Distance metricL2 distanceL2 distance//
Test queries10,00010,000//

In this table, we show the version of all the extensions we use in these experiments:

VersionImage
VectorChordv0.2.2tensorchord/vchord-postgres:pg17-v0.2.2
pgvectorv0.8.0tensorchord/vchord-postgres:pg17-v0.2.2 (with a pgvector v0.8.0 installed)
pgvectorscalev0.6.0timescale/timescaledb-ha:pg17.4-ts2.18.2-oss

Vectors in memory: for small scale data

For a vector search system, if you want the best performance, it's important to make sure that the entire index is cached in shared_buffers, which is more appropriate on a memory-optimized instance. In this situation, the size of shared_buffers should ideal be large enough to hold the entire index.

For our experiment, we choose a medium-sized dataset, LAION-5m, which consists of 5 million vectors with 768 dimensions. To host LAION-5m in shared_buffers, we can use the AWS instance I4I Extra Large, with a 937GB NVMe SSD and 32GB of memory.

On this machine, we set the shared_buffers to 24GB and measure the performance of the first query and then the repeated query. The following is the result of the repeated query performance over pgvector, pgvectorscale and VectorChord.

The following graphs illustrate the trade-off between query speed (QPS) and search quality (Recall@10/100). Recall@10 measures the percentage of true nearest neighbors found within the top 10 results. In general, higher QPS can be achieved at the cost of lower recall, and vice versa. A better system may achieve higher recall at the same QPS, or higher QPS at the same recall level. Our goal is typically to maximize QPS while maintaining a high recall target (e.g., 95%).

Why VectorChord is so fast?
VectorChord's RabitQ index achieves its speed through optimized vector compression, which allows for faster distance calculations. This compression technique minimizes data movement, resulting in efficient hardware utilization and improved query performance.

To measure first query performance, we cleared shared_buffers before running the 10,000 test queries sequentially. Although the buffer starts empty, index/data pages loaded into the cache by earlier queries within the same test run can sometimes be reused by subsequent queries. This can cause the instantaneous QPS to increase during the run. We report the average QPS across all 10,000 queries to provide a representative measure of performance with a cold cache.

In conclusion, VectorChord consistently achieves higher QPS at high recall levels (e.g., >95% Recall@10) compared to the other extensions. When it comes to the other extensions, pgvectorscale performs better for first query, but once data is cached, pgvector will outperform it.

Vectors in disk: for large scale data

There are times when things are different. For billions of vectors, it is not possible to use memory to store the entire index on a single machine. When the size of the vectors to be read exceeds the size of memory, the speedup effect of shared_buffers is greatly reduced, and most of the data is fetched from disk.

While shared_buffers becomes less effective, the operating system's page cache can still provide some caching benefits, although its performance is generally lower than shared_buffers.

Indexing and Migration
VectorChord's internal build and pgvectorscale require additional memory to build the index, so it is not possible to build the index on such a small instance with 4GB of memory. To solve this problem, we build the index on another I4I Extra Large instance with 24GB of memory, and then backup and restore the database data directory from AWS S3 for all extensions.

To illustrate this situation, we can host LAION-5m on a much smaller machine, C6ID Large, with 4GB of memory and set shared_buffers to 2GB. This configuration ensures that memory will be insufficient and most queries will be forced to go to disk.

In these cases, the difference in performance between the first and repeated queries will be much smaller due to insufficient memory. Therefore, it's acceptable to evaluate only the repeated query performance of each extension.

As always, VectorChord consistently achieves higher QPS at high recall levels (e.g., >95% Recall@10) compared to the other extensions, followed by pgvector and pgvectorscale with similar QPS.

Experience in index building

As we discussed earlier, users also care about some index building metrics: Index build time, memory usage, disk usage and usability. We measured these metrics during the CREATE INDEX step of the above experiment on the Laion-5m dataset:

  • Core utilization: Percentage of available CPU cores effectively used during index creation. Higher values indicate better parallelism.

  • Index build time: The time taken by the SQL command to build the index. If the index build requires an additional step (VectorChord external build), this should also be included.

  • Memory usage: Peak memory usage during the index build, including the 1G shared_buffers allocated for PostgreSQL.

  • Disk usage: The size of PGDATA on disk after index building is finished, including vectors and the index.

External build vs internal build in VectorChord
Vectorchord supports two different ways of build indexing, internal build and external build. While internal build is similar to pgvector and pgvectorscale, external build requires an additional k-means clustering. External build usually has better query performance in query and it's the recommended way for indexing large vectors. If you want to know more about external build in VectorChord, please read our previous blog.

All query performance is measured after an external build, on a standalone machine with A10 GPU. If you insist on using the same I4I Extra Large instance for external builds, the index build time will increase to 50 minutes, which is still competitive.

In this case, we set shared_buffers to 1GB and logged the peak memory usage of each extension. Both VectorChord's internal build and VectorChord's external build are set to 25 iters for K-means clustering. For ease of use, pgvector, pgvectorscale and VectorChord's internal build have similar syntax with a single SQL to build the index, while VectorChord's external build is more complicated.

For most aspects, VectorChord's external build gained a leading position, but this partly sacrificed the ease of use, as developers need to implement an additional K-means clustering step outside of PostgreSQL and manage the resulting centroids.

If you want to start with a simple CREATE INDEX command for a straightforward approach, then VectorChord's external build may not be the best choice , you can turn to other extensions / methods:

  • Choose VectorChord's internal build for a faster index build

  • Choose pgvector if you really don't have enough memory

  • Choose pgvectorscale to reduce disk requirements or get better capacity

Insertion performance: for stream-generated data

With streaming vector data, insertion speed can become a new concern. In real-time data analysis systems, the number of new vectors written per second can be very high. Each time new vectors are added to an index, the index itself must be updated. This capability is usually referred to as insertion performance.

Why there is difference in insertion performance between indexes?
For graph-based indexes(pgvector hnsw, pgvectorscale), an index update usually performs multiple searches in the graph and creates new edges for the inserted vector. While clustering-based indexes(VectorChord) just find the right group and add the new vector to it.

So, we set up an experiment to see how well the insertion worked. After building the index for 5 million vectors, we insert another 100,000 random generated vectors into the table one after the other and calculate the insertion time of all.

As you can see, VectorChord (1565 Insert/Sec) performs much better than pgvector (246 Insert/Sec) and pgvectorscale (107 Insert/Sec) when inserting data. For most workloads, 100 Insert/Sec is more than enough. However, if you find that a lot of vectors are continuously inserted in your production (for example, 500 Insert/Sec), this could be a problem.

Summary

Based on the results above, this table shows the differences between the three extensions in various aspects. We can see that the main advantage of VectorChord is concentrated in performance, while pgvector and pgvectorscale have their own advantages in terms of usability and capacity.

If you want to see more detailed differences, this table shows them between the three extensions in various aspects. We can see that the main advantage of VectorChord is concentrated in performance, while pgvector and pgvectorscale have their own advantages in terms of usability and capacity.

pgvectorpgvectorscaleVectorChord
Ease of useEasyEasy, similar to pgvectorEasy (internal), hard (external) with 2-stage build
Max dimension can index20001600016000
Ease of tuningEasyModerate, need manually tune query_rescore for different queriesEasy
Highest Achievable RecallBottom-tierMid-tierTop-tier
QPS of repeated queryMid-tierBottom-tierTop-tier
QPS of first queryMid-tierTop-tierTop-tier
QPS of disk indexMid-tierMid-tierTop-tier
Speed of index buildingMid-tier, support multiple coresBottom-tier, does not support multiple coresTop-tier, support multiple cores
Peak Index Build Memory Usage2.1 GB15.2 GB1.1 GB (external), 9.8 GB (internal)
Total Disk UsageNearly largestSmallestLargest

Want to know more about how VectorChord compares to other extensions? In future blogs, we will explore the differences of PostgreSQL vector extensions in more scenarios, such as bm25 hybrid search (combining traditional keyword search with vector similarity search) and filtered vector search. Ready to try VectorChord for your vector search needs? Download the extension and feel free to join our Discord community or contact us at support@tensorchord.ai.

0
Subscribe to my newsletter

Read articles from Junyu Chen directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Junyu Chen
Junyu Chen