BRIN vs B-tree Indexes in PostgreSQL: A Definitive Guide

John M HarperJohn M Harper
7 min read

In PostgreSQL indexing, B-tree indexes have long been the default choice for most use cases. However, BRIN indexes are often considered a more space-efficient alternative when dealing with multi-billion-row tables. But do they live up to the hype? This post will systematically benchmark B-tree and BRIN indexes in PostgreSQL using various query patterns. We'll also cover essential optimization techniques to maximize the performance of both index types.


📦 Data Setup: A Multi-Billion Row Web Log Table

We generated a synthetic web log table with 1 billion rows for our benchmarks, mimicking common web server logs. Please see the creation script for all columns:

  • created_atTimestamps spanning 180 days

  • status_codeHTTP status codes (e.g., 200, 404, 500)

  • http_methodHTTP methods (e.g., GET, POST)

  • id: UUID values

We created the following indices:

  • brin_web_log BRIN index on (created_at)

  • btree_web_log_target B-tree index on (status_code, http_method, created_at)

Download the Scripts


🛠️ Benchmarking Queries and Scripts

Please note that we duplicated the web_log table and ran the following queries against it (_brin, _btree, and _noix).

Query 1: Last 7 Days of GET Requests with Specific Status Codes

SELECT id, user_ip, path, http_method, status_code, created_at
FROM web_log_brin
WHERE created_at >= NOW() - INTERVAL '7 days'
  AND status_code IN (500, 404)
  AND http_method = 'GET'
ORDER BY created_at DESC
LIMIT 100;

Query 2: Specific Status Code and HTTP Method

SELECT id, user_ip, path, http_method, status_code, created_at
FROM web_log_brin
WHERE status_code = 404
  AND http_method = 'GET';

Query 3: Full Range Scan with Sorting

SELECT id, user_ip, path, http_method, status_code, created_at
FROM web_log_brin
ORDER BY created_at DESC
LIMIT 10000;

Query 4: Large Range Scan (Last 180 Days)

SELECT id, user_ip, path, http_method, status_code, created_at
FROM web_log_brin
WHERE created_at >= NOW() - INTERVAL '180 days';

Benchmark Results:

QueryNo Index (ms)B-tree (ms)BRIN (ms)Best Performer
Query 1 - Last 7 Days11863.68329.2611472.956B-tree
Query 2 - 404 + GET3166.48811503.16811836.658No Index
Query 3 - Full Range Scan11763.5251996.29915997.571B-tree
Query 4 - 180 Days Range30939.19636847.48634870.932No Index

⚡ Optimization Recommendations

  • For Targeted Queries: Use B-tree for precise lookups and backward scans.

  • For Large Range Scans: Partition by created_at and apply BRIN indexes to each partition.

  • For High Cardinality Data: Consider partial indexes or specialized BRIN configurations with smaller pages_per_range values.

  • Order/Cluster Data: BRIN indexes do not function well if the data is not ordered. I’ll put together another script that you can download from my Git. This is blog entry 002.


📖 Further Reading and Documentation

Other Notes:

UUIDs

I used uuid v1 because that method generates time-aware guids; however, you can use uuid v4, which is superior in avoidance of key collision. If you use v4 UUIDs, add a time attribute so the data can be sorted correctly. Also, note that if you use v4 UUIDs, you cannot “vacuum full analyze” without destroying the critical time order that brin indexes rely on. Your way around this is to add a time attribute and use that in your table clustering.

V7 UUIDs are not inherently available to PostgreSQL versions. PostgreSQL 17 is a fantastic technology, but if you, like me, have to work with a version back by policy, you cannot use V7 UUIDs. V7 UUIDs have a time element just as ULIDs do. I am anxious and excited to use V7 as soon as the enterprise I work for moves to PostgreSQL 17.

Pages Per Range

I performed extensive testing with the pages_per_range. There was documentation about the brin index growing if the pages_per_range were too small; however, I did not see the index size increase much. I decreased the page size and found that the index performed better. I have not fully explored this part of Brin indexing; however, I believe it may be key in getting Brin indexes to perform faster, as they did for me.

Vacuum Full vs Vacuum

When maintaining PostgreSQL databases, it's crucial to understand how to manage table bloat and reclaim disk space. Two standard maintenance commands are VACUUM ANALYZE and VACUUM FULL ANALYZE. Although they may seem similar, they serve different purposes and have distinct impacts on your database.

🔍 Key Differences

AspectVACUUM ANALYZEVACUUM FULL ANALYZE
FunctionMarks dead tuples as reusable. Updates table statistics. Does not shrink the table size.Removes dead tuples and physically rewrites the table. Shrinks the table size significantly. Updates table statistics.
LocksAcquires a Share Update Exclusive Lock – allows reads and writes during execution.Acquires an Access Exclusive Lock – blocks all reads and writes during execution.
Disk SpaceDoes not reduce table size, but reclaims space for future inserts.Reduces table size by removing dead tuples and compacting data.
Index RebuildDoes not rebuild indexes. Only marks dead tuples for removal.Rebuilds all indexes, potentially improving index performance.
Execution TimeGenerally faster as it only marks dead tuples without rewriting the table.Slower due to the full table rewrite and index rebuild.
Use CasesRoutine maintenance to reduce bloat and improve performance.Suitable for severe table bloat or after massive deletions to reclaim disk space.

Why Use VACUUM FULL ANALYZE?

While VACUUM ANALYZE is typically sufficient for routine maintenance, but there are scenarios where VACUUM FULL ANALYZE is the better choice:

  1. Disk Space Reclamation:

    • Physically shrinks the table size by removing dead tuples and consolidating the remaining tuples.
  2. Enhanced Index Performance:

    • Rebuilding indexes can improve query performance, especially if the table was previously bloated.
  3. Reduced Fragmentation:

    • Compacting the table reduces fragmentation, leading to faster sequential scans.
  4. Improved Visibility Map:

    • The visibility map is updated, making autovacuum processes more accurately.

📌 Best Practice:

Avoid running VACUUM FULL As part of your regular maintenance routine, it locks the table, preventing reads and writes. Instead, reserve it for emergencies where significant bloat impacts performance or disk space. However, running the following query and executing VACUUM FULL, as often as is required will keep your database in top shape. If you perform large truncate/load, inserts, deletes, and update transactions against a table, reducing bloat and defragging the table via the vacuum full operation is a good idea.


🛠️ Finding Tables That Need VACUUM FULL

Here’s a handy query to detect tables with high bloat:

-- do I need to vacuum full?
  SELECT  schemaname, tablename, 
          pg_size_pretty(pg_table_size(c.oid)) AS table_size,
          pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
           (pg_total_relation_size(c.oid) - pg_relation_size(c.oid)) AS bloat_bytes
    FROM  pg_class c
              JOIN pg_namespace n ON n.oid = c.relnamespace
   WHERE  relkind = 'r'
ORDER BY  bloat_bytes DESC
LIMIT 10;

This query lists the top 10 tables with the most bloat, helping you prioritize maintenance.

Choosing between VACUUM ANALYZE and VACUUM FULL ANALYZE comes down to your use case. If your goal is routine maintenance, stick with VACUUM ANALYZE. Reclaiming disk space and compacting a highly bloated table VACUUM FULL ANALYZE is the right choice. Understanding the differences and best practices allows you to maintain optimal database performance without accidentally locking your tables.


🛠️ Final Thoughts: BRIN or B-tree?

BRIN indexes are a fantastic tool for reducing index size and scan ranges in large, well-ordered datasets. However, when data is not physically ordered or queries are highly targeted, B-tree remains the optimal choice. For mixed workloads, a hybrid approach — combining BRIN for range scans and B-tree for targeted lookups — offers the best balance of performance and storage efficiency.

✅ Credits and References

This post was developed with the assistance of ChatGPT and provides comprehensive insights into PostgreSQL indexing strategies and performance analysis. Stay tuned for more PostgreSQL optimization tips and tricks!

0
Subscribe to my newsletter

Read articles from John M Harper directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

John M Harper
John M Harper