everything everywhere postgres.

Aatir NadimAatir Nadim
17 min read

The world revolves around data. Economies are driven on data, intelligence is shaped on data, conglomerates rise and fall on data and most importantly, social mindset is constructed on data, whether by the consumer or not.

You need to persist the ingested data to recall it at some later time for intelligent decisions, performing analysis, general trends, or simply to provide information.
Data can be stored as binary on storage devices, your basic hard disks, memory cards, optical drives, and so on. Depending on your use case, the traffic that these storage devices will address, the sensitivity and vitality of data and its nature, you can represent it in some file system or a database (to provide storage, access and management of the raw data).
You need only deal with this data management system, and not forcibly concern yourself with its internal dealings with the kernel. This is an excellent example of abstraction.

“We, all of us, as an organisation, as a team, as an individual, we all deal in abstraction on some or more levels.”

Upon various factors as listed below, you might differentiate between a File System and a Database Management System or DBMS:

Feature / ConcernFile System ApproachDBMS Approach
Data StructureUnstructured / Loosely Structured.Schema-on-Write. Mostly, A schema (the "blueprint" of the data) is strictly defined and enforced.
Data Integrity & RulesNone.High (Constraints). Enforces data integrity through constraints like PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, etc.
Concurrency ControlMultiple access processes will cause race condition and data corruption.Most databases weigh strongly on consistency and are ACID-compliant.

Further differences can be made on Efficient Searching Algorithms, Data Relations, Security, etc.

In case of industry usage, which would require fine-grained security, concurrency management, data consistency, relation mapping, DBMS is the clear winner.

What is DBMS?

On a technical level, you can think of a DBMS as an expert negotiator between your application's data needs and the computer's operating system (OS) and hardware.

Its primary goal is to persist and retrieve data from secondary storage (like SSDs or hard disks) in the most robust and efficient way possible (create, maintain and control access). It achieves this through highly optimised conversations with the OS kernel.

A DBMS does not leave data access and manipulation to the kernel itself, it orchestrates how the data is stored, where it is stored, and in what manner (maybe grouping several writes together). It works with database pointers and write logs to sync the data and maintain atomicity.

In essence, a DBMS is engineered to be a master of storage efficiency and concurrency control, all in service of the simple, universal need to Create, Read, Update, and Delete data.

Types of DBMS

Asking the right (and complete) questions formulates half the answer. Once your requirements and your resources are clear, choosing on the right DBMS becomes logical decision making.
There is a reason a specific kind of DBMS exists and is preferred over another.

  1. What is the nature of the data? Is it structured like a spreadsheet, a nested document like a JSON file, or a network of relationships like a social graph?

  2. What are the access patterns? Will the application perform many small reads and writes (OLTP - Online Transaction Processing), or a few complex queries across huge datasets (OLAP - Online Analytical Processing)?

  3. What are the scaling and reliability requirements? Does the system need to run on a single, powerful server, or be distributed across hundreds of machines around the world? What the significance and sensitivity of data? Does it require strong consistency, high availability, both?
    This brings in the crucial CAP Theorem. Among Consistency, Availability and Partition Tolerance, you can only afford two features at a time for your database.

  4. How much work is involved in a single operation? Is an UPDATE a simple value swap, or does it require complex calculations and trigger a cascade of other changes?

This outline is only for your general understanding and helps comprehend the concept of a given DBMS. Why is it the way it is? Could it be made better?
For the purpose of this blog, we will not delve much into the different kinds of DBMS at your disposal. For further details, you may research them on your own. (In fact, I recommend it. In your own time, of course.)

1. Relational Databases (SQL)

Data is stored in strictly defined tables with rows and columns. Highly structured, predictable data. The schema (the blueprint) is king. Integrity checks, complex queries.
Examples: PostgreSQL, MySQL, Oracle, SQL Server.
Ventures: Banking, E-Commerce, ERP systems

2. NoSQL Databases (Non-relational)

Volume, Velocity, Variety

Semi-structured. Data is stored in flexible, self-describing documents, typically JSON or BSON. Perfect for user profiles, product catalogs, and content management where the schema can evolve.
Examples: MongoDB, Couchbase.

3. Key-Value Store

A giant dictionary where every piece of data (the "value") is accessed by a unique "key." The database is agnostic about what the value is.
Examples: Redis, Memcached, Amazon DynamoDB.

4. Wide Column Databases

Optimised for aggregating data by column rather than by row.
Examples: Apache Cassandra, HBase.

5. Graph Databases

Stores data as nodes (entities) and edges (relationships). The relationships are first-class citizens, just as important as the data itself.
Examples: Neo4j, Amazon Neptune.

The Mighty Postgres

The Elephant in the Room

While the "polyglot persistence" model—using many specialized databases for a given application—was gaining traction, PostgreSQL was quietly taking a different path. Instead of forcing developers to stitch together multiple systems.
A powerful question was asked: "What if one system could do most of it, and do it exceptionally well?"

From Academic Project to Global Standard

PostgreSQL's story began in academia, which is core to its DNA of correctness and innovation.

  • The Genesis (Ingres): The project's lineage traces back to the Ingres project at the University of California, Berkeley, in the 1970s, led by computer science luminary Michael Stonebraker.

  • The Rebirth (Postgres): In 1986, Stonebraker led a post-Ingres project, aptly named "Postgres," to incorporate concepts of an "object-relational" model, aiming to overcome the limitations of the other RDBMS specimens of that era.

  • The Open-Source Era (PostgreSQL): In 1996, the project was renamed to PostgreSQL to reflect its modern SQL capabilities. It was released under a liberal open-source license, setting the stage for community-driven development.

The Architectural Secret Weapon: Extensibility

Many relational databases are powerful, but PostgreSQL has a fundamental architectural design that sets it apart: extreme extensibility*.*

This isn't about just adding plugins or extensions as an afterthought. Extensibility is woven into the very fabric of Postgres. It was designed from its "Post-Ingres" days as an object-relational system. This means almost everything inside the database is treated like an object that can be extended:

  • Data Types: Not happy with INTEGER or TEXT? You can create your own custom data types with unique properties and functions.

  • Functions: You can define new functions in various languages (PL/pgSQL, Python, C, etc.) that run inside the database, right next to the data.

  • Operators: You can define your own operators (e.g., a ~= operator for a custom data type).

  • Index Methods: This is the killer feature. Don't think the standard B-tree index is right for your data? You can define entirely new kinds of indexes. The now-famous GiST (Generalized Search Tree), GIN (Generalized Inverted Index), and other index types are a direct result of this architecture.

This architectural choice—to be a powerful RDBMS platform rather than just a product—is the single most important reason for its modern-day dominance.

The Power of a True Community and Its Backers

PostgreSQL's governance model is another key differentiator. It is not controlled by a single company.

  • The Community: The PostgreSQL Global Development Group is a diverse, international community of individuals and companies who collaborate on its development. The development process is famously transparent, meritocratic, and conservative, prioritizing stability and correctness over flashy, unstable features. This ensures the project's longevity and neutrality.

  • The Backers: This community-led approach has fostered a vibrant commercial ecosystem. Rather than competing with the core product, companies build businesses around it. Major players like Crunchy Data and EDB (EnterpriseDB) provide enterprise-grade support, tooling, and certified distributions. Furthermore, every major cloud provider—AWS (with RDS and Aurora), Microsoft Azure, and Google Cloud—has made PostgreSQL a first-class, managed offering, pouring immense resources into its operational excellence and making it accessible to everyone.

This combination of an extensible core, a thriving open source community and a robust commercial ecosystem has allowed PostgreSQL to maintain its reputation as the most reliable RDBMS while simultaneously evolving to meet the challenges of the modern data landscape.

What makes it reliable?

To understand why it's so trusted for mission-critical data, we need to look at how it handles three fundamental challenges: managing simultaneous users (concurrency), ensuring data integrity (ACID compliance), and gracefully handling load.

Concurrency

At the very basic level, you would want a dedicated postgres process to handle every client connection, whether they be concurrent or serial, which is exactly what postgres’ architecture works.

This process-per-connection model is a classic design with profound implications:

  • Pros (The Fortress):

    • Exceptional Stability: If a single connection's backend process encounters a critical error and crashes, it does not affect any other connections or the main database server.

    • Simplicity & Security: It leverages the OS's mature and secure process management and memory protection, simplifying the database's internal code.

  • Cons (The Cost):

    • Resource Heavy: Each process consumes a non-trivial amount of RAM. A server with thousands of active connections would require an enormous amount of memory, which is inefficient.

    • Connection Overhead: The cost of setting up and tearing down a new connection (the process fork, authentication) is relatively high.

This leads to a practical limit on the number of simultaneous connections a PostgreSQL server can handle, often configured via the max_connections setting (typically in the low hundreds, not tens of thousands).

Enter pg_bouncer

You don’t necessarily have to tear down a db connection just because its transaction has been completed. Within a time period, you may as well use it for another transaction. The previous transaction does not affect this new transaction in any way, all you really want is the ability to make intelligent calls to the OS kernel, i.e. you only want the db connection process itself.

This forms the basic of connection pooling.

Enter the connection pooler, the most famous of which is PgBouncer.

PgBouncer is not part of the core PostgreSQL server; it's a lightweight, highly-optimised external program that sits between your application and the database. It acts like an efficient, intelligent doorman at an exclusive club.
PgBouncer maintains a "pool" of real, persistent connections to the actual PostgreSQL server. Your application, instead of connecting to Postgres directly, connects to PgBouncer (which is very fast and cheap). When your app sends a query, PgBouncer "borrows" an actual connection from its pool, sends the query to Postgres, gets the result, and returns it to your app. Crucially, it then immediately returns the real connection to the pool, making it available for another application's request (like a bucket of tokens).

Most importantly, If all the pooled connections are in use, PgBouncer will make new incoming clients wait in a queue until a connection becomes free (with an element of expiry of course), which is a far more graceful way to handle load than your application receiving a "too many connections" error.

ACID Compliance

Writers don't block readers, and readers don't block writers.

When you UPDATE a row, Postgres does not overwrite the existing row data.
Instead, it creates a new version of that row elsewhere and marks the old version as "expired" for any transactions that begin after this one.
Each transaction gets a "snapshot" of the database at the moment it begins. It can only "see" the row versions that were valid at that time.
This means a long-running SELECT query will continue to see the old, consistent data, completely unaware of any UPDATEs or DELETEs that happen while it's running.

The only time blocking occurs is when two transactions try to modify the exact same row at the same time. A background process called VACUUM is responsible for later cleaning up the old, expired row versions to reclaim space.

Postgres makes use of Write Ahead Log (WAL) architecture for durability.

This ensures:

  • Durability: If the server crashes, upon restart, PostgreSQL simply reads the WAL and "replays" any transactions that were logged but hadn't yet been written to the main data files. No committed data is ever lost.

  • Atomicity: A transaction is only considered complete if its COMMIT record is in the WAL. If a crash happens mid-transaction, there's no commit record, and the partial changes are simply discarded during recovery. The transaction either happened completely or not at all.

Basics

For the purpose of demonstration, we will consider some basic entities and their corresponding relations.

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(user_id),
    product_id INT NOT NULL REFERENCES products(product_id),
    quantity INT NOT NULL CHECK (quantity > 0),
    ordered_at TIMESTAMPTZ DEFAULT now()
);

Querying the data:

SELECT
    u.email,
    p.name,
    o.quantity
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN products p ON o.product_id = p.product_id
WHERE p.name = 'Quantum Keyboard';

The Star: JSON Handling

ALTER TABLE products ADD COLUMN attributes JSONB;

-- Now let's add a product with rich, nested attributes
INSERT INTO products (name, price, attributes) VALUES
('Noise-Cancelling Headphones', 249.99,
 '{
    "brand": "AudioPhile",
    "color": "Midnight Blue",
    "wireless": true,
    "specs": {
        "driver_size_mm": 40,
        "battery_life_hrs": 30
    },
    "features": ["Active Noise Cancellation", "Ambient Mode"]
 }');

-- Find all products that are 'Midnight Blue'
-- The `->>` operator extracts a field as text.
SELECT name, price FROM products
WHERE attributes ->> 'color' = 'Midnight Blue';

-- Find all products that have Active Noise Cancellation
-- The `@>` operator checks if the left JSON contains the right JSON.
SELECT name FROM products
WHERE attributes -> 'features' @> '["Active Noise Cancellation"]';

To make these queries lightning-fast on millions of rows, you simply create a GIN (Generalized Inverted Index).

CREATE INDEX idx_products_attributes_gin ON products USING GIN (attributes);

Extensiblility:

Custom functions in select languages

-- First, enable the Python language extension (once per database)
CREATE EXTENSION IF NOT EXISTS plpython3u;

CREATE OR REPLACE FUNCTION calculate_discount(price NUMERIC, discount_pct INT)
RETURNS NUMERIC AS $$
    if price is None or discount_pct is None:
        return None
    if not (0<= discount_pct <= 100):
        raise ValueError("Discount percentage must be within 0 and 100")
    discounted_price = price * (1 - (discount_pct / 100.0))
      return round(discounted_price, 2)
$$ LANGUAGE plpython3u;

-- Use it directly in a query!
SELECT name, price, calculate_discount(price, 15) AS discounted_price
FROM products;

Elevate Performance:

#include "postgres.h"
#include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(calculate_discount_c);

Datum
calculate_discount_c(PG_FUNCTION_ARGS)
{
    float8 price = PG_GETARG_FLOAT8(0);
    int32 discount_pct = PG_GETARG_INT32(1);
    float8 result;

    result = price * (1.0 - (discount_pct / 100.0));

    PG_RETURN_FLOAT8(result);
}

Note the postgres header file.

Compile and link into a shared library (.so file).

CREATE FUNCTION calculate_discount_c(float8, integer)
RETURNS float8
AS '/path/to/your/library/discounts.so', 'calculate_discount_c'
LANGUAGE C STRICT;

Custom indices

-- Create an index on the *result* of the lower() function
CREATE INDEX idx_users_email_lower ON users (lower(email));

-- Now, this query becomes incredibly fast, as it can use the index directly!
SELECT user_id, email FROM users
WHERE lower(email) = 'some.user@example.com';

Custom Data Type

The attitude of treating every value as an object pays off here.

-- Define a new composite type for addresses
CREATE TYPE full_address AS (
    street_address TEXT,
    city TEXT,
    postal_code VARCHAR(10)
);

-- Use the new type in a table
ALTER TABLE users ADD COLUMN shipping_address full_address;

-- Insert data using the ROW constructor
UPDATE users
SET shipping_address = ROW('123 Elephant Way', 'Postgresville', '54321')
WHERE user_id = 1;

-- Query the fields using dot notation, just like an object
SELECT
    email,
    (shipping_address).city,
    (shipping_address).street_address
FROM users
WHERE user_id = 1;

This ability to mold the database to the shape of your data—not the other way around—is the hallmark of modern PostgreSQL.

Next, we will explore the incredible results of this extensibility, seeing how PostgreSQL has natively integrated the capabilities of Document, Geospatial, and even AI/Vector databases.

Immense Pluggability

Postgres is a platform built for evolution. Period.

By now, we've seen that PostgreSQL was architecturally designed for extensibility. The reason it has established itself as a multi-model data platform is due to its other, equally important characteristic: it is a truly community-driven, open-source project.

The entire PostgreSQL codebase is open for inspection.

The general process of this evolution is:

  1. A Niche Need Arises: A company needs to store and query highly specialized data, like geospatial coordinates or time-series metrics, and finds the existing tools lacking.

  2. Innovation is Possible: Instead of building a whole new database from scratch, they see that Postgres's extensible architecture provides the "hooks" (like custom index types and data types) to add this new functionality.

  3. Community Development: They develop an extension. Because the project is open, they often release their extension to the public.

  4. Battle-Hardening: Other developers and companies with the same problem discover the extension. They use it, find bugs, contribute patches, and suggest improvements. The extension matures rapidly through collective effort.

  5. Ecosystem Maturity: The extension becomes a de facto standard, trusted by the industry and often backed by commercial and community support.

Have a glance at this site to get the picture: Postgres - Software Catalogue

A Showcase of Game-Changing Extensions

1. PostGIS: The Geospatial Powerhouse

  • Use Case: Turns PostgreSQL into a full-featured, standards-compliant Geographic Information System (GIS). It introduces new data types (geometry, geography) for storing points, lines, and polygons, and hundreds of functions for querying spatial relationships.

  • Origin: Developed initially by Paul Ramsey and Refractions Research, now maintained by a wide group of contributors.

  • Industry Usage: Ubiquitous. It is the de facto standard for storing and querying geospatial data in open-source stacks, logistics and delivery apps, etc.

-- This command loads all the PostGIS types, functions, and operators
-- into the current database.
CREATE EXTENSION IF NOT EXISTS postgis;

CREATE TABLE famous_landmarks (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    -- The 'geography' type is provided by PostGIS.
    -- 'Point' specifies we are storing single points.
    -- '4326' is the Spatial Reference System Identifier (SRID) for WGS 84,
    -- the standard used by GPS systems worldwide.
    location geography(Point, 4326)
);

2. TimescaleDB: The Time-Series Titan

  • Use Case: Optimizes PostgreSQL for time-series data—data points indexed by a timestamp. This includes IoT sensor readings, financial market data, and DevOps metrics.

  • Origin: Developed by the company Timescale.

  • Industry Usage: Widespread in IoT, industrial monitoring, FinTech, and observability platforms.

Setup timescaledb as shared_preloaded_library in the postgres.conf file. This is a special requirement for TimescaleDB to allow it to hook deeply into PostgreSQL's core.

# In your postgresql.conf file:
shared_preload_libraries = 'timescaledb'
-- Create the extension
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- 1. Create the regular SQL table
CREATE TABLE device_metrics (
    -- The time column is essential. TIMESTAMPTZ is recommended.
    time        TIMESTAMPTZ       NOT NULL,
    device_id   TEXT              NOT NULL,
    temperature  DECIMAL(5, 2),
    humidity     DECIMAL(5, 2)
);

-- 2. Convert it into a hypertable, partitioned by the 'time' column. 
-- <Look up hypertable, it is an interesting concept>
SELECT create_hypertable('device_metrics', 'time');

INSERT INTO device_metrics (time, device_id, temperature, humidity) VALUES
    (NOW() - INTERVAL '7 minutes', 'device_A', 24.5, 60.1),
    (NOW() - INTERVAL '6 minutes', 'device_A', 24.6, 60.3),
    (NOW() - INTERVAL '1 minute',  'device_A', 25.1, 61.0),
    (NOW(),                       'device_B', 22.1, 56.0);

3. pg_vector: The AI/ML Enabler

  • Use Case: Adds a new vector data type and the ability to perform efficient similarity searches on it. Used for storing "embeddings"—numerical representations of text, images, or other data.

  • Origin: Developed by Andrew Kane (ankane), a prolific developer in the database space.

  • Industry Usage: Exploding in popularity. It has become one of the go-to solutions for developers building AI-native applications.

CREATE EXTENSION vector;

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    -- The vector type, with 3 dimensions.
    embedding vector(3)
);

INSERT INTO documents (content, embedding) VALUES
    ('An orange is a fruit', '[0.9, 0.1, 0.1]'),
    ('An apple is a fruit', '[0.8, 0.15, 0.2]'),
    ('A car is a vehicle', '[0.1, 0.2, 0.9]');

-- The query vector representing "What is a common fruit?"
SELECT
    id,
    content,
    -- Calculate the cosine distance between the stored embedding and our query vector
    embedding <=> '[0.85, 0.1, 0.15]' AS distance
FROM
    documents
ORDER BY
    -- Order by the distance to find the closest matches first
    distance
LIMIT 5;

4. Citus: The Scaling Specialist

  • Use Case: Transforms PostgreSQL into a distributed, horizontally scalable database.

  • Origin: Developed by Citus Data, which was later acquired by Microsoft.

  • Industry Usage: Heavily used in multi-tenant SaaS applications (where each customer's data can be sharded), real-time analytics dashboards, and any application needing to process terabytes of data with high throughput.

5. pg_stat_statements: The Performance Detective

  • Use Case: An operational, not a data-type, extension.

  • Origin: Part of the official PostgreSQL "contrib" modules, meaning it's developed and maintained by the core PostgreSQL community.

  • Industry Usage: Universal. It is considered an essential, non-negotiable tool for any production PostgreSQL database. DBAs and developers use it as the first step in diagnosing performance problems.

The PostgreSQL Extension Network (PGXN) hosts hundreds of other such extensions.

Conclusion

Why bother setting up multiple vendors, when you can do it all in one?

This is the general attitude, among the masses, who don’t like to or don’t prioritise explicitly configuring polyglot persistence. For specialised software this may even not make sense, since for CRUD of a given kind of data, you would want some dedicated service.

The case is similar for Postgres, except that the multiple other vendors are simply extensions which can be plugged to a database. Every extension provides dedicated service to the corresponding kind of data it is built around, and accounting for the active open source element and strong corporate backing, Postgres is evolved as a platform for all your needs. It is the ultimate moat.
It ensures that no matter what the next big trend in data is, a community of brilliant developers is likely already working on an extension to make Postgres master it.

0
Subscribe to my newsletter

Read articles from Aatir Nadim directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aatir Nadim
Aatir Nadim