Modern Database Architecture

Ruban SahooRuban Sahoo
32 min read

Table of contents

In this blog, we will explore the critical database and communication patterns that power today's high-performance applications. From the foundational concepts of pessimistic locking in relational databases to the architectural decisions behind Slack's real-time messaging system, we'll navigate through the complex trade-offs that define modern system design.

Whether you're looking to build a scalable key-value store on top of a traditional SQL backend, understand when to adopt NoSQL solutions versus scaling your relational databases, or implement websocket architectures that can support millions of concurrent connections, this comprehensive guide will provide both the theoretical framework and practical implementation details.

We'll also demystify graph databases, examining their specific use cases, inherent advantages, and situations where alternative solutions prove more effective. Join us as we dissect these interconnected technologies through real-world examples, performance benchmarks, and architectural patterns that can be applied to your next system design challenge.

A. Pessimistic Locking on Relational DBs

1. Relational Databases

  • Data is stored and represented in rows and columns.

  • Key highlight of relational database is relations.

  • Relational databases are known for ACID properties.

Database indexes

Indexes make reads faster and writes slower.

2. How indexes make reads faster?

Indexes dramatically accelerate reads by creating efficient search structures that avoid scanning the entire table:

  1. Organised Data Structures:

    • Indexes typically use B-trees, hash tables, or similar structures optimised for searching

    • These maintain data in a sorted or hashed order for quick lookups

  2. Direct Access Path:

    • Without indexes: Database must scan every row (O(n) complexity)

    • With indexes: Database can jump directly to relevant data (often O(log n) or O(1))

  3. Query Optimisation:

    • Covering indexes can satisfy queries without accessing the table at all

    • The query optimiser uses statistics about indexes to choose efficient execution plans

  4. Real-world Example:

    • Finding "Smith" in a phone book with 100,000 names:

      • Without index: Check each name sequentially (up to 100,000 comparisons)

      • With index: Use alphabetical ordering to find it in ~17 comparisons (log₂(100,000))

  5. Specialised Index Types:

    • Full-text indexes optimise text searches

    • Spatial indexes accelerate geographic queries

    • Composite indexes speed up multi-column conditions

The performance difference can be dramatic—turning queries that would take minutes into ones that complete in milliseconds.

3. Why indexes make writes slower?

When you write to a database with indexes, several additional operations must occur:

  1. Index Updates: For each index on a table, the database must:

    • Calculate where in the index structure the new entry belongs

    • Insert the new entry into the index data structure (B-tree, hash table, etc.)

    • Potentially rebalance the index structure

  2. Write Amplification: What would be a single write operation becomes multiple write operations:

    • One write to the main table

    • Additional writes to each index

  3. Storage Overhead:

    • Each index requires additional disk I/O operations

    • For some storage engines, this means additional random disk seeks

  4. Locking Considerations:

    • Indexes may require additional locks during updates

    • This can impact concurrency in high-throughput systems

It's essentially a classic speed trade-off: the organisation that makes reading faster (pre-computed structures sorted for quick lookups) requires maintenance work during writes to keep those structures updated and properly organised.

4. Database Locking (Pessimistic Locking)

Core idea: You acquire the lock before proceeding.

Typical Flow:

ACQUIRE_LOCK()
    READ UPDATE -> Critical Section
RELEASE_LOCK()

Two types of locking strategies:

  • shared lock

  • exclusive lock

Why do we need locks?

  • To protect the sanity of the data. Here sanity refers to consistency and integrity.

  • Protecting the data against concurrent updates.

  • Risk: Transactional deadlock. The transaction that detects the deadlock kills itself.

Explanation:

  • T1 → R1: T1 is waiting for R1, which is locked by T2.

  • T2 → R2: T2 is waiting for R2, which is locked by T1.

  • R1 → T2: R1 is held by T2.

  • R2 → T1: R2 is held by T1.

  • The cycle T1 → R1 → T2 → R2 → T1 indicates a deadlock.

Let’s say, transaction T1 wants to acquire a lock on row R1. Row R1 is locked by transaction T2. Transaction T2 wants to acquire a lock on row R2. Now, if transaction T1 wants to acquire a lock on row R2: before the database assigns a lock on a row to any transaction, it runs a deadlock detection algorithm which checks whether the database would go into a deadlock if any transaction acquires a lock on any row. If a deadlock is detected, the database kills the transaction so that the transaction won’t be able to acquire the lock thus preventing deadlock.

  T1 -----> R1 (held by T2)
   ^         |
   |         v
  R2 <------ T2
   (held by T1)

5. Shared Locks

  • reserved for read by the current transaction

  • other transactions can read the locked rows

  • other transactions cannot modify the locked rows

  • if the current transaction wants to modify then the locks will be elevated to an exclusive lock

Implementation:

SELECT * FROM table_name FOR SHARE;

6. Exclusive locks

  • reserved for write by the current transaction

  • other transactions cannot read

Implementation:

SELECT * FROM table_name FOR UPDATE;

7. Skip locked

Removes the locked rows from the result set.

Implementation:

SELECT * FROM table_name FOR UPDATE SKIP LOCKED;

8. No wait

Locking read does not wait for the lock to be acquired. It fails immediately if the row is locked. If the row is locked kill the transaction.

SELECT * FROM table_name FOR UPDATE NOWAIT;

ERROR 3572: Do not wait for lock.

9. Design: Airline Check-in system

To understand the above concepts we will be designing an airline check-in system, here are the requirements:

  • Multiple airlines

  • Every airline has multiple flights

  • Each flight has 120 seats

  • Each flight has multiple trips

  • User books a seat in one trip of a flight

  • Handle multiple people trying to pick seats on the flight

airlines

idname
1AIRINDIA
2INDIGO
3GOAIR

flights

idairline_idname
11AIRINDIA 101
21AIRINDIA 279
32INDIGO 6E101

trips

idflight_idfly_time
1108/05/2025 10:00
2109/05/2025 10:00
3208/05/2025 09:00
4209/05/2025 12:00
5308/05/2025 07:00

users

idname
1Ruban
2Shikhar
3Shrey
4Ram
5Krishna

seats

idnametrip_iduser_id
11A11
21B12
31C13
41D14
51E15

Locking demonstration:

How will we handle when all 120 people flying in one flight in the same trip check in at the same time?

Fixed inventory + Contention → Locking

Similar systems: CoWin, IRCTC, BookMyShow, Flash Sale

-- Selecting the first available seat and exclusively locking it
SELECT id, name, trip_id, user_id FROM seats
WHERE trip_id = 1 and user_id IS null
ORDER BY id LIMIT 1
FOR UPDATE;

Let’s say we have 120 threads that are concurrently scheduled to run the above transaction, each thread wants to book a seat for a user. After execution gets completed, we see that all the seats got booked.

10. Concurrency Control Mechanism

The system successfully allocates exactly 120 seats to 120 users through the following process:

  1. Lock Acquisition Phase

    • All 120 transactions concurrently execute the SELECT ... FOR UPDATE statement

    • One transaction (T₁) successfully acquires an exclusive lock on the first available row

    • The remaining 119 transactions are blocked, waiting for this lock to be released

  2. Resource Modification Phase

    • Transaction T₁ updates the row, setting user_id to its passenger identifier

    • T₁ commits the transaction, which releases the lock on the modified row

  3. Lock Release and Notification Phase

    • The database notifies waiting transactions that the lock has been released

    • The waiting transactions are awakened and compete for the next available row

  4. Sequential Resolution

    • Another transaction (T₂) acquires a lock on the next available row

    • Steps 2-3 repeat with T₂

    • This process continues sequentially until all 120 transactions have completed

  5. Final State

    • All 120 seats have been allocated to 120 different users

    • No overbooking occurs despite the concurrent execution attempts

    • All 120 transactions complete successfully

11. Key Database Mechanisms in Play

  1. Row-Level Locking: The FOR UPDATE clause ensures exclusive access to the selected row

  2. Transaction Isolation: Each transaction operates independently within its isolation level

  3. Lock Management: The database maintains a queue of waiting transactions

  4. Transaction Scheduling: After a lock release, the database scheduler determines which waiting transaction proceeds next

This approach ensures that even with massive concurrency, each seat is allocated exactly once, maintaining the integrity of the booking system.

12. Performance Tuning: The Magic of SKIP LOCKED

When designing high-concurrency systems like our airline check-in platform, performance bottlenecks often emerge from lock contention. One particularly powerful but underutilised feature for addressing this issue is SQL's SKIP LOCKED clause. Let's examine how a simple modification to our seat allocation query can yield dramatic performance improvements.

The Bottleneck Problem

With our original query, when 120 concurrent transactions execute:

  1. The first transaction acquires a lock on the first available row

  2. The remaining 119 transactions all wait in line for this specific row

  3. After the first transaction completes, the second transaction processes the same row, discovers it's now taken, and must re-evaluate the query

  4. This pattern continues, creating a sequential processing situation despite concurrent attempts

This results in what we call "lock convoy" behaviour - threads lining up for the same resource, dramatically reducing throughput.

The SKIP LOCKED Solution

By modifying our query to include SKIP LOCKED:

SELECT id, name, trip_id, user_id FROM seats 
WHERE trip_id = 1 AND user_id IS NULL 
ORDER BY id LIMIT 1 
FOR UPDATE SKIP LOCKED;

We completely transform the concurrent behaviour:

  1. Transaction A attempts to lock row 1

  2. Simultaneously, Transaction B skips row 1 (since it's locked) and locks row 2

  3. Transaction C skips rows 1 and 2 and locks row 3

  4. And so on...

Instead of a queue of transactions waiting for the same row, we have transactions working in parallel on different rows.

Performance Improvement: 90% Reduction in Processing Time

Our benchmarks show this simple change reduces total processing time by approximately 90% when handling 120 concurrent seat bookings:

ApproachTime to Process 120 Concurrent Bookings
FOR UPDATE~1.79 s
FOR UPDATE SKIP LOCKED~147.47 ms

This dramatic improvement occurs because:

  1. Parallel Processing: Multiple transactions can proceed simultaneously instead of waiting in a queue

  2. Reduced Lock Wait Time: Transactions don't waste time waiting for locks on rows they'll never use

  3. Optimised Resource Utilisation: Database resources are used more efficiently with distributed locks

Implementation Considerations

When implementing SKIP LOCKED, keep in mind:

  • It's available in PostgreSQL 9.5+, MySQL 8.0+, and Oracle databases

  • It slightly changes the semantics of your query (it will skip locked rows rather than wait for them)

  • It may not be appropriate for all use cases (e.g., when processing must occur in a specific order)

For high-concurrency systems like our airline check-in where order of seat allocation isn't critical, SKIP LOCKED represents one of the most impactful performance optimisations available. This small change transforms a sequential bottleneck into a parallel processing opportunity, delivering near-linear scalability as concurrency increases.

Remember: Sometimes the most significant performance gains come not from hardware upgrades or complex code refactoring, but from understanding and properly utilising the database features designed specifically for your concurrency pattern.

B. Designing a scalable SQL-backed Key-Value Store

Building a scalable key-value store on top of a relational database combines the familiar ACID guarantees of SQL with the simplicity of NoSQL interfaces. This hybrid approach allows us to leverage existing database infrastructure while providing the flexibility and performance characteristics that modern applications demand. Let's walk through designing a single-node KV store that can serve as the foundation for distributed systems like Apache Ignite.

1. System Requirements

Our KV store must support four fundamental operations with infinite scalability potential:

  • GET: Retrieve value by key

  • PUT: Store/update key-value pairs

  • DELETE: Remove key-value pairs

  • TTL: Time-based expiration of keys

2. Storage Layer Design

Initial Schema Structure

We start with MySQL as our storage backend, designed for easy horizontal scaling as demand grows:

CREATE TABLE store (
    key VARCHAR(255) PRIMARY KEY,
    value TEXT,
    ttl BIGINT,           -- Absolute expiration time in epochs
    is_deleted BOOLEAN DEFAULT FALSE
);

Storage Optimisation: Eliminating Redundant Columns

The is_deleted column introduces unnecessary storage overhead. Instead, we can use a special TTL value to indicate deletion:

CREATE TABLE store (
    key VARCHAR(255) PRIMARY KEY,
    value TEXT,
    ttl BIGINT            -- Absolute expiration time in epochs OR -1 for soft deletion
);

This optimisation:

  • Reduces storage footprint by 25%

  • Simplifies queries by using a single condition check

  • Maintains the same functional behaviour

Hard deletion is handled through batch cleanup processes that run periodically, minimising I/O operations and database rebalancing overhead.

3. Core Operations Implementation

PUT Operation: Leveraging Database Upserts

Rather than implementing separate INSERT and UPDATE logic, we use database-native upsert operations:

MySQL Implementation:

REPLACE INTO store VALUES (?, ?, ?);

PostgreSQL Implementation:

INSERT INTO store VALUES (?, ?, ?) 
ON CONFLICT (key) 
DO UPDATE SET value = EXCLUDED.value, ttl = EXCLUDED.ttl;

This approach eliminates race conditions and reduces the complexity of handling existing vs. new keys.

4. Handling Concurrent Updates

For scenarios requiring strong consistency guarantees during updates, we implement pessimistic locking:

-- Acquire lock with immediate failure if unavailable
SELECT * FROM store WHERE key = ? FOR UPDATE NOWAIT;
UPDATE store SET value = ?, ttl = ? WHERE key = ?;

The choice between NOWAIT and SKIP LOCKED depends on your consistency requirements:

  • NOWAIT: Fails fast if lock unavailable (strong consistency)

  • SKIP LOCKED: Continues with next available resource (eventual consistency)

5. TTL Implementation: Three Strategic Approaches

Approach 1: Batch Deletion with CRON Jobs (Disk-based Databases)

Ideal for traditional disk-based storage systems:

-- Periodic cleanup job
DELETE FROM store WHERE ttl <= UNIX_TIMESTAMP() AND ttl != -1;

Key optimisation: All GET operations must filter expired keys:

SELECT value FROM store WHERE key = ? AND (ttl > UNIX_TIMESTAMP() OR ttl = -1);

This server-side filtering saves:

  • Network I/O bandwidth

  • Client-side processing overhead

  • Unnecessary data transfer for large values

Approach 2: Lazy Deletion (In-memory Databases)

Perfect for memory-constrained environments:

  • Delete expired keys only when accessed

  • Immediate memory reclamation (similar to free() in C)

  • Backup CRON job handles never-accessed keys

Approach 3: Random Sampling & Deletion (Redis style Optimisation)

Based on statistical sampling principles:

def cleanup_expired_keys():
    while True:
        sample = randomly_select_keys_with_ttl(20)  # Sample size from Central Limit Theorem
        expired_keys = [k for k in sample if k.ttl <= now()]
        delete_keys(expired_keys)

        if len(expired_keys) / len(sample) <= 0.25:
            break  # Less than 25% expired indicates clean population

This approach leverages the Central Limit Theorem: if our sample has <25% expired keys, the overall population likely has <25% expired keys.

6. DELETE and GET Operations

Soft Delete Implementation

UPDATE store SET ttl = -1 WHERE key = ? AND ttl > UNIX_TIMESTAMP();

Benefits:

  • Reduces immediate disk I/O load

  • Maintains referential integrity during high-traffic periods

  • Allows for potential key recovery before hard deletion

Batch Cleanup Process

DELETE FROM store WHERE ttl <= UNIX_TIMESTAMP(); -- Handles both expired and soft-deleted keys

Optimised GET Operation

SELECT value FROM store WHERE key = ? AND ttl > UNIX_TIMESTAMP();

This single query handles both key existence and expiration checks efficiently.

7. Performance Characteristics

This design achieves high performance through several key optimisations:

  1. Reduced Network Overhead: Server-side expiration filtering

  2. Optimised Storage: Elimination of redundant columns

  3. Concurrent Safety: Strategic use of database locking mechanisms

  4. Efficient Cleanup: Multiple TTL strategies based on storage medium

  5. Native Database Features: Leveraging upserts and atomic operations

The foundation we've built here scales vertically within a single node and provides the architectural patterns necessary for horizontal scaling across distributed systems.

8. High level architecture

As our single-node KV store reaches capacity limits, we need to scale horizontally. The scaling strategy follows a deliberate progression based on traffic patterns and load characteristics. Now, we'll explore how these principles extend to multi-node architectures and the additional complexities of distributed consensus and data partitioning.

Application Layer Scaling

The first scaling bottleneck typically occurs at the application layer. Since our KV store operations are stateless, we can add multiple API servers behind a load balancer:

Load Balancer → [API Server 1, API Server 2, API Server 3, ...] → MySQL

Each API server handles the same operations (GET, PUT, DELETE, TTL) independently, with all servers connecting to the same MySQL instance.

Read Scaling with Replicas

When MySQL becomes the bottleneck and we observe a high read-to-write ratio, we introduce read replicas:

Conditions for Adding Read Replicas:

  • Read-to-write ratio is approximately 99:1

  • Application can tolerate eventual consistency for read operations

  • Master database CPU/memory is saturated primarily by read queries

Architecture with Read Replicas:

Load Balancer → [API Servers] → Master (Writes) 
                             ↘ Replica 1 (Reads)
                             ↘ Replica 2 (Reads)
                             ↘ Replica 3 (Reads)

Implementation Strategy:

-- Route writes to master
WRITE_DB_CONNECTION.execute("REPLACE INTO store VALUES (?, ?, ?)")

-- Route reads to replicas (with round-robin or least-connections)
READ_DB_CONNECTION.execute("SELECT value FROM store WHERE key = ? AND ttl > UNIX_TIMESTAMP()")

Write Scaling Through Vertical and Horizontal Partitioning

When the master database becomes write-bound despite read replica scaling:

Phase 1: Vertical Scaling

  • Increase CPU cores and memory on the master instance

  • Optimize disk I/O with faster storage (NVMe SSDs)

  • Fine-tune database parameters for write-heavy workloads

Phase 2: Horizontal Partitioning (Sharding)

When vertical scaling reaches economic or technical limits, we implement sharding:

Sharding Strategy: Each master node owns an exclusive fragment of the keyspace, ensuring no data overlap between shards.

def get_shard(key, num_shards):
    return hash(key) % num_shards

def route_request(operation, key, value=None, ttl=None):
    shard_id = get_shard(key, NUM_SHARDS)
    shard_master = SHARD_MASTERS[shard_id]

    if operation == "GET":
        return shard_master.execute("SELECT value FROM store WHERE key = ? AND ttl > UNIX_TIMESTAMP()", [key])
    elif operation == "PUT":
        return shard_master.execute("REPLACE INTO store VALUES (?, ?, ?)", [key, value, ttl])
    elif operation == "DELETE":
        return shard_master.execute("UPDATE store SET ttl = -1 WHERE key = ?", [key])

Complete Sharded Architecture:

                    Load Balancer
                         |
              [API Servers with Consistent Hashing]
                         |
    ┌───────────────────┼───────────────────┐
    |                   |                   |
Shard 1            Shard 2            Shard 3
Master + Replicas  Master + Replicas  Master + Replicas

Trade-offs and Considerations

Read Replicas:

  • Pros: Simple to implement, handles read-heavy workloads effectively

  • Cons: Eventual consistency, replica lag during high write periods

Sharding:

  • Pros: True horizontal scalability for both reads and writes

  • Cons: Increased complexity, cross-shard operations become expensive, rebalancing challenges

This distributed architecture maintains the core principles of our single-node design while addressing the scale requirements of modern applications. Each scaling phase is triggered by specific performance characteristics, ensuring we only add complexity when necessary.

C. NoSQL Databases and their Trade-offs

1. NoSQL

While SQL databases have dominated the data storage landscape for decades, the rise of web-scale applications has brought NoSQL databases into prominence. Understanding when and why to choose NoSQL over traditional relational databases requires examining the fundamental trade-offs each approach offers.

Understanding NoSQL: Beyond Relational Constraints

NoSQL databases store data in non-relational structures, fundamentally changing how we think about data organisation and access patterns. The key distinguishing characteristic isn't the absence of SQL (many NoSQL databases support SQL-like query languages), but rather the departure from rigid relational schemas and ACID guarantees.

The Central Trade-off: Most NoSQL databases achieve massive scalability and high availability by compromising on strong consistency. They typically operate under eventual consistency models, where data changes propagate across distributed nodes over time rather than immediately.

Important Note: The ability to scale isn't exclusive to NoSQL. Any database that can be effectively sharded can achieve horizontal scalability, including traditional SQL databases.

2. Types of NoSQL databases:

  1. Document databases

    Document databases store data as semi-structured documents, typically in JSON format:

     {
       "_id": "user123",
       "name": "John Doe",
       "email": "john@example.com",
       "address": {
         "street": "123 Main St",
         "city": "San Francisco",
         "zipcode": "94105"
       },
       "preferences": ["sports", "technology", "travel"]
     }
    

    Characteristics:

    • Support for complex, nested data structures

    • Flexible schema evolution

    • Partial document updates possible

    • Rich query capabilities including filtering, sorting, and aggregation

    • Closest to relational databases in terms of query flexibility

Examples: MongoDB, CouchDB, Amazon DocumentDB, ElasticSearch

Use Cases: Content management, catalogs, user profiles, real-time analytics

  1. Key-Value Store

    The simplest NoSQL model, storing data as key-value pairs:

     user:123 → {"name": "John", "email": "john@example.com"}
     session:abc → {"user_id": 123, "expires": 1640995200}
     cache:page:home → "<html>...</html>"
    

    Characteristics:

    • Extremely simple access patterns (GET, PUT, DELETE by key)

    • Heavily partitioned and distributed

    • Limited query capabilities (typically only key-based lookups)

    • Optimised for high throughput and low latency

Examples: Redis, DynamoDB, Aerospike, Riak

Use Cases: Caching, session storage, shopping carts, real-time recommendations

  1. Column Oriented Databases

    Column-oriented databases store data by columns rather than rows, optimising for analytical workloads:

    Table: stock_ticks; Columns: symbol, price, name, exchange, timestamp

    Traditional Row-Oriented Storage:

     Row 1: [AAPL, 150.25, Apple Inc, NASDAQ, 2023-01-15]
     Row 2: [GOOGL, 2800.50, Alphabet Inc, NASDAQ, 2023-01-15]
     Row 3: [MSFT, 310.75, Microsoft Corp, NASDAQ, 2023-01-15]
    

    Column-Oriented Storage:

     symbol: [AAPL, GOOGL, MSFT, ...]
     price: [150.25, 2800.50, 310.75, ...]
     name: [Apple Inc, Alphabet Inc, Microsoft Corp, ...]
    

    The Analytics Advantage:

    Consider this analytical query on a stock trading table:

     SELECT AVG(price) FROM stock_ticks WHERE timestamp = '2023-01-15';
    
    • Row-oriented approach: Reads entire rows (all 100 columns), discards 98 unnecessary columns

    • Column-oriented approach: Reads only the price and timestamp columns, ignoring the other 98 columns entirely

This results in dramatically reduced I/O for analytical workloads, often achieving 10x-100x performance improvements for aggregation queries.

Examples: Amazon Redshift, Apache Cassandra, HBase, ClickHouse

Use Cases: Data warehousing, business intelligence, real-time analytics, time-series data

Do read the foundational paper on column-oriented databases:

C-Store: A Column Oriented DBMS

  1. Graph Databases

    Graph databases model data as nodes (entities) and edges (relationships):

     (User:John)-[FRIENDS_WITH]->(User:Alice)
     (User:John)-[PURCHASED]->(Product:Laptop)
     (Product:Laptop)-[BELONGS_TO]->(Category:Electronics)
    

    Characteristics:

    • Native support for complex relationship queries

    • Optimised for traversing connections and paths

    • ACID properties typically maintained

    • Query languages optimised for graph traversal (Cypher, Gremlin)

Compelling Use Case - Fraud Detection: Graph databases excel at detecting fraudulent patterns by analysing relationship networks:

  • Multiple accounts sharing the same device fingerprint

  • Rapid fund transfers through connected accounts

  • Social network analysis for suspicious behaviour patterns

Examples: Neo4j, Amazon Neptune, ArangoDB, TigerGraph, DGraph

Use Cases: Social networks, recommendation engines, fraud detection, network analysis

3. Why NoSQL Databases Scale Effectively

The scalability advantages of NoSQL databases stem from several architectural decisions:

  1. Absence of Complex Relationships: Without foreign keys and JOIN operations, data can be distributed more easily across nodes

  2. Denormalised Data Models: Data is often duplicated across documents/records, eliminating the need for cross-partition operations

  3. Shard-Friendly Design: Data models are specifically designed with partitioning in mind

Critical Clarification: SQL databases can also achieve horizontal scaling through sharding techniques. The distinction lies in design philosophy rather than technical impossibility.

4. Decision Framework: SQL vs NoSQL

Choose SQL When You Need:

ACID Guarantees: Strong consistency requirements for financial transactions, inventory management

  • Complex Relationships: Data with intricate foreign key relationships and complex JOIN operations

  • Fixed Schema: Well-defined data structures that benefit from schema enforcement

  • Mature Tooling: Established ecosystem of tools, ORMs, and developer expertise

Choose NoSQL When You Have:

Massive Scale Requirements: Need to handle millions of operations per second across distributed infrastructure

  • Flexible Schema Needs: Rapidly evolving data structures or semi-structured data

  • Denormalised Access Patterns: Can optimise data models for specific query patterns

  • Eventual Consistency Tolerance: Application can handle temporary data inconsistencies

5. The Modern Reality

Today's database landscape isn't about choosing sides in an SQL vs NoSQL battle. Many organisations employ polyglot persistence—using the right database for each specific use case. A typical modern application might use:

  • PostgreSQL for transactional data requiring ACID properties

  • Redis for caching and session storage

  • Elasticsearch for full-text search and logging

  • Neo4j for recommendation algorithms

The key is understanding the trade-offs and choosing the technology that best aligns with your specific scalability, consistency, and query requirements.

D. Why is graph databases preferred over relational databases for fraud detection?

Graph databases are often preferred over relational databases for fraud detection due to their ability to efficiently model, store, and query complex relationships and patterns in data, which are central to identifying fraudulent activities. Below are the key reasons why graph databases excel in this domain:

1. Natural Representation of Relationships

  • Graph Databases: Fraud detection relies heavily on analysing relationships between entities (e.g., users, transactions, accounts, or devices). Graph databases store data as nodes (entities) and edges (relationships), making it intuitive to model and traverse connections like "User A sent money to User B" or "Device X was used in multiple suspicious transactions."

  • Relational Databases: Relationships are stored in tables joined by foreign keys, requiring complex and often slow JOIN operations to reconstruct networks. For fraud detection, where relationships may span multiple degrees (e.g., friends of friends), relational databases struggle to efficiently query deep connections.

  • Advantage: Graph databases allow real-time traversal of relationships (e.g., finding patterns like "account shared across multiple users") without the overhead of multiple JOINs, which is critical for detecting fraud patterns like money laundering or account takeovers.

2. Efficient Pattern Matching

  • Graph Databases: They support fast querying of complex patterns using graph traversal algorithms or query languages like Cypher (Neo4j) or Gremlin. For example, detecting a fraud ring (a cycle of transactions between accounts) is straightforward with graph queries that look for cycles or clusters.

  • Relational Databases: Pattern matching requires constructing complex SQL queries with multiple JOINs, subqueries, or recursive CTEs (Common Table Expressions), which are computationally expensive and harder to maintain for intricate fraud patterns.

  • Advantage: Graph databases can quickly identify suspicious patterns, such as circular transactions or shared device usage, which are common in fraud detection.

3. Scalability for Connected Data

  • Graph Databases: Designed to handle highly connected data, graph databases scale well for traversing large networks. They use index-free adjacency, meaning relationships are stored directly with nodes, enabling constant-time traversal regardless of dataset size.

  • Relational Databases: Performance degrades as the dataset grows due to the cost of JOIN operations, especially when analysing deep relationships (e.g., second- or third-degree connections). Indexing helps, but it doesn’t fully address the complexity of fraud networks.

  • Advantage: Graph databases maintain performance for real-time fraud detection, even with large, interconnected datasets, such as tracking transactions across millions of accounts.

4. Flexibility for Dynamic Data

  • Graph Databases: They are schema-less or have flexible schemas, allowing easy addition of new node types (e.g., new fraud indicators like IP addresses) or edge types (e.g., new relationship types like "shared phone number") without altering the database structure.

  • Relational Databases: Schema changes (e.g., adding new tables or columns) are rigid and require migrations, which can be cumbersome when fraud detection models evolve to include new data points.

  • Advantage: Graph databases adapt quickly to new fraud detection requirements, such as incorporating emerging fraud patterns or new data sources (e.g., social media connections).

5. Real-Time Fraud Detection

  • Graph Databases: Their ability to traverse relationships quickly supports real-time analysis, critical for stopping fraud as it happens (e.g., flagging a suspicious transaction during processing). Queries like "find all accounts linked to a flagged IP within three degrees" execute efficiently.

  • Relational Databases: Real-time queries are slower due to the need for multiple JOINs or aggregations, often requiring precomputed views or materialised tables, which can delay fraud detection.

  • Advantage: Graph databases enable faster decision-making in time-sensitive fraud scenarios, such as blocking a transaction before it completes.

6. Uncovering Hidden Patterns

  • Graph Databases: They excel at uncovering hidden or non-obvious relationships, such as fraud rings, sybil attacks (multiple fake identities), or collusive networks, by leveraging graph algorithms like community detection, centrality measures, or shortest paths.

  • Relational Databases: Identifying such patterns requires complex SQL queries or exporting data to external tools for analysis, which is less efficient and harder to integrate into a live system.

  • Advantage: Graph databases provide built-in tools for advanced analytics, making it easier to detect sophisticated fraud schemes.

Example Use Case

In fraud detection, a graph database might represent:

  • Nodes: Users, accounts, transactions, devices, IP addresses.

  • Edges: "owns account," "performed transaction," "used device."

  • Fraud Query: Find all accounts sharing the same device or IP within two degrees of a known fraudulent account. A graph database can execute this query in milliseconds, while a relational database might require multiple JOINs and take significantly longer.

When Relational Databases Might Still Be Used

  • Structured Data with Simple Relationships: If fraud detection involves mostly tabular data with minimal relationships (e.g., checking individual transaction amounts against thresholds), relational databases may suffice.

  • Existing Infrastructure: Organisations with mature relational database systems may prefer to stick with SQL-based solutions, using techniques like materialised views or stored procedures, despite performance trade-offs.

  • Small-Scale Needs: For smaller datasets or simpler fraud rules, the overhead of setting up a graph database may not be justified.

Graph databases are preferred for fraud detection because they excel at modelling and querying complex relationships, detecting patterns, and operating in real-time on highly connected data. Their flexibility and performance for network analysis make them ideal for uncovering sophisticated fraud schemes, such as money laundering or identity fraud, compared to the JOIN-heavy, less flexible relational databases.

E. Designing Slack's Realtime Text Communication

Building a realtime messaging system like Slack requires careful consideration of persistence guarantees, scalability patterns, and user experience expectations. Unlike consumer messaging apps, enterprise communication platforms prioritise message durability and formal communication patterns over instant delivery.

1. System Requirements

Our messaging system must support:

  • Multi-user, multi-channel communication with both public channels and direct messages

  • Real-time message delivery with low latency

  • Historical message persistence with efficient scrolling and search

  • Enterprise-grade reliability ensuring no message loss

Similar Systems: This architecture pattern applies broadly to multiplayer games, real-time polls, creator tools, and any system requiring instant bidirectional communication.

2. Database Schema Design

Core Tables Structure

-- User management
CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255) UNIQUE,
    created_at TIMESTAMP
);

-- Channel management (handles both channels and DMs)
CREATE TABLE channels (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    org_id BIGINT,
    type ENUM('group', 'direct_message', 'public_channel', 'private_channel'),
    created_at TIMESTAMP
);

-- Membership tracking
CREATE TABLE memberships (
    user_id BIGINT,
    channel_id BIGINT,
    joined_at TIMESTAMP,
    last_read_message_id BIGINT, -- For read receipts and notifications
    PRIMARY KEY (user_id, channel_id)
);

-- Message storage
CREATE TABLE messages (
    id BIGINT PRIMARY KEY,
    from_user_id BIGINT,
    channel_id BIGINT,
    content TEXT,
    created_at TIMESTAMP,
    message_type ENUM('text', 'file', 'system'),
    INDEX idx_channel_created (channel_id, created_at)
);

Direct Message Implementation

Direct messages are implemented as special channels with exactly two members:

-- Creating a DM creates a channel
INSERT INTO channels (name, type) VALUES ('DM', 'direct_message');
INSERT INTO memberships (user_id, channel_id) VALUES (user1_id, channel_id), (user2_id, channel_id);

3. Database Scaling Strategy

Given the massive volume of messages in enterprise environments, horizontal scaling is essential:

Sharding by Channel ID

def get_message_shard(channel_id, num_shards):
    return hash(channel_id) % num_shards

def store_message(from_user, channel_id, content):
    shard = get_message_shard(channel_id, NUM_MESSAGE_SHARDS)
    message_db = MESSAGE_SHARDS[shard]

    return message_db.execute("""
        INSERT INTO messages (from_user_id, channel_id, content, created_at)
        VALUES (?, ?, ?, NOW())
    """, [from_user, channel_id, content])

Benefits of Channel-based Sharding:

  • All messages for a channel reside on the same shard

  • Historical message retrieval is efficient (single shard query)

  • Channel-specific operations don't require cross-shard coordination

Database Options:

  • MySQL: Strong consistency, ACID guarantees, familiar tooling

  • Cassandra: Excellent write performance, built-in sharding

  • MongoDB: Document flexibility, good horizontal scaling

4. Message Delivery Architecture

Enterprise vs Consumer Messaging: Different Persistence Guarantees

The choice of message delivery architecture depends entirely on your persistence requirements and user expectations:

1. Slack's Enterprise Approach: Synchronous Persistence

Architecture Flow:

User → HTTPS API Server → Database (Persist) → Message Queue → WebSocket Servers → Recipients

Implementation:

@app.route('/send_message', methods=['POST'])
def send_message():
    # 1. Validate and authenticate user
    user_id = authenticate_user(request.headers.get('Authorization'))

    # 2. Synchronously persist to database
    message_id = database.store_message(
        from_user=user_id,
        channel_id=request.json['channel_id'],
        content=request.json['content']
    )

    # 3. Only after successful persistence, broadcast
    if message_id:
        message_queue.publish('message_broadcast', {
            'message_id': message_id,
            'channel_id': request.json['channel_id'],
            'content': request.json['content'],
            'from_user': user_id
        })

        return {'status': 'success', 'message_id': message_id}
    else:
        return {'status': 'error'}, 500

Why This Approach for Enterprise:

  • Formal Communication: Enterprise users expect message durability

  • Compliance Requirements: Legal and regulatory needs for message retention

  • Lower Frequency: Less frequent messaging allows for synchronous persistence overhead

  • Refresh Reliability: Users can refresh and see all historical messages reliably

2. Consumer Apps Approach: Asynchronous Persistence

Facebook Messenger/WhatsApp Architecture:

User → WebSocket Server → Broadcast (immediate) + Message Queue → Database (async)

Implementation:

@websocket.on('send_message')
def handle_message(data):
    # 1. Immediately broadcast to recipients
    broadcast_to_channel_members(data['channel_id'], data)

    # 2. Asynchronously queue for persistence
    message_queue.publish('persist_message', data)

    # User sees message immediately, persistence happens in background

@message_queue.consumer('persist_message')
def persist_message_async(message_data):
    # Eventual persistence - may fail, but user experience isn't affected
    database.store_message(message_data)

Why This Approach for Consumers:

  • High Frequency: Users send messages very frequently

  • Real-time Priority: Instant delivery more important than guaranteed persistence

  • Mobile-First: Optimised for mobile app experience

  • Eventual Consistency: Users accept occasional message loss for speed

3. Zoom's Ephemeral Approach: No Persistence

Architecture:

User → WebSocket Server → Broadcast (immediate only)
  • Messages exist only during active sessions

  • No database storage whatsoever

  • Optimised for temporary, meeting-based communication

  • Lowest latency possible

5. WebSocket Implementation for Real-time Delivery

Connection Management

class WebSocketManager:
    def __init__(self):
        self.user_connections = {}  # user_id -> [websocket_connections]
        self.channel_subscriptions = {}  # channel_id -> [user_ids]

    def subscribe_to_channel(self, user_id, channel_id, websocket):
        # Add user to channel subscription
        if channel_id not in self.channel_subscriptions:
            self.channel_subscriptions[channel_id] = set()
        self.channel_subscriptions[channel_id].add(user_id)

        # Track user's websocket connection
        if user_id not in self.user_connections:
            self.user_connections[user_id] = []
        self.user_connections[user_id].append(websocket)

    def broadcast_to_channel(self, channel_id, message):
        if channel_id in self.channel_subscriptions:
            for user_id in self.channel_subscriptions[channel_id]:
                if user_id in self.user_connections:
                    for websocket in self.user_connections[user_id]:
                        websocket.send(json.dumps(message))

Message Broadcasting

@message_queue.consumer('message_broadcast')
def broadcast_message(message_data):
    # Get channel members
    members = database.get_channel_members(message_data['channel_id'])

    # Broadcast to all online members
    websocket_manager.broadcast_to_channel(
        message_data['channel_id'],
        {
            'type': 'new_message',
            'message': message_data,
            'timestamp': datetime.now().isoformat()
        }
    )

6. Key Architectural Decisions

  1. Persistence-First Design: Slack prioritises message durability over millisecond-level latency

  2. Channel-Based Sharding: Optimises for historical message retrieval patterns

  3. Hybrid Communication: HTTPS for sending (reliability), WebSockets for receiving (real-time)

  4. Separation of Concerns: API servers handle persistence, dedicated WebSocket servers handle real-time delivery

This architecture scales to millions of concurrent users while maintaining the reliability expectations of enterprise customers. The key insight is matching your persistence guarantees to your user base's expectations and communication patterns.

F. Scaling WebSockets

While REST-based short polling might work for traditional web applications, it creates a jarring user experience for real-time messaging. The constant delay between sending a message and seeing it appear creates friction that breaks the natural flow of conversation. For truly real-time communication like Slack, we need a persistent connection that can push messages instantly to users—enter WebSockets.

1. The WebSocket Foundation

Our approach centers on giving every user exactly one WebSocket connection to our backend infrastructure. This single connection handles all real-time communication needs, from chat messages to notifications to presence updates. This design choice stems from a fundamental browser limitation: most browsers restrict each domain to only 6 concurrent TCP connections. Since WebSockets maintain persistent TCP connections, we must multiplex all real-time features through this single precious connection.

2. Edge Server Architecture

To handle the persistent nature of WebSocket connections at scale, we deploy a fleet of edge servers. These servers act as the first point of contact for user WebSocket connections and serve as the intelligent routing layer for all real-time communication. When any service—whether it's our chat system, notification service, or presence tracker—needs to communicate with users in real-time, the information flows through these edge servers.

Each edge server maintains a local registry of connected users and their communication preferences. Libraries like Socket.IO help manage these connection pools, tracking which users are connected and maintaining the health of each WebSocket connection. When user A sends a message to user B, the edge server orchestrates a three-step process: first, it persists the message to Kafka for durability; then it locates user B in its local connection pool; finally, it delivers the message through B's WebSocket connection.

3. Handling Broadcast Messages

The architecture becomes more complex when dealing with channel messages that need to reach multiple recipients. Consider a channel with 50 members—the edge server needs to efficiently deliver the same message to all connected channel participants. However, real-time delivery isn't always guaranteed due to network issues or temporary disconnections. To handle message delivery failures, we maintain our channel scroll API as a fallback mechanism. When users open a channel, this REST endpoint ensures they receive any messages that might have been missed during real-time delivery.

4. Horizontal Scaling Challenges

A single edge server quickly becomes a bottleneck. If each server can handle 4 concurrent connections, what happens when user 5 tries to connect? The natural solution is horizontal scaling—deploying multiple edge servers and distributing users across them. User E might connect to WebSocket Server 2 while users A through D remain on WebSocket Server 1.

This distribution creates a new challenge: cross-server communication. When user A (on Server 1) sends a message to user E (on Server 2), how does the message traverse the server boundary? The naive approach of creating direct TCP connections between every pair of servers creates an unmanageable mesh topology that doesn't scale beyond a handful of servers.

5. Pub/Sub for Server Communication

The elegant solution lies in introducing a real-time pub/sub system, typically Redis. Instead of direct server-to-server connections, we create a publish/subscribe model where each Slack channel corresponds to a Redis pub/sub channel. Edge servers subscribe only to the channels that their connected users participate in, creating an efficient message distribution network.

Consider this scenario: User A belongs to channels C1, C2, and C3; User B belongs to C2, C4, and C5; User C belongs to C1 and C3; User D belongs to C4. When user E joins the system and participates in channel C3, Edge Server 2 automatically subscribes to the C3 pub/sub channel in Redis.

Now when user A sends a message to channel C3, the process flows seamlessly: the message gets persisted asynchronously to our message store for durability; Edge Server 1 immediately delivers the message to user C through their local WebSocket connection; simultaneously, it publishes the message to Redis on the C3 pub/sub channel; Edge Server 2, having subscribed to C3, receives the message and forwards it to user E. This architecture ensures that user E receives user A's message despite being connected to a different edge server.

6. Handling Transient Failures

Real-time systems must gracefully handle temporary failures. Network hiccups, server restarts, or client-side issues can cause messages to fail delivery through WebSocket connections. Our system addresses this through multiple resilience mechanisms.

We maintain correction jobs that periodically verify message delivery and fill any gaps. Additionally, failed messages are buffered in a separate Kafka topic. When WebSocket connections recover, we replay these buffered messages to ensure users eventually receive all communications, even if not in real-time.

7. Complete Architecture Overview

The final architecture brings together several specialised components. Our API servers continue handling non-real-time operations like channel scrolling, managing muted channels, and other traditional REST endpoints. The edge servers focus exclusively on maintaining WebSocket connections and routing real-time messages.

A connection balancer sits at the front of our edge server fleet, making intelligent routing decisions for new WebSocket connections. Rather than randomly distributing users, it attempts to colocate users who frequently communicate on the same edge servers. This optimisation reduces cross-server pub/sub traffic by keeping channel conversations local whenever possible.

The connection balancer also performs periodic rebalancing, strategically terminating connections to redistribute load across edge servers. While this creates temporary disruptions for affected users, it prevents the system from developing hotspots that could degrade performance for entire user cohorts.

This layered approach: combining persistent WebSocket connections, intelligent edge server routing, Redis pub/sub for cross-server communication, and robust failure recovery mechanisms, creates a real-time messaging system that maintains the instant responsiveness users expect while scaling to support millions of concurrent connections.

That's all for now folks. See you in the next blog!

0
Subscribe to my newsletter

Read articles from Ruban Sahoo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ruban Sahoo
Ruban Sahoo