Modern Database Architecture

Table of contents
- A. Pessimistic Locking on Relational DBs
- 1. Relational Databases
- 2. How indexes make reads faster?
- 3. Why indexes make writes slower?
- 4. Database Locking (Pessimistic Locking)
- 5. Shared Locks
- 6. Exclusive locks
- 7. Skip locked
- 8. No wait
- 9. Design: Airline Check-in system
- 10. Concurrency Control Mechanism
- 11. Key Database Mechanisms in Play
- 12. Performance Tuning: The Magic of SKIP LOCKED
- B. Designing a scalable SQL-backed Key-Value Store
- C. NoSQL Databases and their Trade-offs
- D. Why is graph databases preferred over relational databases for fraud detection?
- E. Designing Slack's Realtime Text Communication
- F. Scaling WebSockets

In this blog, we will explore the critical database and communication patterns that power today's high-performance applications. From the foundational concepts of pessimistic locking in relational databases to the architectural decisions behind Slack's real-time messaging system, we'll navigate through the complex trade-offs that define modern system design.
Whether you're looking to build a scalable key-value store on top of a traditional SQL backend, understand when to adopt NoSQL solutions versus scaling your relational databases, or implement websocket architectures that can support millions of concurrent connections, this comprehensive guide will provide both the theoretical framework and practical implementation details.
We'll also demystify graph databases, examining their specific use cases, inherent advantages, and situations where alternative solutions prove more effective. Join us as we dissect these interconnected technologies through real-world examples, performance benchmarks, and architectural patterns that can be applied to your next system design challenge.
A. Pessimistic Locking on Relational DBs
1. Relational Databases
Data is stored and represented in rows and columns.
Key highlight of relational database is
relations
.Relational databases are known for ACID properties.
Database indexes
Indexes make reads faster and writes slower.
2. How indexes make reads faster?
Indexes dramatically accelerate reads by creating efficient search structures that avoid scanning the entire table:
Organised Data Structures:
Indexes typically use B-trees, hash tables, or similar structures optimised for searching
These maintain data in a sorted or hashed order for quick lookups
Direct Access Path:
Without indexes: Database must scan every row (O(n) complexity)
With indexes: Database can jump directly to relevant data (often O(log n) or O(1))
Query Optimisation:
Covering indexes can satisfy queries without accessing the table at all
The query optimiser uses statistics about indexes to choose efficient execution plans
Real-world Example:
Finding "Smith" in a phone book with 100,000 names:
Without index: Check each name sequentially (up to 100,000 comparisons)
With index: Use alphabetical ordering to find it in ~17 comparisons (log₂(100,000))
Specialised Index Types:
Full-text indexes optimise text searches
Spatial indexes accelerate geographic queries
Composite indexes speed up multi-column conditions
The performance difference can be dramatic—turning queries that would take minutes into ones that complete in milliseconds.
3. Why indexes make writes slower?
When you write to a database with indexes, several additional operations must occur:
Index Updates: For each index on a table, the database must:
Calculate where in the index structure the new entry belongs
Insert the new entry into the index data structure (B-tree, hash table, etc.)
Potentially rebalance the index structure
Write Amplification: What would be a single write operation becomes multiple write operations:
One write to the main table
Additional writes to each index
Storage Overhead:
Each index requires additional disk I/O operations
For some storage engines, this means additional random disk seeks
Locking Considerations:
Indexes may require additional locks during updates
This can impact concurrency in high-throughput systems
It's essentially a classic speed trade-off: the organisation that makes reading faster (pre-computed structures sorted for quick lookups) requires maintenance work during writes to keep those structures updated and properly organised.
4. Database Locking (Pessimistic Locking)
Core idea
: You acquire the lock before proceeding.
Typical Flow:
ACQUIRE_LOCK()
READ UPDATE -> Critical Section
RELEASE_LOCK()
Two types of locking strategies:
shared lock
exclusive lock
Why do we need locks?
To protect the sanity of the data. Here sanity refers to consistency and integrity.
Protecting the data against concurrent updates.
Risk: Transactional deadlock. The transaction that detects the deadlock kills itself.
Explanation:
T1 → R1: T1 is waiting for R1, which is locked by T2.
T2 → R2: T2 is waiting for R2, which is locked by T1.
R1 → T2: R1 is held by T2.
R2 → T1: R2 is held by T1.
The cycle T1 → R1 → T2 → R2 → T1 indicates a deadlock.
Let’s say, transaction T1 wants to acquire a lock on row R1. Row R1 is locked by transaction T2. Transaction T2 wants to acquire a lock on row R2. Now, if transaction T1 wants to acquire a lock on row R2: before the database assigns a lock on a row to any transaction, it runs a deadlock detection algorithm which checks whether the database would go into a deadlock if any transaction acquires a lock on any row. If a deadlock is detected, the database kills the transaction so that the transaction won’t be able to acquire the lock thus preventing deadlock.
T1 -----> R1 (held by T2)
^ |
| v
R2 <------ T2
(held by T1)
5. Shared Locks
reserved for read by the current transaction
other transactions can read the locked rows
other transactions cannot modify the locked rows
if the current transaction wants to modify then the locks will be elevated to an exclusive lock
Implementation:
SELECT * FROM table_name FOR SHARE;
6. Exclusive locks
reserved for write by the current transaction
other transactions cannot read
Implementation:
SELECT * FROM table_name FOR UPDATE;
7. Skip locked
Removes the locked rows from the result set.
Implementation:
SELECT * FROM table_name FOR UPDATE SKIP LOCKED;
8. No wait
Locking read does not wait for the lock to be acquired. It fails immediately if the row is locked. If the row is locked kill the transaction.
SELECT * FROM table_name FOR UPDATE NOWAIT;
ERROR 3572: Do not wait for lock.
9. Design: Airline Check-in system
To understand the above concepts we will be designing an airline check-in system, here are the requirements:
Multiple airlines
Every airline has multiple flights
Each flight has 120 seats
Each flight has multiple trips
User books a seat in one trip of a flight
Handle multiple people trying to pick seats on the flight
airlines
id | name |
1 | AIRINDIA |
2 | INDIGO |
3 | GOAIR |
flights
id | airline_id | name |
1 | 1 | AIRINDIA 101 |
2 | 1 | AIRINDIA 279 |
3 | 2 | INDIGO 6E101 |
trips
id | flight_id | fly_time |
1 | 1 | 08/05/2025 10:00 |
2 | 1 | 09/05/2025 10:00 |
3 | 2 | 08/05/2025 09:00 |
4 | 2 | 09/05/2025 12:00 |
5 | 3 | 08/05/2025 07:00 |
users
id | name |
1 | Ruban |
2 | Shikhar |
3 | Shrey |
4 | Ram |
5 | Krishna |
seats
id | name | trip_id | user_id |
1 | 1A | 1 | 1 |
2 | 1B | 1 | 2 |
3 | 1C | 1 | 3 |
4 | 1D | 1 | 4 |
5 | 1E | 1 | 5 |
Locking demonstration:
How will we handle when all 120 people flying in one flight in the same trip check in at the same time?
Fixed inventory + Contention → Locking
Similar systems: CoWin, IRCTC, BookMyShow, Flash Sale
-- Selecting the first available seat and exclusively locking it
SELECT id, name, trip_id, user_id FROM seats
WHERE trip_id = 1 and user_id IS null
ORDER BY id LIMIT 1
FOR UPDATE;
Let’s say we have 120 threads that are concurrently scheduled to run the above transaction, each thread wants to book a seat for a user. After execution gets completed, we see that all the seats got booked.
10. Concurrency Control Mechanism
The system successfully allocates exactly 120 seats to 120 users through the following process:
Lock Acquisition Phase
All 120 transactions concurrently execute the
SELECT ... FOR UPDATE
statementOne transaction (T₁) successfully acquires an exclusive lock on the first available row
The remaining 119 transactions are blocked, waiting for this lock to be released
Resource Modification Phase
Transaction T₁ updates the row, setting
user_id
to its passenger identifierT₁ commits the transaction, which releases the lock on the modified row
Lock Release and Notification Phase
The database notifies waiting transactions that the lock has been released
The waiting transactions are awakened and compete for the next available row
Sequential Resolution
Another transaction (T₂) acquires a lock on the next available row
Steps 2-3 repeat with T₂
This process continues sequentially until all 120 transactions have completed
Final State
All 120 seats have been allocated to 120 different users
No overbooking occurs despite the concurrent execution attempts
All 120 transactions complete successfully
11. Key Database Mechanisms in Play
Row-Level Locking: The
FOR UPDATE
clause ensures exclusive access to the selected rowTransaction Isolation: Each transaction operates independently within its isolation level
Lock Management: The database maintains a queue of waiting transactions
Transaction Scheduling: After a lock release, the database scheduler determines which waiting transaction proceeds next
This approach ensures that even with massive concurrency, each seat is allocated exactly once, maintaining the integrity of the booking system.
12. Performance Tuning: The Magic of SKIP LOCKED
When designing high-concurrency systems like our airline check-in platform, performance bottlenecks often emerge from lock contention. One particularly powerful but underutilised feature for addressing this issue is SQL's SKIP LOCKED
clause. Let's examine how a simple modification to our seat allocation query can yield dramatic performance improvements.
The Bottleneck Problem
With our original query, when 120 concurrent transactions execute:
The first transaction acquires a lock on the first available row
The remaining 119 transactions all wait in line for this specific row
After the first transaction completes, the second transaction processes the same row, discovers it's now taken, and must re-evaluate the query
This pattern continues, creating a sequential processing situation despite concurrent attempts
This results in what we call "lock convoy" behaviour - threads lining up for the same resource, dramatically reducing throughput.
The SKIP LOCKED Solution
By modifying our query to include SKIP LOCKED
:
SELECT id, name, trip_id, user_id FROM seats
WHERE trip_id = 1 AND user_id IS NULL
ORDER BY id LIMIT 1
FOR UPDATE SKIP LOCKED;
We completely transform the concurrent behaviour:
Transaction A attempts to lock row 1
Simultaneously, Transaction B skips row 1 (since it's locked) and locks row 2
Transaction C skips rows 1 and 2 and locks row 3
And so on...
Instead of a queue of transactions waiting for the same row, we have transactions working in parallel on different rows.
Performance Improvement: 90% Reduction in Processing Time
Our benchmarks show this simple change reduces total processing time by approximately 90% when handling 120 concurrent seat bookings:
Approach | Time to Process 120 Concurrent Bookings |
FOR UPDATE | ~1.79 s |
FOR UPDATE SKIP LOCKED | ~147.47 ms |
This dramatic improvement occurs because:
Parallel Processing: Multiple transactions can proceed simultaneously instead of waiting in a queue
Reduced Lock Wait Time: Transactions don't waste time waiting for locks on rows they'll never use
Optimised Resource Utilisation: Database resources are used more efficiently with distributed locks
Implementation Considerations
When implementing SKIP LOCKED
, keep in mind:
It's available in PostgreSQL 9.5+, MySQL 8.0+, and Oracle databases
It slightly changes the semantics of your query (it will skip locked rows rather than wait for them)
It may not be appropriate for all use cases (e.g., when processing must occur in a specific order)
For high-concurrency systems like our airline check-in where order of seat allocation isn't critical, SKIP LOCKED
represents one of the most impactful performance optimisations available. This small change transforms a sequential bottleneck into a parallel processing opportunity, delivering near-linear scalability as concurrency increases.
Remember: Sometimes the most significant performance gains come not from hardware upgrades or complex code refactoring, but from understanding and properly utilising the database features designed specifically for your concurrency pattern.
B. Designing a scalable SQL-backed Key-Value Store
Building a scalable key-value store on top of a relational database combines the familiar ACID guarantees of SQL with the simplicity of NoSQL interfaces. This hybrid approach allows us to leverage existing database infrastructure while providing the flexibility and performance characteristics that modern applications demand. Let's walk through designing a single-node KV store that can serve as the foundation for distributed systems like Apache Ignite.
1. System Requirements
Our KV store must support four fundamental operations with infinite scalability potential:
GET: Retrieve value by key
PUT: Store/update key-value pairs
DELETE: Remove key-value pairs
TTL: Time-based expiration of keys
2. Storage Layer Design
Initial Schema Structure
We start with MySQL as our storage backend, designed for easy horizontal scaling as demand grows:
CREATE TABLE store (
key VARCHAR(255) PRIMARY KEY,
value TEXT,
ttl BIGINT, -- Absolute expiration time in epochs
is_deleted BOOLEAN DEFAULT FALSE
);
Storage Optimisation: Eliminating Redundant Columns
The is_deleted
column introduces unnecessary storage overhead. Instead, we can use a special TTL value to indicate deletion:
CREATE TABLE store (
key VARCHAR(255) PRIMARY KEY,
value TEXT,
ttl BIGINT -- Absolute expiration time in epochs OR -1 for soft deletion
);
This optimisation:
Reduces storage footprint by 25%
Simplifies queries by using a single condition check
Maintains the same functional behaviour
Hard deletion is handled through batch cleanup processes that run periodically, minimising I/O operations and database rebalancing overhead.
3. Core Operations Implementation
PUT Operation: Leveraging Database Upserts
Rather than implementing separate INSERT and UPDATE logic, we use database-native upsert operations:
MySQL Implementation:
REPLACE INTO store VALUES (?, ?, ?);
PostgreSQL Implementation:
INSERT INTO store VALUES (?, ?, ?)
ON CONFLICT (key)
DO UPDATE SET value = EXCLUDED.value, ttl = EXCLUDED.ttl;
This approach eliminates race conditions and reduces the complexity of handling existing vs. new keys.
4. Handling Concurrent Updates
For scenarios requiring strong consistency guarantees during updates, we implement pessimistic locking:
-- Acquire lock with immediate failure if unavailable
SELECT * FROM store WHERE key = ? FOR UPDATE NOWAIT;
UPDATE store SET value = ?, ttl = ? WHERE key = ?;
The choice between NOWAIT
and SKIP LOCKED
depends on your consistency requirements:
NOWAIT: Fails fast if lock unavailable (strong consistency)
SKIP LOCKED: Continues with next available resource (eventual consistency)
5. TTL Implementation: Three Strategic Approaches
Approach 1: Batch Deletion with CRON Jobs (Disk-based Databases)
Ideal for traditional disk-based storage systems:
-- Periodic cleanup job
DELETE FROM store WHERE ttl <= UNIX_TIMESTAMP() AND ttl != -1;
Key optimisation: All GET operations must filter expired keys:
SELECT value FROM store WHERE key = ? AND (ttl > UNIX_TIMESTAMP() OR ttl = -1);
This server-side filtering saves:
Network I/O bandwidth
Client-side processing overhead
Unnecessary data transfer for large values
Approach 2: Lazy Deletion (In-memory Databases)
Perfect for memory-constrained environments:
Delete expired keys only when accessed
Immediate memory reclamation (similar to
free()
in C)Backup CRON job handles never-accessed keys
Approach 3: Random Sampling & Deletion (Redis style Optimisation)
Based on statistical sampling principles:
def cleanup_expired_keys():
while True:
sample = randomly_select_keys_with_ttl(20) # Sample size from Central Limit Theorem
expired_keys = [k for k in sample if k.ttl <= now()]
delete_keys(expired_keys)
if len(expired_keys) / len(sample) <= 0.25:
break # Less than 25% expired indicates clean population
This approach leverages the Central Limit Theorem: if our sample has <25% expired keys, the overall population likely has <25% expired keys.
6. DELETE and GET Operations
Soft Delete Implementation
UPDATE store SET ttl = -1 WHERE key = ? AND ttl > UNIX_TIMESTAMP();
Benefits:
Reduces immediate disk I/O load
Maintains referential integrity during high-traffic periods
Allows for potential key recovery before hard deletion
Batch Cleanup Process
DELETE FROM store WHERE ttl <= UNIX_TIMESTAMP(); -- Handles both expired and soft-deleted keys
Optimised GET Operation
SELECT value FROM store WHERE key = ? AND ttl > UNIX_TIMESTAMP();
This single query handles both key existence and expiration checks efficiently.
7. Performance Characteristics
This design achieves high performance through several key optimisations:
Reduced Network Overhead: Server-side expiration filtering
Optimised Storage: Elimination of redundant columns
Concurrent Safety: Strategic use of database locking mechanisms
Efficient Cleanup: Multiple TTL strategies based on storage medium
Native Database Features: Leveraging upserts and atomic operations
The foundation we've built here scales vertically within a single node and provides the architectural patterns necessary for horizontal scaling across distributed systems.
8. High level architecture
As our single-node KV store reaches capacity limits, we need to scale horizontally. The scaling strategy follows a deliberate progression based on traffic patterns and load characteristics. Now, we'll explore how these principles extend to multi-node architectures and the additional complexities of distributed consensus and data partitioning.
Application Layer Scaling
The first scaling bottleneck typically occurs at the application layer. Since our KV store operations are stateless, we can add multiple API servers behind a load balancer:
Load Balancer → [API Server 1, API Server 2, API Server 3, ...] → MySQL
Each API server handles the same operations (GET, PUT, DELETE, TTL) independently, with all servers connecting to the same MySQL instance.
Read Scaling with Replicas
When MySQL becomes the bottleneck and we observe a high read-to-write ratio, we introduce read replicas:
Conditions for Adding Read Replicas:
Read-to-write ratio is approximately 99:1
Application can tolerate eventual consistency for read operations
Master database CPU/memory is saturated primarily by read queries
Architecture with Read Replicas:
Load Balancer → [API Servers] → Master (Writes)
↘ Replica 1 (Reads)
↘ Replica 2 (Reads)
↘ Replica 3 (Reads)
Implementation Strategy:
-- Route writes to master
WRITE_DB_CONNECTION.execute("REPLACE INTO store VALUES (?, ?, ?)")
-- Route reads to replicas (with round-robin or least-connections)
READ_DB_CONNECTION.execute("SELECT value FROM store WHERE key = ? AND ttl > UNIX_TIMESTAMP()")
Write Scaling Through Vertical and Horizontal Partitioning
When the master database becomes write-bound despite read replica scaling:
Phase 1: Vertical Scaling
Increase CPU cores and memory on the master instance
Optimize disk I/O with faster storage (NVMe SSDs)
Fine-tune database parameters for write-heavy workloads
Phase 2: Horizontal Partitioning (Sharding)
When vertical scaling reaches economic or technical limits, we implement sharding:
Sharding Strategy: Each master node owns an exclusive fragment of the keyspace, ensuring no data overlap between shards.
def get_shard(key, num_shards):
return hash(key) % num_shards
def route_request(operation, key, value=None, ttl=None):
shard_id = get_shard(key, NUM_SHARDS)
shard_master = SHARD_MASTERS[shard_id]
if operation == "GET":
return shard_master.execute("SELECT value FROM store WHERE key = ? AND ttl > UNIX_TIMESTAMP()", [key])
elif operation == "PUT":
return shard_master.execute("REPLACE INTO store VALUES (?, ?, ?)", [key, value, ttl])
elif operation == "DELETE":
return shard_master.execute("UPDATE store SET ttl = -1 WHERE key = ?", [key])
Complete Sharded Architecture:
Load Balancer
|
[API Servers with Consistent Hashing]
|
┌───────────────────┼───────────────────┐
| | |
Shard 1 Shard 2 Shard 3
Master + Replicas Master + Replicas Master + Replicas
Trade-offs and Considerations
Read Replicas:
Pros: Simple to implement, handles read-heavy workloads effectively
Cons: Eventual consistency, replica lag during high write periods
Sharding:
Pros: True horizontal scalability for both reads and writes
Cons: Increased complexity, cross-shard operations become expensive, rebalancing challenges
This distributed architecture maintains the core principles of our single-node design while addressing the scale requirements of modern applications. Each scaling phase is triggered by specific performance characteristics, ensuring we only add complexity when necessary.
C. NoSQL Databases and their Trade-offs
1. NoSQL
While SQL databases have dominated the data storage landscape for decades, the rise of web-scale applications has brought NoSQL databases into prominence. Understanding when and why to choose NoSQL over traditional relational databases requires examining the fundamental trade-offs each approach offers.
Understanding NoSQL: Beyond Relational Constraints
NoSQL databases store data in non-relational structures, fundamentally changing how we think about data organisation and access patterns. The key distinguishing characteristic isn't the absence of SQL (many NoSQL databases support SQL-like query languages), but rather the departure from rigid relational schemas and ACID guarantees.
The Central Trade-off: Most NoSQL databases achieve massive scalability and high availability by compromising on strong consistency. They typically operate under eventual consistency models, where data changes propagate across distributed nodes over time rather than immediately.
Important Note: The ability to scale isn't exclusive to NoSQL. Any database that can be effectively sharded can achieve horizontal scalability, including traditional SQL databases.
2. Types of NoSQL databases:
Document databases
Document databases store data as semi-structured documents, typically in JSON format:
{ "_id": "user123", "name": "John Doe", "email": "john@example.com", "address": { "street": "123 Main St", "city": "San Francisco", "zipcode": "94105" }, "preferences": ["sports", "technology", "travel"] }
Characteristics:
Support for complex, nested data structures
Flexible schema evolution
Partial document updates possible
Rich query capabilities including filtering, sorting, and aggregation
Closest to relational databases in terms of query flexibility
Examples: MongoDB, CouchDB, Amazon DocumentDB, ElasticSearch
Use Cases: Content management, catalogs, user profiles, real-time analytics
Key-Value Store
The simplest NoSQL model, storing data as key-value pairs:
user:123 → {"name": "John", "email": "john@example.com"} session:abc → {"user_id": 123, "expires": 1640995200} cache:page:home → "<html>...</html>"
Characteristics:
Extremely simple access patterns (GET, PUT, DELETE by key)
Heavily partitioned and distributed
Limited query capabilities (typically only key-based lookups)
Optimised for high throughput and low latency
Examples: Redis, DynamoDB, Aerospike, Riak
Use Cases: Caching, session storage, shopping carts, real-time recommendations
Column Oriented Databases
Column-oriented databases store data by columns rather than rows, optimising for analytical workloads:
Table: stock_ticks; Columns: symbol, price, name, exchange, timestamp
Traditional Row-Oriented Storage:
Row 1: [AAPL, 150.25, Apple Inc, NASDAQ, 2023-01-15] Row 2: [GOOGL, 2800.50, Alphabet Inc, NASDAQ, 2023-01-15] Row 3: [MSFT, 310.75, Microsoft Corp, NASDAQ, 2023-01-15]
Column-Oriented Storage:
symbol: [AAPL, GOOGL, MSFT, ...] price: [150.25, 2800.50, 310.75, ...] name: [Apple Inc, Alphabet Inc, Microsoft Corp, ...]
The Analytics Advantage:
Consider this analytical query on a stock trading table:
SELECT AVG(price) FROM stock_ticks WHERE timestamp = '2023-01-15';
Row-oriented approach: Reads entire rows (all 100 columns), discards 98 unnecessary columns
Column-oriented approach: Reads only the
price
andtimestamp
columns, ignoring the other 98 columns entirely
This results in dramatically reduced I/O for analytical workloads, often achieving 10x-100x performance improvements for aggregation queries.
Examples: Amazon Redshift, Apache Cassandra, HBase, ClickHouse
Use Cases: Data warehousing, business intelligence, real-time analytics, time-series data
Do read the foundational paper on column-oriented databases:
C-Store: A Column Oriented DBMS
Graph Databases
Graph databases model data as nodes (entities) and edges (relationships):
(User:John)-[FRIENDS_WITH]->(User:Alice) (User:John)-[PURCHASED]->(Product:Laptop) (Product:Laptop)-[BELONGS_TO]->(Category:Electronics)
Characteristics:
Native support for complex relationship queries
Optimised for traversing connections and paths
ACID properties typically maintained
Query languages optimised for graph traversal (Cypher, Gremlin)
Compelling Use Case - Fraud Detection: Graph databases excel at detecting fraudulent patterns by analysing relationship networks:
Multiple accounts sharing the same device fingerprint
Rapid fund transfers through connected accounts
Social network analysis for suspicious behaviour patterns
Examples: Neo4j, Amazon Neptune, ArangoDB, TigerGraph, DGraph
Use Cases: Social networks, recommendation engines, fraud detection, network analysis
3. Why NoSQL Databases Scale Effectively
The scalability advantages of NoSQL databases stem from several architectural decisions:
Absence of Complex Relationships: Without foreign keys and JOIN operations, data can be distributed more easily across nodes
Denormalised Data Models: Data is often duplicated across documents/records, eliminating the need for cross-partition operations
Shard-Friendly Design: Data models are specifically designed with partitioning in mind
Critical Clarification: SQL databases can also achieve horizontal scaling through sharding techniques. The distinction lies in design philosophy rather than technical impossibility.
4. Decision Framework: SQL vs NoSQL
Choose SQL When You Need:
ACID Guarantees: Strong consistency requirements for financial transactions, inventory management
Complex Relationships: Data with intricate foreign key relationships and complex JOIN operations
Fixed Schema: Well-defined data structures that benefit from schema enforcement
Mature Tooling: Established ecosystem of tools, ORMs, and developer expertise
Choose NoSQL When You Have:
Massive Scale Requirements: Need to handle millions of operations per second across distributed infrastructure
Flexible Schema Needs: Rapidly evolving data structures or semi-structured data
Denormalised Access Patterns: Can optimise data models for specific query patterns
Eventual Consistency Tolerance: Application can handle temporary data inconsistencies
5. The Modern Reality
Today's database landscape isn't about choosing sides in an SQL vs NoSQL battle. Many organisations employ polyglot persistence—using the right database for each specific use case. A typical modern application might use:
PostgreSQL for transactional data requiring ACID properties
Redis for caching and session storage
Elasticsearch for full-text search and logging
Neo4j for recommendation algorithms
The key is understanding the trade-offs and choosing the technology that best aligns with your specific scalability, consistency, and query requirements.
D. Why is graph databases preferred over relational databases for fraud detection?
Graph databases are often preferred over relational databases for fraud detection due to their ability to efficiently model, store, and query complex relationships and patterns in data, which are central to identifying fraudulent activities. Below are the key reasons why graph databases excel in this domain:
1. Natural Representation of Relationships
Graph Databases: Fraud detection relies heavily on analysing relationships between entities (e.g., users, transactions, accounts, or devices). Graph databases store data as nodes (entities) and edges (relationships), making it intuitive to model and traverse connections like "User A sent money to User B" or "Device X was used in multiple suspicious transactions."
Relational Databases: Relationships are stored in tables joined by foreign keys, requiring complex and often slow JOIN operations to reconstruct networks. For fraud detection, where relationships may span multiple degrees (e.g., friends of friends), relational databases struggle to efficiently query deep connections.
Advantage: Graph databases allow real-time traversal of relationships (e.g., finding patterns like "account shared across multiple users") without the overhead of multiple JOINs, which is critical for detecting fraud patterns like money laundering or account takeovers.
2. Efficient Pattern Matching
Graph Databases: They support fast querying of complex patterns using graph traversal algorithms or query languages like Cypher (Neo4j) or Gremlin. For example, detecting a fraud ring (a cycle of transactions between accounts) is straightforward with graph queries that look for cycles or clusters.
Relational Databases: Pattern matching requires constructing complex SQL queries with multiple JOINs, subqueries, or recursive CTEs (Common Table Expressions), which are computationally expensive and harder to maintain for intricate fraud patterns.
Advantage: Graph databases can quickly identify suspicious patterns, such as circular transactions or shared device usage, which are common in fraud detection.
3. Scalability for Connected Data
Graph Databases: Designed to handle highly connected data, graph databases scale well for traversing large networks. They use index-free adjacency, meaning relationships are stored directly with nodes, enabling constant-time traversal regardless of dataset size.
Relational Databases: Performance degrades as the dataset grows due to the cost of JOIN operations, especially when analysing deep relationships (e.g., second- or third-degree connections). Indexing helps, but it doesn’t fully address the complexity of fraud networks.
Advantage: Graph databases maintain performance for real-time fraud detection, even with large, interconnected datasets, such as tracking transactions across millions of accounts.
4. Flexibility for Dynamic Data
Graph Databases: They are schema-less or have flexible schemas, allowing easy addition of new node types (e.g., new fraud indicators like IP addresses) or edge types (e.g., new relationship types like "shared phone number") without altering the database structure.
Relational Databases: Schema changes (e.g., adding new tables or columns) are rigid and require migrations, which can be cumbersome when fraud detection models evolve to include new data points.
Advantage: Graph databases adapt quickly to new fraud detection requirements, such as incorporating emerging fraud patterns or new data sources (e.g., social media connections).
5. Real-Time Fraud Detection
Graph Databases: Their ability to traverse relationships quickly supports real-time analysis, critical for stopping fraud as it happens (e.g., flagging a suspicious transaction during processing). Queries like "find all accounts linked to a flagged IP within three degrees" execute efficiently.
Relational Databases: Real-time queries are slower due to the need for multiple JOINs or aggregations, often requiring precomputed views or materialised tables, which can delay fraud detection.
Advantage: Graph databases enable faster decision-making in time-sensitive fraud scenarios, such as blocking a transaction before it completes.
6. Uncovering Hidden Patterns
Graph Databases: They excel at uncovering hidden or non-obvious relationships, such as fraud rings, sybil attacks (multiple fake identities), or collusive networks, by leveraging graph algorithms like community detection, centrality measures, or shortest paths.
Relational Databases: Identifying such patterns requires complex SQL queries or exporting data to external tools for analysis, which is less efficient and harder to integrate into a live system.
Advantage: Graph databases provide built-in tools for advanced analytics, making it easier to detect sophisticated fraud schemes.
Example Use Case
In fraud detection, a graph database might represent:
Nodes: Users, accounts, transactions, devices, IP addresses.
Edges: "owns account," "performed transaction," "used device."
Fraud Query: Find all accounts sharing the same device or IP within two degrees of a known fraudulent account. A graph database can execute this query in milliseconds, while a relational database might require multiple JOINs and take significantly longer.
When Relational Databases Might Still Be Used
Structured Data with Simple Relationships: If fraud detection involves mostly tabular data with minimal relationships (e.g., checking individual transaction amounts against thresholds), relational databases may suffice.
Existing Infrastructure: Organisations with mature relational database systems may prefer to stick with SQL-based solutions, using techniques like materialised views or stored procedures, despite performance trade-offs.
Small-Scale Needs: For smaller datasets or simpler fraud rules, the overhead of setting up a graph database may not be justified.
Graph databases are preferred for fraud detection because they excel at modelling and querying complex relationships, detecting patterns, and operating in real-time on highly connected data. Their flexibility and performance for network analysis make them ideal for uncovering sophisticated fraud schemes, such as money laundering or identity fraud, compared to the JOIN-heavy, less flexible relational databases.
E. Designing Slack's Realtime Text Communication
Building a realtime messaging system like Slack requires careful consideration of persistence guarantees, scalability patterns, and user experience expectations. Unlike consumer messaging apps, enterprise communication platforms prioritise message durability and formal communication patterns over instant delivery.
1. System Requirements
Our messaging system must support:
Multi-user, multi-channel communication with both public channels and direct messages
Real-time message delivery with low latency
Historical message persistence with efficient scrolling and search
Enterprise-grade reliability ensuring no message loss
Similar Systems: This architecture pattern applies broadly to multiplayer games, real-time polls, creator tools, and any system requiring instant bidirectional communication.
2. Database Schema Design
Core Tables Structure
-- User management
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255) UNIQUE,
created_at TIMESTAMP
);
-- Channel management (handles both channels and DMs)
CREATE TABLE channels (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
org_id BIGINT,
type ENUM('group', 'direct_message', 'public_channel', 'private_channel'),
created_at TIMESTAMP
);
-- Membership tracking
CREATE TABLE memberships (
user_id BIGINT,
channel_id BIGINT,
joined_at TIMESTAMP,
last_read_message_id BIGINT, -- For read receipts and notifications
PRIMARY KEY (user_id, channel_id)
);
-- Message storage
CREATE TABLE messages (
id BIGINT PRIMARY KEY,
from_user_id BIGINT,
channel_id BIGINT,
content TEXT,
created_at TIMESTAMP,
message_type ENUM('text', 'file', 'system'),
INDEX idx_channel_created (channel_id, created_at)
);
Direct Message Implementation
Direct messages are implemented as special channels with exactly two members:
-- Creating a DM creates a channel
INSERT INTO channels (name, type) VALUES ('DM', 'direct_message');
INSERT INTO memberships (user_id, channel_id) VALUES (user1_id, channel_id), (user2_id, channel_id);
3. Database Scaling Strategy
Given the massive volume of messages in enterprise environments, horizontal scaling is essential:
Sharding by Channel ID
def get_message_shard(channel_id, num_shards):
return hash(channel_id) % num_shards
def store_message(from_user, channel_id, content):
shard = get_message_shard(channel_id, NUM_MESSAGE_SHARDS)
message_db = MESSAGE_SHARDS[shard]
return message_db.execute("""
INSERT INTO messages (from_user_id, channel_id, content, created_at)
VALUES (?, ?, ?, NOW())
""", [from_user, channel_id, content])
Benefits of Channel-based Sharding:
All messages for a channel reside on the same shard
Historical message retrieval is efficient (single shard query)
Channel-specific operations don't require cross-shard coordination
Database Options:
MySQL: Strong consistency, ACID guarantees, familiar tooling
Cassandra: Excellent write performance, built-in sharding
MongoDB: Document flexibility, good horizontal scaling
4. Message Delivery Architecture
Enterprise vs Consumer Messaging: Different Persistence Guarantees
The choice of message delivery architecture depends entirely on your persistence requirements and user expectations:
1. Slack's Enterprise Approach: Synchronous Persistence
Architecture Flow:
User → HTTPS API Server → Database (Persist) → Message Queue → WebSocket Servers → Recipients
Implementation:
@app.route('/send_message', methods=['POST'])
def send_message():
# 1. Validate and authenticate user
user_id = authenticate_user(request.headers.get('Authorization'))
# 2. Synchronously persist to database
message_id = database.store_message(
from_user=user_id,
channel_id=request.json['channel_id'],
content=request.json['content']
)
# 3. Only after successful persistence, broadcast
if message_id:
message_queue.publish('message_broadcast', {
'message_id': message_id,
'channel_id': request.json['channel_id'],
'content': request.json['content'],
'from_user': user_id
})
return {'status': 'success', 'message_id': message_id}
else:
return {'status': 'error'}, 500
Why This Approach for Enterprise:
Formal Communication: Enterprise users expect message durability
Compliance Requirements: Legal and regulatory needs for message retention
Lower Frequency: Less frequent messaging allows for synchronous persistence overhead
Refresh Reliability: Users can refresh and see all historical messages reliably
2. Consumer Apps Approach: Asynchronous Persistence
Facebook Messenger/WhatsApp Architecture:
User → WebSocket Server → Broadcast (immediate) + Message Queue → Database (async)
Implementation:
@websocket.on('send_message')
def handle_message(data):
# 1. Immediately broadcast to recipients
broadcast_to_channel_members(data['channel_id'], data)
# 2. Asynchronously queue for persistence
message_queue.publish('persist_message', data)
# User sees message immediately, persistence happens in background
@message_queue.consumer('persist_message')
def persist_message_async(message_data):
# Eventual persistence - may fail, but user experience isn't affected
database.store_message(message_data)
Why This Approach for Consumers:
High Frequency: Users send messages very frequently
Real-time Priority: Instant delivery more important than guaranteed persistence
Mobile-First: Optimised for mobile app experience
Eventual Consistency: Users accept occasional message loss for speed
3. Zoom's Ephemeral Approach: No Persistence
Architecture:
User → WebSocket Server → Broadcast (immediate only)
Messages exist only during active sessions
No database storage whatsoever
Optimised for temporary, meeting-based communication
Lowest latency possible
5. WebSocket Implementation for Real-time Delivery
Connection Management
class WebSocketManager:
def __init__(self):
self.user_connections = {} # user_id -> [websocket_connections]
self.channel_subscriptions = {} # channel_id -> [user_ids]
def subscribe_to_channel(self, user_id, channel_id, websocket):
# Add user to channel subscription
if channel_id not in self.channel_subscriptions:
self.channel_subscriptions[channel_id] = set()
self.channel_subscriptions[channel_id].add(user_id)
# Track user's websocket connection
if user_id not in self.user_connections:
self.user_connections[user_id] = []
self.user_connections[user_id].append(websocket)
def broadcast_to_channel(self, channel_id, message):
if channel_id in self.channel_subscriptions:
for user_id in self.channel_subscriptions[channel_id]:
if user_id in self.user_connections:
for websocket in self.user_connections[user_id]:
websocket.send(json.dumps(message))
Message Broadcasting
@message_queue.consumer('message_broadcast')
def broadcast_message(message_data):
# Get channel members
members = database.get_channel_members(message_data['channel_id'])
# Broadcast to all online members
websocket_manager.broadcast_to_channel(
message_data['channel_id'],
{
'type': 'new_message',
'message': message_data,
'timestamp': datetime.now().isoformat()
}
)
6. Key Architectural Decisions
Persistence-First Design: Slack prioritises message durability over millisecond-level latency
Channel-Based Sharding: Optimises for historical message retrieval patterns
Hybrid Communication: HTTPS for sending (reliability), WebSockets for receiving (real-time)
Separation of Concerns: API servers handle persistence, dedicated WebSocket servers handle real-time delivery
This architecture scales to millions of concurrent users while maintaining the reliability expectations of enterprise customers. The key insight is matching your persistence guarantees to your user base's expectations and communication patterns.
F. Scaling WebSockets
While REST-based short polling might work for traditional web applications, it creates a jarring user experience for real-time messaging. The constant delay between sending a message and seeing it appear creates friction that breaks the natural flow of conversation. For truly real-time communication like Slack, we need a persistent connection that can push messages instantly to users—enter WebSockets.
1. The WebSocket Foundation
Our approach centers on giving every user exactly one WebSocket connection to our backend infrastructure. This single connection handles all real-time communication needs, from chat messages to notifications to presence updates. This design choice stems from a fundamental browser limitation: most browsers restrict each domain to only 6 concurrent TCP connections. Since WebSockets maintain persistent TCP connections, we must multiplex all real-time features through this single precious connection.
2. Edge Server Architecture
To handle the persistent nature of WebSocket connections at scale, we deploy a fleet of edge servers. These servers act as the first point of contact for user WebSocket connections and serve as the intelligent routing layer for all real-time communication. When any service—whether it's our chat system, notification service, or presence tracker—needs to communicate with users in real-time, the information flows through these edge servers.
Each edge server maintains a local registry of connected users and their communication preferences. Libraries like Socket.IO help manage these connection pools, tracking which users are connected and maintaining the health of each WebSocket connection. When user A sends a message to user B, the edge server orchestrates a three-step process: first, it persists the message to Kafka for durability; then it locates user B in its local connection pool; finally, it delivers the message through B's WebSocket connection.
3. Handling Broadcast Messages
The architecture becomes more complex when dealing with channel messages that need to reach multiple recipients. Consider a channel with 50 members—the edge server needs to efficiently deliver the same message to all connected channel participants. However, real-time delivery isn't always guaranteed due to network issues or temporary disconnections. To handle message delivery failures, we maintain our channel scroll API as a fallback mechanism. When users open a channel, this REST endpoint ensures they receive any messages that might have been missed during real-time delivery.
4. Horizontal Scaling Challenges
A single edge server quickly becomes a bottleneck. If each server can handle 4 concurrent connections, what happens when user 5 tries to connect? The natural solution is horizontal scaling—deploying multiple edge servers and distributing users across them. User E might connect to WebSocket Server 2 while users A through D remain on WebSocket Server 1.
This distribution creates a new challenge: cross-server communication. When user A (on Server 1) sends a message to user E (on Server 2), how does the message traverse the server boundary? The naive approach of creating direct TCP connections between every pair of servers creates an unmanageable mesh topology that doesn't scale beyond a handful of servers.
5. Pub/Sub for Server Communication
The elegant solution lies in introducing a real-time pub/sub system, typically Redis. Instead of direct server-to-server connections, we create a publish/subscribe model where each Slack channel corresponds to a Redis pub/sub channel. Edge servers subscribe only to the channels that their connected users participate in, creating an efficient message distribution network.
Consider this scenario: User A belongs to channels C1, C2, and C3; User B belongs to C2, C4, and C5; User C belongs to C1 and C3; User D belongs to C4. When user E joins the system and participates in channel C3, Edge Server 2 automatically subscribes to the C3 pub/sub channel in Redis.
Now when user A sends a message to channel C3, the process flows seamlessly: the message gets persisted asynchronously to our message store for durability; Edge Server 1 immediately delivers the message to user C through their local WebSocket connection; simultaneously, it publishes the message to Redis on the C3 pub/sub channel; Edge Server 2, having subscribed to C3, receives the message and forwards it to user E. This architecture ensures that user E receives user A's message despite being connected to a different edge server.
6. Handling Transient Failures
Real-time systems must gracefully handle temporary failures. Network hiccups, server restarts, or client-side issues can cause messages to fail delivery through WebSocket connections. Our system addresses this through multiple resilience mechanisms.
We maintain correction jobs that periodically verify message delivery and fill any gaps. Additionally, failed messages are buffered in a separate Kafka topic. When WebSocket connections recover, we replay these buffered messages to ensure users eventually receive all communications, even if not in real-time.
7. Complete Architecture Overview
The final architecture brings together several specialised components. Our API servers continue handling non-real-time operations like channel scrolling, managing muted channels, and other traditional REST endpoints. The edge servers focus exclusively on maintaining WebSocket connections and routing real-time messages.
A connection balancer sits at the front of our edge server fleet, making intelligent routing decisions for new WebSocket connections. Rather than randomly distributing users, it attempts to colocate users who frequently communicate on the same edge servers. This optimisation reduces cross-server pub/sub traffic by keeping channel conversations local whenever possible.
The connection balancer also performs periodic rebalancing, strategically terminating connections to redistribute load across edge servers. While this creates temporary disruptions for affected users, it prevents the system from developing hotspots that could degrade performance for entire user cohorts.
This layered approach: combining persistent WebSocket connections, intelligent edge server routing, Redis pub/sub for cross-server communication, and robust failure recovery mechanisms, creates a real-time messaging system that maintains the instant responsiveness users expect while scaling to support millions of concurrent connections.
That's all for now folks. See you in the next blog!
Subscribe to my newsletter
Read articles from Ruban Sahoo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
