Why Systems Slow Down — and What Smart Caching Teaches Us About Scalability


Modern applications usually start fast. But as traffic grows, so does the load on the backend — and somewhere along the way, things slow down.
Often, it’s not bad code or poor DB design — it’s the volume of repeated reads hitting your database like a DDoS. Caching becomes the first (and sometimes only) line of defense.
But caching isn’t just about speed. It’s about trade-offs — consistency, durability, and failure recovery.
So… what exactly is a cache?
A cache is memory that stores frequently accessed data, so you don’t have to hit your database or expensive downstream systems every time.
But in real systems, a cache is not just a faster version of your database. It’s a separate layer that has its own lifecycle, consistency rules, and edge cases.
Let’s start with the typical, unoptimized request flow:
sequenceDiagram
participant User
participant Frontend
participant Backend
participant DB
User->>Frontend: Request data
Frontend->>Backend: API Call
Backend->>DB: Query for data
DB-->>Backend: Response
Backend-->>Frontend: Render data
Repeat this for every user, every second, and your DB will cry for help.
Choosing the Right Cache Strategy
1. Local (In-Process) Cache
With local caching, each server instance stores data in its own memory. It’s blazingly fast — there are no network hops, just RAM access.
But this comes at a cost. Since every instance has its own copy of the cache, data updates don’t automatically sync across them. This can lead to inconsistencies.
In setups with multiple services or containers, this quickly turns into a fanout problem.
graph TD
A[App Instance 1] --> C1[Local Cache]
B[App Instance 2] --> C2[Local Cache]
C[App Instance 3] --> C3[Local Cache]
To make this work reliably, you'd need sharding, coordination, and sometimes even replication logic — adding operational complexity.
2. Global (Centralized) Cache
This is where tools like Redis or Memcached shine. Instead of each node caching data independently, all instances talk to a shared in-memory store.
graph TD
App1 --> Redis
App2 --> Redis
App3 --> Redis
Redis --> PostgreSQL
Now, if a value is updated, it’s immediately visible to all instances — solving the consistency problem. The downside? Every cache access is a network call. Still fast, but not as instant as a local memory lookup.
3. Distributed Cache with Sharding + Replication
This setup partitions the cache across nodes (sharding), and replicates data across machines for fault tolerance.
graph TD
Client --> Coordinator
Coordinator --> N1[Cache Node 1]
Coordinator --> N2[Cache Node 2]
Coordinator --> N3[Cache Node 3]
To maintain consistency, you typically use quorum logic:
If total nodes = 3, and you write to 2 (
W=2
), then you must read from at least 2 (R=2
) to be safe, becauseR + W > N
.
Handling Writes: Where Things Start Getting Real
➔ Write-Through Cache
Every write goes to both the cache and the database, synchronously.
sequenceDiagram
participant Client
participant Cache
participant DB
Client->>Cache: Write(key, value)
Cache->>DB: Write-through
DB-->>Cache: Ack
Cache-->>Client: Success
Reliable and consistent, but adds latency.
➔ Write-Back (Write-Behind) Cache
Here, the write is stored in cache and acknowledged immediately. The database is updated later, often asynchronously.
sequenceDiagram
participant Client
participant Cache
participant DB
Client->>Cache: Write(key, value)
Cache-->>Client: Ack
Cache->>DB: Async flush
Fast, but if the cache crashes, you lose data unless you persist elsewhere.
➔ Write-Around Cache
Skip the cache entirely for writes. Cache only comes into play during reads.
sequenceDiagram
participant Client
participant DB
participant Cache
Client->>DB: Write
Client->>Cache: Read
Cache-->>Client: Miss
Cache->>DB: Fetch from DB
DB-->>Cache: Result
Cache-->>Client: Return data
Good for cold data, but every first read is a miss.
➔ Cache-Aside (Lazy Loading)
App explicitly manages reads and writes. On a cache miss, fetch from DB and then write to cache. On writes, update DB and invalidate cache.
It gives full control but demands discipline.
Ensuring Consistency with Quorum Reads
If you’re using a distributed cache, ensure R + W > N
to read from at least one up-to-date node. Otherwise, you might serve stale data from a node that hasn’t received the latest write.
Cache Invalidation: The Real Headache
Common patterns:
TTL: Keys expire after a fixed time.
Manual Invalidation: Delete the cache entry after DB write.
Pub/Sub: Broadcast cache bust messages.
Versioned Keys: Use versioning in keys to force reads to new data.
Eviction Strategies: When Memory Runs Out
Common Strategies
LRU (Least Recently Used)
LFU (Least Frequently Used)
Segmented LRU (used in Memcached)
flowchart LR
subgraph Cold Region
C1[Key 4] --> C2[Key 5]
end
subgraph Hot Region
H1[Key 1] --> H2[Key 2]
end
C2 -->|Used again| H3[Key 5]
H1 -->|Evicted| C3[Key 1]
Choose based on your app's access patterns.
Summary: Pick Based on Trade-offs
Strategy | Consistency | Speed | Risk | Best For |
Write-Through | Strong | Medium | DB latency affects writes | Profiles, settings, payments |
Write-Back | Eventual | Fast | Data loss if cache crashes | Logs, counters, analytics |
Write-Around | Eventual | Medium | Cache misses on fresh data | Product catalogs, meta info |
Cache-Aside | Manual | Flexible | Devs must invalidate cache | API-driven, GraphQL, mixed reads |
Before You Cache Anything...
Ask yourself:
Is the data read-heavy or write-heavy?
Can you tolerate eventual consistency?
How will you handle invalidation?
What’s your eviction strategy under load?
Final Thoughts
Caching is not just a performance trick — it’s a system design decision.
Used right, it can speed up systems by 10x. Used wrong, it silently causes data bugs that surface only in production.
Plan your cache like you're planning your database. Design for failure. Test for staleness.
Let’s build systems that scale and stay correct.
What caching patterns or disasters have you seen in production?
#Caching #SystemDesign #Redis #BackendEngineering #PerformanceOptimization #Microservices #Scalability #Architecture
Subscribe to my newsletter
Read articles from Pushkar kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
