Why Systems Slow Down — and What Smart Caching Teaches Us About Scalability

Pushkar kumarPushkar kumar
5 min read

Modern applications usually start fast. But as traffic grows, so does the load on the backend — and somewhere along the way, things slow down.

Often, it’s not bad code or poor DB design — it’s the volume of repeated reads hitting your database like a DDoS. Caching becomes the first (and sometimes only) line of defense.

But caching isn’t just about speed. It’s about trade-offs — consistency, durability, and failure recovery.


So… what exactly is a cache?

A cache is memory that stores frequently accessed data, so you don’t have to hit your database or expensive downstream systems every time.

But in real systems, a cache is not just a faster version of your database. It’s a separate layer that has its own lifecycle, consistency rules, and edge cases.

Let’s start with the typical, unoptimized request flow:

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant DB

    User->>Frontend: Request data
    Frontend->>Backend: API Call
    Backend->>DB: Query for data
    DB-->>Backend: Response
    Backend-->>Frontend: Render data

Repeat this for every user, every second, and your DB will cry for help.


Choosing the Right Cache Strategy

1. Local (In-Process) Cache

With local caching, each server instance stores data in its own memory. It’s blazingly fast — there are no network hops, just RAM access.

But this comes at a cost. Since every instance has its own copy of the cache, data updates don’t automatically sync across them. This can lead to inconsistencies.

In setups with multiple services or containers, this quickly turns into a fanout problem.

graph TD
    A[App Instance 1] --> C1[Local Cache]
    B[App Instance 2] --> C2[Local Cache]
    C[App Instance 3] --> C3[Local Cache]

To make this work reliably, you'd need sharding, coordination, and sometimes even replication logic — adding operational complexity.


2. Global (Centralized) Cache

This is where tools like Redis or Memcached shine. Instead of each node caching data independently, all instances talk to a shared in-memory store.

graph TD
    App1 --> Redis
    App2 --> Redis
    App3 --> Redis
    Redis --> PostgreSQL

Now, if a value is updated, it’s immediately visible to all instances — solving the consistency problem. The downside? Every cache access is a network call. Still fast, but not as instant as a local memory lookup.


3. Distributed Cache with Sharding + Replication

This setup partitions the cache across nodes (sharding), and replicates data across machines for fault tolerance.

graph TD
    Client --> Coordinator
    Coordinator --> N1[Cache Node 1]
    Coordinator --> N2[Cache Node 2]
    Coordinator --> N3[Cache Node 3]

To maintain consistency, you typically use quorum logic:

If total nodes = 3, and you write to 2 (W=2), then you must read from at least 2 (R=2) to be safe, because R + W > N.


Handling Writes: Where Things Start Getting Real

➔ Write-Through Cache

Every write goes to both the cache and the database, synchronously.

sequenceDiagram
    participant Client
    participant Cache
    participant DB

    Client->>Cache: Write(key, value)
    Cache->>DB: Write-through
    DB-->>Cache: Ack
    Cache-->>Client: Success

Reliable and consistent, but adds latency.


➔ Write-Back (Write-Behind) Cache

Here, the write is stored in cache and acknowledged immediately. The database is updated later, often asynchronously.

sequenceDiagram
    participant Client
    participant Cache
    participant DB

    Client->>Cache: Write(key, value)
    Cache-->>Client: Ack
    Cache->>DB: Async flush

Fast, but if the cache crashes, you lose data unless you persist elsewhere.


➔ Write-Around Cache

Skip the cache entirely for writes. Cache only comes into play during reads.

sequenceDiagram
    participant Client
    participant DB
    participant Cache

    Client->>DB: Write
    Client->>Cache: Read
    Cache-->>Client: Miss
    Cache->>DB: Fetch from DB
    DB-->>Cache: Result
    Cache-->>Client: Return data

Good for cold data, but every first read is a miss.


➔ Cache-Aside (Lazy Loading)

App explicitly manages reads and writes. On a cache miss, fetch from DB and then write to cache. On writes, update DB and invalidate cache.

It gives full control but demands discipline.


Ensuring Consistency with Quorum Reads

If you’re using a distributed cache, ensure R + W > N to read from at least one up-to-date node. Otherwise, you might serve stale data from a node that hasn’t received the latest write.


Cache Invalidation: The Real Headache

Common patterns:

  • TTL: Keys expire after a fixed time.

  • Manual Invalidation: Delete the cache entry after DB write.

  • Pub/Sub: Broadcast cache bust messages.

  • Versioned Keys: Use versioning in keys to force reads to new data.


Eviction Strategies: When Memory Runs Out

Common Strategies

  • LRU (Least Recently Used)

  • LFU (Least Frequently Used)

  • Segmented LRU (used in Memcached)

flowchart LR
    subgraph Cold Region
        C1[Key 4] --> C2[Key 5]
    end

    subgraph Hot Region
        H1[Key 1] --> H2[Key 2]
    end

    C2 -->|Used again| H3[Key 5]
    H1 -->|Evicted| C3[Key 1]

Choose based on your app's access patterns.


Summary: Pick Based on Trade-offs

StrategyConsistencySpeedRiskBest For
Write-ThroughStrongMediumDB latency affects writesProfiles, settings, payments
Write-BackEventualFastData loss if cache crashesLogs, counters, analytics
Write-AroundEventualMediumCache misses on fresh dataProduct catalogs, meta info
Cache-AsideManualFlexibleDevs must invalidate cacheAPI-driven, GraphQL, mixed reads

Before You Cache Anything...

Ask yourself:

  • Is the data read-heavy or write-heavy?

  • Can you tolerate eventual consistency?

  • How will you handle invalidation?

  • What’s your eviction strategy under load?


Final Thoughts

Caching is not just a performance trick — it’s a system design decision.

Used right, it can speed up systems by 10x. Used wrong, it silently causes data bugs that surface only in production.

Plan your cache like you're planning your database. Design for failure. Test for staleness.

Let’s build systems that scale and stay correct.


What caching patterns or disasters have you seen in production?

#Caching #SystemDesign #Redis #BackendEngineering #PerformanceOptimization #Microservices #Scalability #Architecture

0
Subscribe to my newsletter

Read articles from Pushkar kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pushkar kumar
Pushkar kumar