25 Golden Rules of System Design

Nitin SinghNitin Singh
17 min read

🟡 Rule #1: For Read-Heavy Systems – Use Caching

In read-heavy systems—like social feeds, product pages, or dashboards—the same data is often requested over and over. Hitting the database for every request is wasteful and can lead to latency and scaling issues.

Caching is your go-to strategy here.

By storing frequently accessed data in-memory (using tools like Redis or Memcached), you can dramatically reduce response times and offload pressure from your primary database.

✅ When to Use Caching:

  • High read-to-write ratio

  • Expensive or repetitive database queries

  • Performance-critical endpoints (e.g., homepage, trending section)

⚙️ Common Tools:

  • Redis – fast, in-memory data store

  • CDNs – cache static content (images, videos, stylesheets)

  • Local in-app cache – for small-scale or single-node apps

💡 Interview Tip: If asked about scaling read-heavy workloads, caching should be one of the first solutions you mention. Be prepared to discuss cache invalidation strategies and data freshness.


🟡 Rule #2: For Low-Latency Requirements – Use Cache + CDN

In latency-sensitive applications—like media streaming, global dashboards, or news portals—delivering content quickly is critical. Even minor delays can impact user experience.

To minimize latency, especially for geographically distributed users, combine in-memory caching with a Content Delivery Network (CDN).

While caching accelerates dynamic content, CDNs serve static assets (like images, JS, and CSS) from edge servers close to the user’s location.

✅ When to Use:

  • Global or regionally distributed users

  • Static content (media, stylesheets, scripts)

  • Performance-critical first-load experiences

⚙️ Common Tools:

  • Cloudflare, CloudFront, Akamai – for static asset delivery

  • Redis, browser cache, or service worker cache – for dynamic content

💡 Interview Tip: Mention Cache + CDN when optimizing for speed in global systems or user-facing apps where latency directly affects UX.


🟡 Rule #3: For Write-Heavy Systems – Use Message Queues

In systems where write operations are frequent and bursty—like payment logs, user events, or order processing—direct writes to the database can become a bottleneck.

Message Queues help decouple the write path. Instead of writing directly to the DB, incoming data is pushed to a queue and processed asynchronously by background workers.

This improves system reliability, absorbs traffic spikes, and prevents overload on the database or downstream services.

✅ When to Use:

  • Systems with high write volumes

  • Spiky traffic patterns (e.g., flash sales, uploads)

  • Event-driven architectures (e.g., activity logs, metrics)

⚙️ Common Tools:

  • Apache Kafka, RabbitMQ, Amazon SQS

  • Kafka Streams, Celery, or custom consumers for processing

💡 Interview Tip: Use message queues when you need to buffer write load, ensure durability, or decouple services for better fault tolerance.


🟡 Rule #4: For ACID-Compliant Needs – Use Relational Databases

When your system requires strong consistency, integrity, and transactional support, a relational database (SQL) is the ideal choice.

ACID properties (Atomicity, Consistency, Isolation, Durability) are crucial in use cases like financial transactions, inventory systems, or user account management—where even a single corrupted write can lead to serious issues.

Relational databases ensure that all operations happen reliably and in the right order, using structured schemas and constraints.

✅ When to Use:

  • Financial systems (e.g., banking, billing)

  • Systems requiring multi-step transactions

  • Data with strong relational integrity (e.g., users, orders, items)

⚙️ Common Tools:

  • PostgreSQL, MySQL, Oracle, SQL Server

  • Use with ORMs like Hibernate or JPA in Java-based stacks

💡 Interview Tip: Emphasize SQL databases for systems where correctness, referential integrity, and rollback support are non-negotiable.


🟡 Rule #5: For Unstructured, Non-ACID Data – Use NoSQL Databases

Not all data fits neatly into tables. When dealing with unstructured, semi-structured, or high-volume data that doesn’t require strict ACID guarantees, NoSQL databases are often a better fit.

They offer flexible schemas, horizontal scalability, and high performance—ideal for use cases like analytics, user activity logs, or content storage.

Depending on your needs, you can choose from document stores, key-value stores, column stores, or wide-column databases.

✅ When to Use:

  • Schema-less or frequently changing data

  • High write throughput, low consistency needs

  • User preferences, analytics, activity logs, product catalogs

⚙️ Common Tools:

  • MongoDB (Document Store)

  • Cassandra, DynamoDB (Wide-column)

  • Redis, Riak (Key-value)

💡 Interview Tip: NoSQL is ideal when flexibility and scalability outweigh the need for strict transactional integrity.


🟡 Rule #6: For Storing Large Binary Files – Use Blob/Object Storage

Databases are not optimized for handling large binary files like images, videos, PDFs, or backups. Storing these directly in a database can slow down queries, increase storage costs, and complicate scaling.

Instead, use object storage services designed for handling large, unstructured binary data efficiently.

These services store files as objects in a flat namespace and offer features like versioning, metadata, and lifecycle management—ideal for media-heavy applications.

✅ When to Use:

  • User-uploaded media (images, videos, resumes)

  • File backups, logs, data exports

  • Any content > a few MB that doesn’t need querying

⚙️ Common Tools:

  • AWS S3, Google Cloud Storage, Azure Blob Storage

  • Integrate with CDNs for faster delivery

💡 Interview Tip: Always separate media storage from your core database—mention object storage for scalability, cost-efficiency, and performance.


🟡 Rule #7: For Precomputed, Personalized Data – Use Message Queues + Cache

When serving personalized feeds or recommendation lists (like newsfeeds, dashboards, or notifications), computing data on the fly can be slow and expensive.

A better approach: precompute data asynchronously using message queues, then store the result in a cache for fast access.

This decouples the heavy lifting from the real-time response cycle and ensures a snappy user experience.

✅ When to Use:

  • Personalized homefeeds, activity streams, or dashboards

  • Data aggregation (likes, followers, scores)

  • Systems with delayed consistency needs

⚙️ Common Tools:

  • Kafka, RabbitMQ – for background processing

  • Redis, Memcached – to cache precomputed results

💡 Interview Tip: For feed-based apps, suggest precomputing via a queue and caching the result for quick reads—this shows a strong grasp of performance trade-offs.


🟡 Rule #8: For High-Volume Search Use Cases – Use a Search Index or Engine

When your system needs to support fast, full-text search across large datasets—like product catalogs, documents, or user profiles—a standard database query won’t cut it.

Instead, use a search engine or custom search index to handle queries like keyword matching, filtering, and ranking efficiently.

These engines are optimized for text analysis, fuzzy matching, autocomplete, and relevance scoring.

✅ When to Use:

  • E-commerce search, site-wide search bars

  • Document search, tagging systems, user search

  • Features like autocomplete, typo-tolerance, and filters

⚙️ Common Tools:

  • Elasticsearch, OpenSearch, Apache Solr

  • Custom Tries or inverted indexes for lightweight/local use

💡 Interview Tip: Mention Elasticsearch or similar tools when designing anything involving full-text search, autocomplete, or filtering at scale.


🟡 Rule #9: For Scaling SQL Databases – Use Database Sharding

As your system grows, a single SQL database can become a performance bottleneck. Instead of scaling vertically (adding more CPU/RAM), shard your database—split it horizontally across multiple machines.

Each shard handles a portion of the data (e.g., by user ID or region), reducing the load on any single server and allowing for parallel processing.

✅ When to Use:

  • Large-scale systems with millions of users or records

  • Uneven load distribution across data

  • Need to scale read/write throughput beyond a single DB

⚙️ Common Approaches:

  • Range-based sharding (e.g., user ID ranges)

  • Hash-based sharding

  • Geo-based sharding (e.g., region/country)

💡 Interview Tip: When your SQL database can’t scale further, sharding is your go-to solution. Be ready to explain how you’d split data and handle cross-shard queries.


🟡 Rule #10: For High Availability & Load Handling – Use a Load Balancer

When traffic increases, you need multiple servers to handle the load. But how do you distribute traffic across them efficiently?

Use a Load Balancer to route requests evenly across multiple instances. This improves both availability and scalability by preventing any single server from becoming a bottleneck or point of failure.

Load balancers also help with automatic failover, health checks, and SSL termination.

✅ When to Use:

  • Applications deployed on multiple servers

  • Auto-scaling systems in cloud environments

  • Any system that requires high uptime and fault tolerance

⚙️ Common Tools:

  • HAProxy, NGINX, AWS Elastic Load Balancer (ELB)

  • Round-robin, least-connections, and IP-hash strategies

💡 Interview Tip: Mention load balancers when asked how to scale an app horizontally or handle failover in multi-node deployments.


🟡 Rule #11: For Faster Global Delivery – Use a CDN

If your users are spread across the globe, latency becomes a real challenge. Static assets like images, scripts, and videos should not be served from a single origin server.

A Content Delivery Network (CDN) distributes static files to edge servers located closer to users, drastically reducing load times and improving performance.

CDNs also reduce traffic on your origin server and absorb sudden traffic spikes.

✅ When to Use:

  • Websites or apps with global users

  • Heavy static content like media, fonts, CSS, and JS

  • SEO and performance-sensitive web apps

⚙️ Common Tools:

  • Cloudflare, AWS CloudFront, Akamai, Fastly

  • Integrate with object storage for seamless delivery

💡 Interview Tip: Suggest a CDN when optimizing global response times or offloading static asset traffic from your backend.


🟡 Rule #12: For Relationship-Based Data – Use a Graph Database

When your system needs to model and query complex relationships—like social connections, recommendation networks, or maps—a traditional relational or NoSQL database can become inefficient.

A Graph Database is purpose-built for this. It stores data as nodes and edges, making it easier and faster to traverse relationships in real time.

Queries like “Who are the mutual friends of A and B?” or “What’s the shortest path between two users?” are what graph databases excel at.

✅ When to Use:

  • Social networks (friends, followers, mutuals)

  • Recommendation engines (people you may know, similar items)

  • Network graphs (routing, maps, dependencies)

⚙️ Common Tools:

  • Neo4j, Amazon Neptune, ArangoDB

  • Cypher Query Language for graph traversal

💡 Interview Tip: Use graph databases when your queries are relationship-driven and involve many hops—especially in social or networked systems.


🟡 Rule #13: For Scaling Individual System Components – Use Horizontal Scaling

When one part of your system—like your web server or database—is overwhelmed, the best way to scale is to add more instances of that component rather than making a single machine bigger.

This is known as horizontal scaling, and it improves availability, distributes load, and supports auto-scaling in cloud environments.

Unlike vertical scaling (adding CPU/RAM), horizontal scaling is more fault-tolerant and cost-effective over time.

✅ When to Use:

  • Web servers, app servers, microservices

  • Systems under dynamic or growing load

  • Cloud-native or containerized infrastructure

⚙️ Common Tools:

  • Kubernetes, Docker Swarm – for service orchestration

  • AWS Auto Scaling, GCP Instance Groups

💡 Interview Tip: Always suggest horizontal scaling over vertical scaling in system design questions—it's more resilient, scalable, and cloud-friendly.


🟡 Rule #14: For Fast Query Performance – Use Database Indexes

When your application needs to query large datasets quickly, scanning every row becomes inefficient. That’s where database indexes come in.

An index is like a lookup table that helps the database find rows faster—just like an index in a book helps you find a topic without flipping through every page.

Used correctly, indexes can drastically reduce query time, especially for filters, sorts, and joins.

✅ When to Use:

  • Queries on large tables with filters (e.g., WHERE, JOIN, ORDER BY)

  • Frequently searched fields (e.g., user ID, email, timestamps)

  • Read-heavy analytics or reporting dashboards

⚙️ Common Tools:

  • B-Tree Index (default in most databases)

  • Hash Index, GIN, Composite Indexes

  • Available in PostgreSQL, MySQL, MongoDB, and others

💡 Interview Tip: If asked about optimizing slow queries, mention indexes early—but also highlight trade-offs like increased storage and slower writes.


🟡 Rule #15: For Bulk Processing Jobs – Use Batch Processing + Message Queues

Some operations—like sending emails to millions of users, generating reports, or processing logs—are too large or slow for real-time handling.

The best approach? Batch processing using background workers that consume jobs from a message queue.

This decouples the work from your user-facing services, improves system responsiveness, and handles large workloads efficiently over time.

✅ When to Use:

  • Scheduled or bulk operations (e.g., data cleanup, daily summaries)

  • High-volume event logs or metrics pipelines

  • Background email/SMS delivery systems

⚙️ Common Tools:

  • Apache Kafka, RabbitMQ, Amazon SQS – for queuing

  • Airflow, Celery, Spark, or custom worker scripts – for batch processing

💡 Interview Tip: When asked how to handle large-scale background tasks, suggest queue-based batch processing to demonstrate separation of concerns and resilience.


🟡 Rule #16: For Controlling Excessive Requests – Implement a Rate Limiter

APIs and services exposed to the public or high-traffic environments are vulnerable to abuse, accidental flooding, or denial-of-service (DoS) attacks.

A rate limiter protects your system by restricting how many requests a user or client can make within a given time frame. It also ensures fair usage and helps maintain system stability under load.

✅ When to Use:

  • Public APIs and login systems

  • Expensive or sensitive endpoints (e.g., payment, search)

  • Preventing brute-force or bot attacks

⚙️ Common Strategies:

  • Token Bucket, Leaky Bucket, Fixed Window, Sliding Window

  • Enforced at API Gateway, load balancer, or app level

⚙️ Common Tools:

  • NGINX, Envoy, Kong, Rate-limiting middleware in frameworks

  • Redis-backed counters for distributed rate limiting

💡 Interview Tip: Rate limiting is a strong follow-up to any question on system reliability, abuse prevention, or public API design.


🟡 Rule #17: For Microservice Architectures – Use an API Gateway

In microservices, each service handles a specific function—but exposing them directly to the client leads to complexity and security risks.

An API Gateway acts as a single entry point for all client requests. It routes traffic to the correct microservice, handles authentication, rate limiting, request transformation, and can even serve cached responses.

It simplifies communication and centralizes cross-cutting concerns.

✅ When to Use:

  • Microservice-based systems

  • When you need centralized control over authentication, logging, or throttling

  • Public-facing APIs with multiple internal services behind them

⚙️ Common Tools:

  • Kong, NGINX, AWS API Gateway, Istio, Zuul

  • Integrated with OAuth2, JWT, rate limiters, and logging systems

💡 Interview Tip: When discussing microservices, always mention using an API Gateway to manage external access, reduce exposure, and enforce common policies.


🟡 Rule #18: For Avoiding Single Points of Failure – Add Redundancy

A system is only as strong as its most fragile component. If a single server, database, or network link goes down and takes the whole system with it—you have a single point of failure.

To build resilient systems, introduce redundancy at every critical layer. This means duplicating resources—like having multiple servers, replicas, or network paths—so the system can continue operating even if one part fails.

✅ When to Use:

  • Any production system requiring high availability

  • Database layers, application servers, load balancers

  • Multi-AZ or multi-region deployments

⚙️ Common Strategies:

  • Database replication, multi-node clusters

  • Redundant network paths, failover servers, hot/cold backups

  • Cloud auto-scaling groups with health checks

💡 Interview Tip: Always bring up redundancy when asked about availability, fault tolerance, or disaster recovery in system design.


🟡 Rule #19: For Fault Tolerance and Durability – Use Data Replication

In distributed systems, losing data during a crash or network failure can be catastrophic. Data replication ensures that copies of your data exist across multiple servers or locations—so if one node goes down, your system keeps running.

Replication increases both durability and availability. It also helps balance read operations across nodes and enables disaster recovery.

✅ When to Use:

  • Systems requiring high data durability and uptime

  • Distributed databases and file storage systems

  • Read-heavy applications (with replicated read nodes)

⚙️ Common Strategies:

  • Leader-follower (master-slave) replication

  • Multi-leader (multi-master) replication

  • Quorum-based writes and reads

⚙️ Tools:

  • PostgreSQL, MongoDB, Cassandra, MySQL, HDFS

💡 Interview Tip: When discussing data safety or failover, mention replication strategies and consistency trade-offs (eventual vs strong).


🟡 Rule #20: For Real-Time, Bi-Directional Communication – Use WebSockets

Traditional HTTP is request-response only, meaning the client must initiate every interaction. But for real-time apps—like chat, live notifications, or collaborative tools—you need two-way, persistent communication.

That’s where WebSockets come in.

WebSockets allow clients and servers to open a continuous connection, enabling real-time, bi-directional data flow without repeatedly polling the server.

✅ When to Use:

  • Chat applications, multiplayer games, live dashboards

  • Real-time notifications and event updates

  • Collaborative editing tools

⚙️ Common Tools:

  • Socket.IO, STOMP, Spring WebSocket, SignalR

  • Built into modern browsers and supported by most backend frameworks

💡 Interview Tip: Mention WebSockets when asked how to push updates from server to client instantly without polling.


🟡 Rule #21: For Detecting Failures in Distributed Systems – Implement a Heartbeat Mechanism

In a distributed system, simply assuming that nodes or services are alive can lead to cascading failures. Instead, implement a heartbeat mechanism—a lightweight, periodic signal sent between nodes to verify health and availability.

If a service or server stops sending heartbeats within a defined interval, it can be marked as unavailable, triggering failover or recovery logic.

✅ When to Use:

  • Distributed systems and microservices

  • Cluster coordination, leader election, and failover detection

  • Systems requiring high uptime and automated recovery

⚙️ Common Tools:

  • Consul, Zookeeper, Eureka

  • Custom heartbeats using health check endpoints + schedulers

💡 Interview Tip: When asked how your system detects failures or manages node health, heartbeat mechanisms show you understand system resilience in depth.


🟡 Rule #22: For Ensuring Data Integrity – Use Checksums

When transmitting or storing critical data—especially across unreliable networks or disks—you need to verify that it hasn’t been altered or corrupted.

Checksums are small, fixed-size digests (e.g., MD5, SHA-256) computed from the original data. By comparing the stored checksum with one recalculated after transmission, you can detect corruption or tampering.

✅ When to Use:

  • File transfers, data replication, backups

  • Network packet validation

  • Verifying software downloads or uploads

⚙️ Common Tools:

  • MD5, SHA-1, SHA-256

  • Built into most programming languages and database engines

💡 Interview Tip: Mention checksums when asked about data integrity in replication, backups, or network transfer reliability.


🟡 Rule #23: For Decentralized Communication – Use Gossip Protocol

In peer-to-peer or decentralized systems, you can’t rely on a central node for coordination. That’s where Gossip Protocols shine.

Each node periodically shares information with a random peer. Over time, data like membership, health, or state spreads through the system—just like gossip in a social group.

It’s scalable, fault-tolerant, and eventually consistent.

✅ When to Use:

  • Peer-to-peer networks

  • Cluster membership tracking and health checking

  • Distributed databases and consensus mechanisms

⚙️ Common Tools:

  • Serf, Consul, Cassandra, ScyllaDB

💡 Interview Tip: Bring up gossip protocols when discussing decentralized systems or cluster coordination without a single point of control.


🟡 Rule #24: For Efficient Node-Based Load Distribution – Use Consistent Hashing

In systems with a dynamic set of servers or caches, you need a way to assign keys (like user sessions or cache items) to nodes without causing massive re-distribution when nodes are added or removed.

Consistent Hashing solves this. It maps nodes and keys to a circular hash space, minimizing reassignments and keeping the system stable under change.

✅ When to Use:

  • Distributed caching (e.g., Memcached, Redis clusters)

  • Sharded databases or DHTs (Distributed Hash Tables)

  • Load balancing with dynamic server pools

⚙️ Common Tools:

  • Built into Cassandra, Voldemort, Envoy, Akka Cluster

💡 Interview Tip: Use consistent hashing when your system has dynamic scaling needs or distributed key-value stores.


🟡 Rule #25: For Location-Based Systems – Use Quadtree or Geohash

When building features like “Find nearby restaurants” or “Map search within 10km,” you need spatial indexing to query by location efficiently.

Use structures like Quadtrees or Geohashes to break the world into grid-like zones that allow fast range queries.

They outperform brute-force distance checks at scale and are commonly used in mapping, delivery, and geolocation apps.

✅ When to Use:

  • Location-based search or filtering

  • Maps, rideshare apps, delivery tracking

  • Any app using latitude/longitude queries

⚙️ Common Tools:

  • Geohash, Quadtree, R-Tree, PostGIS, Elasticsearch Geo

💡 Interview Tip: If asked to design a location-aware system, always mention spatial indexing techniques like Geohash or Quadtree.


🙌 Enjoyed These 25 Golden Rules of System Design?

If this guide helped simplify core architectural concepts or gave you practical ideas for building scalable systems, feel free to:

  • Share it with your peers

  • Bookmark it for future reference

  • Leave a ❤️ to support the work behind this series

This isn’t the end—just the beginning of deeper, more thoughtful system design.

📩 Subscribe now to join the journey. I’ll keep your inbox learning-ready—one principle at a time.

Nitin
Hashnode | Substack | LinkedIn | GIT

0
Subscribe to my newsletter

Read articles from Nitin Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nitin Singh
Nitin Singh

I'm a passionate Software Engineer with over 12 years of experience working with leading MNCs and big tech companies. I specialize in Java, microservices, system design, data structures, problem solving, and distributed systems. Through this blog, I share my learnings, real-world engineering challenges, and insights into building scalable, maintainable backend systems. Whether it’s Java internals, cloud-native architecture, or system design patterns, my goal is to help engineers grow through practical, experience-backed content.