System Design - A Gentle Introduction

Chapter 1: The Humble Beginnings – A Single Server

Imagine you’re launching "notamazon.com", your own e-commerce store. Some ignores it calling it as ‘poor man’s amazon’, but you sticks with it believing in yourself. At first, on a good day you get like 10 daily users. A single server (like AWS EC2) works fine—it handles the website, database, and payments. All in one (cloud) computer. Life is simple.

Problem: There is no problem. You convince yourself that you are in your ‘growth’ phase.

Chapter 2: DNS – The Address Book

You started to ask your friends and family to check out the new ‘notamazon.com’ that you have built for feedback. Then that one curious uncle of yours ask -

when users type notamazon.com on their browser and hits enter, how does it find the server which has an IP like 192.168.1.1 ?!

Well then you will tell your uncle that there is something known as DNS (Domain Name System) which acts like a phonebook for servers. Browser first ‘under-the-table’ asks the DNS server where is notamazon.com’s server located and then it will find out (if it exists, that is) & gives it to the browser to which it then connects. The whole thing is generally referred to as - DNS resolution. Since your uncle is curious, you also informs him that AWS Route 53 (DNS resolver from amazon) maps notamazon.com to your server’s IP. You then refer your uncle to read your article that you have written about it.

QA Tip: If DNS fails, your site disappears. Always test DNS configurations!

Chapter 3: Vertical Scaling – Bigger Shoes

Problem: Out of the blue, some A-Lister celebrity became notamazon.com ‘s self proclaimed ‘fan’ and tweets out it’s link to their users. Traffic spikes to 1,000 users. There are a lot of incoming requests than the one you have had anticipated. The computer cannot sustain the load. And shortly after, the inevitable happens - the server crashes. Orders, as well as new potential users, vanish. Chaos. Your server is clearly overloaded.

Now what do you do? Instead of adding more servers, you upgrade the existing one: more CPU, RAM, and storage. This is called - Vertical Scaling. But it has a drawback - every time you add new resources to a computer, it needs a restart - which translates to - application downtime. But that’s okay for now. You do it in the midnight such that tomorrow’s user load is handled. Now your server with the upgraded muscle works like a power horse and manages the spiked user load.

Example: AWS lets you resize an EC2 instance from a t2.micro (Toyota) to a c5.4xlarge (Ferrari) in minutes.

But… Even a Ferrari has a speed limit. Upon deeper reflection, you realised that at 10,000 users, even your upgraded server struggles. So you started to think about strategies to handle higher user load.

Chapter 4: Horizontal Scaling – Army of Clones

Time to clone your server! Yes, rather than upgrading the existing server with new muscle, let’s connect new computers to share the user load. This is called Horizontal scaling where it adds more servers ( in our case more EC2 instances) to share the load. So we have a provision for spawning clones during traffic spikes (whenever required) and removing extra instances when they are not needed anymore.

AWS Auto Scaling Groups automate this activity.

Chapter 5: Load Balancer – The Traffic Manager

Problem: Now, with 10 servers, how do users know which one to hit?

Enter the Elastic Load Balancer (ELB):

Routes traffic evenly.
Checks server health (kicks out sick servers).
Handles HTTPS encryption (so your servers don’t have to).

You just need to update the DNS server to return the load balancer’s IP address instead of the single EC2 instance where you had initially host the whole server.

ELB is an offering from AWS and you can think of it like a guide at the mall entrance directing crowds to the shortest checkout line.

Chapter 6: Microservices – Divide and Conquer

People really seem to like your application. You are starting to get more than 100 k daily users. Your code base is now a giant monolith. Changing or updating a single feature (for example, the search feature) breaks part of your application (for example, the payment system). More over, if you want to introduce breaking changes in one service then you need to bring down other unrelated services as well.

Solution: Split into microservices (small, independent teams). Each microservice has its own load balancer to route the traffic and have multiple EC2 instances to handle user load. Third party services in your application like the payment handler exists as they were but they exists now as just another service.

In this setup, the API Gateway (AWS API Gateway) becomes the main receptionist and the load balancers for each service becomes sub-receptionists. For instance, it routes /products requests to the Products Service and /upi-pay to the third party Payment Service. Now on a large scale, we can easily change, update or remove services without bringing down the whole application.

Chapter 7: Batch Processing – The Night Crew

Imagine that at midnight, while users sleep, notamazon.com needs to:

Generate daily sales reports.
Update product prices.

How do you do this now? It needs dedicated computers running these tasks in the background.

AWS Batch or Lambda runs these jobs quietly, like janitors cleaning after a party. This is part of the event driver microservices architecture.

Example: A Lambda function triggers at 2 AM, scans the database, and emails you: “Sold 500 toothbrushes today!”

Chapter 8: Queues – The Waiting Room

When a user clicks “Buy Now,” the Payment Service shouldn’t wait for the Inventory Service.

AWS SQS (Simple Queue Service) to the rescue:

The order goes into a queue.
Payment Service processes it.
Inventory Service updates stock later.

Now what if the payment failed due to issues at the third party service? Well in that case, we can use a ‘dead letter queue’ to retry it later.

Dead Letter Queue (DLQ): If an order fails (e.g., payment declines 3 times), it moves here.

Chapter 9: Event-Driven Alerts – “Hey, Listen!”

When an order succeeds, multiple services need to know about it. For example in our case when an order is placed and payment is done:

Send a confirmation whatsapp message.
Send a confirmation SMS
Update recommendations.
etc

In such cases, queues cannot be used because queues are only for one to one communication. We need something which can communicate with the connected services in one shot. Notification system comes into picture at this stage.

AWS SNS (Simple Notification Service) for instance broadcast events like a town crier: “Order #1234 Placed!”. Subscribers (WhatsApp worker, SMS worker, etc) can then react to these messages independently.

Chapter 10: Fan-Out – One to Many

The drawback of the notification system is that there is no feedback from the service which consumed it. Queues get feedback but it can only communicate from one service to another. If we combine these two strategies somehow, then we get the best of both the worlds!

For example: instead of the Order Service calling 5 services, let SNS + SQS together fans out the event to multiple queues.

Example: Order → Inventory Queue + Email Queue + Analytics Queue.

This is like a celebrity tweeting once, and all his or her followers after reading them retweeting to their own followers.

Chapter 11: Rate Limiting – “Slow Down, Buddy!”

Once you are gaining a lot in terms of market share, then envy strikes - forces against you unite to throw you off by bringing down your servers with artificial users overwhelming your application to unprecedented levels. For instance these bots try anything and everything to brute-force login. Or maybe it need not come from them; simple users spam the “Refresh” button during a sale on an already slow loading page. These could be problematic. So how would you deal with this?

Solution: Rate limiting at the API Gateway:

Fixed Window: Allow X requests/minute per user session (e.g.: 100 requests per minute).
Token Bucket: Let users burst X requests (e.g.: 10) in Y time (e.g.: 2 seconds), then slow down.

API rate limits need to be checked and verified before every major release, for the lack of which could potentially cause huge loses.

Chapter 12: Database Scaling – The Brain

When push comes to shove, you have noticed that, it is your database which slows down you application. In other words, database is where there is the bottleneck. Why? Well we have all these different services which all push their data to the single database; multiple and parallel reads and writes from the same database by different services chokes the performance unresolvably.

So how do we deal with this?

We need to spin up and create multiple database instances where we can keep on primary database where the services write to and several other instances from which the services can read from. We can set it up like this:

Primary Node: Handles writes (e.g., updating stock).
Read Replicas: Handle reads (e.g., product searches). AWS RDS automates this.

If the Primary Node crashes, writes stop. So make sure to set up Multi-Instances of the primary node for backups!

Chapter 13: Redis – The Speedster

Alright so as things are setup based on your learning, one of your analysts told you that though there are a lot of products in your application for sale, users mostly search and buy only certain products. And for each of each of those queries, at scale, is having an impact on the application response when they are being queried - i.e. multiple user querying for particular product details from database at the same time.

When presented this problem with your engineers, they came up with a brilliant solution - temporarily store frequently searched items in a database separately and serve it to the users from there rather than from the database. This is referred to as - Caching.

Caching is a common technique used to improve performance, reduce server load, and enhance user experience across various applications and systems.

AWS ElastiCache (Redis) stores data in-memory rather than in database. Now if someone in an area has searched for X product, the detailed can be fetched from the cache server and kept ready to be served to anyone else who might need it, saving a trip to the database and querying for results.

Once implemented, you started to observe that a product page loads in 2ms instead of 200ms. Users don’t bounce. You sell more.

Chapter 14: CDN – The Global Courier

Your company is a big success. So you went global to serve users from across the world. Now you are faced with another problem - Users in certain countries complain about slow image loads of the products that are on display in your application.

Solution - CDN or Content Delivery Network. It is used for caching resources like images.

AWS CloudFront (CDN) caches images/videos at edge locations worldwide. It also has another offering called Anycast IP which routes users to the nearest server while keeping the IP address as the same for every servers. Now a user in Mumbai gets images from Singapore instead of Ohio where your servers are located.

Conclusion

By joining the journey of ‘building’ and scaling notamazon.com we have understood some key concepts of system design. We’ve seen vertical and horizontal scaling, how load balancer helps, transitioned to micro services architecture, covered batch processing, queue processing and event notification system,rate limited user requests, scaled database and addressed requests coming from all over the world using a CDN. These basics will give us the base we need to jump off to learn advanced topics in system design. There are more features to consider in system design like virtualization, containerization, container orchestration, etc but we will cover it in a separate article.

Reference

System Design for beginners by Piyush Garg