Scaling Systems: Vertical vs Horizontal + Load Balancing

In the last blog, we explored how your browser connects to servers and what happens behind the scenes. But what if your app suddenly goes viral and thousands of users start hitting your server at once?
This is where scaling and load balancing come into play.
In this blog, I’ll break down how modern systems handle growing traffic, avoid crashes, and stay reliable — using concepts like vertical scaling, horizontal scaling, and load balancers.

  1. What Happens When Your Server Gets Too Many Requests?

When too many users try to access your server at the same time, it can get overloaded — just like a small shop flooded with too many customers.

Here’s what can happen:

  • High latency: Responses become slow or delayed.

  • Request timeouts: Some users might not get any response.

  • Crashes: If the server runs out of CPU, memory, or bandwidth, it may go down entirely.

  • Bad user experience: Users see errors or loading spinners instead of your app.

That’s why scaling your system — either vertically or horizontally — is critical once your user base or traffic starts to grow.

  1. What is Scaling?

    Scaling is the process of upgrading your system to handle more traffic, users, or data.

    Scaling is the process of making the system capable to support growth or to manage the increasing demands on the server .

Need To Scale Systems :

  1. Growth Management : When a system is scalable , it can handle more users , more traffic simultaneously without affecting the speed or reliability .

  2. Increasing Performance : By dividing the load into multiple sub-systems , the system can increase the overall performance of the system .

  3. Cost-effectiveness : Scalable systems are able to adjust their resources to meet demand variations by adding or subtracting resources as needed. This flexibility helps avoid oversupply and leads to significant cost savings by using just the resources that are required.

Types of Scalability :

  • Vertical Scaling (Scale - Up)

  • Horizontal Scaling (Scale - Down)

  1. Vertical Scaling (Scale Up) :

Vertical Scaling means increasing the computational capacity of the hardware or software within the system . It basically means increasing power of a single server .

This can be done by :

  • By increasing the RAM

  • Adding more Power(CPU)

Example : Suppose you have a system with a fixed RAM of 4GB . Suddenly , your websites goes viral and traffic on your website increases rapidly ultimately your websites performance . So , to prevent this you can increase the specifications of your server to 8GB or 16GB depending upon your requirements

Advantages Of Vertical Scaling :

  1. Increased Capacity : It can help the system to increase the performance and capacity to handle more requests by installing more RAM , or faster CPU .

  2. Simple to implement : No need of extra tools or software while installing the RAM or faster CPU i.e easy to manage .

  3. No code changes required : Also , vertical scaling does not require any changes / updates in the code of the system .

Disadvantages Of Vertical Scaling :

  1. Limited Scalability : There is always a limit to physical devices / hardware . So when the threshold will be reached how will the system will be made scalable .

  2. Increased Costs : To scale the system through vertical way , you need to spend a lot on physical hardware and resources leading the cost of scaling the system .

  3. Single Point Of Failure : If by any chance , the server is impacted by requests it can get affected while will lead to completely disrupt the whole system .

  4. Horizontal Scaling (Scale - Down) :

Horizontal Scaling means increasing the performance of the system by adding more machines / servers to sub-divide the load on the single server .

In this approach , there is no need to change the capacity of the server or replace the server with a bigger one .

Also , unlike vertical scaling there is no downtime involved .

Example : In the real world , horizontal scaling can be easily represented by a Series of Restaurants where multiple chains of restaurants are being opened at different locations unlike making the single restaurant bigger and bigger .

Advantages of Horizontal Scaling :

  • Fault-tolerant : There is no downtime involved as in the case of vertical scaling , because there is no need to change the capacity of the single server rather adding multiple servers.

  • Can scale massively : Horizontal Scaling systems can be scaled massively as there is no threshold for the specifications of the resources involved .

  • Enhanced Performance : With multiple servers , there will be less load on a single server and it will be sub-divided among the servers .

Disadvantages of Horizontal Scaling :

  • Increased Complexity : Adding multiple servers sounds easier , but not for the people managing it . As, the complexity increases because the team needs to now maintain multiple servers .

Needs coordination/load balancing : A proper and an effective coordination is required between the servers . Also , there is a need of load balancer which should divide the requests and send to each server accordingly .

ParameterHorizontal ScalingVertical Scaling
DefinitionAdding multiple servers to distribute load.Enhancing resources of individual server
FlexibilityHighly FlexibleLimited flexibility
Fault ToleranceEnhances fault tolerance by distributing workloadLimited fault tolerance as it relies on a single unit
ComplexityComplexity increases as server increase in numberEasy to manage as less components involved
Load BalancingRequires load balancingLoad Balancing is less required
  1. What is a Load Balancer?

In simple words , it acts as a traffic manager that distributes client requests to multiple servers.

In technical terms , a load balancer is a networking device or software application that distributes and balances the incoming traffic among the servers to provide high availability, efficient utilization of servers, and high performance.

→Benefits Of Load Balancer :

  • Traffic Distribution

  • Scalability

  • Optimization

  • High Availability

  • Helps Horizontal Scaling

  1. Round-Robin Algorithm :

    → It is the simplest static load balancing algorithm .

    → The Round Robin algorithm is a simple static load balancing approach in which requests are distributed across the servers in a sequential or rotational manner.

For example : Suppose you have a group of friends , and you need to give the candies . Here , you will unknowingly use the Round - Robin approach .

  1. Elastic Load Balancer (ELB in AWS) :

    When managing traffic manually becomes too complex, cloud providers like AWS offer managed services to handle it for you. One of the most popular tools is the Elastic Load Balancer (ELB).

Elastic Load Balancer is a service provided by AWS that automatically distributes incoming application traffic across multiple targets — like EC2 instances, containers, or even IP addresses.

Why is it called “Elastic”?

Because it automatically scales based on the incoming traffic. Whether you're serving 100 users or 10,000, ELB can adapt without requiring manual intervention

What ELB Does :

  • Routes traffic intelligently to healthy instances

  • Performs health checks on servers

  • Automatically scales up or down based on traffic

  • Supports failover — rerouting traffic if a server fails

Real-World Analogy:

Think of ELB like a smart traffic cop who:

  • Watches how many cars (users) are coming in,

  • Directs them to open lanes (servers),

  • Closes off lanes (servers) that are broken,

  • Opens more lanes when traffic builds up.

Key Takeaways :

  • Scaling helps systems handle growing traffic without breaking.

  • Vertical Scaling = Upgrading a single server (more RAM, CPU).

    • Easy to set up, but limited and prone to single points of failure.
  • Horizontal Scaling = Adding more servers and distributing load.

    • More fault-tolerant and scalable, but adds complexity.
  • Load Balancers are essential when scaling horizontally — they ensure smooth traffic distribution and help maintain system reliability.

  • ELB (Elastic Load Balancer) on cloud platforms like AWS simplifies load balancing and auto-scaling.

What’s Next?

In the next blog, we’ll explore how systems deal with unwanted traffic surges, like bot attacks or sudden traffic spikes — using strategies like Rate Limiting, Token Buckets, and protection against DDoS attacks.

0
Subscribe to my newsletter

Read articles from developer_nikhil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

developer_nikhil
developer_nikhil