Load balancing, as the name suggests, helps distribute incoming traffic (load) across multiple resources. It ensures high availability and reliability by sending requests only to resources that are ready to handle them. This allows flexibility to add or remove resources based on demand.

Why its needed?

Today, high-traffic websites need to handle hundreds of thousands of concurrent requests from users or clients. To manage these requests, we must scale the servers by adding more servers to handle the high volume of traffic.

A load balancer can sit in front of the servers and distribute client requests across all servers capable of fulfilling them. This maximizes speed, utilizes capacity, and ensures no server is overloaded. If one server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added, the load balancer automatically starts sending requests to it.

Workload Distribution

This is the core function provided by the load balancer and includes three different types:

Host-based: Distributes requests based on the requested hostname.
Path-based: Distributes requests based on the entire URL endpoint.
Content-based: Distributes requests based on the message content, such as the value of a request parameter.

Operation Layers

Load balancers can operate at one of two layers:

Network Layer

This load balancer operates at the transport layer. It routes based on network information, like IP addresses, and cannot perform content-based routing. These are primarily dedicated hardware devices that work at very high speeds.

Application Layer

This load balancer operates at the application layer. It can read requests completely and perform content-based routing. This enables load management based on a full understanding of the request.

Types

There are mainly two types of load balancers, Software Load Balancer and Hardware Load Balancer. Software Load Balancers can also be divided into additional categories like Self-Hosted, Cloud-Based, DNS-Based.

Feature	Software Load Balancer	Hardware Load Balancer
Definition	Load balancing is done using software installed on the server.	A dedicated physical device that manages traffic distribution.
Deployment	Runs on general-purpose hardware or virtual machines.	Requires dedicated proprietary hardware.
Performance	Depends on the hardware and OS perfomance.	High peformance with specialized processors and optimized firmware.
Scalability	Easily scalable by adding more instances.	Requires purchasing additional hardware.
Cost	Lower cost (server cost and software license).	Expensive due to hardware and support costs.
Flexibility	Highly configurable and customizable.	Limited customization due to vendor restrictions.
Management	Can be managed remotely and automated.	Requires manual setup and vender specific management.
Redundancy	Achieved using multiple instances and clustering.	Built-in redundancy with specialized failover mechanisms.
Example	Nginx, AWS ELB, Azure LB	Citrix ADC, Radware, F5 BIG-IP

Types of Software Load Balancers

Self-Hosted

These are the software applications that we can install and manage on our own servers.

Deployment: Linux/Windows servers, Virtual Machines(VMs)
Customization: Highly configurable, open-source options available
Cost: Free or low-cost (infrastructure required)
Scalability: Scales with additional instances but requires manual setup
Examples: Nginx, HAProxy, Traefik
Best for: Full control over configurations, running in on-premises or hybrid environments, need cost effective solution without cloud dependency.

Cloud-Based (Managed)

These are provided by cloud providers as fully managed services, eliminating the need for manual installation and maintenance.

Deployment: Cloud provider infrastructure like AWS, Azure, Google Cloud
Customization: Limited to provider’s features
Cost: Pay-as-you-go pricing
Scalability: Automatic scaling based on traffic load.
Examples: AWS Elastic Load Balancer, Azure Load Balancer, Google Cloud Load Balancer
Best for: Hassle free managed solution, deploying in a cloud-native architecture, need auto-scaling on demand.

DNS-Based

DNS-based load balancers distribute traffic by resolving a domain name to mulitple IP addresses.

Deployment: Cloud or dedicated DNS providers
Customization: Uses DNS policies & geo-routing
Cost: Varies based on queries & records
Scalability: Highly scalable but slower response compared to Layer 7 load balancers.
Example: AWS Route 53, Cloudflare DNS
Best for: Handling global traffic distribution across multiple data centers, need geo-based routing for low latency.

Routing Algorithms

Load Balancer uses different routing (or scheduling) algorithms to distribute traffic among servers efficiently. These algorithms cn be categorized based on static and dynamic methods.

Static Load Balancing Algorithms

Traffic distribution is predefined and does not consider real-time server load. Best for predictable and consistent workloads.

Algoritms	Description	Pros	Cons
Round Robin	Assings request cyclically to servers (Server 1 → Server 2 → Server 3 → Server 1 …).	Simple, evenly distributed workloads.	Does not consider server load, so an overload server can get requests.
Weighted Round Robin	Assigns a higher weight to more powerful server (e.g., a powerful server gets 2x more requests).	Balances resouces based on capability.	Needs manual weight adjustment.
IP Hashing	Uses a hash function on the client’s IP to assign them to a specific server.	Ensures session persistence (sticky sessions).	If a server fails, affected users may experience disruptions.
URL Hash	Uses a hash of the requested URL to direct requests to the same backend server.	Cache optimization and CDN.	Same as IP Hashing.

Dynamic Load Balancing Algorithms

Traffic Distribution is based on real-time server performance (CPU load, active connections, response time). Best for highly dynamic workloads.

Algorithms	Description	Pros	Cons
Least Connections	Routes traffic to the server with the fewest active connections	Prevents overloading servers with high traffic.	Does not consider server capacity.
Least Response Time	Directs traffic to the server that has the fastest response time.	Optimizes for low latency.	Response time can fluctuate, which may cause instability.
Least Bandwidth	Sends traffic to the server with least amount of active data transfer.	Handles data intensive apps.	Does not consider if a request require high CPU and might overload weaker servers.
Least Load / Least CPU	Routes traffic to the server with the lowest CPU or memory usage.	Handles compute heavy traffics.	Requires real-time monitoring of CPU, memory, disk I/O which increase load on load balancer itself.
Adaptive Load Balancing	Uses AI/ML to analyze server health and dynamically adjust the distribution.	Intelligently handles traffic decisions, best for cloud native and scalable applications.	More complex to implement.

Which Algorithm to use?

For evenly distributed tarffic - Round Robin
For mixed-performance servers - Weighted Round Robin
For session persistence - IP Hash
For long-lived connections (DB, WebSockets) - Least Connections
For fast response times - Least Response Time
For modern cloud apps - Adaptive Load Balancing

Features in Load Balancers

Autoscaling: Automatically starting and stopping resources based on demand.
Sticky Session: Keeping the same user or device connected to the same resource to maintain session state.
Health Checks: Checking if a resource is down or underperforming to remove it from the load balancing pool.
Persistent Connections: Allowing a server to maintain a continuous connection with a client, such as through WebSocket.
Encryption: Managing encrypted connections like TLS and SSL.
Certificates: Presenting certificates to a client and verifying client certificates.
Compression: Compressing responses to reduce bandwidth usage.
Caching: Storing responses at the application layer.
Logging: Recording request and response metadata, which is useful for audits or analytics.
Request Tracing: Assigning a unique ID to each request for logging, monitoring, and troubleshooting.
Redirects: Redirecting incoming requests based on factors like the requested path.

Implementation Strategy

For additional scalability and redundancy, we can load balance at each layer of our system.
As load balancer itself can be a single point of failure, we can add a second or N numbers of load balancers in a cluster. If active load balancer fails then passive load balancer takes over, which will make our system more fault-tolerant.

Summary

Load balancing is essential for distributing traffic across multiple resources to ensure high availability and reliability. It manages requests by redirecting them to ready servers, allowing for scalability and efficiency. Key functions include workload distribution based on host, path, or content, and operation at either the network or application layer. Load balancers can be hardware or software-based, with software options further categorized into self-hosted, cloud-based, or DNS-based. Various static (e.g., round robin) and dynamic (e.g., least connections) routing algorithms optimize traffic distribution. Essential features include autoscaling, health checks, encryption, and caching.

What is Load Balancing and Why It's Important

Table of contents