What Inspired the Project

Today, I was hit with a burst of inspiration. I was studying for my AWS Security Specialty exam, and I started working with ELBs to understand which ELB is best for specific use cases. While doing this, I found myself wondering how exactly Elastic Load Balancers work. I had known from previous AWS courses that they scale traffic across multiple different surfaces to ensure no one server receives too much traffic. But I wanted to understand the underlying logic that goes into building a load balancer.

A professor of mine once told me, ‘If you want to know whether or not you truly understand something, try to code it. If you can code it flawlessly, you truly understand it.’ Now that’s exactly what I set out to do. This project is called a: Real-Time Distributed Load Balancer Simulation.

Simulating Servers

Now, the reason it’s a simulation is that creating multiple different servers to test whether my system correctly scales/descales them would be very expensive. So instead, I created an object called Server that can handle a specific number of requests (between 2-10 requests) at once and has a maximum capacity. Now, I also used Spring Boot to build a LoadBalancerController, which serves as a ‘point of arrival’ for all requests and then handles the logic of what to do with each request depending on the status of the servers currently deployed.

Request Handling Logic

When my LoadBalancerController receives a request, it does the following:

Incoming Request Logic:

1. Filter servers not at full capacity.
  1. Route the request to the first available server.
  2. Inside the server:
    - Increment the load counter.
    - Launch a thread to simulate processing (processing time took 0.5- 5s).
    - Once done, decrement the counter.

Autoscaling and Health Checks

I’m sure you’re asking yourself, How does the controller handle a request if all servers are at capacity? What it would do here is create a new server and send the request to that server. Now that doesn’t seem to be efficient in a real-world scenario; creating servers constantly is expensive and would cost a lot of money. This is why I implemented a scheduled method that runs every 10 minutes, checking for any servers that are currently not working on any requests. If the load balancer finds a server that isn’t currently working, it shuts it down.

Another aspect I introduced here was Health. Servers don’t always run perfectly. Sometimes servers face issues like network timeouts, disk space is full, or even memory leaks. Now, since I was simulating servers and not using real servers, I couldn’t really handle these types of issues outright. Instead, I introduced a 10% chance that a server would be ‘unhealthy,’ which would result in the load balancer not sending any requests to it, and shutting it down completely.

Results

To evaluate the scalability, availability, and resilience of the system, I developed a Python-based load testing script that sent approximately 1,000 HTTP requests over the span of an hour. During peak load, individual servers handled up to 38 concurrent requests. The system maintained a 90% request processing success rate, and the load balancer achieved 100% accuracy in scaling and descaling, seamlessly spinning up new servers when capacity was reached and shutting down idle or unhealthy ones.

What I Learned

Now that’s my Real-Time Distributed Load Balancer Simulation. I was really proud of this project because it was actually the first time I introduced and used threads in a system I built. It was also a system I built that simulates something I learnt about from Amazon, which, as I’ve mentioned before, is the company I dream of working at one day.

Real-Time Distributed Load Balancer Simulation