Understanding the Token Bucket Algorithm

Picture this: your API is running along smoothly, serving requests at a steady pace. Suddenly, clients starts bombarding it with thousands of calls per second. Within minutes, servers are overwhelmed, legitimate users are locked out, and your carefully architected system is on the edge of collapse.

Sounds dramatic? Sadly, it’s not uncommon. This is why rate limiting is a cornerstone of modern system design. And among all the techniques available, one stands out for being both intuitive and practical: the Token Bucket Algorithm.

I recently encountered an issue at my job where one of our old company systems was overwhelming a third-party service with requests, leading to bulk rejections. This caused many cases to fail during bulk asynchronous processing. To address this, I used a Token Bucket module internally to manage and regulate the flow of requests to the third-party service more cautiously.

Today, I want to share what I learned from this experience. I also plan to develop a reusable module based on this algorithm and document the complete process in a blog. For now, let's delve into the basics.

Why Rate Limiting?

In large-scale systems - think payment gateways on sale days, SaaS APIs serving thousands of clients, or streaming platforms handling spikes during big matches - traffic can be unpredictable.

Without limits, a single or a small bunch of abusive or malfunctioning client could starve everyone else. Rate limiting solves this by enforcing fairness and protecting systems from being overwhelmed.

The Token Bucket Intuition 🎟️

The Token Bucket algorithm is exactly what it sounds like:

You need tokens to perform any task (example: hit an API)
Imagine a bucket that holds these tokens. You cant have unlimited tokens. You can have only store as many as you can store in your bucket.
Tokens are added at a fixed rate (say, 100 per second).
The bucket has a capacity limit (say, 1000 tokens).
Every request “spends” a token.
If no tokens are available, the request waits or gets denied or waits until gets denied. (depends on your implementation)

That’s the whole trick. It’s simple, but it gives us a powerful balance:

You can control the average rate of requests (via refill rate).
You can allow short bursts of requests (via bucket capacity).

Clearly deciding on these two is crucial for any system. We can build algorithms and the factors based on which to decide this later. But for now, lets continue with this algorithm for now:

A Better Analogy: Toll Booth on a Highway 🚗

Think of a busy toll plaza on a highway:

Cars (requests) want to pass.
Each car needs a ticket (token) to go through.
The machine prints tickets steadily over time.
If traffic is light, unused tickets pile up in the machine (but only to a certain maximum amount).
Later, when a sudden rush of cars arrives, they can all pass quickly—as long as tickets are available.

But once the saved tickets run out, new arrivals must wait until the printer generates more.

That’s exactly how the Token Bucket handles both steady traffic and spikes gracefully.

How It Works (Step by Step)

Initialization
Start with a bucket of capacity b and a refill rate r tokens per second.

Token Refill
Every passing moment, new tokens are added:

 tokens_to_add = elapsed_time × refill_rate
 current_tokens = min(current_tokens + tokens_to_add, bucket_capacity)

Request Arrival
When a request comes in, check if there are enough tokens.
Consumption
- If yes → deduct tokens and process the request.
- If no → reject or queue the request.

Why It’s So Helpful

✅ Prevents overload: No client can exceed the allowed rate.
✅ Allows natural bursts: Saved tokens make short spikes possible.
✅ Simple to reason about: Just count tokens, no complex state.
✅ Predictable behavior: Average request rate never exceeds refill rate.

That’s why you’ll find Token Bucket everywhere—from API gateways to ISPs shaping bandwidth.

Modification and Uses for Real-World Scenarios

Tier based API Rate Limiting:

Free tier → 100 requests/hour
Premium → 1000 requests/hour
Enterprise → 10,000 requests/hour

Tier-based access can be easily implemented using this algorithm in several ways. One simple method is to provide each user with a unique key and a bucket. Map the user key to a number representing the maximum capacity, and use that maximum capacity as the bucket's configuration. This is just one idea that comes to mind. More advanced methods and algorithms can be developed based on specific use cases.
ISP Bandwidth Control:
In the context of Internet Service Providers (ISPs) managing bandwidth, a 100 Mbps plan could be designed to allow temporary bursts of speed up to 150 Mbps. This means that while a user typically has access to 100 Mbps, they can experience higher speeds for a brief period when needed. This is particularly useful during activities that require extra bandwidth, such as streaming high-definition videos or downloading large files. The burst capability is controlled by the token bucket algorithm, which ensures that these temporary increases in speed do not exceed the network's overall capacity. By allowing such bursts, ISPs can offer a more flexible and responsive service to their customers, enhancing user satisfaction without compromising the stability of the network. This approach balances the need for speed with the necessity of maintaining a fair distribution of resources among all users.
Microservices Communication:
In a micro-services architecture, it is important to manage the communication between different services efficiently. Critical services, which are essential for the core functionality of the application, are allocated larger buckets. This means they have more resources and bandwidth available to handle requests, ensuring they can operate smoothly even under high demand. These services might include user authentication, payment processing, or data retrieval systems, which are crucial for the application's performance and user experience.

On the other hand, non-critical services, which support secondary functions, are given smaller buckets. These services might include logging, analytics, or notification systems. While they are important, they do not require the same level of resources as critical services. By allocating smaller buckets to these services, we ensure that they do not consume excessive resources, allowing the critical services to maintain optimal performance.

This strategy of assigning different bucket sizes helps in prioritizing resources effectively, ensuring that the most important services have the capacity they need to function reliably, while still supporting less critical services in a balanced manner. This approach enhances the overall efficiency and reliability of the microservices ecosystem.

Wrapping Up

The Token Bucket algorithm is like a toll booth that keeps the traffic flowing smoothly. It enforces limits without being too rigid, allowing both fairness and flexibility.

In the real world, traffic is messy and unpredictable - but with Token Bucket, your systems can stay resilient, efficient, and user-friendly.

So next time you hear “rate limit exceeded,” picture that bucket of tokens running dry - it’s not about blocking you, it’s about keeping the whole system healthy.

I am also beginning to implement this algorithm. I want to consider all the edge cases during the process. My goal is to build a module that is as production-ready and robust as possible. Additionally, I have started writing a blog to document everything properly. Stay tuned! ☺️

The Token Bucket Algorithm: Keeping Traffic Flowing Without Chaos 🚦

Table of contents