Mastering Rate Limiting


What is Rate Limiting?
Rate limiting is a technique used to control the number of requests a client can make to a server within a specific time period. Think of it as a “speed limit” for APIs and web services — it prevents excessive or abusive use that could degrade performance or crash systems.
For example, an API might allow 100 requests per minute per user. If a client exceeds that threshold, the server can delay, reject, or throttle further requests until the time window resets.
Rate limiting is essential for:
Preventing abuse such as DDoS attacks or brute-force login attempts.
Maintaining fair usage across all clients.
Protecting infrastructure from sudden traffic spikes.
Without rate limiting, even a single misbehaving client could overwhelm a system and disrupt service for everyone else. It’s a foundational safeguard in modern backend and API design.
Types of Rate Limiting Algorithms
Rate limiting can be implemented using different algorithms, each with its own strengths, weaknesses, and best-fit scenarios. Here are the most common ones:
Fixed Window Counter
Requests are counted in a fixed time window (e.g., 1 minute).
Once the limit is reached, all additional requests in that window are blocked until the next window starts.
Pros: Simple to implement.
Cons: Can cause traffic bursts at window boundaries.
Sliding Window Log
Maintains a timestamp log of each request and removes timestamps older than the defined time window.
Checks if the number of requests in the log exceeds the limit.
Pros: More accurate and smoother than a fixed window.
Cons: Higher memory usage for storing timestamps.
Sliding Window Counter
A hybrid approach that combines fixed windows with partial counting from the previous window.
Reduces burst issues while keeping memory use low.
Pros: Efficient and balanced.
Cons: Slightly more complex to implement.
Token Bucket
Tokens are added to a “bucket” at a fixed rate. Each request consumes one token; if no tokens are available, the request is denied or delayed.
Pros: Allows short bursts while enforcing an average rate.
Cons: Needs careful configuration of bucket size and refill rate.
Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate, “leaking” out steadily.
Excess requests are dropped if the bucket overflows.
Pros: Smooth, predictable request flow.
Cons: Can delay burst traffic even if the system can handle it.
Token Bucket Algorithm
The Token Bucket algorithm is one of the most widely used rate limiting strategies because it offers a good balance between controlling request rates and allowing short bursts of traffic.
Here’s how it works:
Imagine a bucket that can hold a fixed number of tokens.
Tokens are added to the bucket at a constant rate (e.g., 5 tokens per second).
Each incoming request “consumes” one token.
If the bucket has tokens available, the request is processed.
If the bucket is empty, the request is either delayed until tokens are available again or rejected outright.
Key advantages:
Supports bursts: If the bucket is full, a client can make multiple requests quickly until tokens run out.
Smooth long-term control: The token refill rate ensures an average request limit over time.
Flexibility: You can control burst capacity with bucket size and steady rate with token refill speed.
Leaky Bucket Algorithm
The Leaky Bucket algorithm is another popular rate limiting technique, but unlike the Token Bucket, it focuses on making request flow steady and predictable by smoothing out bursts.
Here’s how it works:
Imagine a bucket with a small hole at the bottom.
Incoming requests are added to the bucket like drops of water.
Water leaks out at a constant rate, representing the fixed rate at which requests are processed.
If requests arrive faster than the leak rate and the bucket becomes full, excess requests are discarded (or queued in some implementations).
Key characteristics:
Fixed output rate: Regardless of traffic spikes, requests are processed at the same steady pace.
No burst allowance: Even if the system is idle, it won’t process faster than the leak rate.
Simple and predictable: Great for systems that need a constant processing speed.
Example:
If the leak rate is set to 5 requests per second and the bucket can hold 20 requests, the system will always process exactly 5 requests each second. If more than 20 arrive quickly, the extras are dropped.
Use cases:
Network traffic shaping.
Systems where stability and predictability are more important than handling bursts.
Fixed Window Counting Algorithm
The Fixed Window Counting algorithm is one of the simplest ways to implement rate limiting. It divides time into fixed intervals (windows) — such as seconds, minutes, or hours — and counts how many requests a client makes during each window.
Here’s how it works:
A counter is initialized for each client at the start of the time window.
Each incoming request increases the counter.
If the counter exceeds the allowed limit within that window, further requests are rejected until the next window begins.
When the window resets, the counter is cleared, and the process starts over.
Example:
If the limit is 100 requests per minute, the system counts requests from 12:00:00 to 12:00:59. If a user hits 100 requests at 12:00:40, all additional requests until 12:01:00 are blocked. At 12:01:00, the counter resets to zero.
Pros:
Very easy to implement.
Low memory usage.
Cons:
Burst problem: Users can make the maximum allowed requests at the end of one window and immediately at the start of the next, effectively doubling the short-term rate.
Less smooth traffic control compared to sliding window or token bucket.
Sliding Window Log Algorithm
The Sliding Window Log algorithm improves accuracy over fixed window counting by tracking exact request timestamps rather than relying on rigid time boundaries. This approach eliminates the burst problem seen in fixed windows.
Here’s how it works:
For each client, the system stores the timestamp of every incoming request in a log (usually in memory or a fast data store).
When a new request arrives:
Remove all timestamps older than the defined time window (e.g., last 60 seconds).
Check how many timestamps remain in the log.
If the count is below the limit, accept the request and add its timestamp to the log.
If the count exceeds the limit, reject the request.
Example:
If the limit is 100 requests per minute, the log only contains timestamps from the past 60 seconds. Even if a client sends 100 requests in the last second of a previous minute, those requests will quickly “expire” from the log, preventing an immediate burst in the next minute.
Pros:
Eliminates the boundary burst issue of fixed windows.
Very accurate request tracking.
Cons:
Requires storing timestamps for every request (higher memory usage).
Slightly more computationally expensive than counter-based methods.
Sliding Window Counting Algorithm
The Sliding Window Counting algorithm is a hybrid approach that combines the efficiency of fixed window counters with the fairness of sliding windows. Instead of storing every request timestamp like in the Sliding Window Log, it estimates usage by blending the counts from the current and previous time windows.
Here’s how it works:
Maintain a counter for the current fixed time window (e.g., this minute) and for the previous window.
When a request arrives:
Calculate how far into the current window you are (as a fraction).
Estimate the total request rate by taking the current window’s count plus a weighted fraction of the previous window’s count.
If this estimated rate exceeds the limit, reject the request; otherwise, accept it and increment the current counter.
Example:
If the limit is 100 requests per minute, and:
Previous window had 80 requests.
Current window (30 seconds in) has 40 requests.
Estimated usage =40 + (0.5 × 80) = 80 requests
.
If this is below the limit, the request is allowed.
Pros:
Smooths out bursts at window boundaries.
More memory-efficient than storing full logs.
Cons:
Less precise than the Sliding Window Log method.
Slightly more complex than simple fixed window counting.
Subscribe to my newsletter
Read articles from somil directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

somil
somil
I am a full-stack web developer, learning ai, web3, automation