Everything You Need to Know About Rate Limiting
In our interconnected world, websites and applications are accessed by countless users and systems at the same time. This heavy traffic can strain resources, causing slower service or, in extreme cases, complete service failure.
Rate limiting is a crucial technique for securing system resources and managing the flow of user requests. It lets us control how quickly user requests are processed by our server.
It is a way to control the amount of traffic a system, network, database, or application handles over time. It helps prevent network congestion, improves performance, and ensures systems are not overwhelmed.
For example, we might limit an unsubscribed user to 1,000 requests per month to a public API. If they go over this limit, we ignore their request and return an error.Here’s why Rate Limiting is Essential
Mitigates Bot Attacks: Rate limiting can help to prevent certain kinds of malicious bot activity, such as brute force attacks, DoS (Denial-Of-Service) and DDoS attacks, and other malicious activities by limiting the number of requests a user or IP address can make within a specific time period.
Prevent Overuse: It ensures efficient allocation and usage of resources by preventing any single user or component from consuming an excessive number of resources, such as bandwidth, CPU, or memory, which could lead to performance degradation or system failures.
Fair Use Enforcement: By restricting the number of requests, rate limiting ensures fair use of services and prevents any single user or bot from monopolising resources.
Compliance and Policy Enforcement: Rate limiting allows organisations to enforce usage policies and comply with service level agreements (SLAs). It helps in managing the distribution of resources according to predefined rules and agreements.
Prevent Operational Cost: In the case of auto-scaling resources on a pay-per-use model, rate limiting puts a virtual cap on the scaling of resources to help control operational costs. Without rate limiting, resources might scale out of proportion leading to exponential bills.
Different Level to implement Rate Limiting
Rate Limiting can be implemented at various level within a system, depending on the specific requirements and characteristics of the applications. Here are common levels at which rate limiting can be applied:
Application Level
Implementing rate limiting directly within the application code or middleware. This allows for fine-grained control over specific functionalities or features. For example, limiting the rate of API calls, login attempts, or specific user actions.
API Gateway Level
Rate limiting can be enforced at the API gateway, which serves as a centralised entry point for all incoming API requests. This is particularly relevant in micro services architectures, where a gateway can control the rate of requests to various services.
Web Server Level
Web servers can be configured to enforce rate limits. This is often effective for controlling the rate of HTTP requests before they reach the application layer. Web server modules or plugins can be used to implement this type of rate limiting.
Proxy Level
Proxies, whether they are reverse proxies or load balancers, can be configured to implement rate limiting. This approach is useful for controlling the rate of traffic entering a network or reaching specific backend servers.
Network Level
Network devices, such as firewalls or routers, can be configured to enforce rate limits on incoming or outgoing traffic. This approach is helpful for protecting against certain types of network-based attacks.
Cloud Service Level
Cloud service providers often offer built-in rate limiting features as part of their services. For example, cloud-based APIs, storage services, or serverless functions may have rate limiting options that can be configured.
Middleware Level
Middleware components, such as message brokers or queues, can incorporate rate limiting mechanisms. This is essential to control the rate at which messages or tasks are processed.
Database Level
Rate limiting can be applied to database queries to prevent excessive database access. This is crucial in scenarios where there’s a risk of database overload due to high query rates.
The choice of the implementation level depends on the specific use case, system architecture, and the nature of the resources or functionalities being protected. In many cases, a combination of these levels might be employed to create a robust and comprehensive rate limiting strategy.
Different Types of Rate Limiting
There are several rate limiting algorithms, including:
User Rate Limiting: This is one of the most common forms of the approach used for protecting resources. In this approach we are limiting the number of requests is allowed from the user to process.
Concurrency Rate Limiting: In this approach, we are implementing the restriction on the number of concurrent sessions are allowed by the user in the given timeframe. This approach helps to mitigate risk of DDoS attacks.
Location and ID Rate Limiting: In certain scenario we are running demographic campaign for user. We would like to prevent all the requests which are coming from outside preferred demography. This approach is used to prevent such requests and ensure system availability for valid request.
Server Rate Limiting: Rate limiting at the server level is a specialised approach. It’s typically applied in scenarios where certain servers are tasked with processing the majority of requests, particularly those servers dedicated to performing distinct, critical functions.
Common algorithm to Implementing Rate Limiting
There are several rate limiting algorithms, including:
Fixed Window Counter
Sliding Logs
Sliding Window Counter
Token Bucket
Leaky Bucket
1. Fixed Window Counter:
In the Fixed Window Counter rate-limiting algorithm, a fixed time window is established, and a counter is incremented for each request made within that window. If the counter exceeds the predefined limit, subsequent requests are rejected until the window resets.
Pros:
Simple and easy to implement.
Predictable and deterministic behaviour.
Minimal resource overhead.
Cons:
Vulnerable to burst requests at the beginning of each window.
Inefficient for long-duration windows if bursts are concentrated towards the end.
May lead to uneven distribution of requests if not carefully tuned.
Here’s a simple example using an array to keep track of request timestamps:
const requestLog = [];
const WINDOW_SIZE = 60 * 1000; // 1 minute
function isRateLimited() {
const now = Date.now();
requestLog.push(now);
// Remove requests older than the window
while (requestLog[0] < now - WINDOW_SIZE) {
requestLog.shift();
}
return requestLog.length > 100; // Limit to 100 requests per minute
}
// Usage example
if (isRateLimited()) {
console.log('Rate limit exceeded');
} else {
console.log('Request allowed');
}
2. Sliding Logs:
The sliding logs algorithm maintains a sliding window of request timestamps. If the number of requests within the window exceeds a threshold, further requests are denied.
Pros:
Precision in tracking request timestamps.
Effective for distributed systems with synchronised clocks.
Adaptable to varying traffic patterns.
Cons:
Increased storage requirements for maintaining logs.
Pruning can lead to data loss and may require careful tuning.
Sensitive to clock synchronisation issues in distributed environments.
Here’s an example:
class SlidingWindow {
constructor(capacity, timeUnit) {
this.capacity = capacity;
this.timeUnit = timeUnit;
this.queue = [];
}
checkRequest() {
const now = Date.now();
while (this.queue.length && this.queue[0].in_time > now - this.timeUnit) {
this.queue.shift();
}
if (this.queue.length >= this.capacity) {
return false;
} else {
this.queue.push({ in_time: now });
return true;
}
}
}
// Usage example
const limiter = new SlidingWindow(10, 10000); // 10 requests per 10 seconds
if (limiter.checkRequest()) {
console.log('Request allowed');
} else {
console.log('Rate limit exceeded');
}
3. Sliding Window Counter:
Similar to sliding logs, the sliding window counter algorithm keeps track of request counts within a moving time window.
Pros:
Handles burst requests more gracefully than Fixed Window Counter.
Provides a more continuous and balanced rate limiting approach.
Reduces the impact of sudden resets seen in Fixed Window Counter.
Cons:
Complexity increases compared to Fixed Window Counter.
May require additional mechanisms for window management.
Still vulnerable to burst requests within a sliding window.
Here’s an example using a circular buffer:
class SlidingWindowCounter {
constructor(capacity, timeUnit) {
this.capacity = capacity;
this.timeUnit = timeUnit;
this.buffer = new Array(capacity).fill(0);
this.index = 0;
}
checkRequest() {
const now = Date.now();
const oldestTimestamp = now - this.timeUnit;
if (this.buffer[this.index] >= oldestTimestamp) {
return false;
}
this.buffer[this.index] = now;
this.index = (this.index + 1) % this.capacity;
return true;
}
}
// Usage example
const limiter = new SlidingWindowCounter(10, 10000); // 10 requests per 10 seconds
if (limiter.checkRequest()) {
console.log('Request allowed');
} else {
console.log('Rate limit exceeded');
}
4. Token Bucket:
The token bucket algorithm maintains a bucket of tokens. Each request consumes a token, and the bucket refills over time.
Pros:
Smooth handling of bursts due to token accumulation.
Predictable and controllable rate of request processing.
Simplicity in implementation and understanding.
Cons:
Requires continuous maintenance of the token bucket.
Can lead to latency for requests when the bucket is empty.
May require careful tuning of token generation rate and bucket capacity.
Here’s a basic implementation:
class TokenBucket {
constructor(capacity, refillRate) {
this.tokens = capacity;
this.capacity = capacity;
this.refillRate = refillRate;
this.lastRefillTime = Date.now();
}
checkRequest() {
const now = Date.now();
const timePassed = now - this.lastRefillTime;
this.tokens = Math.min(this.capacity, this.tokens + (timePassed * this.refillRate));
this.lastRefillTime = now;
if (this.tokens >= 1) {
this.tokens--;
return true;
} else {
return false;
}
}
}
// Usage example
const limiter = new TokenBucket(100, 0.1); // 100 tokens, refill 0.1 tokens per millisecond
if (limiter.checkRequest()) {
console.log('Request allowed');
} else {
console.log('Rate limit exceeded');
}
5. Leaky Bucket:
The leaky bucket algorithm maintains a bucket with a fixed capacity. Requests are processed at a constant rate, and excess requests overflow.
Pros:
Smooth handling of bursts due to leaky token release.
Simplicity in implementation and understanding.
Predictable and controlled rate of request processing.
Cons:
Continuous maintenance of the leaky bucket.
Potential for burst requests to accumulate during idle periods.
May require careful tuning of leak rate and bucket capacity.
Here’s a simple example:
class LeakyBucket {
constructor(capacity, leakRate) {
this.bucket = 0;
this.capacity = capacity;
this.leakRate = leakRate;
}
processRequest() {
this.bucket = Math.max(0, this.bucket - this.leakRate);
if (this.bucket < this.capacity) {
this.bucket++;
return true;
} else {
return false;
}
}
}
// Usage example
const limiter = new Le
In summary, rate limiting is a fundamental aspect of system security, resource management, and ensuring a reliable and fair user experience. It plays a critical role in safeguarding against abuse, optimizing resource utilization, and maintaining the overall stability of applications and services.
I hope you liked the blog and able to understand the concepts of Rate Limiting.
In my next article we will see how to implement application-level implementation in ExpressJS Backend API.
Thank you. Appreciate your feedback and comments.
Subscribe to my newsletter
Read articles from Vishad Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by