Protecting Your System: Rate Limiting & DDoS Defense

Introduction :

In the last blog, we discussed how to scale your systems using vertical and horizontal strategies, and how load balancers help distribute traffic.

But what if the traffic is not legitimate?

Imagine your system gets flooded with thousands of fake requests every second — not from users, but from bots or attackers trying to crash your servers. This is where system protection mechanisms come into play.

In this blog, we’ll explore:

How DoS and DDoS attacks affect your backend
What Rate Limiting is and how it prevents abuse
The two most common rate-limiting strategies: Leaky Bucket and Token Bucket
How platforms like AWS help defend against these threats

Let’s break it all down.

What is a DoS/DDoS Attack?

When your system is overwhelmed with too many incoming requests — especially from non-legit sources — it can slow down or crash. This kind of attack is called a Denial of Service (DoS).

DoS (Denial of Service) :

A DoS attack involves a single machine sending a large number of requests to your server — faster than it can handle.

Goal: Exhaust resources like CPU, memory, or bandwidth
Result: Real users can't access your service.

DDoS (Distributed Denial of Service) :

DDoS is a distributed version of the same idea — but here, thousands (or even millions) of machines are used to attack the system at the same time.

These machines are often compromised devices (part of a botnet)
Much harder to block because requests come from many IPs.

Why It’s Dangerous:

Increases latency
Causes downtime
Blocks access for real users
Can lead to reputation loss and revenue drop

In short: DoS/DDoS attacks are like a traffic jam on your server’s highway — they block real drivers (users) from reaching their destination.

What is Rate Limiting?

Rate Limiting is a technique used to control how many requests a client can make to a server within a specific time frame.

It acts like a gatekeeper — ensuring that no single user (or attacker) can overwhelm your system by sending too many requests too quickly.

Why Use Rate Limiting?

🔐 Security: Prevent brute-force attacks (e.g., repeated login attempts)
💡 Fairness: Ensure all users get a fair share of system resources
🧹 Stability: Avoid server crashes caused by traffic spikes or abuse
🛡️ DDoS Defense: Throttle suspicious or abnormal traffic patterns

Common Use Cases:

Limiting API requests per user/IP
Restricting login attempts
Preventing spam form submissions or scraping bots
Enforcing usage tiers (free vs. premium users)

Rate Limiting Strategies

There are multiple ways to implement rate limiting behind the scenes. The two most commonly used strategies are:

Leaky Bucket Algorithm
Token Bucket Algorithm

Let’s explore both with simple explanations and real-world analogies.

a. Leaky Bucket Algorithm

Concept:
Imagine pouring water into a bucket with a small hole at the bottom — the water leaks out at a fixed rate, no matter how fast you pour it in.

Incoming requests go into a queue (the “bucket”).
Requests are processed at a steady rate.
If requests come in too fast and the bucket overflows → extra requests are dropped.

Good For:

Smoothing traffic (steady flow)
Avoiding sudden load spikes

Limitation:

Doesn’t allow burst traffic (even short, legit bursts get dropped)

b. Token Bucket Algorithm

Concept:
Imagine a bucket being filled with tokens at a fixed rate.

Each request requires a token to be processed.
If tokens are available → request proceeds.
If no tokens → request is throttled or delayed.

Good For:

Handling bursty traffic gracefully
Offering flexibility in how many requests are allowed in short bursts

Limitation:

Slightly more complex to implement

Summary:

Strategy	Handles Bursts	Processing Rate	Ideal Use Case
Leaky Bucket	No	Fixed	Smooth, steady traffic control
Token Bucket	Yes	Flexible	Bursty or inconsistent traffic

Rate Limiting in Real Systems

Now that we understand how Leaky Bucket and Token Bucket work, let’s look at how rate limiting is actually applied in real-world systems.

1. Per-IP Rate Limiting

Each client (IP address) is allowed only a certain number of requests per minute/second.
Common in public APIs to prevent abuse from a single IP.

2. Per-User Rate Limiting

Limits based on user identity (e.g., user ID or API key).
Useful for tiered services:
- Free users → 60 requests/min
- Pro users → 600 requests/min

3. Route-Specific Limiting

Different endpoints have different limits.
Examples:
- /login: 5 attempts/minute (to prevent brute force)
- /feed: 100 requests/minute (to avoid scraping)

4. Global Rate Limits

Protect overall infrastructure.
Useful in edge services like CDNs or reverse proxies where traffic control is needed globally.

5. Queue-based Throttling

Some systems queue excess requests instead of dropping them.
Useful in backend workers or API gateways.

Rate limiting can be custom-coded, or configured directly using cloud services — which we’ll explore in the next section.

How AWS Helps with Rate Limiting & DDoS Protection

Cloud platforms like AWS provide built-in tools that help you defend your systems against traffic abuse, bot attacks, and DDoS incidents — without writing everything from scratch.

1. AWS API Gateway

Built-in rate limiting per API key, user, or method.
Easily configure:
- Requests per second (RPS)
- Burst limits (max short-term traffic)
Example:
Limit /login to 5 reqs/sec/user
Limit /public-feed to 50 reqs/sec/IP

2. AWS WAF (Web Application Firewall)

Blocks malicious traffic at the edge before it reaches your app.
Helps detect and block:
- SQL injections
- Cross-site scripting (XSS)
- DDoS patterns
You can define custom rules or use AWS managed rule groups.

3. Amazon CloudFront (CDN)

Rate limiting and geographic filtering at the edge locations.
Protects origin servers by serving cached responses.
Integrates with WAF for filtering and blocking at global scale.

Bonus:

AWS Shield (advanced DDoS protection, included with most services)
AWS GuardDuty (threat detection engine for logging + alerts)

Together, these services offer a layered defense strategy — letting you secure your system with minimal custom setup.

Key Takeaways

DoS/DDoS attacks aim to flood your system with fake traffic, making it unusable for real users.
Rate Limiting is a key technique to restrict how often users can access your system, protecting against abuse and overload.
Two common strategies:
- Leaky Bucket: Smooth, steady traffic control.
- Token Bucket: Flexible and burst-friendly.
In real-world systems, rate limiting is applied per-IP, per-user, or per-route.
AWS tools like API Gateway, WAF, and CloudFront provide built-in support for rate limiting and DDoS mitigation.

What’s Next?

In the next blog, we’ll dive into how to speed up your applications and reduce server load using caching strategies — with tools like Redis and CDNs (Content Delivery Networks).

You’ll learn how to store frequently accessed data closer to the user, reduce latency, and make your system more efficient.

Protecting Your System: Rate Limiting & DDoS Defense

Table of contents

Introduction :

What is a DoS/DDoS Attack?

DoS (Denial of Service) :

DDoS (Distributed Denial of Service) :

Why It’s Dangerous:

What is Rate Limiting?

Why Use Rate Limiting?

Common Use Cases:

Rate Limiting Strategies

a. Leaky Bucket Algorithm

Good For:

Limitation:

b. Token Bucket Algorithm

Good For:

Limitation:

Summary:

Rate Limiting in Real Systems

1. Per-IP Rate Limiting

2. Per-User Rate Limiting

3. Route-Specific Limiting

4. Global Rate Limits

5. Queue-based Throttling

How AWS Helps with Rate Limiting & DDoS Protection

1. AWS API Gateway

2. AWS WAF (Web Application Firewall)

3. Amazon CloudFront (CDN)

Bonus:

Key Takeaways

What’s Next?

Subscribe to my newsletter

developer_nikhil

developer_nikhil