Protecting Your System: Rate Limiting & DDoS Defense

Introduction :

In the last blog, we discussed how to scale your systems using vertical and horizontal strategies, and how load balancers help distribute traffic.

But what if the traffic is not legitimate?

Imagine your system gets flooded with thousands of fake requests every second — not from users, but from bots or attackers trying to crash your servers. This is where system protection mechanisms come into play.

In this blog, we’ll explore:

  • How DoS and DDoS attacks affect your backend

  • What Rate Limiting is and how it prevents abuse

  • The two most common rate-limiting strategies: Leaky Bucket and Token Bucket

  • How platforms like AWS help defend against these threats

Let’s break it all down.

What is a DoS/DDoS Attack?

When your system is overwhelmed with too many incoming requests — especially from non-legit sources — it can slow down or crash. This kind of attack is called a Denial of Service (DoS).

DoS (Denial of Service) :

A DoS attack involves a single machine sending a large number of requests to your server — faster than it can handle.

  • Goal: Exhaust resources like CPU, memory, or bandwidth

  • Result: Real users can't access your service.

DDoS (Distributed Denial of Service) :

DDoS is a distributed version of the same idea — but here, thousands (or even millions) of machines are used to attack the system at the same time.

  • These machines are often compromised devices (part of a botnet)

  • Much harder to block because requests come from many IPs.

Why It’s Dangerous:

  • Increases latency

  • Causes downtime

  • Blocks access for real users

  • Can lead to reputation loss and revenue drop

In short: DoS/DDoS attacks are like a traffic jam on your server’s highway — they block real drivers (users) from reaching their destination.

What is Rate Limiting?

Rate Limiting is a technique used to control how many requests a client can make to a server within a specific time frame.

It acts like a gatekeeper — ensuring that no single user (or attacker) can overwhelm your system by sending too many requests too quickly.


Why Use Rate Limiting?

  • 🔐 Security: Prevent brute-force attacks (e.g., repeated login attempts)

  • 💡 Fairness: Ensure all users get a fair share of system resources

  • 🧹 Stability: Avoid server crashes caused by traffic spikes or abuse

  • 🛡️ DDoS Defense: Throttle suspicious or abnormal traffic patterns


Common Use Cases:

  • Limiting API requests per user/IP

  • Restricting login attempts

  • Preventing spam form submissions or scraping bots

  • Enforcing usage tiers (free vs. premium users)

Rate Limiting Strategies

There are multiple ways to implement rate limiting behind the scenes. The two most commonly used strategies are:

  • Leaky Bucket Algorithm

  • Token Bucket Algorithm

Let’s explore both with simple explanations and real-world analogies.


a. Leaky Bucket Algorithm

Concept:
Imagine pouring water into a bucket with a small hole at the bottom — the water leaks out at a fixed rate, no matter how fast you pour it in.

  • Incoming requests go into a queue (the “bucket”).

  • Requests are processed at a steady rate.

  • If requests come in too fast and the bucket overflows → extra requests are dropped.

Good For:

  • Smoothing traffic (steady flow)

  • Avoiding sudden load spikes

Limitation:

  • Doesn’t allow burst traffic (even short, legit bursts get dropped)


b. Token Bucket Algorithm

Concept:
Imagine a bucket being filled with tokens at a fixed rate.

  • Each request requires a token to be processed.

  • If tokens are available → request proceeds.

  • If no tokens → request is throttled or delayed.

Good For:

  • Handling bursty traffic gracefully

  • Offering flexibility in how many requests are allowed in short bursts

Limitation:

  • Slightly more complex to implement


Summary:

StrategyHandles BurstsProcessing RateIdeal Use Case
Leaky BucketNoFixedSmooth, steady traffic control
Token BucketYesFlexibleBursty or inconsistent traffic

Rate Limiting in Real Systems

Now that we understand how Leaky Bucket and Token Bucket work, let’s look at how rate limiting is actually applied in real-world systems.


1. Per-IP Rate Limiting

  • Each client (IP address) is allowed only a certain number of requests per minute/second.

  • Common in public APIs to prevent abuse from a single IP.


2. Per-User Rate Limiting

  • Limits based on user identity (e.g., user ID or API key).

  • Useful for tiered services:

    • Free users → 60 requests/min

    • Pro users → 600 requests/min


3. Route-Specific Limiting

  • Different endpoints have different limits.

  • Examples:

    • /login: 5 attempts/minute (to prevent brute force)

    • /feed: 100 requests/minute (to avoid scraping)


4. Global Rate Limits

  • Protect overall infrastructure.

  • Useful in edge services like CDNs or reverse proxies where traffic control is needed globally.


5. Queue-based Throttling

  • Some systems queue excess requests instead of dropping them.

  • Useful in backend workers or API gateways.


Rate limiting can be custom-coded, or configured directly using cloud services — which we’ll explore in the next section.

How AWS Helps with Rate Limiting & DDoS Protection

Cloud platforms like AWS provide built-in tools that help you defend your systems against traffic abuse, bot attacks, and DDoS incidents — without writing everything from scratch.


1. AWS API Gateway

  • Built-in rate limiting per API key, user, or method.

  • Easily configure:

    • Requests per second (RPS)

    • Burst limits (max short-term traffic)

  • Example:
    Limit /login to 5 reqs/sec/user
    Limit /public-feed to 50 reqs/sec/IP


2. AWS WAF (Web Application Firewall)

  • Blocks malicious traffic at the edge before it reaches your app.

  • Helps detect and block:

    • SQL injections

    • Cross-site scripting (XSS)

    • DDoS patterns

  • You can define custom rules or use AWS managed rule groups.


3. Amazon CloudFront (CDN)

  • Rate limiting and geographic filtering at the edge locations.

  • Protects origin servers by serving cached responses.

  • Integrates with WAF for filtering and blocking at global scale.


Bonus:

  • AWS Shield (advanced DDoS protection, included with most services)

  • AWS GuardDuty (threat detection engine for logging + alerts)

Together, these services offer a layered defense strategy — letting you secure your system with minimal custom setup.

Key Takeaways

  • DoS/DDoS attacks aim to flood your system with fake traffic, making it unusable for real users.

  • Rate Limiting is a key technique to restrict how often users can access your system, protecting against abuse and overload.

  • Two common strategies:

    • Leaky Bucket: Smooth, steady traffic control.

    • Token Bucket: Flexible and burst-friendly.

  • In real-world systems, rate limiting is applied per-IP, per-user, or per-route.

  • AWS tools like API Gateway, WAF, and CloudFront provide built-in support for rate limiting and DDoS mitigation.


What’s Next?

In the next blog, we’ll dive into how to speed up your applications and reduce server load using caching strategies — with tools like Redis and CDNs (Content Delivery Networks).

You’ll learn how to store frequently accessed data closer to the user, reduce latency, and make your system more efficient.

0
Subscribe to my newsletter

Read articles from developer_nikhil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

developer_nikhil
developer_nikhil