Scalable URL Shortener

Introduction
Hey everyone!
In this blog, we're excited to share a deep dive into a project we recently built: a Scalable URL Shortener — kind of like Bitly.
We built it using:
Rust (for the backend)
Tokio (for handling async requests)
Warp (for building the web server)
Redis (for caching short links)
Nginx (for load balancing)
And we even used Snowflake IDs to generate unique short URLs in a highly scalable way.
This project is designed to be horizontally scalable, fault tolerant, and insanely fast.
Let's jump in!
Architecture Overview
The system has 3 main parts:
Frontend : Simple interface made of React for users to submit URLs.
Backend : Rust server exposing REST APIs.
Storage : Redis is used for storing mappings of short URLs → original URL.
Plus, we added Nginx in front of everything for load balancing and health checks.
Here's the flow:
User → Nginx (Load Balancer) → Rust Backend (Warp Server) → Redis
Core Technologies Used
Rust + Tokio + Warp
Rust gives us blazing fast performance and memory safety with zero-cost abstractions.
Tokio is Rust’s asynchronous runtime — it lets us handle thousands of simultaneous requests using lightweight tasks instead of heavy OS threads.
Warp is a minimal, type-safe, and composable web framework built on Tokio and Hyper.
Why We Chose This Stack
Async programming with Rust + Tokio means:
We can serve tons of concurrent users efficiently
No thread explosion → lower memory usage, better CPU scaling
Rust’s ownership model makes race conditions and segfaults a thing of the past
And Warp makes building REST APIs feel clean and expressive — it’s:
Fast
Safe
Modular
What It Looks Like
Here's a sneak peek of what a simple route looks like in Warp:
use warp::Filter;
#[tokio::main]
async fn main() {
let hello = warp::path("hello").map(|| "Hello, world!");
warp::serve(hello).run(([127, 0, 0, 1], 3030)).await;
}
Snowflake Hashing for Scalable ID Generation
When building a scalable URL shortener, we needed to generate unique, fast, and sortable IDs without relying on central coordination.
Random strings and auto-increment IDs didn’t scale well.
Instead, we used Snowflake Hashing — a technique inspired by Twitter's Snowflake ID system.
What is a Snowflake ID?
A Snowflake ID is a 64-bit number that is:
Globally unique
Time sortable (newer IDs are bigger)
Distributed (works across multiple servers)
High throughput (can generate thousands of IDs per second per machine)
Structure of a Snowflake ID
Each Snowflake ID is composed of several parts:
| 1 bit | 41 bits | 10 bits | 12 bits |
|------|--------------|------------|---------------|
| sign | timestamp | machine ID | sequence num |
41 bits – Timestamp (in milliseconds since a custom epoch)
→ ensures time-based sorting10 bits – Machine/Node ID
→ allows 1024 different servers12 bits – Sequence number
→ allows 4096 IDs per server per millisecond1 bit – Sign bit (always 0)
Fun Fact: Did you know that Snowflake IDs are time-sortable? This means that when you generate a Snowflake ID, newer IDs will always have a higher value than older ones! This time-based sorting makes them perfect for distributed systems because you can avoid the need for complex coordination between servers
How We Use It
When a user shortens a URL:
A Snowflake ID is generated immediately (no waiting for DB).
The ID is encoded (like Base62) to create the short URL.
This happens asynchronously, so it doesn’t block the main request.
The short link looks something like:
https://rustyshortener/8zF3kT
This keeps our system:
Fast under heavy load
Consistent without collisions
Time-ordered for analytics
Why Not Random or Auto-Increment?
Random IDs → Can collide, not sortable
Auto-increment IDs → Need central coordination, slow
UUIDs → Unique, but long and hard to index
Redis Database
Redis is used as the main database here (no slow disk-based database!).
We use it to:
Store (short URL → original URL) mappings
Retrieve original URLs fast
We configured Redis to:
Use 2GB of memory
Apply LRU (Least Recently Used) eviction when full
Be TCP optimized with a large backlog size
Why Redis as Primary Database? Redis is incredibly fast due to its in-memory data store architecture, which eliminates the latency of disk I/O operations. It optimizes performance through the use of advanced data structures like strings, lists, sets, and hashes, each designed for high-speed access and manipulation. Additionally, Redis employs techniques like pipelining and lazy eviction, which further reduce delays in data processing. Since it is primarily designed for read-heavy workloads, it’s perfect for caching use cases, allowing quick access to frequently requested data without hitting slower databases
Backend Details
We expose these endpoints:
POST /generate_url
: To shorten a new URLGET /:short_code
: To redirect a short code to the original URLGET /ping
: For health checks (Basic Debugging Endpoint)
Important Features:
Environment Variables for configuration (port, Redis address, API key)
API Key Authentication to prevent misuse
Async Redis Integration via redis crate
Error Handling (timeouts, 404s, etc.)
Connection Pooling using keep-alive
And everything runs inside Docker containers for easy deployment!
- This allows us to migrate to Kubernetes as a future venture!!
Load Balancing with Nginx
We use Nginx as a smart load balancer that sits in front of multiple Rust backend instances ( We chose 3 ).
upstream backend_servers {
hash $request_uri consistent;
server backend1:8000 max_fails=3 fail_timeout=30s;
server backend2:8000 max_fails=3 fail_timeout=30s;
server backend3:8000 max_fails=3 fail_timeout=30s;
keepalive 32;
}
Why Consistent Hashing?
We chose consistent hashing to route incoming requests based on the $request_uri
.
This ensures that requests for the same shortened URL will (almost always) be routed to the same backend, which:
Improves cache hit rates
Reduces cold starts
Helps with stateful optimizations (if any)
What Nginx Does for Us
Nginx isn’t just doing round-robin load balancing — it’s a battle-hardened traffic router that also:
Monitors health of backend services
Retries failed requests automatically
Buffers and queues connections during spikes
Keeps connections alive with
keepalive
, reducing latency
Nginx uses an event-driven architecture, meaning it can handle thousands of concurrent connections with a single thread. Unlike traditional servers that create a new thread or process for each request, Nginx operates asynchronously, allowing it to efficiently manage high traffic loads without consuming excessive system resources. This is one of the reasons Nginx is so fast and scalable
With this setup:
Our Rust backends stay stateless and fast
Traffic gets distributed predictably
We gain resilience and scalability out of the box
System Tuning
For better performance, we tuned system parameters:
File descriptors limit raised to 65536
TCP backlog size increased
Memory settings tuned for Redis
Connection timeouts adjusted
Error retries handled smartly
All backend containers have resource limits:
resources:
limits:
cpus: '2'
memory: 1G
Some optimizations include:
File descriptors limit raised to 65536: Increases the number of simultaneous connections the system can handle, ensuring stability during high traffic.
TCP backlog size increased: Helps manage incoming requests more effectively, preventing connection drops during traffic spikes.
Memory settings tuned for Redis: Ensures optimal memory usage, maintaining high throughput and low latency for caching.
Connection timeouts adjusted: Prevents resource hogging by terminating slow connections and ensuring efficient resource usage.
Error retries handled smartly: Reduces the risk of system overload by applying controlled retry mechanisms during failures.
Backend containers resource limits: Ensures resource allocation remains predictable and balanced, maintaining system stability under load.
Load Testing Results with wrk
We used wrk
, a modern HTTP benchmarking tool, along with a custom Lua script to simulate high-throughput POST requests.
Some wrk runs:
wrk -t8 -c1000 -d30s -s wrk_post.lua http://127.0.0.1:80/generate_url
Result:
324,758 requests in 30s
10,808.79 requests/sec
2.38 MB/s transfer
And:
wrk -t6 -c500 -d30s -s wrk_post.lua http://127.0.0.1:15555/generate_url
Result:
339,250 requests in 30s
11,293.93 requests/sec
1.99 MB/s transfer
Average latency stayed under 100ms — even with 1000 concurrent connections!
Security Measures
API key authentication to protect shortening endpoint
Header sanitization at Nginx
Connection limits to avoid abuse
Timeouts for every operation
Future Improvements
Some things we want to add later:
Redis Cluster for even better horizontal scaling
Rate limiting (to stop spamming)
More monitoring (Grafana dashboards)
Advanced analytics (like top clicked links)
Better caching strategies
Using Redis as caching and using an alternate main database
Conclusion
This project taught us SO MUCH about building production-grade systems:
How async Rust works under the hood
Why consistent hashing is magical for load balancing
How Snowflake IDs can eliminate the need for database locking
And why system tuning matters even more than writing "fast code"
Rust + Warp + Tokio gave us good performance — comparable to Go or even C++ web servers!
Thanks for reading! Feel free to reach out if you want the full code or setup scripts!
Github Repo : https://github.com/Cioraz/URL_Shortener_Scalable
Subscribe to my newsletter
Read articles from Naga Mukesh Konatham directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
