Scaling your Apps: The Why, How, and When

Tanmay BansalTanmay Bansal
7 min read

As I’ve been diving deeper into backend systems and real-world web applications, I’ve come across terms like horizontal scaling, vertical scaling, and capacity estimation more times than I can count. These aren't just buzzwords—they're essential for building scalable, high-performance systems.

Today, let’s start exploring these concepts one by one, beginning with Vertical Scaling.


🏗️ What Is Vertical Scaling?

Vertical scaling (aka scaling up) means upgrading your existing server by adding:

  • 🧠 More RAM

  • 🧮 More CPU cores

  • ⚡ Faster storage (SSD over HDD)

How vertical scaling works

Image Source: Vertical vs Horizontal Scaling


🔍 Real Example in Deployment

Say you're deploying a Node.js app on a DigitalOcean droplet. You start with:

Basic Plan: 1 vCPU, 1 GB RAM

As traffic increases, you upgrade to:

Upgraded Plan: 4 vCPU, 8 GB RAM

No code changes needed — your server just has more horsepower to handle requests. This is vertical scaling in action: same code, better infrastructure.


⚙️ How Apps Should React to Vertical Scaling

While you don’t scale vertically in code, your application should be designed to efficiently use the increased resources. For example:

  • Use asynchronous I/O

  • Optimize database queries

  • Use memory and CPU wisely (avoid memory leaks, unbounded loops, etc.)

The most important point? With vertical scaling, your code doesn’t need to know the machine is upgraded — it just runs better.


✅ Pros of Vertical Scaling

+ Simple to implement
+ No change to application code
+ Great for monolithic apps

❌ Cons of Vertical Scaling

- Hardware limitations (you can’t scale forever)
- Downtime during upgrades
- Cost increases rapidly at higher tiers

👨‍🍳 Analogy: Your Burger Stall

You’re running a small burger stall. Orders start piling up. You:

  • Get a bigger grill 🍳

  • Hire more people 👨‍🍳

  • Add more tables 🪑

This is vertical scaling — making the existing setup more powerful.

But… what if there's no more space? That’s when horizontal scaling comes in — multiple burger stalls! 🍔🍔🍔 (We’ll get to that next.)


📈 When to Consider Vertical Scaling

Use monitoring tools like:

  • htop / top (Linux CPU usage)

  • pm2 (Node.js process manager)

  • NewRelic / Datadog (Application metrics)

Example with htop:

sudo apt install htop
htop

If your CPU is constantly above 80–90% or RAM is fully used, it’s a clear sign to scale vertically.


Let me know if you’ve ever hit a performance wall and what your fix was—I’d love to hear your stories!

Capacity Estimation: Planning for Growth

Before scaling — vertically or horizontally — it’s essential to estimate how much capacity you’ll actually need.

Think of this as resource budgeting for your system. Without it, you’re either overspending or under-preparing.


🧠 What Is Capacity Estimation?

Capacity estimation helps answer:

  • 🧾 How many requests can my current setup handle?

  • 📈 What happens when traffic grows 10x?

  • 🧯 When will my system crash under load?

It’s the process of calculating the maximum throughput of your system and identifying bottlenecks before they cause outages.


📐 Estimating Capacity Step-by-Step

Let’s go through a practical framework using a backend API as an example:

1. Measure Baseline Performance

Start with a single machine setup. Use tools like:

  • wrk – HTTP benchmarking

  • ab (Apache Bench)

  • artillery, k6, or locust

bashCopyEditwrk -t12 -c400 -d30s http://localhost:3000

Let’s say this outputs:

bashCopyEditRequests/sec: 500

That means 1 server can handle 500 RPS (requests per second).


2. Factor in Peak Load

If your app gets ~100,000 daily users, and on average each makes 20 requests:

txtCopyEdit100,000 users × 20 reqs = 2,000,000 total requests/day

Spread across 24 hours:

txtCopyEdit2,000,000 ÷ 86,400 seconds ≈ ~23 RPS average

But traffic is not uniform — you’ll likely have peak hours. Multiply by a safety factor (say 10x):

txtCopyEditPeak RPS ≈ 230

3. Decide Based on Throughput

If 1 server = 500 RPS, and you need 230 peak RPS, you’re good.

But if you're expecting a marketing campaign or viral event to push this to 2,000 RPS, you’ll need:

txtCopyEdit2000 ÷ 500 = 4 servers (horizontally scaled)

OR upgrade a single machine vertically if the cost and performance align.


4. Monitor and Revisit

Capacity estimation isn’t a one-time thing.

✅ Use:

  • Grafana + Prometheus for dashboards

  • Datadog, New Relic, Elastic APM

  • Alerts for CPU/RAM usage, request latency, DB slow queries

📉 Regularly revisit as your feature set, traffic patterns, or codebase evolve.


📦 Capacity Estimation for Databases

Don’t forget your database!

  • Measure average query time under load

  • Identify slow queries with indexes (more on that in the next section!)

  • Use read replicas or DB sharding if DB is the bottleneck


🧮 Example Capacity Chart

ComponentCapacity (RPS)Bottleneck?
Node.js Server500
DB (Postgres)300
Redis Cache5,000

In this example, even if the Node server can handle more, the DB becomes the limiting factor. This is where indexing or horizontal DB scaling comes in.


🌐 What Is Horizontal Scaling?

Horizontal scaling (aka scaling out) means running multiple instances of your application on different servers or machines.

Instead of upgrading the size of one machine, you add more machines to distribute the load.

How horizontal scaling works

Image Source: Vertical vs Horizontal Scaling


🚀 Real Example in Deployment

Let’s say you have a web server receiving a ton of traffic. One instance can’t handle it alone. Instead of just upgrading it (vertical), you:

  • Deploy 3 instances of the server

  • Put a load balancer (like NGINX or HAProxy) in front

  • Distribute incoming traffic across instances

Incoming Request ─▶ Load Balancer ─▶ Server A
                                └▶ Server B
                                └▶ Server C

🔄 Load Balancer Code Example (NGINX)

Here’s a super simple nginx.conf snippet:

http {
    upstream backend {
        server 127.0.0.1:3000;
        server 127.0.0.1:3001;
        server 127.0.0.1:3002;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://backend;
        }
    }
}

👨‍💻 Node.js Cluster Example for Horizontal Scaling

Node.js also allows horizontal scaling via its built-in cluster module:

const cluster = require('cluster');
const os = require('os');
const http = require('http');

if (cluster.isMaster) {
  const cpuCount = os.cpus().length;
  for (let i = 0; i < cpuCount; i++) {
    cluster.fork();
  }
} else {
  http.createServer((req, res) => {
    res.writeHead(200);
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(8000);
}

This code runs multiple worker processes across available CPU cores, effectively simulating horizontal scaling on a single machine.


✅ Pros of Horizontal Scaling

+ High availability (one instance goes down, others stay up)
+ Can scale beyond hardware limits
+ Better fault tolerance

❌ Cons of Horizontal Scaling

- Requires load balancers
- Adds complexity to deployment and state management
- Needs inter-service communication strategies

👩‍🍳 Analogy: Expanding the Burger Empire

Remember your burger stall? You’ve maxed out your space. So, you:

  • Open another stall down the street 🍔

  • Then another in the next neighborhood 🍔

This is horizontal scaling — adding more stalls (servers) to meet demand.

You might even hire a delivery service (load balancer!) to route orders intelligently.


⚖️ When to Choose Horizontal Over Vertical

  • Vertical scaling hits hardware or cost limits

  • You need redundancy and high availability

  • You’re designing a distributed system

📌 Often, companies start with vertical scaling (it’s easier), and move to horizontal scaling as they grow.


🏁 Conclusion: Building For Tomorrow, Today!

Scaling is not just about brute-forcing hardware or adding more servers—it's about planning, measuring, and evolving as your system grows.

Whether you're vertically scaling a hobby project, horizontally scaling a startup backend, or capacity planning for a production-grade system—understanding these concepts helps you make smart, cost-effective, and reliable decisions.

🔁 Keep measuring. 📈 Keep optimizing. 🧠 Keep learning.

If you’ve found this article helpful, stay tuned for the next one where we’ll cover Indexing in Databases and how it plays a massive role in scaling efficiently!

Follow along, share your thoughts in the comments, and happy scaling! 🚀

0
Subscribe to my newsletter

Read articles from Tanmay Bansal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tanmay Bansal
Tanmay Bansal

build. ship, publish