Deployment Strategies in Microservices

LakshmiPerumalLakshmiPerumal
14 min read

What deployment strategies are commonly employed in the domain of Microservices?How to deploy microservices?

Common deployment strategies in the domain of microservices include:

Recreate Deployment

  • We stop the old version of the app.

  • Then we deploy the new version.

  • It is simple but has downtime (app is not available for some time).

  • Best for non-critical apps where small downtime is okay.

1. Development Phase

Step 1: Code Development

  • Developers write new code (features or bug fixes).

  • Changes are saved in Git (version control system).

Step 2: Local Testing

  • Run the app on local machine.

  • Check if it works properly.

  • Run:

    • Unit tests (check individual parts)

    • Integration tests (check how parts work together)

  • Make sure all environment variables and configs are correct.

Step 3: Setup CI (Continuous Integration)

means using tools like GitHub Actions, Jenkins, or GitLab CI to automate the process. When a developer pushes code to GitHub, the CI tool automatically builds the application (for example, creates a .jar file or Docker image) and then runs tests (like JUnit tests) to make sure everything is working correctly. This helps catch errors early and avoids manual testing every time. Everything happens automatically in the background.

2. Pre-Production Phase

Step 4: Build Artifacts

we just take the build artifact created by CI (like a Docker image or .jar file) and save or push it to a central place like Docker Hub or JFrog Artifactory.

Note: That image or .jar is pushed to a repository (so it can be used in later steps like staging or production deployment).

Step 5: Test in Staging

  • Deploy the build to a test environment (same as production).

  • Make sure everything works fine.

Step 6: UAT (User Testing)

  • Check if the app behaves correctly.

  • Do manual or automation testing.

  • CI = automated, fast, code-level checks

  • UAT = final human testing to make sure app is ready for users

3. Production Deployment (Recreate Style)

Step 7: Inform Everyone

  • Let your team or users know about the upcoming downtime.

Step 8: Backup First

  • Save current database/data.

  • Plan rollback (how to go back if something breaks).

Step 9: Do the Deployment

  • Stop current app in production.

  • Deploy the new version.

  • Start the new app.

Production Environment Instances in Recreate Deployment

In Recreate Deployment, we usually run only one instance of the application in production. Here's why:

  • When we deploy, we stop all old instances of the app.

  • Only after stopping the old ones, we start the new version.

  • So at any moment, only one version is running — either old or new, not both.

Important Points to Remember:

  • Number of Instances: Just one (for simplicity). → Only one version of the application is running at a time in production.

  • Downtime: Yes, there will be a short time when the app is down — because we stop old before starting new.

  • Backup Plan: Always take a backup and prepare a rollback plan in case something breaks after deployment.

  • Best Use Case: For apps that don’t need to run all the time (non-critical systems), like internal tools or apps with fewer users.

Rolling Deployment

  • Update the app one pod at a time instead of stopping everything.

  • While some pods run the old version (v1), others start running the new version (v2).

  • Users feel little or no downtime because most pods stay online.

Example:

If you have 5 pods, Kubernetes will update them(Pods) one by one (or in small batches) — based on the values of:

  • maxUnavailable: how many old pods can be stopped at once

  • maxSurge: how many new pods can be added temporarily

So if:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Max number of pods unavailable at a time
      maxSurge: 1         # 1 additional pod above the desired replicas, which means up to 6 pods can run at a time
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:v2  # New version

➡️ Kubernetes updates 1 pod at a time:

  • Adds 1 new (v2) pod —> 4 old (v1) + 1 new (v2)

  • Deletes 1 old (v1) pod —> 3 old + 2 new

  • Repeats until all 5 pods run v2 —> 0 old + 5 new (v2)

Kubernetes handles these steps automatically.

Blue-Green Deployment

In Blue-Green Deployment, we keep two environments:

  • Blue: The current live version (v1)

  • Green: The new version (v2), fully ready but not yet receiving traffic

👉 We test Green first.
👉 If it’s working fine, we switch all traffic from Blue to Green.
👉 If something goes wrong, we switch traffic back to Blue quickly (rollback).

Why Use It?

  • No downtime — users don’t feel the switch

  • Safe rollback — just point traffic back to the old one

  • Perfect for critical systems — where app must always be available

# Blue environment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: my-app
        version: blue
    spec:
      containers:
      - name: my-app
        image: my-app:v1  # Old version

# Green environment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3 # meaning 3 pods are running the new version of the application. However, these pods do not receive user traffic initially.
  template:
    metadata:
      labels:
        app: my-app
        version: green
    spec:
      containers:
      - name: my-app
        image: my-app:v2  # New version

How It Works in Kubernetes

  1. Deploy two apps:

    • my-app-blue → runs v1 (live now)

    • my-app-green → runs v2 (not live yet, just running)

  2. Green also runs 3 pods (same as Blue), but no traffic goes to Green yet.

  3. Use a load balancer or service to send traffic only to the Blue pods.

  4. You test the Green version (v2) safely. No user is affected.

  5. If Green is working fine → you change the traffic to go to Green.

  6. If something fails → quickly send traffic back to Blue.

  7. After full success → you can delete Blue or keep it for future rollback.

# Stable Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 10  # Initial stable version pods
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: stable
    spec:
      containers:
      - name: my-app
        image: my-app:v1  # Stable version

# Canary Deployment (small subset of pods for canary release)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-canary
spec:
  replicas: 2  # A small number for canary testing
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: canary
    spec:
      containers:
      - name: my-app
        image: my-app:v2  # New version (canary)

Canary Deployment:

In Canary Deployment, we release the new version (v2) to a small number of users first.

  • Most users still use the stable version (v1).

  • Only a few users get the canary version (v2).

  • If the canary works well, we slowly give it to more users.

  • If there are problems, we stop the rollout and fix the issue.

Why use Canary?

  • Helps catch bugs early without affecting all users.

  • Used in high-traffic apps to reduce risk.

  • Easy to rollback, since most users still use the stable version.

# Stable Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 10  # Initial stable version pods
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: stable
    spec:
      containers:
      - name: my-app
        image: my-app:v1  # Stable version

# Canary Deployment (small subset of pods for canary release)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-canary
spec:
  replicas: 2  # A small number for canary testing
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: canary
    spec:
      containers:
      - name: my-app
        image: my-app:v2  # New version (canary)

Steps in Simple Terms

  1. Start with v1 (stable) running in most pods.

  2. Deploy a few pods with v2 (canary).

  3. Route a small part of user traffic to the canary pods.

  4. Monitor the new version:

    • Check logs, errors, and performance
  5. If everything is good:

    • Slowly increase canary pods (or traffic)
  6. Finally, when confident:

    • Replace all v1 pods with v2

A/B Testing

In A/B Testing, you run two different versions of your app at the same time

  • Version A (e.g., v1)

  • Version B (e.g., v2)

Each version is shown to a different group of users.
This helps you compare how users react to each version.

Why use it?

  • To test which version users like more

  • Common for UI/UX changes, feature experiments, and user behavior tracking

  • Helps make data-driven decisions

# Version A deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-version-a
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: a
  template:
    metadata:
      labels:
        app: my-app
        version: a
    spec:
      containers:
      - name: my-app
        image: my-app:v1

# Version B deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-version-b
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: b
  template:
    metadata:
      labels:
        app: my-app
        version: b
    spec:
      containers:
      - name: my-app
        image: my-app:v2

How It Works in Kubernetes

  1. You create two deployments:

    • One for Version A

    • One for Version B

  2. Both versions run at the same time with separate pods.

  3. You use a load balancer (like AWS ELB, GCP Load Balancer) or Ingress controller to:

    • Split traffic 50–50

    • Or send certain users to A and others to B

    • Or even route based on headers (ex: country, user group)

Steps in Simple Terms

  1. Deploy Version A and Version B side by side.

  2. Route traffic using:

    • Cloud load balancer (e.g., AWS, GCP)

    • Kubernetes Ingress with rules

  3. Users see different versions randomly (or based on logic).

  4. Collect feedback or metrics (like click rate, performance).

  5. Decide which version performs better.

  6. Keep the better one and remove the other.

Data driven Decisions —> Means You don’t guess which version is better —
You collect real user data (like clicks, signups, errors, time spent, etc.) and use that to decide which version to keep.

Shadow Deployment:

Shadow deployment is a deployment strategy where:

  • The new version of the application (v2) is deployed alongside the current stable version (v1).

  • Live traffic that is going to the current version (v1) is cloned and sent to the new version (v2), but:

    • The new version does not respond to users.

    • It only processes the traffic in the background — used for testing, monitoring, and validation.

Why Use Shadow Deployment?

  • To test new features or code changes under real-world traffic.

  • To catch performance bottlenecks, memory leaks, or unexpected behavior in production without affecting users.

  • To compare logs, performance, and outputs between old and new versions.

  • To prepare for a full rollout with confidence.

How It Works in Kubernetes

We deploy two versions of the same app:

Old Version (v1) — Stable

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-old
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: old
  template:
    metadata:
      labels:
        app: my-app
        version: old
    spec:
      containers:
      - name: my-app
        image: my-app:v1  # Old stable version
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-old-service
spec:
  selector:
    app: my-app
    version: old
  ports:
    - port: 80
      targetPort: 80

New Version (v2) — Shadow

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-new-shadow
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: new-shadow
  template:
    metadata:
      labels:
        app: my-app
        version: new-shadow
    spec:
      containers:
      - name: my-app
        image: my-app:v2  # New version under test
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-new-shadow-service
spec:
  selector:
    app: my-app
    version: new-shadow
  ports:
    - port: 80
      targetPort: 80
  • This version is not receiving user traffic directly.

  • It receives cloned traffic using a traffic mirroring tool (like Istio or service mesh).

  • This version does not send a response to the user.

How to Clone/Mirror Traffic?

Kubernetes alone doesn’t mirror traffic by default. You need a service mesh, such as:

  • Istio

  • Linkerd

  • NGINX with Lua scripting

🧪 Example with Istio (using VirtualService and DestinationRule) can mirror traffic to shadow service.

Feature Toggles (Flags)

What are Feature Toggles?

Feature Toggles allow you to deploy new features along with your code but keep them hidden (disabled) for users. You can control when and for whom the feature is visible using a configuration setting — without changing code or redeploying the app.

Why Use It:

  • Safely test new features in production, without exposing them to everyone.

  • Gradually enable a feature for selected users (e.g., 5%, 10%, internal team).

  • If something goes wrong, you can instantly disable the feature with a switch.

  • Helps in A/B testing or gathering user feedback before a full release.

Example:

Let’s say you’re building a new "dark mode" for your app.

You deploy the code, but wrap it with a toggle:

if (featureToggle.isEnabled("dark-mode")) {
    enableDarkMode();
} else {
    enableLightMode();
}

Now, even though the feature is deployed, no one sees it until the toggle is ON.

Microservices are developed and deployed quickly and in most cases automatically as part of the CI/CD pipeline. Microservices could be deployed in Virtual Machines or Containers. The virtual machines or containers can be On-premise or in the cloud as well.

There are different deployment approaches available for Microservices. Some of the possible deployment approaches for microservices are mentioned below.

  • Multiple service instances per host

  • Service instance per host

  • Service instance per VM

  • Service instance per Container

  • Serverless deployment

  • Service deployment platform

Asynchronous communication in a micro‑service system without Kafka or a traditional message broker:

Asynchronous Communication Without Kafka or RabbitMQ

Asynchronous means:
→ The sender does not wait for the receiver to respond immediately.
→ It lets services work independently and in parallel, improving performance and fault tolerance.

Now let’s go through each method clearly:

1. HTTP Polling

How it Works:
Service A keeps asking Service B: "Do you have any new updates?" at regular intervals.

Example:
A weather app (Service A) polls the weather server (Service B) every 10 minutes to fetch latest weather data.

✅ Easy to implement
❌ Wasteful if there are no updates (still keeps asking)

2. Webhooks

How it Works:
Service A tells Service B: "When something happens, send me the update."

Example:
When a user places an order (event) in Service B (Order service), it instantly sends an HTTP request to Service A (Notification service) to send a confirmation email.

✅ Real-time and lightweight
❌ Needs the target service to be always available

3. Database Polling

How it Works:
Service A writes something into a shared database. Service B regularly checks that table for changes.

Example:
Inventory Service inserts an entry when stock is updated. Billing Service checks that table every 5 minutes to adjust prices.

✅ Easy if database is already shared
❌ Polling adds load, slower detection

4. Server-Sent Events (SSE)

How it Works:
Server keeps an open connection to the client and keeps sending updates (one-way only).

Example:
Live cricket score updates from backend to frontend UI in real time.

✅ Works well for real-time one-way notifications
❌ Only server-to-client communication (not bidirectional)

5. WebSockets

How it Works:
Server and client (or two services) keep a long-lasting connection, and they can both send messages anytime.

Example:
Chat service — both sender and receiver can send and receive messages instantly via a single open connection.

✅ Real-time, two-way communication
❌ Needs more infra to manage connections at scale

6. Redis Pub/Sub (via Distributed Cache)

How it Works:
Service A "publishes" an event to Redis. Service B "subscribes" to that event and gets notified instantly.

Example:
Order Service publishes an “Order Placed” event. Inventory Service subscribed to Redis will immediately update the stock.

✅ Very fast; real-time
❌ Redis doesn’t persist events — they are lost if no one listens

7. HTTP/REST with Async Processing

How it Works:
Service A sends a request. Service B accepts it, starts processing in background, and returns “OK, I received it”.

Example:
Image Upload API — User uploads an image (Service A sends request), and Service B processes it in background (resizing, watermarking).

✅ Quick response to user
❌ You don’t know when the task finishes unless handled via notification

8. HTTP Callbacks

How it Works:
Service A sends a request with a callback URL to Service B. After processing, Service B calls back Service A with the result.

Example:
Payment Service sends transaction data to Bank API and provides its own callback URL. Bank notifies success or failure later using that URL.

✅ Works well for asynchronous APIs
❌ Error handling and retry logic must be custom coded

9. Background Jobs

How it Works:
Service handles a request and internally offloads heavy tasks (e.g., data processing) to background threads or worker queues.

Example:
User uploads a large CSV file. The app saves it and starts a background job to process rows, update DB, etc., without blocking the user.

✅ Keeps UI/API responsive
❌ Not suitable for cross-service communication

Summary

Summary: Async Communication Without Kafka or Message Queues

In a microservices architecture, services often need to talk to each other asynchronously, meaning they don’t wait for an immediate response. Normally, systems like Kafka, RabbitMQ, or ActiveMQ are used for this. These tools are highly reliable, scalable, and perfect for handling large volumes of messages.

But the good news is:

You don’t always need Kafka or MQs to implement async communication — especially for small or medium systems, or specific use cases.

No single approach is "best for all cases".
Instead, pick the right one based on your needs:

  • Use Webhooks or Redis Pub/Sub for simple real-time event sharing

  • Use WebSockets for live two-way data

  • Use HTTP callbacks if the receiver takes time but should notify you back

  • Use background workers for internal, heavy, async tasks

0
Subscribe to my newsletter

Read articles from LakshmiPerumal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

LakshmiPerumal
LakshmiPerumal