Zero Downtime Deployments in .NET with Blue Green and Canary Techniques

Patrick KearnsPatrick Kearns
6 min read

True zero downtime deployment means your users experience no failed requests, broken sessions, or visual inconsistencies during a new release. For stateless services, this may sound trivial, but when persistent connections, background jobs, distributed caches, and database migrations enter the frame, the challenge becomes multidimensional.

In .NET specifically, .Net Core apps often maintain in memory cache, in flight background tasks, or transient scoped services that may hold references to now defunct resources. Rolling over such an app while maintaining traffic can cause sharp failures if done carelessly. If you also add working with EF Core, schema mismatches between application and database version can result in runtime exceptions that affect only some users, leading to mysterious “it only happens in production” bugs.

This is where blue green and canary strategies come in handy, they allow you to test in production without fully releasing in production.

Blue Green Deployment: Like a Theatre Swap

If you imagine your application environment as a stage set. The blue version is the one currently in front of the audience, stable, rehearsed, and live. Behind the curtain, you prepare a green version with new scenes, updated lighting, and maybe a better script. Once ready, you swap the scenes in a single move. The audience never sees the switch happen, all they notice is that suddenly, the show looks better.

In Azure Container Apps, this is facilitated through revision based deployment. Your container app maintains multiple active or inactive revisions. The existing revision continues serving 100% of traffic (blue), while you deploy a new revision (green) in the background. Once you’ve verified readiness, either manually or through automated smoke tests, you route all traffic to the new revision in one action. The system instantly updates its internal ingress router, and traffic flows with zero interruption.

A similar pattern exists in Kubernetes using services and deployment resources. You can spin up a second deployment, wire it to the same service selector when ready, and scale down the original.

One critical caveat is database changes. In a blue green scenario, the new version might expect a schema the old one doesn't understand, or vice versa. The safest approach is to apply additive, backward compatible database migrations before switching traffic, then clean up deprecated schema objects in a later release. This two phase migration pattern should be a part of your deployment script, not a manual afterthought.

Canary Releases: Controlled Exposure

Canary deployments take their name from the early 20th century practice of sending canaries into coal mines to detect toxic gas. If the bird thrived, humans could follow. In software, the "canary" is a small percentage of traffic, say 1%, routed to a new release while the rest remains on the old one. If metrics remain healthy, more traffic follows gradually. In .NET systems, especially those serving APIs or websites under variable load, this approach enables measured risk. A canary deployment to 5% of production users might reveal performance regressions, broken CSS, or subtle API contract violations, issues unlikely to surface in lower environments. Observability here is crucial. Tools like Azure Front Door or YARP (Yet Another Reverse Proxy) can perform percentage based traffic routing. With YARP, you can configure route clusters with custom load balancing policies that steer only a fraction of requests to the new backend instance. Azure App Gateway or Kubernetes ingress controllers also support this via annotations or configuration maps.

Let’s say you’re deploying version 3.1 of a ProductService API that introduces stricter validation logic. Instead of sending 100% of traffic to the new version, you use a YARP route rule:

"routes": [
  {
    "routeId": "product-api",
    "match": { "path": "/api/products/{*}" },
    "clusterId": "products-v3",
    "transforms": [ { "RequestHeader": "X-Canary", "Set": "true" } ]
  }
]

Then, a custom load balancer filter in YARP checks for this header and applies routing rules accordingly. For web apps, you can base the decision on cookies, IP ranges, or even logged in user IDs if you're managing canaries internally.

When telemetry shows confidence in the new version, requests per second, error rate, latency percentiles, you gradually shift more users until the old version becomes obsolete.

How to Handle EF Core Migrations in Live Systems

One of the most common causes of downtime in .NET apps stems from uncoordinated EF Core migrations. If your application auto applies migrations on startup (app.MigrateDatabase() in Program.cs), then blue green or canary strategies will likely cause version mismatches during rollout.

The correct approach is to decouple schema migration from application bootstrapping. Run migrations manually or as a separate job, validate their success, and then deploy your application containers. This ensures that a new version isn't assuming a schema that doesn’t yet exist. For larger systems, you could try applying the Expand and Contract pattern - introduce new schema elements (columns, tables), let both old and new app versions coexist, and only later remove obsolete parts.

Real World Observability: Seeing the Canary Sing or Die

No zero downtime strategy works without feedback. It’s essential to plug your deployment pipeline into real time metrics, whether it's Application Insights, Prometheus/Grafana, or OpenTelemetry.

In ACA, revisions can report per revision metrics like request count, failure rate, and response time. By tagging your deployments and correlating them with logs and telemetry, you can quickly spot regressions.

.NET 8 supports OpenTelemetry out of the box. Configure your API to emit traces, spans, and structured logs to capture user flow. If a canary release sees higher 5xx responses or latency spikes, your rollback logic can kick in before widespread impact. Log enrichment with deployment revision identifiers, such as a Git commit SHA or container image tag, makes it easy to correlate errors with specific releases.

Rollbacks: Fast, Safe, and Quiet

The final test of a deployment strategy is its rollback path. Can you revert with confidence, speed, and no disruption?

With blue green, rollback is a simple switch of traffic back to the previous version. Because both versions remain live (at least temporarily), no container restart or redeployment is needed, just a router configuration update.

Canary rollbacks require halting the progressive rollout and returning 100% of traffic to the stable version. Tools like Argo Rollouts (for Kubernetes) automate this decision making based on thresholds. ACA also supports traffic weighting via CLI or ARM templates, making rollback nearly instantaneous. But rollbacks only work if state remains compatible. Always test your rollback flow in staging, deploy forward, apply a database change, then roll back and see what fails. The earlier you detect incompatibility, the safer your production will be.

Blue green and canary deployments aren’t magic, they’re controlled processes that reduce risk when paired with solid monitoring, rollback paths, compatible code and schema evolution. In most teams, blue green works best for stateless or easily swappable services, while canary is preferable for high traffic APIs or changes with unknown risk. Combining both, such as blue green with a canary ramp up within the green environment, offers additional resilience.

1
Subscribe to my newsletter

Read articles from Patrick Kearns directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Patrick Kearns
Patrick Kearns