NATS Cluster Architectures: Regional Clusters - Building Reliable Messaging Foundations


Why does NATS topology matter?
NATS topology refers to the way your NATS servers are arranged and connected. Unlike many messaging systems with more rigid deployment models, NATS offers remarkable flexibility. You're not shoehorned into a one-size-fits-all approach; instead, you become the architect of your messaging fabric.
Your choice of topology, whether a single server, resilient cluster, leaf nodes, a supercluster, etc. will directly shape fault tolerance, message delivery guarantees, network segmentation, and overall system complexity. This means you can tailor your NATS deployment precisely to your needs, but it requires careful consideration.
This miniseries of posts will explore a variety of NATS server topologies, increasing in their complexity. Here, we’ll be covering:
The single instance deployment
A resilient three-node cluster
And for each:
Key characteristics of each topology
Tradeoffs in consistency, latency, and availability
Potential use cases
Impact on application integration, particularly within C#/.NET clients
The Singularity: Single NATS Server
A single NATS server is the most fundamental setup, deployed as a single instance of the NATS server. It's the easiest to grasp and get started with, making it a natural entry point for development and simple use cases. This is the same deployment we used in “Getting Started with NATS in C#”
In this topology, a single NATS server process handles all client connections, message routing, and any configured persistence, i.e. JetStream, if enabled. All publishers and subscribers within our system connect directly to this single endpoint.
Characteristics at a Glance
Simplicity: Straightforward to configure and manage
Low Overhead: Requires the least amount of infrastructure resources, i.e. CPU, memory, network
Single Point of Failure: If the server becomes unavailable due to hardware failure, network issues, or software problems, the entire messaging system halts
Limited Scalability: The performance and throughput of the system are bounded by the capacity of the single server instance
- Note: Although, a single NATS server can still be quite performant, to the tune of millions of messages/sec depending on resources
No Redundancy: Availability limited to the single node, no automatic failover or backup
No JetStream Replication: Of course, with a single server, no JetStream resources can be replicated
Ideal Use Cases - Dev/Local/Non-Prod
Local Dev and Testing: Perfect for a local dev NATS environment and local integration testing
Demos, POCs: Proving new messaging patterns among microservices, demoing new features, etc.
Learning and Exploration: Low barrier of entry to understand the basic concepts of NATS without the complexities of clustering
NATS Server Config
Couldn’t be simpler! This specifies that clients should connect on the default port 4222
and the server will be listening on 0.0.0.0:4222
port: 4222
Mounting a file for just this might be overkill, you can simply specify many options via the command line as well. While 4222
is the default, just for example, with Docker, this is equivalent to the server.conf
above
docker run --rm --name my-nats -p "4222:4222" nats -p 4222
NATS.Net Integration
Connecting to a single NATS server will use a connection string pointing to its address, just the same as we saw in “Getting Started with NATS in C#”. For completeness, the NATS.NET client library provides the tools for this. Importantly, in this topology, our app becomes directly reliant on the availability of this single server. As a result, several factors will impact our app design.
Connection Management: The
NATS.Client.Core
library does provide robust connection management, but with a single server, its primarily focused on the initial connection and handling disconnectsInitial Connection: By default, the client will lazily connect to the server, and fail fast on the initial connection attempt, i.e. the initial attempt is not retried and a
NatsException
is thrownReconnects: After an initial successful connection, any drop will result in a reconnect attempt up to a configured limit via
NatsOpts.*Reconnect*
propertiesNo Automatic Failover: Reconnection attempts have no other route except the same single address
var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<Program>();
var opts = new NatsOpts
{
Url = "nats://localhost:4222",
LoggerFactory = loggerFactory
};
await using var connection = new NatsConnection(opts);
var rtt = await connection.PingAsync(CancellationToken.None);
logger.LogInformation("Ping successful - {RTT}ms to {@ServerInfo}",
rtt.TotalMilliseconds,
connection.ServerInfo);
Resilience Through Redundancy: Three Node Cluster
Clustering in NATS provides fault tolerance and increased capacity by connecting multiple NATS servers. A cluster of three nodes is a common starting point for production deployments and offers a balance between resilience and complexity.
In this topology, three NATS servers are interconnected via routes, and form a full mesh. Each server maintains bidirectional connections to the others, allowing them to fully propagate the Subject interest graph. Clients can connect to any server in the cluster, and the cluster will route messages appropriately, even if a server becomes unavailable.
Characteristics at a Glance
Fault Tolerance & Increased Availability: The cluster can still serve Core NATS traffic even in the face of a two node failure, i.e. operates with only a single node
Improved Scalability: Distributes the load across multiple servers, increasing the overall capacity of the system.
Automatic Discovery: NATS servers in a cluster will automatically discover each other after reaching any ‘seed’ server, cluster gossip ensures a full mesh is maintained
Message Routing: Servers route messages to the appropriate destinations within the cluster, cluster is transparent to publisher/subscribers beyond any connection string
Increased Complexity: Although still minimal, configuring and managing a cluster does involve more complexity than a single server
JetStream Considerations
Clustering is crucial for JetStream's fault tolerance and data durability. JetStream leverages the Raft consensus algorithm to replicate data across multiple servers within a cluster. This ensures, to the limit of consensus, that the data remains available and consistent.
Quorum: JetStream requires a quorum, \(1/2*n+1\), to be available in order to handle requests, e.g. reads, writes, acks, etc.
Storage: Each server in the cluster needs its own storage for JetStream data
Replication Latency: Replication adds some network overhead, low latency network connections are critical
Replication and Consistency
Replication Factor: With three nodes, 3x replication of JetStream Stream and Consumer data/metadata can be achieved
Consistency: Raft ensures strong consistency, meaning that all replicas agree on the order of the data
- Note: Replication is set at the Stream level and either inherited or optionally set at the Consumer level, “In Sync Replicas” will always be equal to Replicas, i.e.
acks = all
- Note: Replication is set at the Stream level and either inherited or optionally set at the Consumer level, “In Sync Replicas” will always be equal to Replicas, i.e.
Fault Tolerance: A replication factor of 3 allows the cluster to tolerate the failure of up to one server without losing any data, and continuing servicing requests
Increased Durability: Storing multiple copies of the data significantly reduces the risk of data loss due to hardware failures or other issues
Ideal Use Cases - Single Region Production
Single Region Production Environments: Essential for any production system that requires high availability and fault tolerance
Durable, Highly Available Apps: Applications where data loss or downtime is unacceptable
Scalable Systems: Systems that need to handle a high volume of messages and clients
NATS Server Config
A basic example of a server.conf
, you could create three separate configuration files (server1.conf, server2.conf, and server3.conf), each with the appropriate settings for that particular server. Although, the config can be reused depending on your networking environment. Note that routes pointed to “self” are smartly ignored making config sharing a bit easier.
listen: 0.0.0.0:4222
cluster {
name: my-cluster
listen: 0.0.0.0:6222
routes: [
"nats://server1:6222",
"nats://server2:6222",
"nats://server3:6222"
]
}
jetstream {
store_dir: /data/jetstream
max_memory_store: 1GB
max_file_store: 10GB
}
Key Differences
cluster
: Section defines the details of our clustercluster.name
: Uniquely identifies the clustercluster.listen
: The host and port to listen on for incoming clustering connectionscluster.routes
: A list of dedicated Routes to other serversjetstream
: Configures JetStream, including storage target and limits
NATS.Net Integration
Connecting to a NATS cluster with the NATS.Client.Core
and NATS.Client.JetStream
libraries is straightforward. This time, you can provide a list of server URLs, and the client will automatically connect to the first available server. If that server becomes unavailable, the client will attempt to connect to another server in the list.
var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<Program>();
var opts = new NatsOpts
{
Url = "nats://server1:4222,nats://server2:4222,nats://server3:4222",
LoggerFactory = loggerFactory
};
var connection = await GetConnectionAsync(logger, opts);
var stream = await GetStreamAsync(logger, connection);
async Task<INatsConnection> GetConnectionAsync(ILogger logger, NatsOpts opts)
{
await using var connection = new NatsConnection(opts);
var rtt = await connection.PingAsync(CancellationToken.None);
logger.LogInformation("Ping successful - {RTT}ms to {@ServerInfo}",
rtt.TotalMilliseconds,
connection.ServerInfo);
return connection;
}
async Task<INatsJSStream> GetStreamAsync(ILogger logger, INatsConnection connection)
{
var context = new NatsJSContext(connection);
var config = new StreamConfig("my-first-stream", ["some.subjects.>"])
{
NumReplicas = 3
};
var stream = await context.CreateOrUpdateStreamAsync(config, CancellationToken.None);
logger.LogInformation("Stream created/updated - {@StreamInfo}", stream.Info);
return stream;
}
Key Differences
NatsOpts.Url
: Set to a comma separated list of server URLsNATS.Client.Core
: Handles connecting among multiple servers, starting with the first available server and reconnecting to another server if the connection is lostNatsJSContext
: Building on top of Core NATS, the JetStream Context accepts an existingNatsConnection
my-first-stream
: Configured with a replication factor of 3, ensuring that data is replicated across all three servers in the cluster
Wrap Up
In this first part exploring NATS topologies, from single to three node cluster, we've laid the groundwork for building robust and scalable messaging systems with NATS.
Next Up - Multiregional Clusters!
Moving beyond single region clusters, we're now ready to take our NATS deployments to the next level. In Part 2: Multiregional Clusters, we'll dive into how Superclusters and Leaf Nodes enable you to connect NATS deployments across geographical distances, building resilient and scalable messaging solutions that span the globe. Get ready to unleash the full potential of NATS for distributed, mission-critical applications!
Have a specific question about NATS? Want a specific topic covered? Drop it in the comments!
Subscribe to my newsletter
Read articles from Joshua Steward directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Joshua Steward
Joshua Steward
Engineering leader specializing in distributed event driven architectures, with a proven track record of building and mentoring high performing teams. My core expertise lies in dotnet/C#, modern messaging platforms, and Microsoft Azure, where I've architected and implemented highly scalable and available solutions. Explore my insights and deep dives into event driven architecture, patterns, and practices on my platform https://concurrentflows.com/. Having led engineering teams in collaborative and remote-first environments, I prioritize mentorship, clear communication, and aligning technical roadmaps with stakeholder needs. My leadership foundation was strengthened through experience as a Sergeant in the U.S. Marine Corps, where I honed skills in team building and operational excellence.