NATS Cluster Architectures: Multiregional Clusters - Connecting the Globe


NATS Topologies for Distributed Systems
In the first part of our NATS Cluster Architectures series, we established robust foundations with regional clusters, covering single-node and three-node NATS deployments. Now, we're ready to expand our horizons and tackle the complexities of connecting your systems across geographical divides.
As applications scale and user bases globalize, bridging data centers or cloud regions becomes essential. NATS excels in these scenarios, offering uniquely flexible multiregional topologies. This installment moves beyond local networks to explore how NATS helps you truly connect the globe.
Here, we'll be diving into:
Three-Region Superclusters
Leaf Node Deployments
And for each topology, we'll examine:
Key characteristics and design considerations
Implications for JetStream persistence
Tradeoffs in consistency, latency, and availability
Ideal use cases
Impact on application integration, particularly for C#/.NET clients
Disaster Ready: Multiregional Supercluster
Superclusters in NATS provide a very high degree of fault tolerance and scalability by interconnecting multiple independent NATS clusters. This allows for resilience across geographically dispersed regions. Here, we'll explore a Supercluster with three regions: Region1, 2 and 3.
In this topology, each region operates as a distinct, three node NATS cluster, as described in the previous section. These regional clusters are then interconnected through Gateways. Gateways establish connections between the regional clusters, allowing for the exchange of messages and subject interest information.
Characteristics at a Glance
High Availability: Failure of an entire region can be tolerated, although some data loss is possible depending on JetStream configuration
Global Scalability: Scales horizontally by adding more regions or adding servers to existing regions
Geographic Distribution: Connects NATS deployments across wide-area networks and different geographical locations
Increased Complexity: Multiple clusters, across multiple regions demands careful, thorough design and configuration with a high level of operational observability and expertise
Potential for Increased Latency: Cross region communication will naturally have higher latency than local communication
The nature of a Gateway
Gateways operate on a dedicated port, separate from client or route ports. They gossip gateway nodes and any discovered gateways. Gateways have several very different characteristics from route connections.
Named Connections: Gateway connections themselves are named, specifying their cluster
Gateway Mesh: A full mesh is formed between clusters instead of between all individual servers
Unidirectional Connections: Unidirectional connections bind gateways, though discovery makes them appear bidirectional
Client Transparency: Gateways connections are not gossiped or propagated to clients
Gateway connections are designed to reduce the number of connections required between servers when joining clusters as compared to a full mesh of all servers across all clusters. For instance, with three clusters, each with three nodes; a full mesh routed cluster requires 36 connections, yet only 18 gateway connections are required.
Another key aspect of gateway connections is the optimization of interest graph propagation. This limits unnecessary cross region chatter. Interest propagation across gateways uses mechanisms like Interest-only Mode and Queue Subscriptions. Interest-only requires remote gateways to have explicitly registered interest in a subject. Queue Subscription semantics are honored globally, i.e. messages are delivered to a single subscriber in a queue group across the Supercluster, prioritizing local queue subscribers before failing over to the next lowest RTT cluster.
JetStream Considerations
JetStream deployed across a Supercluster provides global data distribution, high availability, and fault tolerance. However, resource placement, consistency boundaries and their guarantees must be well understood.
Local Placement: Streams and Consumers can only be placed among servers within a cluster, tags across regions will fail to deploy
Local Replication: Streams and Consumers can only replicate data within a cluster, i.e. across routes, not gateways
Meta Group Dependence: The JetStream Meta Group is made up of the entire Supercluster, if quorum cannot be reached, new resources cannot be created, i.e. new Streams and Consumers
Regional Resilience: Each region operates its own JetStream resources independently, each with their own RAFT group, even in the loss of a Metagroup quorum, reads/writes to existing resources succeed
Replication and Consistency
Within a Region/Cluster: Strong consistency is ensured within each regional cluster with consensus among all replicas
Mirrored & Sourced Streams: To achieve fault tolerance across regions Streams can be Mirrored or Sourced, with mirroring being a read-only copy of one Stream and sourced being a locally writeable copy of one or many Streams
Eventual Consistency: Mirrored & Sourced Streams provide eventual consistency, data written to the source stream(s) will eventually be replicated to the mirrored/sourced stream
Data Loss Possible: In the event of a catastrophic regional failure, data loss is possible, even with mirroring/sourcing, although loss is limited to the replication window
Ideal Use Cases - Globally Available and Disaster Tolerant
Global Applications: When users and/or services are distributed across multiple regions
Disaster Tolerance Scenarios: Where you need to ensure business continuity in the event of a regional outage
Multi Cloud Deployments: When multiple cloud providers are required, supercluster gateways bridge independent provider clusters
NATS Server Config
Getting interesting! This shows how you might configure the servers within one region, region1
. Extrapolating to other regions is trivial, and again “self” gateways are smartly ignored.
port: 4222
cluster: {
name: region1
listen: 0.0.0.0:6222
routes: [
"nats://region1-server1:6222",
"nats://region1-server2:6222",
"nats://region1-server3:6222"
]
}
gateway: {
name: region1
port: 7222
gateways: [
{ name: "region1", url: "nats://region1-gateway:7222" },
{ name: "region2", url: "nats://region2-gateway:7222" },
{ name: "region3", url: "nats://region3-gateway:7222" }
]
}
jetstream: {
store_dir: /data/jetstream
max_memory_store: 1GB
max_file_store: 10GB
}
Key Differences
cluster.*
: Each region has its own cluster configuration, i.e.cluster.name
,cluster.listen
,routes
gateway
: Section defines how the regional clusters connectgateway.name
: Names the cluster, all gateways within a cluster must define the same namegateway.listen
: The host and port for incoming gateway connections from other regionsgateway.gateways
: A list of the gateway addresses of other regions
NATS.Net Integration
Connecting to a NATS Supercluster from a C# application is similar to connecting to a single cluster. Clients connect to a server within their local region. The Supercluster handles the routing of messages across regions.
const string Region1 = "region1";
const string Region2 = "region2";
var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<Program>();
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
var subscribing = Region2JetStreamSubscribeAsync(cts.Token);
var publishing = Region1JetStreamPublishAsync(cts.Token);
await Task.WhenAll(publishing, subscribing);
async Task Region1JetStreamPublishAsync(CancellationToken token)
{
var context = GetRegion1JetStreamContext();
Placement placement = new() { Cluster = Region1 };
var streamConfig = new StreamConfig($"{Region1}-stream", [$"{Region1}.>"])
{
NumReplicas = 3,
Placement = placement
};
_ = await context.CreateOrUpdateStreamAsync(streamConfig, token);
foreach (var id in Enumerable.Range(1, 10))
await context.PublishAsync($"{Region1}.data", $"From {Region1}: {id}", cancellationToken: token);
logger.LogInformation("{Region} client completed publishing", Region1);
}
async Task Region2JetStreamSubscribeAsync(CancellationToken token)
{
var context = GetRegion2JetStreamContext();
Placement placement = new() { Cluster = Region2 };
SubjectTransform transform = new() { Src = $"{Region1}.>", Dest = $"{Region2}.from-{Region1}.>" };
StreamSource source = new()
{
Name = $"{Region1}-stream",
SubjectTransforms = [transform]
};
var streamConfig = new StreamConfig($"{Region2}-stream", [$"{Region2}.>"])
{
NumReplicas = 3,
Placement = placement,
Sources = [source]
};
var stream = await context.CreateOrUpdateStreamAsync(streamConfig, token);
var consumer = await stream.CreateOrderedConsumerAsync(cancellationToken: token);
await foreach (var msg in consumer.ConsumeAsync<string>(cancellationToken: token))
{
logger.LogInformation("{Region} client received: {Message}", Region2, msg.Data);
await msg.AckAsync(cancellationToken: token);
}
}
INatsJSContext GetRegion1JetStreamContext()
=> new NatsJSContext(BuildNatsConnection(Region1));
INatsJSContext GetRegion2JetStreamContext()
=> new NatsJSContext(BuildNatsConnection(Region2));
INatsConnection BuildNatsConnection(string region)
=> new NatsConnection(new()
{
Url = $"nats://{region}1:4222,nats://{region}2:4222,nats://{region}3:4222",
LoggerFactory = loggerFactory
});
Key Differences
This sample becomes a bit more complex but showcases much more functionality.
Region1JetStreamPublishAsync
Creates a region local stream,
region1-stream
Consumes all subjects matching
region1.>
Finally, publishes 10 messages to
region1.data
that get ingested intoregion1-stream
Region2JetStreamSubscribeAsync
Creates a region local stream,
region2-stream
Consumes all subjects matching
region2.>
Also, Sources
region1-stream
, replicating all data locallyThen Transforms all inbound
region1.>
subjects to an east-centric version,region2.from-region1.>
Finally, consumes all available messages via an ephemeral ordered Consumer
Localized Reliability: Leaf Nodes
Leaf Nodes extend an existing NATS deployment, be it single server, cluster, supercluster, etc., to conceptually edge deployments. This creates an isolation boundary and local resiliency with respect to the remote Hub. Leaf Nodes provide communication connectivity locally within their edge boundary even when disconnected from the remote. Unlike servers in the remote hub cluster, Leaf Nodes do not participate in the same routing mesh. Instead, they establish an account scoped point-to-point connection(s) to the remote. Further, Leaf Nodes themselves do not need to be reachable by clients but instead can be used to form any acyclic topology.
Client reachable Leaf Nodes present a discrete NATS context distinct from that of the remote. Clients connect against a local AuthN/AuthZ policy of the Leaf Node, and the Leaf Node forwards messages to and from the remote. Only the credentials required to form the Leaf Node to hub connection need be shared between the domains, allowing security and generally Operator domains to be bridged. The remote hub sees the Leaf Node as a single account scoped connection, carrying with it the permissions, exports, and imports of that account.
Characteristics at a Glance
Network Segmentation: Leaf Nodes represent isolated segments, preventing unnecessary traffic from crossing network boundaries
Connection Schemes: Leaf Nodes can connect to their remote via plain TCP, TLS, and even WebSockets
Controlled Connectivity: Account/User level configuration provides fine grained control over which subjects are exchanged with the upstream
Resilience: Leaf Nodes provide localized resilience, if the connection to the remote is lost, clients within the Leaf Node's network segment can still communicate with each other
Distinct AuthN/AuthZ: Clients of the Leaf Node connect against the local auth policy, while the Leaf Node connection to the remote is a separate shared policy
JetStream Considerations
Leaf Nodes can either extend or isolate a JetStream Domain. The JetStream Domain is an independent namespace within JetStream that disambiguates requests between NATS deployments. Extending a central JetStream deployment through a leaf deployment would map to the same domain. While isolating JetStream deployments accessible to the same client would be done via unique JetStream domain names.
Isolated Domains: Leaf Nodes can isolate JetStream domains, selectively sharing between the independent namespaces
Data Resiliency: JetStream data is stored locally on the Leaf Node, ensuring that data is preserved even if the connection to a remote is temporarily lost
Data Locality: Ensures data locality for compliance or performance reasons
Consistency Separation: Leaf local streams can be immediately consistent, while streams shared across the Leaf node connection can be eventually consistent via sourcing/mirroring to/from the remote
Ideal Use Cases - Segmented or Intermittent Connectivity
Remote Distribution: Connecting NATS servers in different, remote locations, such as disparate local offices, remote sites, etc.
Organization Separation: Leaf Nodes bridge security and operator domains allowing zero trust between separate entities
Edge Computing: Deploying Leaf Nodes at the edge of the network to collect and process data locally before forwarding it to a central system
Network Segmentation: Isolating different parts of your network for security or performance reasons
Data Localization: Keeping data within a specific region or network segment for compliance or performance
NATS Server Config
Leaf Node server config becomes a bit more involved and needs to be well planned. Here, we see a shared account between the Hub and Leaf, while each may have other respective local accounts.
accounts.shared.conf
accounts: {
SYS: {
users: [{ user: sys, password: sys }]
},
LEAF: {
users: [{ user: leaf, password: leaf }],
jetstream: enabled
}
}
system_account: SYS
server.hub.conf
port: 4222
server_name: hub
include ./accounts.conf
jetstream: {
store_dir: /data/jetstream
domain: hub
max_memory_store: 1GB
max_file_store: 10GB
}
leafnodes: {
port: 7422
}
server.leaf.conf
port: 4222
server_name: leaf-node
include ./accounts.conf
jetstream: {
store_dir: /data/jetstream
domain: leaf
max_memory_store: 1GB
max_file_store: 10GB
}
leafnodes: {
remotes: [
{
urls: ["nats://leaf:leaf@0.0.0.0:7422"]
account: "LEAF"
}
]
}
Key Differences
accounts.shared.conf
: LEAF account is shared and used to connect the Leaf Node back to the Hubjetstream.domain
: Defines separate isolated JetStream Domains, namespaces, for the Leaf Node and HubHub
leafnodes.*
: Enables the server to listen for Leaf Node connections on the specified portLeaf
leafnodes.*
: Configures a Leaf Node connection back to the Hub on the specified port, using the shared Account and User
NATS.Net Integration
Connecting to a Leaf Node from a C# application is much the same as connecting to any NATS server. The client authenticates with a local account and is unaware that it's connected to a Leaf Node or that there is anything ‘upstream’.
var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<Program>();
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
var opts = new NatsOpts
{
Url = "nats://leaf:leaf@leaf-node:4222",
LoggerFactory = loggerFactory
};
await using var connection = new NatsConnection(opts);
var context = new NatsJSContext(connection);
var streamConfig = new StreamConfig("local-stream", ["leaf.>"]);
var stream = await context.CreateStreamAsync(streamConfig, cts.Token);
var consumer = await stream.CreateOrderedConsumerAsync(cancellationToken: cts.Token);
await context.PublishAsync<string>("leaf.data", "Hello World");
var msg = await consumer.NextAsync<string>(cancellationToken: cts.Token);
logger.LogInformation("Leaf received: {Message}", msg?.Data);
await (msg?.AckAsync(cancellationToken: cts.Token) ?? ValueTask.CompletedTask);
await connection.PublishAsync("remote.data", "Message to Remote", cancellationToken: cts.Token);
Wrap Up
The second part of the NATS Topologies set expanded our focus and explored two key NATS topologies:
Superclusters: Understood how Superclusters interconnect independent NATS clusters across regions, providing high availability and examined JetStream replication and consistency in a multi-regional context
Leaf Nodes: Then we discussed how Leaf Nodes enable you to extend a central NATS deployment across different networks, and isolated security domains, also highlighting the unique implications of isolated JetStream domains
But there's still more to discover!
Next Up - Advanced Topologies!
In Part 3: Advanced Topologies, we'll push the boundaries of NATS even further, literally! First, we’ll be exploring the nature of a Stretch Cluster and how it differs from a Supercluster. Then, we’ll look at the Leaf Cluster, trading immediate consistency for greater resiliency. Get ready to master some of the most powerful and intricate NATS architectures!
Have a specific question about NATS? Want a specific topic covered? Drop it in the comments!
Subscribe to my newsletter
Read articles from Joshua Steward directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Joshua Steward
Joshua Steward
Engineering leader specializing in distributed event driven architectures, with a proven track record of building and mentoring high performing teams. My core expertise lies in dotnet/C#, modern messaging platforms, and Microsoft Azure, where I've architected and implemented highly scalable and available solutions. Explore my insights and deep dives into event driven architecture, patterns, and practices on my platform https://concurrentflows.com/. Having led engineering teams in collaborative and remote-first environments, I prioritize mentorship, clear communication, and aligning technical roadmaps with stakeholder needs. My leadership foundation was strengthened through experience as a Sergeant in the U.S. Marine Corps, where I honed skills in team building and operational excellence.