Why does NATS topology matter?

NATS topology refers to the way your NATS servers are arranged and connected. Unlike many messaging systems with more rigid deployment models, NATS offers remarkable flexibility. You're not shoehorned into a one-size-fits-all approach; instead, you become the architect of your messaging fabric.

Your choice of topology, whether a single server, resilient cluster, leaf nodes, a supercluster, etc. will directly shape fault tolerance, message delivery guarantees, network segmentation, and overall system complexity. This means you can tailor your NATS deployment precisely to your needs, but it requires careful consideration.

This miniseries of posts will explore a variety of NATS server topologies, increasing in their complexity. Here, we’ll be covering:

The single instance deployment
A resilient three-node cluster

And for each:

Key characteristics of each topology
Tradeoffs in consistency, latency, and availability
Potential use cases
Impact on application integration, particularly within C#/.NET clients

The Singularity: Single NATS Server

A single NATS server is the most fundamental setup, deployed as a single instance of the NATS server. It's the easiest to grasp and get started with, making it a natural entry point for development and simple use cases. This is the same deployment we used in “Getting Started with NATS in C#”

In this topology, a single NATS server process handles all client connections, message routing, and any configured persistence, i.e. JetStream, if enabled. All publishers and subscribers within our system connect directly to this single endpoint.

Characteristics at a Glance

Simplicity: Straightforward to configure and manage
Low Overhead: Requires the least amount of infrastructure resources, i.e. CPU, memory, network
Single Point of Failure: If the server becomes unavailable due to hardware failure, network issues, or software problems, the entire messaging system halts
Limited Scalability: The performance and throughput of the system are bounded by the capacity of the single server instance
- Note: Although, a single NATS server can still be quite performant, to the tune of millions of messages/sec depending on resources
No Redundancy: Availability limited to the single node, no automatic failover or backup
No JetStream Replication: Of course, with a single server, no JetStream resources can be replicated

Ideal Use Cases - Dev/Local/Non-Prod

Local Dev and Testing: Perfect for a local dev NATS environment and local integration testing
Demos, POCs: Proving new messaging patterns among microservices, demoing new features, etc.
Learning and Exploration: Low barrier of entry to understand the basic concepts of NATS without the complexities of clustering

NATS Server Config

Couldn’t be simpler! This specifies that clients should connect on the default port 4222 and the server will be listening on 0.0.0.0:4222

port: 4222

Mounting a file for just this might be overkill, you can simply specify many options via the command line as well. While 4222 is the default, just for example, with Docker, this is equivalent to the server.conf above

docker run --rm --name my-nats -p "4222:4222" nats -p 4222

NATS.Net Integration

Connecting to a single NATS server will use a connection string pointing to its address, just the same as we saw in “Getting Started with NATS in C#”. For completeness, the NATS.NET client library provides the tools for this. Importantly, in this topology, our app becomes directly reliant on the availability of this single server. As a result, several factors will impact our app design.

Connection Management: The NATS.Client.Core library does provide robust connection management, but with a single server, its primarily focused on the initial connection and handling disconnects
Initial Connection: By default, the client will lazily connect to the server, and fail fast on the initial connection attempt, i.e. the initial attempt is not retried and a NatsException is thrown
Reconnects: After an initial successful connection, any drop will result in a reconnect attempt up to a configured limit via NatsOpts.*Reconnect* properties
No Automatic Failover: Reconnection attempts have no other route except the same single address

var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<Program>();

var opts = new NatsOpts
{
    Url = "nats://localhost:4222",
    LoggerFactory = loggerFactory
};

await using var connection = new NatsConnection(opts);

var rtt = await connection.PingAsync(CancellationToken.None);

logger.LogInformation("Ping successful - {RTT}ms to {@ServerInfo}",
    rtt.TotalMilliseconds,
    connection.ServerInfo);

Resilience Through Redundancy: Three Node Cluster

Clustering in NATS provides fault tolerance and increased capacity by connecting multiple NATS servers. A cluster of three nodes is a common starting point for production deployments and offers a balance between resilience and complexity.

In this topology, three NATS servers are interconnected via routes, and form a full mesh. Each server maintains bidirectional connections to the others, allowing them to fully propagate the Subject interest graph. Clients can connect to any server in the cluster, and the cluster will route messages appropriately, even if a server becomes unavailable.

Characteristics at a Glance

Fault Tolerance & Increased Availability: The cluster can still serve Core NATS traffic even in the face of a two node failure, i.e. operates with only a single node
Improved Scalability: Distributes the load across multiple servers, increasing the overall capacity of the system.
Automatic Discovery: NATS servers in a cluster will automatically discover each other after reaching any ‘seed’ server, cluster gossip ensures a full mesh is maintained
Message Routing: Servers route messages to the appropriate destinations within the cluster, cluster is transparent to publisher/subscribers beyond any connection string
Increased Complexity: Although still minimal, configuring and managing a cluster does involve more complexity than a single server

JetStream Considerations

Clustering is crucial for JetStream's fault tolerance and data durability. JetStream leverages the Raft consensus algorithm to replicate data across multiple servers within a cluster. This ensures, to the limit of consensus, that the data remains available and consistent.

Quorum: JetStream requires a quorum, \(1/2*n+1\), to be available in order to handle requests, e.g. reads, writes, acks, etc.
Storage: Each server in the cluster needs its own storage for JetStream data
Replication Latency: Replication adds some network overhead, low latency network connections are critical

Replication and Consistency

Replication Factor: With three nodes, 3x replication of JetStream Stream and Consumer data/metadata can be achieved
Consistency: Raft ensures strong consistency, meaning that all replicas agree on the order of the data
- Note: Replication is set at the Stream level and either inherited or optionally set at the Consumer level, “In Sync Replicas” will always be equal to Replicas, i.e. acks = all
Fault Tolerance: A replication factor of 3 allows the cluster to tolerate the failure of up to one server without losing any data, and continuing servicing requests
Increased Durability: Storing multiple copies of the data significantly reduces the risk of data loss due to hardware failures or other issues

Ideal Use Cases - Single Region Production

Single Region Production Environments: Essential for any production system that requires high availability and fault tolerance
Durable, Highly Available Apps: Applications where data loss or downtime is unacceptable
Scalable Systems: Systems that need to handle a high volume of messages and clients

NATS Server Config

A basic example of a server.conf, you could create three separate configuration files (server1.conf, server2.conf, and server3.conf), each with the appropriate settings for that particular server. Although, the config can be reused depending on your networking environment. Note that routes pointed to “self” are smartly ignored making config sharing a bit easier.

listen: 0.0.0.0:4222
cluster {
  name: my-cluster
  listen: 0.0.0.0:6222
  routes: [
    "nats://server1:6222",
    "nats://server2:6222",
    "nats://server3:6222"
  ]
}
jetstream {
  store_dir: /data/jetstream
  max_memory_store: 1GB
  max_file_store: 10GB
}

Key Differences

cluster: Section defines the details of our cluster
cluster.name: Uniquely identifies the cluster
cluster.listen: The host and port to listen on for incoming clustering connections
cluster.routes: A list of dedicated Routes to other servers
jetstream: Configures JetStream, including storage target and limits

NATS.Net Integration

Connecting to a NATS cluster with the NATS.Client.Core and NATS.Client.JetStream libraries is straightforward. This time, you can provide a list of server URLs, and the client will automatically connect to the first available server. If that server becomes unavailable, the client will attempt to connect to another server in the list.

var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
var logger = loggerFactory.CreateLogger<Program>();

var opts = new NatsOpts
{
    Url = "nats://server1:4222,nats://server2:4222,nats://server3:4222",
    LoggerFactory = loggerFactory
};

var connection = await GetConnectionAsync(logger, opts);
var stream = await GetStreamAsync(logger, connection);

async Task<INatsConnection> GetConnectionAsync(ILogger logger, NatsOpts opts)
{
    await using var connection = new NatsConnection(opts);

    var rtt = await connection.PingAsync(CancellationToken.None);

    logger.LogInformation("Ping successful - {RTT}ms to {@ServerInfo}",
        rtt.TotalMilliseconds,
        connection.ServerInfo);
    return connection;
}

async Task<INatsJSStream> GetStreamAsync(ILogger logger, INatsConnection connection)
{
    var context = new NatsJSContext(connection);
    var config = new StreamConfig("my-first-stream", ["some.subjects.>"])
    {
        NumReplicas = 3
    };

    var stream = await context.CreateOrUpdateStreamAsync(config, CancellationToken.None);
    logger.LogInformation("Stream created/updated - {@StreamInfo}", stream.Info);
    return stream;
}

Key Differences

NatsOpts.Url: Set to a comma separated list of server URLs
NATS.Client.Core: Handles connecting among multiple servers, starting with the first available server and reconnecting to another server if the connection is lost
NatsJSContext: Building on top of Core NATS, the JetStream Context accepts an existing NatsConnection
my-first-stream: Configured with a replication factor of 3, ensuring that data is replicated across all three servers in the cluster

Wrap Up

In this first part exploring NATS topologies, from single to three node cluster, we've laid the groundwork for building robust and scalable messaging systems with NATS.

Next Up - Multiregional Clusters!

Moving beyond single region clusters, we're now ready to take our NATS deployments to the next level. In Part 2: Multiregional Clusters, we'll dive into how Superclusters and Leaf Nodes enable you to connect NATS deployments across geographical distances, building resilient and scalable messaging solutions that span the globe. Get ready to unleash the full potential of NATS for distributed, mission-critical applications!

Have a specific question about NATS? Want a specific topic covered? Drop it in the comments!

NATS Cluster Architectures: Regional Clusters - Building Reliable Messaging Foundations

Why does NATS topology matter?

The Singularity: Single NATS Server

Characteristics at a Glance

Ideal Use Cases - Dev/Local/Non-Prod

NATS Server Config

NATS.Net Integration

Resilience Through Redundancy: Three Node Cluster

Characteristics at a Glance

JetStream Considerations

Replication and Consistency

Ideal Use Cases - Single Region Production

NATS Server Config

Key Differences

NATS.Net Integration

Key Differences

Wrap Up

Next Up - Multiregional Clusters!

Subscribe to my newsletter

Joshua Steward

Joshua Steward