Benchmarking IBM Storage Ceph Object Gateway (RGW) - Part 3: Security at Scale, Encryption in Transit (TLS and Secure Messenger v2) Performance

Daniel ParkesDaniel Parkes
7 min read

Previously, in Part 2 of our Performance Blog Post Series

We explored how IBM Storage Ceph RGW scales with node count, revealing near-linear throughput improvements for both PUT and GET operations across four, eight, and twelve nodes. We also examined how resource usage trends evolve with cluster size, confirming that scaling out improves efficiency even for resource-intensive erasure-coded workloads. If you haven’t checked it out, here is the link for Part 2.

TLS Termination Performance: S3 Endpoint Encryption in Transit

To evaluate the impact of TLS on IBM Storage Ceph Object Gateway (RGW) S3 endpoint performance, we compared three common deployment strategies: end-to-end encryption (SSL at RGW), SSL termination at the cephadm-deployed ingress service (HAProxy), and an unencrypted baseline. The HAproxy services run collocated with the Ceph Object Gateway (RGW) services. There is a virtual IP address (V IP) configured per node, and the benchmark clients balance requests across all VIPs configured. These tests were performed on a twelve-node cluster using EC 8+3 with both mid/large (4 MiB) and small (64 KiB) object workloads.

Medium/Large Object Workloads (4 MiB)

This section focuses on 4 MiB object size results, which serve as a representative case for medium to large object sizes. For even larger objects, the trends observed here generally hold or improve further due to greater transmission efficiency with lower per-byte overhead.

SSL at the Ceph Object Gateway (RGW) delivered nearly identical throughput to the no-SSL configuration. For GET requests, the difference was just 0.4% lower, while PUT throughput showed a slightly larger decline of 4.2%. This small delta is expected, as the cluster was network-bound. Although the average Ceph Object Gateway (RGW) CPU usage increased by 40% for GETs and 71% for PUTs during these tests, the maximum CPU utilization per Ceph host reached ~83%, the additional load had no material impact on performance, confirming that SSL at RGW is a secure-by-default option with a negligible performance penalty for large object workloads.

MetricNo SSL (Baseline)SSL at RGW% Change (vs. No SSL)
GET Throughput (MiB/s)96,35195,965-0.4%
PUT Throughput (MiB/s)58,08655,651-4.2%
GET Latency (ms)42420%
PUT Latency (ms)6470+9.4%
RGW CPU (GET)3.19 cores4.46 cores+40%
RGW CPU (PUT)3.62 cores6.19 cores+71%

In contrast, terminating SSL at the ingress service (HAProxy) showed more noticeable effects. Throughput declined by approximately 27% for GETs and 19% for PUTs, and latency increased accordingly. This drop was not due to SSL overhead itself, but rather the shift of encryption workload to the HAProxy layer. Under a heavy workload, each HAProxy daemon consumed 3 to 6 vCPUs (on average) as the object size is scaled from 64K to 1GB. The peak CPU utilization on Ceph hosts reached up to 90%, highlighting the need for appropriate HAProxy tuning and scaling to prevent it from becoming a bottleneck.

Small Object Workloads (64 KiB)

With small objects, throughput naturally shifts from being network-bound to CPU-bound, making encryption overhead more apparent. Still, the impact of enabling SSL at the Ceph Object Gateway (RGW) was manageable. GET IOPS dropped by 5.2%, and PUT IOPS declined by 10.7% relative to the no-SSL baseline. Ceph Object Gateway (RGW) CPU usage increased by 4.2% for GETs and 11.3% for PUTs, indicating that the encryption work was well-distributed across the cluster. Despite the higher sensitivity of small object workloads to CPU usage, end-to-end SSL remains practical, with performance degradation kept within single-digit percentages for most cases.

MetricNo SSL (Baseline)SSL at RGW% Change (vs. No SSL)
GET IOPS137,226130,089-5.2%
PUT IOPS75,07467,013-10.7%
GET Latency (ms)3.053.25+6.6%
PUT Latency (ms)9.7510.00+2.5%
RGW CPU (GET)6.11 cores6.37 cores+4.2%
RGW CPU (PUT)2.20 cores2.45 cores+11.3%

Ingress-terminated SSL again introduced a larger performance gap. GET IOPS fell by approximately 18%, and PUT IOPS by 8%, compared to the no-SSL case. This was accompanied by increased ingress CPU consumption and slightly higher request latency. While the numbers suggest a larger performance delta, this setup remains valid for production deployments where security policies dictate TLS offload, provided ingress scaling is aligned with concurrency and throughput goals.

Conclusion - SSL/TLS S3 Endpoint security

In summary, SSL/TLS at the Ceph Object Gateway (RGW) offers an outstanding balance between security and performance for most object workloads. It delivers near-baseline throughput for large objects and modest performance degradation for small ones, all while preserving end-to-end encryption.

Cluster Services Encryption In Transit: Safeguarding Internal Traffic at Scale

As security standards continue to evolve, securing internal communication between Ceph daemons is becoming a best practice for production deployments, especially in regulated environments. In IBM Storage Ceph, this internal encryption is enabled via Messenger v2 Secure Mode, also referred to as cluster network encryption or internal encryption in transit. Unlike TLS, which secures traffic between external clients and the S3 Ceph Object Gateway(RGW) endpoint, Messenger v2 ensures that all inter-daemon traffic — including RGW-to-OSD, OSD-to-Monitor, and Manager communications — is encrypted and authenticated.

This section examines the performance impact of enabling Messenger v2 Secure Mode on top of a Ceph Object Gateway (RGW) SSL-enabled baseline. Both configurations — with and without secure mode — used SSL at the RGW for client-facing encryption. The tests were conducted on a twelve-node cluster using 8+3 erasure coding with medium-to-large object sizes (1MB to 256MB).

High-Level Overview: Minimal Overhead for Stronger Security

We evaluated the throughput and latency of GET and PUT operations across configurations with and without Messenger v2 security enabled. As shown in the graph and table below, the performance delta was negligible for both read and write operations, demonstrating that internal encryption in transit is compatible with high-throughput object storage use cases.

Next is a table that provides a complete comparison of the % Change between the 4MB Object Size we are using as reference

MetricNo EncryptionSecure Messenger v2% Change
GET Throughput95,965 MiB/s95,671 MiB/s-0.3%
PUT Throughput55,651 MiB/s52,994 MiB/s-4.8%
GET Latency42 ms42 ms0%
PUT Latency70 ms71 ms+1.4%
RGW CPU (GET)4.46 cores4.38 cores-1.8%
RGW CPU (PUT)6.19 cores6.55 cores+5.8%
RGW Memory (GET)313 MiB336 MiB+7.3%
RGW Memory (PUT)308 MiB329 MiB+6.8%

Analysis:

  • Throughput impact:
    Across all tested object sizes (1 MiB to 256 MiB), GET throughput remained effectively unchanged after enabling Messenger v2 secure mode. PUT throughput showed a modest drop, most notably with 1 MiB objects (−3.1%) and flattening out to near-zero impact at larger sizes (e.g., −0.4% at 64 MiB and 256 MiB). This trend aligns with expectations, as smaller objects amplify coordination and encryption overhead, while larger objects are more throughput-bound and amortize the cost.

  • Latency impact:
    Both GET and PUT latencies remained stable across the board. Variations observed were minimal (typically ±1 ms), confirming that enabling Messenger v2 secure mode does not introduce meaningful queuing or processing delay, even under high concurrency and across varying object sizes.

  • Resource utilization:
    CPU usage at the RGW level increased slightly for PUT operations (~2–6% depending on object size), while GET CPU usage stayed essentially flat. Memory consumption showed similarly modest changes (within ~5–7%), with no signs of resource exhaustion or saturation.

Conclusion – Messenger v2 for Internal Encryption

Enabling Messenger v2 Secure Mode adds cryptographic protection to internal Ceph daemon communications with negligible performance impact. Our testing showed stable throughput and latency across all object sizes, with only modest increases in RGW memory and CPU usage, primarily for PUT operations. The design of Messenger v2 ensures strong security guarantees with minimal trade-offs, making it highly compatible with high-throughput, enterprise-grade object storage deployments.

Final Recommendation – Secure by Design: TLS + Messenger v2

For environments that require strong data protection both in transit to clients and internally between cluster services, the combination of TLS at the S3 endpoint and Messenger v2 for internal encryption provides robust security with minimal impact on performance.

Whether you're securing AI pipelines, analytics platforms, or multi-tenant object storage services, IBM Storage Ceph RGW demonstrates that full-stack encryption can be deployed confidently, without compromising throughput, latency, or scalability.

0
Subscribe to my newsletter

Read articles from Daniel Parkes directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Daniel Parkes
Daniel Parkes