Part 2: Seamless DNS Routing with CoreDNS and Virtual-Host Style Bucket Access

Part 2: Bridging DNS to Ceph Ingress with CoreDNS: Virtual-Host Buckets and Health-Aware Routing
In Part 1, we laid the foundation for dynamic, health-aware S3 ingress by deploying per-node Ingress HAProxy concentrators in IBM Storage Ceph and integrating them with Consul for real-time service registration and health monitoring. Each node now acts as an independent ingress point, and Consul continuously tracks their availability.
What is CoreDNS?
CoreDNS is a flexible and extensible DNS server, written in Go and widely adopted across cloud-native environments. It is the default DNS service in Kubernetes (including OpenShift), where it handles dynamic service resolution based on pod and service metadata.
CoreDNS can serve traditional DNS zone files, act as a forwarder, perform DNS rewrites, and even integrate with external sources like etcd, Consul, or Vault.
In our case, CoreDNS is the glue between standard DNS clients and Consul’s service-aware DNS backend.
The Problem CoreDNS Solves in our Configuration
Without CoreDNS, clients trying to resolve s3.cephlab.com
would need to:
Know about
.consul
as a DNS suffixTalk to Consul over port 8600 (which many libraries and resolvers don’t support)
Accept DNS responses that don’t match the hostname they requested (breaking S3-style virtual host access)
CoreDNS solves this cleanly by:
Listening on port 53, so clients don’t need to change their DNS settings
Forwarding specific domains (like
s3.cephlab.com
) to Consul on port 8600Rewriting queries and responses so they remain compatible with S3 SDK expectations
For example, a client querying mybucket.s3.cephlab.com
will be transparently mapped to the Consul service ingress-rgw-s3.service.consul
, and the response will be rewritten back to mybucket.s3.cephlab.com
, preserving DNS validation in tools like the AWS CLI or SDKs.
Enabling Virtual-Host-Style Bucket Access
Many S3-compatible tools and SDKs use virtual-host-style addressing, where the bucket name appears in the hostname:
GET https://mybucket.s3.cephlab.com
Instead of:
GET https://s3.cephlab.com/mybucket
This is particularly important for S3 V4 signatures, which include the hostname in the signature calculation. If the DNS response doesn’t match the requested name exactly, authentication may fail.
Our CoreDNS configuration handles this via DNS rewrite rules, mapping *.
s3.cephlab.com
to the Consul service ingress-rgw-s3.service.consul
, and vice versa for the response. This enables virtual-host-style access across all healthy nodes, with no extra configuration on the client side.
High Availability with Per-Node CoreDNS Containers
Just like with Consul, we’ll run CoreDNS on the same three Ceph nodes:
ceph-node-00
→192.168.122.12
ceph-node-01
→192.168.122.179
ceph-node-02
→192.168.122.94
Each node runs a CoreDNS container via cephadm
. These services:
Listen on port 53 (standard DNS)
Serve the
cephlab.com
zone from a zone fileForward
*.
s3.cephlab.com
queries to Consul on port 8600Handle DNS rewrites transparently
With this setup, DNS resolution remains distributed and highly available. Any Ceph node can resolve S3 bucket names locally and route clients to healthy ingress endpoints registered in Consul.
Integrating with Enterprise DNS via Stub Zones
In an enterprise environment, s3.cephlab.com
it is unlikely to be the root DNS domain. More likely, Active Directory, BIND, Infoblox, or a cloud DNS provider manages your primary DNS.
To make CoreDNS part of that ecosystem, we configure the enterprise DNS to delegate the s3.cephlab.com
subdomain to our CoreDNS nodes. This is commonly done using stub zones or conditional forwarding.
For example:
The authoritative DNS server for
cephlab.com
delegatess3.cephlab.com
to:192.168.122.12
192.168.122.179
192.168.122.94
Once this delegation is in place, all client queries for *.
s3.cephlab.com
will be handled by CoreDNS, which transparently maps them to the healthy ingress nodes registered in Consul.
This allows enterprise clients to benefit from dynamic, health-aware DNS load balancing, without needing to run Consul on every client, or modify their DNS resolvers.
CoreDNS File Structure
On each node where CoreDNS is going to run, we will have two files: the Corefile for configuration and the cephlab.com for a static zone:
/etc/coredns/Corefile
/etc/coredns/cephlab.com
CoreDNS configuration /etc/coredns/Corefile
For example, we are forwarding to a public DNS 8.8.8.8, but you could configure your upstream enterprise DNS here for internal resolution.
.:53 {
log
errors
forward . 8.8.8.8
}
cephlab.com {
file /etc/coredns/cephlab.com
prometheus
errors
log
debug
}
consul {
forward . 192.168.122.12:8600 192.168.122.179:8600 192.168.122.94:8600
log
errors
}
s3.cephlab.com {
rewrite stop {
name exact s3.cephlab.com ingress-rgw-s3.service.consul.
answer name ingress-rgw-s3.service.consul. s3.cephlab.com.
}
rewrite stop {
name regex (.*)\.s3\.cephlab\.com ingress-rgw-s3.service.consul.
answer auto
}
forward . 192.168.122.12:8600 192.168.122.179:8600 192.168.122.94:8600
log
errors
debug
}
Not relevant for this example, but I also have a static file to manage cephlab.com Zone
$ORIGIN cephlab.com.
@ 3600 IN SOA ns.cephlab.com. admin.cephlab.com. (
2017042745 ; serial
7200 ; refresh
3600 ; retry
1209600 ; expire
3600 ; minimum
)
3600 IN NS ns.example.com.
ns.cephlab.com IN A 127.0.0.1
ceph-node-00 IN A 192.168.122.12
ceph-node-01 IN A 192.168.122.179
ceph-node-02 IN A 192.168.122.94
We copy the files to the other nodes, as we want to run three CoreDNS instances for HA:
for i in 00 01 02 ; do scp -pr /etc/coredns/* ceph-node-$i:/etc/coredns ; done
Deploy CoreDNS via the cephadm custom containers feature. This is our example spec for the Coredns containers:
# core-dns.yaml
service_type: container
service_id: coredns
placement:
hosts:
- ceph-node-00
- ceph-node-01
- ceph-node-02
spec:
image: docker.io/coredns/coredns:latest
entrypoint: '["/coredns", "-conf", "/etc/coredns/Corefile"]'
args:
- "--net=host"
ports:
- 53
bind_mounts:
- ['type=bind','source=/etc/coredns/Corefile','destination=/etc/coredns/Corefile', 'ro=false']
Apply the cephadm spec and check if the services have started successfully:
$ ceph orch apply -i core-dns.yaml
$ ceph orch ps --daemon_type container | grep core
container.coredns.ceph-node-00 ceph-node-00 *:53,53 running (25h) 7m ago 25h 13.8M - <unknown> 0392ee038903 feaa8305273a
container.coredns.ceph-node-01 ceph-node-01 *:53,53 running (24h) 10m ago 24h 12.0M - <unknown> 0392ee038903 c86ee93dab60
container.coredns.ceph-node-02 ceph-node-02 *:53,53 running (24h) 10m ago 24h 12.1M - <unknown> 0392ee038903 06b73ad257ca
Validating DNS-Based Load Balancing
With CoreDNS and Consul in place, your DNS queries for s3.cephlab.com
(or any bucket-style subdomain like bucket1.s3.cephlab.com
) should now resolve dynamically to only healthy Ceph nodes. Each of these nodes runs a local HAProxy concentrator, which routes traffic to local IBM Storage Ceph Object Gateway (RGW) daemons.
Let’s walk through how to verify that DNS is functioning correctly and load-balancing client requests as expected.
Step 1: Resolve the S3 Endpoint with dig
On our client node, where we will be connecting to the S3 endpoint to work with Object Storage, we have configured the DNS servers with the IPs for our three ceph nodes where CoreDNS is running:
cat /etc/resolv.conf
Generated by NetworkManager
nameserver 192.168.122.12
nameserver 192.168.122.179
nameserver 192.168.122.94
We’ll first use dig
To directly query one of the CoreDNS servers running on your Ceph nodes. This will confirm that the DNS rewrites are working and that the Consul-registered service ingress-rgw-s3
is returning healthy IPs.
$ dig @192.168.122.12 s3.cephlab.com
;; ANSWER SECTION:
s3.cephlab.com. 0 IN A 192.168.122.179
s3.cephlab.com. 0 IN A 192.168.122.12
s3.cephlab.com. 0 IN A 192.168.122.94
$ getent hosts test.s3.cephlab.com
192.168.122.12 test.s3.cephlab.com
192.168.122.94 test.s3.cephlab.com
192.168.122.179 test.s3.cephlab.com
This shows that DNS is resolving to three healthy nodes, based on the current Consul health checks. getent hosts
Queries the system's resolver, which points to CoreDNS listening on port 53.
This confirms that the virtual-host-style DNS resolution is working. The query is resolved through CoreDNS, which rewrote test.s3.cephlab.com
to the ingress-rgw-s3
Consul service returned only healthy endpoints
Step 3: Check Load Balancing Distribution
To validate that round-robin load balancing is working across the healthy nodes, let’s perform 100 sequential queries and count which IP was returned first most often. This is a simple way to test the effectiveness of the DNS-level distribution across your ingress points.
$ for i in {1..100}; do getent hosts s3.cephlab.com | awk '{print $1}' | head -n1; done | sort | uniq -c | sort -nr
37 192.168.122.179
34 192.168.122.94
29 192.168.122.12
This shows a fairly even distribution across three Ceph nodes, meaning CoreDNS (backed by Consul) is providing round-robin results, and the system resolver is consuming those in a balanced way.
This confirms:
DNS queries are returning only healthy nodes
Load balancing is functioning correctly across nodes
Your CoreDNS + Consul setup is dynamically and intelligently routing traffic
Validating Request Distribution with an S3 Client and RGW Ops Logs
Now that we've confirmed DNS-based load balancing works at the resolution level, it's time to validate that actual S3 operations are being distributed across IBM Storage Ceph Object Gateway (RGW) endpoints as expected.
To do this, we’ll enable RGW Ops Logs, which provide detailed information about incoming S3 requests, including the RGW daemon that handled each one. We'll then run a few S3 operations using the AWS CLI and inspect the logs to see which RGWs were used.
This approach gives us concrete visibility into how CoreDNS and Consul are affecting real client traffic.
Step 1: Enable Ops Logging in Ceph
To capture S3 operations, enable RGW ops logs and ensure they’re written to file rather than RADOS. These settings can be applied directly using the ceph config
CLI:
$ ceph config set client.rgw rgw_enable_ops_log true
$ ceph config set client.rgw rgw_ops_log_rados false
$ ceph config set global log_to_file true
$ ceph config set global mon_cluster_log_to_file true
$ ceph orch restart client.rgw
Step 2: Run a Few S3 Operations
Use the AWS CLI (or any compatible S3 client) to list your buckets and upload an object through the load-balanced endpoint:
$ aws --profile test --endpoint http://s3.cephlab.com:8080 s3 ls
$ aws --profile test --endpoint http://s3.cephlab.com:8080 s3 ls
$ aws --profile test --endpoint http://s3.cephlab.com:8080 s3 cp /etc/hosts s3://bucket1/
These requests will be routed through CoreDNS and Consul to a healthy ingress node, and forwarded by HAProxy to one of the local RGW daemons on that node.
Step 3: Inspect the Ops Logs
Each RGW daemon has its own opslog file. You can enable dashboard centralized logging. In my case, I’m using a script to centralize the logs on one node. Here is the output:
2025-08-19T11:44:19.296500Z 192.168.122.12 list_buckets GET / HTTP/1.1 ceph-node-00-RGW5
2025-08-19T11:47:29.213482Z 192.168.122.12 list_buckets GET / HTTP/1.1 ceph-node-00-RGW6
2025-08-19T12:50:32.375753Z 192.168.122.12 put_obj PUT /bucket1/hosts HTTP/1.1 ceph-node-00-RGW5
The HAProxy received the requests on
ceph-node-00
(IP192.168.122.12
).The load was distributed between
RGW5
andRGW6
The two Object Gateway daemons are running locally on that node.The
PUT
operation confirms that object uploads are also being routed through the dynamic DNS path.
Simulating Node Failure and Observing Dynamic Recovery
With Consul and CoreDNS in place, our DNS-based load balancing is now aware of service health. Let’s test how the system behaves when an ingress node fails, and how traffic automatically reroutes to the remaining healthy nodes, with no manual intervention.
This scenario validates that:
Consul health checks are actively monitoring ingress services.
DNS answers only include healthy nodes.
Clients continue accessing S3 without disruption.
Node recovery triggers automatic reintegration into DNS.
We’ll simulate a failure stopping an IBM Ceph node through the hypervisor manager (in this case, kcli
), but the behavior would be identical in any real-world BM failure scenario, whether due to hardware, power, or maintenance.
Step 1: Simulate Failure of an Ingress Node
Let’s stop ceph-node-00
, which currently hosts HAProxy and local IBM Storage Ceph Object Gateway (RGW) daemons.
$ kcli stop vm ceph-node-00
This immediately removes the node from the network — and Consul will detect the missing service via its failed health check.
After a few seconds, we can verify that CoreDNS (backed by Consul) no longer includes ceph-node-00
in its DNS responses.
$ getent hosts s3.cephlab.com
192.168.122.179 s3.cephlab.com
192.168.122.94 s3.cephlab.com
As expected, 192.168.122.12
(ceph-node-00) is no longer returned, because its HAProxy health check failed and Consul deregistered it.
Step 2: Continue Accessing S3
Let’s now perform a normal S3 operation using the same load-balanced endpoint. Despite one node being down, traffic should route to healthy nodes transparently.
$ aws --profile test --endpoint http://s3.cephlab.com:8080 s3 ls
Step 3: Check Which RGW Handled the Request
We can now check the RGW ops logs to see which node processed the S3 request.Example ops log output:
2025-08-19T15:24:54.179558Z 192.168.122.94 list_buckets GET / HTTP/1.1 ceph-node-02-RGW1
2025-08-19T15:27:03.889488Z 192.168.122.94 list_buckets GET / HTTP/1.1 ceph-node-02-RGW1
As we can see, requests are now routed to ceph-node-02
, which is healthy and running its local HAProxy and RGW services.
Step 4: Restore the Failed Node
Let’s now bring back ceph-node-00
and verify that it automatically rejoins the cluster and DNS responses.
$ kcli start vm ceph-node-00
Consul will re-register the local ingress service after the health check passes. A few seconds later, let’s re-run the DNS query:
$ getent hosts s3.cephlab.com
192.168.122.94 s3.cephlab.com
192.168.122.12 s3.cephlab.com
192.168.122.179 s3.cephlab.com
Everything is working as expected!, The node is back online and healthy, and automatically reintegrated into the load balancing pool. No manual DNS update, failback, or cluster reconfiguration required.
Conclusion. Modernizing S3 Load Balancing with Consul and CoreDNS
As storage environments scale, especially those serving high-throughput, S3-compatible workloads, traditional load balancing methods begin to show their limits. Static HAProxy stacks and fixed VIPs are reliable for smaller clusters but quickly become operational bottlenecks when dealing with failure domains, elasticity, and traffic distribution at scale.
By combining:
Ingress terminators (HAProxy concentrators) per node.
Consul for dynamic, health-aware service discovery.
CoreDNS for DNS-based routing and virtual-host bucket support.
We’ve built a distributed, self-healing, and highly scalable solution that meets modern object storage demands.
Subscribe to my newsletter
Read articles from Daniel Parkes directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
