Bridging the Gap: GitOps for Network Engineers - Part 2


ArgoCD Is Amazing—But Let’s Make It Do Something!
Intro
In Part 1, we laid the foundation by installing ArgoCD and setting up the basic structure for a GitOps-driven platform. If you've followed along, you should now have a working Kubernetes cluster, ArgoCD deployed and accessible, and your first project created in the UI.
Now it's time to turn that foundation into something usable.
In Part 2, we'll start deploying the critical infrastructure pieces that power everything else. That includes MetalLB for external load balancing, Traefik for ingress, persistent storage using Rook + Ceph, and secrets management with External Secrets and HashiCorp Vault. All of these will be deployed through ArgoCD, GitOps-style.
We’ll kick things off with MetalLB, which enables us to expose services outside the cluster, an essential first step in making your platform actually accessible. Let’s get into it.
MetalLB: Load Balancing for Bare Metal and Home Labs
If you're running Kubernetes in a cloud environment, you typically get a load balancer as part of the package, something like an AWS ELB or an Azure Load Balancer that magically routes traffic to your services. But when you're running on bare metal, in a lab, or on-prem (which, let’s be real, a lot of network engineers are), you're on your own. That's where MetalLB comes in.
What is MetalLB?
MetalLB is a load balancer implementation for Kubernetes clusters that don’t have access to cloud-native load balancer resources. It allows you to assign external IP addresses to your Kubernetes services so that they can be accessed from outside the cluster, exactly what you'd expect from a "real" load balancer, just built for the DIY crowd.
Why You Need It
In any Kubernetes-based GitOps platform, exposing services to the outside world is non-negotiable. Whether it’s ArgoCD, Traefik, Vault, or any of your network automation tools, they all need to be reachable by users, APIs, or other systems. While NodePorts can get the job done in a lab, they’re clunky, inconsistent, and definitely not production-grade.
MetalLB solves this by handling Service type: LoadBalancer in environments where a cloud load balancer doesn’t exist, like bare metal or your home lab. You define a pool of IP addresses from your local network, and MetalLB assigns those IPs to services that request them.
Here’s where the networking magic comes in: MetalLB (when running in Layer 2 mode) announces those external IPs using ARP. If a device outside of the cluster ARPs for an exposed service IP, MetalLB replies with the MAC address of the node running the service. It’s simple, reliable, and doesn’t require BGP or complex router configs.
So when a LoadBalancer service is created, for example, to expose ArgoCD or Traefik, MetalLB makes that service’s external IP reachable from anywhere on your local network, just like a real load balancer would in a cloud environment.
How It Powers the Platform
MetalLB becomes one of the core enablers of our GitOps stack. It allows you to:
Expose ArgoCD with a proper external IP
Route external traffic to Traefik, our ingress controller
Provide consistent access to internal services that need to be reachable from your network
Maintain a production-like networking experience, even in a lab or homelab environment
Without MetalLB, you’d either be stuck manually forwarding ports, messing with IP tables, or leaning on NodePorts. With it, your platform starts acting like it belongs in a real, routable network, and that’s exactly what we want.
Now that we understand what MetalLB does and how it fits into the big picture, let’s deploy it the GitOps way, starting with adding the Helm chart repository to our config
Quick Review: Helm Charts and How They Fit into ArgoCD
Before we deploy MetalLB, let’s quickly go over how Helm works, especially how it integrates with ArgoCD.
Helm is a package manager for Kubernetes. Instead of manually writing and applying a bunch of YAML files, Helm lets you deploy versioned, configurable "charts", pre-packaged bundles of Kubernetes manifests that define an application. These charts live in remote Helm repositories, similar to how apt
or yum
fetch packages on a Linux system.
In a GitOps workflow, Helm charts are referenced as part of an ArgoCD Application manifest, specifically as a source
. ArgoCD uses this source definition to pull the chart directly from the repo, apply any custom values.yaml
overrides you’ve stored in Git, and deploy everything into your cluster automatically.
Using the MetalLB Helm Chart with ArgoCD
The official MetalLB Helm chart is hosted at:
https://metallb.github.io/metallb
When creating your ArgoCD Application, one of your sources
will look like this:
Type:
Helm
Chart:
metallb
Repo URL:
https://metallb.github.io/metallb
Target Revision: Usually the latest
ArgoCD will then treat this Helm chart as part of the desired state. It will sync the chart, merge in your values (if you’re overriding anything), and deploy MetalLB as part of your platform, all driven from Git.
MetalLB Installation
These initial steps, adding the Helm repo or other base sources, creating the app in ArgoCD, and wiring up the basic Helm configuration, are mostly the same for every application we’ll deploy. Because of that, I’ll only walk through this process in detail once (here), and only call out major differences for other apps later in the post. Screenshots are included below where it helps, but once you’ve done it once, you’ll be able to rinse and repeat for everything else.
Step 1: Add the Helm Repo
ArgoCD needs to know where to fetch the Helm chart from. For MetalLB, we’ll be using the Github-hosted chart:
- Helm Repo URL:
https://metallb.github.io/metallb
In the ArgoCD UI:
Go to Settings → Repositories
Click + CONNECT REPO
Enter the Helm repo URL
Choose Helm as the type
Give the repo a name (Optional)
Chose the project you created earlier to associate this repo to (mine was ‘prod-home’)
No authentication is needed for this public repo
Once done, click CONNECT
Once added, ArgoCD can now pull charts from this source.
Note: You’ll also need to add the GitHub repo that contains your custom configuration files, like Helm values.yml
files and Kustomize overlays.
If you're using my example repo, add
https://github.com/leothelyon17/kubernetes-gitops-playground.git
as another source, of type Git.If you're using your own repo, just make sure it's added in the same way so ArgoCD can pull your values and overlays when syncing.
Step 2: Create the ArgoCD Application
Head to the Applications tab and click + NEW APP to start the deployment.
Here’s how to fill it out:
Application Name:
metallb
Project: Select your project (e.g., lab-home)
Sync Policy: Manual for now (we’ll automate later)
Repository URL: Select the Helm repo you just added
Chart Name:
metallb
Target Revision: Use the latest or specify a version (recommended once things are stable)
Cluster URL: Use
https://kubernetes.default.svc
if deploying to the same cluster (mine might be different than the default, dont worry.)Namespace:
metallb-system
(check to create it if it doesn’t exist)
Click CREATE when finished.
If everything is in order you should see the App created like the screenshot below, though your’s will be all yellow status and ‘OutOfSync’ -
Click into the app and you’ll see that ArgoCD has pulled in all the Kubernetes objects defined by the Helm chart. Everything will show as OutOfSync for now, ArgoCD knows what needs to be deployed, but we’re not quite ready to hit sync just yet. You're doing great, let’s move on to the next step
Step 3: Add the Kustomize Configuration Layer
For MetalLB, we’re keeping things straightforward (kind of): the Helm chart gets deployed using its default values, no need to touch values.yml
here. But MetalLB still needs to be told how to operate: what IP ranges it can assign, and how it should advertise them. We handle that using a second source: a Kustomize overlay.
Here’s what to do next:
In the ArgoCD UI, go to the Application you just created for MetalLB.
Click the App details (🖉 edit) icon in the top right to open the manifest editor.
Scroll down to the
source
section.You’ll now be editing this app to include a second source.
Add the following block under source:
to include the Kustomize overlay for your MetalLB custom resources:
project: prod-home
destination:
server: https://prod-kube-vip.jjland.local:6443
namespace: metallb-prod
syncPolicy:
syncOptions:
- CreateNamespace=true
sources:
- repoURL: https://metallb.github.io/metallb
targetRevision: 0.14.9
chart: metallb
- repoURL: https://github.com/leothelyon17/kubernetes-gitops-playground.git
path: apps/metallb/overlays/lab
targetRevision: HEAD
NOTE: ‘source’ needs to be changed to ‘sources’, as there are now more than one.
This tells ArgoCD to deploy not just the Helm chart, but also the additional Kubernetes objects (like IPAddressPool
and L2Advertisement
) defined in your overlay. These are located in your apps/metallb
directory and should include a kustomization.yml
that pulls them together.
Once saved, ArgoCD will treat both the Helm install and the Kustomize overlay as part of the same application, and sync them together.
Step 4: Sync the App
Once everything looks good, hit Sync. ArgoCD will pull the chart, merge/build your kustomize files, and deploy MetalLB into the cluster.
You can click into the app to watch MetalLB’s resources come online; Deployments, ConfigMaps, the speaker DaemonSet, and more. If the sync fails on the first try, don’t panic, just retry it. This can happen if the chart includes CRDs (Custom Resource Definitions), which sometimes cause the sync to complete out of order while the CRDs are still registering.
Once things settle, you should see the application status show “Healthy” and “Synced”. You’ll also see multiple healthy MetalLB pods running in your cluster, just like the screenshot above.
Congrats! MetalLB is now deployed and ready to hand out external IPs like a proper load balancer.
MetalLB Custom Configuration
I wanted to provide a breakdown of the custom MetalLB files I’m using and why. This directory contains a Kustomize overlay used to deploy MetalLB's custom configuration in a lab environment. It layers environment-specific resources, like IP pools and advertisements, on top of the base Helm chart deployment, following GitOps best practices.
File Breakdown
ip-address-pool.yml
Defines a IPAddressPool
custom resource:
Specifies a range of IP addresses MetalLB can assign to LoadBalancer services
Ensures services are reachable from the local network
Helps avoid IP conflicts in your lab environment
l2-advertisement.yml
Defines an L2Advertisement
custom resource:
Tells MetalLB to advertise the IPs via Layer 2 (e.g., ARP)
Perfect for home labs and bare metal where BGP isn’t in use
Allows MetalLB to function like a basic network-aware load balancer
kustomization.yml
Kustomize overlay file:
Combines and applies the above resources
Enables clean separation between base and environment-specific config
Keeps your repo organized and scalable
Why It Matters
This overlay is what makes MetalLB actually work in your lab. While the Helm chart installs the MetalLB controller and speaker pods, these custom resources tell MetalLB what IPs to use and how to announce them to your network.
By keeping these files in Git and applying them via ArgoCD, you’re not just deploying MetalLB, you’re making your configuration declarative, version-controlled, and repeatable across environments.
Moving on…
Traefik: Ingress Routing Built for GitOps
Once MetalLB is in place and capable of handing out external IPs, we need something that can route incoming HTTP and HTTPS traffic to the right service inside the cluster. That’s where an ingress controller comes in, and for our GitOps setup, Traefik is a perfect fit.
What is Traefik?
Traefik is a modern, Kubernetes-native ingress controller that handles routing external traffic into your cluster based on rules you define in Kubernetes. It supports things like:
Routing traffic based on hostname or path
TLS termination (including Let’s Encrypt support)
Load balancing between multiple pods
Middleware support for things like authentication, redirects, rate limiting, etc.
Traefik is also highly compatible with GitOps workflows. It uses Kubernetes Custom Resource Definitions (CRDs) like IngressRoute
and Middleware
, which makes it easy to manage all of your ingress behavior declaratively, right from your Git repo.
Why You Need It
Without an ingress controller, every service you want to expose needs its own LoadBalancer service (i.e., a dedicated external IP). That scales poorly, especially in a lab environment with limited IP space.
Traefik solves that problem by letting you expose multiple services through a single external IP, usually on ports 80 and 443, by routing requests based on hostnames or paths. This means:
You can access services like
argocd.yourdomain.local
andvault.yourdomain.local
through the same IP.You get clean, centralized HTTPS management with built-in TLS support.
You dramatically reduce the number of open ports and public IPs you need.
Paired with MetalLB, Traefik becomes the front door to your entire GitOps platform.
How It Powers the Platform
Traefik is the gateway that makes all the services behind it easily and securely accessible. It enables you to:
Route HTTP/HTTPS traffic to services like ArgoCD, Vault, and your internal tools
Handle TLS (with optional Let’s Encrypt integration)
Define ingress behavior declaratively via CRDs
Share a single external IP across multiple services, using hostnames or paths
All of this is deployed using ArgoCD, meaning every route, certificate, and service exposure is version-controlled and reproducible.
Traefik Installation
As we covered during the MetalLB install, adding Helm repositories, creating the app in ArgoCD, and configuring the basic Helm parameters is mostly the same for each app we deploy. Because we've already gone through that in detail with MetalLB, I'll just briefly outline the steps again here. No detailed screenshots needed unless there’s a significant difference.
Step 1: Add the Traefik Helm Repo
ArgoCD needs to know where to pull the Traefik Helm chart from. For Traefik, we’ll use the official Traefik Helm repository:
Helm Repo URL:
https://helm.traefik.io/traefik
In the ArgoCD UI:
Navigate to Settings → Repositories
Click + CONNECT REPO
Enter the Traefik Helm repo URL listed above
Select Helm as the repository type
Provide a name (optional, something like
traefik-charts
)Associate the repo with the appropriate ArgoCD project (mine was
lab-home
)No authentication is required since this repo is publicly accessible
Click CONNECT to finish
Once connected, ArgoCD is ready to deploy the Traefik Helm chart into your cluster.
Step 2: Create the ArgoCD Application (Traefik)
Head to the Applications tab in ArgoCD, and click + NEW APP to start deploying Traefik.
Here's how you'll fill it out:
Application Name:
traefik
Project: Select your ArgoCD project
Sync Policy: Manual (for now)
Repository URL: Select the Traefik Helm repo you just connected
Chart Name:
traefik
Target Revision: Use
latest
, or specify a stable version once you've tested and confirmed compatibilityCluster URL: Typically
https://kubernetes.default.svc
for an in-cluster deploy (if yours differs, just use the appropriate URL)Namespace: Use
kube-system
(check the option to create it if it doesn’t exist yet)
Why kube-system
namespace?
Deploying Traefik to the kube-system
namespace makes sense because Traefik is essentially a core infrastructure service. Placing it here aligns with Kubernetes best practices, core infrastructure and networking-related services belong in this namespace, separating them clearly from user or application workloads.
When finished, click CREATE to finalize the setup.
Step 3: Add Custom Helm Values for Traefik
Unlike MetalLB, our Traefik deployment uses custom Helm values directly from our Git repository, without Kustomize. We'll define these custom values as a second source within our ArgoCD Application manifest.
Here's how you'll set this up in the ArgoCD UI:
Navigate to the Traefik Application you created earlier.
Click the App details (🖉 edit) icon in the top-right corner to open the manifest editor.
Scroll down to the manifest, and ensure you're using
sources:
(plural), since we're adding an additional source.Modify your ArgoCD Application manifest to look similar to this:
yamlCopyEditproject: home-lab
destination:
server: https://172.16.99.25:6443
namespace: kube-system
syncPolicy:
syncOptions:
- CreateNamespace=true
sources:
- repoURL: https://helm.traefik.io/traefik
targetRevision: 35.0.1
helm:
valueFiles:
- $values/apps/traefik/values-lab.yml
chart: traefik
- repoURL: https://github.com/leothelyon17/kubernetes-gitops-playground.git
targetRevision: HEAD
ref: values
Explanation:
The first source references the official Traefik Helm repository, specifying the chart version.
The second source references my GitHub repo (or your own), where your custom Helm values (
values-lab.yml
) are stored.ArgoCD merges these values when syncing Traefik, allowing environment-specific customizations, such as ingress rules, TLS settings, dashboard exposure, middleware options, and other important configurations.
Once you've updated and saved this manifest, ArgoCD will apply the changes, and Traefik will deploy using your customized configuration, all neatly managed by GitOps.
Step 4: Sync the Traefik Application
Once everything looks good, click Sync in ArgoCD. It will pull the Traefik Helm chart, merge your custom Helm values (values-lab.yml
), and deploy Traefik into your cluster.
You can click into the application details to watch Traefik’s resources spin up; Deployments, Services, IngressRoutes, and more. If the sync fails initially, don't worry, just retry it.
After a short period, you should see Traefik showing a status of “Healthy” and “Synced”. Verify that Traefik pods are running successfully in your cluster (similar to MetalLB earlier).
Congratulations! Traefik is now up and running as your ingress controller, ready to handle external HTTP(S) traffic into your cluster.
Traefik Custom Helm Values
Let’s take a look at the custom Helm values we’re using for Traefik, pulled from apps/traefik/values-lab.yml
. These provide a simple but functional starting point for ingress, dashboard access, and authentication in a lab environment.
Key Configuration Highlights
IngressRoute for the Traefik Dashboard
ingressRoute:
dashboard:
enabled: true
matchRule: Host(`YOUR-URL`)
entryPoints: ["web", "websecure"]
middlewares:
- name: traefik-dashboard-auth
Enables the Traefik dashboard and exposes it via both HTTP and HTTPS.
Routes traffic based on hostname, i.e. (
traefik-dashboard-lab.jjland.local
).Adds a middleware for basic authentication to protect access.
Basic Authentication Middleware
extraObjects:
- kind: Secret
type: kubernetes.io/basic-auth
stringData:
username: admin
password: changeme
- kind: Middleware
spec:
basicAuth:
secret: traefik-dashboard-auth-secret
Creates a Kubernetes Secret with hardcoded credentials (
admin
/changeme
).Defines a Traefik Middleware that references the secret and applies HTTP basic auth to protected routes.
NOTE: These credentials are hardcoded and intended only for lab/demo use. You should absolutely replace
"changeme"
with a strong, securely managed password, or better yet, use a more robust authentication mechanism in production.
Static LoadBalancer IP Assignment
service:
spec:
loadBalancerIP: <YOUR IP SET ASIDE BY METALLB>
- This assigns a specific external IP to Traefik’s LoadBalancer service, ensuring stable access through MetalLB.
Accessing the Dashboard
Once deployed and synced in ArgoCD, you can access the Traefik dashboard by visiting the URL set in the custom values file.
To make this work:
Add a DNS record (or local
/etc/hosts
entry) pointing to your Traefik service IP (in my case,172.16.99.30
).Use the credentials you set in the values file (
admin
/changeme
) to log in via the basic auth prompt.
Why It Matters
This configuration gives you:
A working Traefik dashboard protected by basic auth
A predictable IP address exposed by MetalLB
A GitOps-managed ingress setup, all stored in Git and synced automatically via ArgoCD
These are just starter settings. They work great in a lab, but you’ll want to harden and expand them for production use. Still, even at this basic level, you’re getting all the core benefits: visibility, consistency, and version-controlled configuration.
Let’s move on to the next part of the platform.
Rook + Ceph: Persistent Storage for Stateful Applications
So far, we’ve deployed the pieces that make your platform accessible, MetalLB for external IPs, and Traefik for routing traffic. But modern platforms don’t just serve traffic, they store data. If you’re planning to run apps like Nautobot, NetBox, or Postgres, you’ll need reliable, persistent storage to keep data alive across restarts and node failures.
That’s where Rook + Ceph comes in.
What is Rook + Ceph?
Ceph is a distributed storage system that provides block, object, and file storage, all highly available and scalable. It’s used in enterprise environments for cloud-native storage, and it’s rock solid.
Rook is the Kubernetes operator that makes deploying and managing Ceph clusters easier and more native to the Kubernetes ecosystem. Together, they turn a set of disks across your nodes into a resilient, self-healing storage platform.
Why You Need It
Kubernetes doesn’t come with a built-in storage backend. While it allows you to declare PersistentVolumeClaims
, it’s up to you to provide the actual storage behind them. In cloud environments, that’s easy, just hook into EBS, Azure Disks, or whatever your platform provides. But in a lab or on-prem cluster? You’re on your own.
Rook + Ceph fills that gap. Once deployed, it becomes your cluster’s dynamic, self-healing storage layer. You can provision persistent volumes for any stateful workload—databases, internal tooling, monitoring stacks, and more, without having to manually manage local disks or worry about data loss.
How It Powers the Platform
Rook + Ceph is the backbone of persistent infrastructure in this setup. It enables you to:
Create
PersistentVolumes
dynamically, on demand, usingStorageClass
definitionsRun stateful apps like NetBox, Nautobot, PostgreSQL, and Prometheus with confidence
Survive pod restarts and node reboots, your data stays intact and available
Manage it all declaratively, deployed and version-controlled with ArgoCD, just like everything else
What This Looks Like When Deployed
Once your Rook + Ceph configuration is applied and the cluster becomes active, you’ll effectively have a resilient, distributed storage system spanning all your nodes. In this setup:
Ceph stores data redundantly across all three nodes, similar in concept to a 3-node RAID-1 (mirrored) configuration.
When one node goes offline or a disk fails, your data is still accessible and safe.
The Ceph monitor daemons ensure quorum and cluster health, while OSDs (Object Storage Daemons) replicate data across your available storage devices (e.g.,
/dev/vdb
on each node).
This redundancy is built-in and automatically managed by the Ceph cluster itself, no manual RAID configuration needed. It’s a core reason why Ceph is trusted in both enterprise and lab-scale deployments.
What We’re Deploying: The Operator + StorageCluster
As with many Kubernetes-native tools, Rook uses the Operator pattern to manage Ceph. We’ll be deploying two key components:
The Rook-Ceph Operator – Acts as a controller that manages Ceph-specific resources and keeps everything in the desired state.
A
CephCluster
resource – Defines how the storage backend should be built using the disks available across your nodes.
What’s an Operator?
A Kubernetes Operator is a purpose-built controller that manages complex stateful applications by watching for custom resources (likeCephCluster
) and continuously reconciling their desired state—creating, healing, scaling, and configuring everything automatically.
By deploying both the operator and the cluster config together, we get a hands-off, fully declarative storage setup. Everything is defined in Git, synced by ArgoCD, and managed by the operator—including provisioning, recovery, and upgrades.
Step 1: Add the Rook-Ceph Helm Repo
ArgoCD needs to know where to pull the Rook-Ceph Helm chart from. For this, we’ll use the official Rook Helm repository:
Helm Repo URL:
https://charts.rook.io/release
In the ArgoCD UI:
Navigate to Settings → Repositories
Click + CONNECT REPO
Enter the Helm repo URL listed above
Select Helm as the repository type
Optionally give it a name (e.g.,
rook-ceph-charts
)Associate the repo with your ArgoCD project (mine was
lab-home
)No authentication is required since it’s publicly accessible
Click CONNECT to finish
Once connected, ArgoCD will be able to deploy both the Rook-Ceph operator and storage cluster using this chart.
Step 2: Create the ArgoCD Application (Rook-Ceph)
Now that the repo is connected, head to the Applications tab in ArgoCD and click + NEW APP to start the deployment.
Here’s how to fill it out:
Application Name:
rook-ceph
Project: Select your ArgoCD project (e.g.,
lab-home
)Sync Policy: Manual (for now)
Repository URL: Select the Rook Helm repo you just connected
Chart Name:
rook-ceph
Target Revision: Use
latest
, or pin to a stable version you’ve testedCluster URL: Typically
https://kubernetes.default.svc
if deploying in-clusterNamespace:
rook-ceph
(check the box to create it if it doesn’t exist)
Why the rook-ceph
Namespace?
Rook and Ceph manage a lot of moving parts—monitors, OSDs, managers, etc.—and isolating those components into their own namespace (rook-ceph
) helps keep your cluster clean and easier to troubleshoot. It also aligns with common community best practices and makes upgrades and deletions much safer.
Once you’ve filled everything out, click CREATE to finish provisioning the application.
Step 3: Add Custom Helm Values + Kustomize Overlay for Rook-Ceph
Rook-Ceph is one of the more complex components in our GitOps platform. It’s not just a single deployment, it involves multiple controllers, CRDs, and cluster-level storage logic. Because of that, we’ll be using both a Helm chart (with custom values) and a Kustomize overlay to deploy it cleanly and maintainably.
This dual-source approach lets us:
Use the Helm chart to install the Rook-Ceph operator and core components
Apply custom values to tailor behavior for our environment (resource tuning, monitor placement, dashboard settings, etc.)
Layer in Kustomize-based manifests for complex resources like
CephCluster
,StorageClass
,CephFilesystem
, resources that often require more precise control
ArgoCD Application Sources
When editing your ArgoCD Application manifest, your sources
block will look similar to this:
sources:
- repoURL: https://charts.rook.io/release
targetRevision: v1.17.0
helm:
valueFiles:
- $values/apps/rook-ceph/values-lab.yml
chart: rook-ceph
- repoURL: https://github.com/leothelyon17/kubernetes-gitops-playground.git
path: apps/rook-ceph/overlays/lab
targetRevision: HEAD
ref: values
Why Both Sources?
The Helm chart deploys the operator and all required CRDs in the correct order.
The Kustomize overlay (from your Git repo) contains environment-specific resources like:
CephCluster – the main storage cluster definition
StorageClass – so other apps can request storage using
PersistentVolumeClaims
CephFileSystem – enables shared POSIX-compliant volumes for apps needing ReadWriteMany access
Optional extras like
CephBlockPool
or a toolbox deployment for CLI-based Ceph management
You can find these manifests in the repo under:
apps/rook-ceph/overlays/lab/
Once saved, ArgoCD will treat both sources as part of the same application and sync them together, ensuring everything is deployed in the right order and stays in sync with Git.
Understanding the Rook-Ceph Overlay: Managing Complexity with GitOps
I wanted to cover this now before we try and sync. Setting up Rook-Ceph in a GitOps workflow involves more than just deploying a Helm chart. You’re orchestrating a sophisticated storage platform made up of tightly coupled components: an operator, CRDs, a distributed Ceph cluster, storage classes, ingress routes, and more. Each piece needs to be configured correctly and deployed in the proper order.
To keep all of this manageable and repeatable, we separate concerns using a combination of custom Helm values and a Kustomize overlay. The overlay found in apps/rook-ceph/overlays/lab
brings together the critical resources required for a working Ceph deployment—block pools, shared filesystems, storage classes, and even a dashboard ingress.
The sections below break down each of these files so you can understand what’s happening, why it’s needed, and how it fits into the larger GitOps puzzle.
apps/rook-ceph/values-lab.yml
csi:
enableRbdDriver: false
Purpose: Disables the RBD (block-device) CSI driver in this lab setup, since we’re only using CephFS here.
Why it matters: Keeps the cluster lean by not installing unused CSI components.
apps/rook-ceph/overlays/lab/
ceph-cluster.yml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v19.2.1
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
dashboard:
enabled: true
storage:
useAllNodes: true
useAllDevices: false
deviceFilter: vdb
Defines the core
CephCluster
resource.Key settings:
Runs 3 monitors for quorum.
Uses each node’s
vdb
device for OSDs (fits your lab VM disk layout).Enables the Ceph dashboard for visual health checks.
⚠️ NOTE: These settings are specific to my 3-node lab cluster, where each node has:
One OS disk (
vda
)One dedicated Ceph data disk (
vdb
)
Example disk layout (lsblk
output from one node):
bashCopyEdit[jeff@rocky9-lab-node1 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 1.7G 0 rom
vda 252:0 0 50G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 49G 0 part
├─rl-root 253:0 0 44G 0 lvm /
└─rl-swap 253:1 0 5G 0 lvm
vdb 252:16 0 250G 0 disk
Your disk layout will likely be different. I’ve configured Ceph to use only the vdb
disk via the deviceFilter
setting to avoid accidentally wiping the OS disk.
⚠️ Be careful: If you don’t tailor these values to your hardware, you could unintentionally destroy existing data. Always verify your node’s disk setup and adjust your configuration accordingly.
ceph-filesystem.yml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: k8s-ceph-fs
namespace: rook-ceph
spec:
metadataPool:
failureDomain: host
replicated:
size: 3
dataPools:
- name: replicated
failureDomain: host
replicated:
size: 3
preserveFilesystemOnDelete: true
metadataServer:
activeCount: 1
activeStandby: true
Creates a
CephFilesystem
(CephFS) for shared, POSIX-style volumes.Why CephFS? Enables
ReadWriteMany
storage, which block pools alone can’t provide.
ceph-storageclass-delete.yml
& ceph-storageclass-retain.yml
Both define Kubernetes StorageClass
objects that front the CephFS CSI driver:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs-delete # or rook-cephfs-retain
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
clusterID: rook-ceph
fsName: k8s-ceph-fs
pool: k8s-ceph-fs-replicated
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
reclaimPolicy: Delete # or Retain
allowVolumeExpansion: true
Difference:
rook-cephfs-delete
will delete PV data when PVCs are removed.rook-cephfs-retain
will retain data for manual cleanup or backup.
Why two classes? Gives you flexibility for different workloads (ephemeral test vs. persistent data).
ingress-route-gui.yml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: ceph-ingressroute-gui
namespace: rook-ceph
spec:
entryPoints:
- web
- websecure
routes:
- match: Host(`ceph-dashboard-lab.jjland.local`) # EXAMPLE
kind: Rule
services:
- name: rook-ceph-mgr-dashboard
port: 7000
Exposes the Ceph dashboard through Traefik on your chosen host.
Why: Lets you reach the Ceph UI (after DNS/hosts setup) without manually port-forwarding.
kustomization.yml
resources:
- ceph-cluster.yml
- ingress-route-gui.yml
- ceph-filesystem.yml
- ceph-storageclass-delete.yml
- ceph-storageclass-retain.yml
Aggregates all the above files into a single overlay that ArgoCD can sync.
Why Kustomize? Keeps base Helm installs separate from environment-specific definitions, making updates cleaner and more maintainable.
Step 4: Sync the Rook-Ceph Application
Ready? Go ahead and click Sync in ArgoCD for the rook-ceph
application.
This one’s going to take a little more time, and for good reason. There’s a lot happening under the hood.
When you sync, ArgoCD will:
Deploy the Rook-Ceph Operator, which is responsible for watching and managing Ceph resources in your cluster
Install CephFS CSI drivers, RBAC roles, and CRDs needed to support persistent volumes
Apply your
CephCluster
,CephFilesystem
, andStorageClass
definitions via the Kustomize overlay
But the real magic starts after the operator is running.
Once the operator is up, it will immediately start watching for additional Ceph custom resources in the rook-ceph
namespace. When it discovers the CephCluster
definition, it will:
Initialize the monitors (MONs) for quorum
Deploy the manager (MGR) for handling cluster state and dashboard
Start spinning up the OSDs (Object Storage Daemons) using the storage devices you specified (in this case,
vdb
on each node)
This process can take several minutes depending on your hardware, node performance, and the size of your disks.
How do you know it worked?
The cluster is healthy when you see:
3 running OSD pods, one for each disk across your 3 nodes
The
rook-ceph
application status in ArgoCD shows “Healthy” and “Synced”Optionally: access the Ceph dashboard and verify health checks (covered earlier)
Troubleshooting Tips
Rook-Ceph is powerful, but complex. And with that complexity comes the potential for a lot of things to go sideways. I won’t dive into every failure mode here, but I’ll leave you with a few quick tips that can help when something’s not working as expected:
Use the ArgoCD UI to inspect pod logs.
Click into therook-ceph
application, navigate to the "PODS" tab, and use the logs view to get real-time output from key components like the operator, mons, OSDs, and mgr. Most issues will reveal themselves here.Resync the operator app to restart it.
If the cluster gets stuck or fails to initialize certain pieces, manually syncing the operator application in ArgoCD will redeploy the pod. This is often enough to force a retry or pull in updated CRDs.Disk issues?
If Ceph is skipping disks or refusing to reuse them, it’s usually leftover metadata. Try running a full zap withceph-volume
or fallback towipefs
,sgdisk
, anddd
to fully clean the disk.
Congratulations! Once everything is green, you now have a fully functional Ceph storage backend—redundant, self-healing, and fully managed through GitOps.
Secrets Management: External Secrets + HashiCorp Vault
In any production platform, secrets management isn’t optional, it’s foundational. We're talking about things like API tokens, database passwords, SSH keys, and TLS certs. Storing these directly in your Git repo? Not an option. Hardcoding them into manifests? Definitely not.
That’s where External Secrets and HashiCorp Vault come in, and together, they solve this problem the right way.
What is HashiCorp Vault?
Vault is a centralized secrets manager that securely stores, encrypts, and dynamically serves secrets to applications and users. It supports access control, auditing, and integration with identity systems and cloud providers. In this stack, Vault acts as the secure system of record for all sensitive data.
What is External Secrets?
External Secrets is a Kubernetes operator that bridges external secret stores (like Vault) with native Kubernetes Secret
objects. It watches for custom resources like ExternalSecret
and automatically pulls values from Vault into the cluster, keeping them updated and consistent without manual intervention.
Why Network Automation Needs This
Network automation platforms—like NetBox, Nautobot, and custom Python tooling—frequently need access to sensitive data:
Device credentials for SSH or API-based provisioning
Authentication tokens for systems like GitHub, Slack, or ServiceNow
Vaulted credentials for orchestrating changes via Ansible or Nornir
You don’t want these values floating around in plaintext in Git. But you still want to declare your intent (what secrets are needed and where) in version control. This is especially critical when you're deploying infrastructure with GitOps and need environments to be reproducible and secure.
With Vault + External Secrets, you can:
Keep the actual secret values outside of Git
Still declare your
ExternalSecret
manifests in Git as part of your ArgoCD-managed platformLet Kubernetes handle syncing and refreshing secrets automatically
This pattern ensures your network automation stack is secure, scalable, and compliant, without losing any GitOps benefits.
Installing External Secrets Operator
Setting up External Secrets is straightforward and follows the same pattern we’ve used throughout this platform. In this section, we’ll deploy the External Secrets Operator using its official Helm chart with default values, no custom overlays, or secret stores just yet.
Step 1: Add the Helm Repo
First, add the External Secrets Helm repository to ArgoCD:
In the ArgoCD UI, go to Settings → Repositories
Click + CONNECT REPO
Fill in the following:
Type: Helm
Name (optional): external-secrets
Project: Choose your ArgoCD project (e.g.,
lab-home
)Authentication: Leave empty (this is a public repo)
Click CONNECT to save
Step 2: Create the ArgoCD Application
Navigate to Applications → + NEW APP, and fill out the form like this:
Application Name: external-secrets
Project: lab-home (or your equivalent)
Sync Policy: Manual
Repository URL: Select the Helm repo you just added
Chart:
external-secrets
Target Revision: latest (or a specific version like
0.16.1
)Cluster URL:
https://kubernetes.default.svc
Namespace:
external-secrets
(Check the box to create the namespace if it doesn’t exist)
Click CREATE to finish.
Step 3: Sync the Application
Once the app is created, hit SYNC in the ArgoCD UI. This will:
Deploy the External Secrets Operator into your cluster
Create the necessary CRDs and controller components
Make the
ExternalSecret
,SecretStore
, andClusterSecretStore
resource types available
You should see the app enter a Synced and Healthy state once everything is up and running. No custom values or overlays are needed at this stage.
Installing HashiCorp Vault
Vault is our centralized secrets store, and in this setup we’re deploying it with two main goals in mind:
Enable its built-in GUI for easy inspection and management
Ensure secret data is persisted using our Rook-Ceph-backed StorageClass
To accomplish this, we’ll combine a Helm-based deployment with a Kustomize overlay that adds a Traefik IngressRoute
for secure browser access.
Step 1: Add the Helm Repo
Add the official HashiCorp Helm chart repo to ArgoCD:
In the ArgoCD UI, go to Settings → Repositories
Click + CONNECT REPO
Fill in:
Type: Helm
URL:
https://helm.releases.hashicorp.com
Project:
lab-home
(or whatever you're using)Authentication: Leave blank (public repo)
Click CONNECT to save
Step 2: Prepare Your Vault Application
Vault is more stateful and config-heavy than most apps, so we’re using two sources in our ArgoCD Application:
A Helm chart to install Vault and enable persistent storage
A Kustomize overlay that exposes the Vault UI through Traefik
Here’s an example Application manifest (adjust values as needed for your setup):
project: lab-home
destination:
server: https://kubernetes.default.svc
namespace: vault
syncPolicy:
syncOptions:
- CreateNamespace=true
sources:
- repoURL: https://helm.releases.hashicorp.com
chart: vault
targetRevision: 0.30.0 # or latest stable
helm:
valueFiles:
- $values/apps/hashicorp-vault/values-lab.yml
- repoURL: https://github.com/leothelyon17/kubernetes-gitops-playground.git
targetRevision: HEAD
path: apps/hashicorp-vault/overlays/lab
ref: values
Note: The Git repo and folder structure here are based on my kubernetes-gitops-playground. If you’re using your own repo, be sure to adjust the
repoURL
,path
, andvalueFiles
references accordingly.
Step 3: Custom Helm Values
In your Git repo, the file at apps/vault/values-lab.yml
should enable:
The Vault UI (
ui: true
)Persistent storage via the Rook-Ceph-backed
StorageClass
you created earlier
Example configuration:
server:
dataStorage:
enabled: true
# Size of the PVC created
size: 1Gi
# Location where the PVC will be mounted.
mountPath: "/vault/data"
# Name of the storage class to use. If null it will use the
# configured default Storage Class.
storageClass: rook-cephfs-retain
# Access Mode of the storage device being used for the PVC
accessMode: ReadWriteOnce
# Vault UI
ui:
enabled: true
Step 4: Expose Vault Securely with Traefik
In your apps/vault/overlays/lab
directory, define a Kustomize file to expose the UI via Traefik.
Example: kustomization.yml
resources:
- ingress-route-gui.yml
And in ingress-route-gui.yml
:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: vault-dashboard
namespace: vault
spec:
entryPoints:
- websecure
routes:
- match: Host(`vault-lab.jjland.local`) # EXAMPLE
kind: Rule
services:
- name: vault
port: 8200
Note:
vault-lab.jjland.local
is an example hostname used in my lab.
If you're following along exactly, feel free to use it, just be sure to add a local DNS or/etc/hosts
entry that maps this to your cluster’s ingress IP.
Otherwise, replace this hostname with one appropriate for your environment.
Step 5: Sync the Application
Once your Helm values and Kustomize overlay are in place and committed to Git, go ahead and sync the Vault application from ArgoCD.
ArgoCD will deploy all Vault components into the vault-lab
namespace, including:
The StatefulSet for the Vault server
The service account, RBAC roles, and services
A PersistentVolumeClaim (PVC) for storing Vault data
Your custom IngressRoute for exposing the GUI
After syncing, head to the Vault app in ArgoCD to verify the following:
The app status should be Synced
The PVC should be Bound and Healthy
The main Vault pod will likely remain in a Progressing state, this is expected
That “Progressing” status is normal because Vault isn’t fully initialized yet. It won’t report itself as ready until it has been manually initialized and unsealed for the first time.
Before moving forward, it’s a good idea to:
Inspect the pod logs in the ArgoCD UI if anything seems stuck
Check
kubectl get pvc -n vault-lab
to confirm the PVC is attached and healthyUse
kubectl describe pod
ordescribe pvc
to troubleshoot issues
If all looks good, navigate to the Vault UI in your browser:
https://vault-lab.jjland.local # EXAMPLE
If you’re using a different hostname, be sure you’ve created the appropriate DNS or
/etc/hosts
entry.
From the web UI, you can initialize Vault, generate unseal keys, and perform the first unseal operation, all interactively.
Initializing Vault Through the GUI
Once the Vault UI is accessible, it’s time to initialize the system. Vault doesn’t become “ready” until this step is completed, and it only needs to be done once per cluster.
Step 1: Open the Vault UI
Navigate to the Vault dashboard in your browser:
https://vault-lab.jjland.local
(Or your custom hostname if you’re using a different setup.)
You’ll be presented with a message that Vault has not yet been initialized. Click the “Initialize” button to begin the process.
Step 2: Generate Unseal Keys
The GUI will prompt you to configure key shares and key threshold. Leave these at the defaults unless you have a specific security model in mind:
Key Shares:
5
Key Threshold:
3
This means Vault will generate 5 unseal keys, and any 3 of them will be required to unseal the Vault.
Click "Initialize" to proceed. Vault will generate a JSON file containing:
The root token (used to log in as admin)
All 5 unseal keys
Download this file immediately and store it in a secure location. These keys cannot be recovered later.
⚠️ Do not skip this download. If you lose these keys before unsealing, you’ll have to wipe and redeploy Vault from scratch.
Step 3: Unseal the Vault
After downloading the key file, Vault will prompt you to enter the unseal keys one by one.
Copy a single unseal key from the JSON file
Paste it into the field and click “Unseal”
Repeat with two more keys (for a total of 3)
Once the required threshold is met, Vault will unlock and become active.
Step 4: Log In with the Root Token
After unsealing, return to the login screen and paste in the root token from your downloaded JSON file.
Once logged in, you’ll have full admin access to Vault.
Step 5: Verify in ArgoCD
Flip back to the ArgoCD UI and check the status of the Vault application. At this point, the main pod should switch from Progressing to Healthy, and your application should show as fully operational.
You're now ready to configure Vault as a backend for External Secrets, so your GitOps-managed workloads can securely retrieve credentials, tokens, and other sensitive data on demand.
This completes Part 2 of this series.
Summary & What’s Next
In Part 2, we took our GitOps foundation and turned it into a functional, production-capable platform. We integrated critical infrastructure components like MetalLB for external access, Traefik for routing, Rook-Ceph for persistent storage, and a full-fledged secrets management stack using External Secrets and HashiCorp Vault, all deployed declaratively using ArgoCD.
At this point, you have a GitOps-powered Kubernetes environment that’s capable of:
Exposing services securely with external IPs and ingress rules
Persisting data across workloads using Ceph-backed volumes
Managing secrets securely without embedding them in Git
Deploying and managing infrastructure the same way you'll deploy apps: as code
This platform is now ready to host real-world applications, whether it’s NetBox, Nautobot, or custom tooling built for your network automation workflows.
In Part 3, we’ll finally do just that: deploy a real application on top of everything we’ve built. I haven’t finalized which app we’ll use yet, but it’ll be something practical and network-engineer focused. Stay tuned and thank you for reading!
Subscribe to my newsletter
Read articles from Jeffrey Lyon directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
