Nomad is dead, long live Kubernetes


The time has come: my homelab, for which I have been writing how-tos and occasional status updates on this blog had / has been serving me very well, but k8s ecosystem has reached maturity levels and broad acceptance and my Nomad cluster raises more questions and brings up raised eyebrows more often than I'd like to admit. I think the time has come to face the music…
Here is my Nomad cluster, spread around Oracle Cloud and my basement.
Actually, “basement” is not really the case as of today, since we are still converting it into fitness+movie room, so my remote-working office (in the attict) is the actual “basement” in this story, temporarily
Nodes I use:
cloud4
is a big arm64 that Oracle gives for freecloud5
is a small free amd64 that Oracle gives for freeixion
andpluto
are RPi 4soberon
is a RPi 5io
is a VM on my Proxmox (explained later).
What are the steps I made so far in the migration process?
Introduce an empty k8s cluster
This has been done. I got myself a Ryzen 7 mini PC last year, configured ProxMox on it and segmented its 64GB/16 vCPUs into:
one more Linux VM for Nomad since I was short on the resources already (that’s the
io
from the screenshot)one “control” k3s node (using Proxmox support for LXC);
one “worker” k3s node where work will be scheduled (LXC).
Basic infra setup
Some steps I did so far, after configuring k3s nodes on my Proxmox using a tutorial I found, follow. In not really realistic order, but close to reality enough that it makes sense…
I installed the latest Helm as of today. It's pretty important part of the modern k8s experience and it just makes sense to have if from the start.
I installed then the cert-manager
using Helm as well. Very cool, but took couple of hairs out until I made it work. It made me rethink the entire damn migration since so many sources talked about different aspects/approaches/ingresses. But, I persevered and it… just works now. Certificates are provided via Cloudflare which was there already, so I can easily access the server.
Then I installed an ArgoCD with nginx ingress since my understanding is that it basically became super-popular for GitOps. I think there's also Flux, but we're using ArgoCD on my work, so I thought “no, Milan, you will not again choose another non-mainstream approach just for the sake of a principal”.
In hindsight, I should’ve used Helm for setting up ArgoCD. I just used their default install scripts. I will migrate later, I guess
Gitea remains my personal self-hosted git VCS, also for this new use case for ArgoCD - so that's where Argo will find its CRDs. It's still deployed in my homelab, just using Ansible as one of those core things, that for some specific reason I couldn't configure in Nomad job (I forgot why, it’s like that for years in my homelab, which translates in centuries in normal people).
I then added the latest HashiCorp Vault as well using Helm. I am still not happy about this since I simplified helm values probably too much and at this time it still can’t provide secrets inllfor other pods, but that’s just because I needed to learn Vault for a project at work (I needed a service ip and running with exposed API on ingress, and that's it).
Interesting tidbit: I used Cursor for the first time in my life for Vault Helm setup 🎉.
I added tailscale operator so that I can access the cluster out of home as well. This is a cool thing since I can access it even from my Androind smartphone using e.g. kubenav app. This is where I was saying to myself actually
I added Grafana Alloy so that it can push all the logs into my existing Grafana Loki (self-hosted in my main Nomad cluster). I used Promtail already in my (deprecating) cluster for system stuff and Vector for nomad logs, but apparently Alloy is the new future-safe thing in Grafana stack so I went with it.
Finally, I thought what to do with my Grafana and my InfluxDB. I thought for a long time about these… the decision I made is: migrate Grafana just as any other app into k8s gradually, but don’t go with InfluxDB. I have already Prometheus so I will go with that for my k8s metrics monitoring. I just exposed it on the internal ingress and registered it in the existing Grafana as a data source.
Setup PoC with a small service
I wasn't sure what to migrate. Simple thing? At some point, nothing is simple, even stateless services appear in my ingress Caddyfile
via Consul Templates and get registered into Cloudflare DNS for Tailscale IPs and local dnsmasque
for LAN IP overrides.
I decided to go with n8n
service since it is just complex enough to holistically check many things:
logs must appear in my Grafana Loki;
certificate needs to be issued;
external DNS and Caddy should forward webhooks into n8n;
internal DNS should expose the Web UI;
deployment should work automatically using ArgoCD;
bugs like this one should be fixed by renaming the service from
n8n
;NFS on my ancient Synology should be used for the persistent volume;
I still have a small but important optimization: instead of using NFS as a backup destination (to which I push the locally mounted files once per day), I should “just trust the cluster” and let it use NFS always.
Future plan
The master plan is to drain the ixion
node of all Nomad jobs and introduce it into the cluster as a new k3s node. Basically, let it disappear from Consul and from Nomad node listings.
To do that, I will have to migrate complex work into another node (there’s still space for that extra work, luckily) and migrate simple work into k8s just as I did with n8n
service. Finally, I will then be able to introduce ixion
as a new worker k3s node into the Kubernetes cluster.
Then, one by one, all the rest. I think this will be done gradually during 2025 when I find time, but the idea is to go into 2026 without consul/nomad. That’s the plan at least…
Subscribe to my newsletter
Read articles from Milan Aleksić directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
