How I Recovered a Broken MariaDB Replica in Kubernetes (Bitnami + K8s)

Recently, I ran into a frustrating issue with our MariaDB setup in Kubernetes using the Bitnami Helm chart. One of the read replicas (secondary node) kept restarting constantly with exit code 139, which is usually a segmentation fault. Meanwhile, the primary node was unaffected and continued working fine.
π The Situation
We had a Bitnami MariaDB setup running in a Kubernetes cluster with a primary-secondary replication topology. Everything had been running fine for months until one day, I noticed that the secondary pod was crash-looping with this:
Exit Code: 139 (Segmentation fault)
Not super helpful.
π οΈ What I Tried (That Didn't Work)
Since the primary was healthy, I figured I could just rebuild the secondary.
I mounted the secondary PVC (which was still referenced, but I had already cleared the data earlier in panic).
Tried doing a
mysqldump
from the primary and restoring into the secondary.- Problem: The dump was way too big and took too long.
β What Finally Worked
Hereβs the step-by-step of what actually worked to recover the replica:
1. π Identify Master Status on the Primary
Logged into the primary MariaDB instance and captured the binlog position:
SHOW MASTER STATUS;
This gave me:
File: mysql-bin.000123
Position: 456789
2. π§ Lock Tables on the Primary
FLUSH TABLES WITH READ LOCK;
Leave this session open β it holds the read lock.
3. π Use rsync
to Copy the Data
With both primary and secondary PVCs mounted in a helper pod, I ran:
rsync -a /mnt/primary-data/ /mnt/secondary-data/
β
This is faster than mysqldump
and preserves internal replication metadata.
4. π Configure the Replica
Once the copy completed, I configured the replica to start replication:
CHANGE MASTER TO
MASTER_HOST='primary-hostname',
MASTER_USER='replicator',
MASTER_PASSWORD='repl-password',
MASTER_LOG_FILE='mysql-bin.000123',
MASTER_LOG_POS=456789;
START SLAVE;
5. π Unlock the Primary
Go back to the terminal holding the read lock and run:
UNLOCK TABLES;
6. β Confirm Replication is Working
On the secondary:
SHOW SLAVE STATUS\G
You should see:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
π‘ Key Takeaways
Segfault (code 139) can be a symptom of a broken MariaDB datadir, especially after PVC loss
rsync
is faster and more robust thanmysqldump
for full-state syncingLock the primary when syncing to avoid inconsistent data
Don't forget to note the binlog file and position before syncing
Bitnami MariaDB replicas work beautifully if the initial state is consistent
Hopefully, this helps someone avoid a full day of trial and error!
Let me know if you've had similar horror stories in Kubernetes + databases!
Subscribe to my newsletter
Read articles from Vikram directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
