How I Recovered a Broken MariaDB Replica in Kubernetes (Bitnami + K8s)

VikramVikram
2 min read

Recently, I ran into a frustrating issue with our MariaDB setup in Kubernetes using the Bitnami Helm chart. One of the read replicas (secondary node) kept restarting constantly with exit code 139, which is usually a segmentation fault. Meanwhile, the primary node was unaffected and continued working fine.

πŸ” The Situation

We had a Bitnami MariaDB setup running in a Kubernetes cluster with a primary-secondary replication topology. Everything had been running fine for months until one day, I noticed that the secondary pod was crash-looping with this:

Exit Code: 139 (Segmentation fault)

Not super helpful.

πŸ› οΈ What I Tried (That Didn't Work)

Since the primary was healthy, I figured I could just rebuild the secondary.

  1. I mounted the secondary PVC (which was still referenced, but I had already cleared the data earlier in panic).

  2. Tried doing a mysqldump from the primary and restoring into the secondary.

    • Problem: The dump was way too big and took too long.

βœ… What Finally Worked

Here’s the step-by-step of what actually worked to recover the replica:

1. πŸ”‘ Identify Master Status on the Primary

Logged into the primary MariaDB instance and captured the binlog position:

SHOW MASTER STATUS;

This gave me:

File: mysql-bin.000123
Position: 456789

2. 🧊 Lock Tables on the Primary

FLUSH TABLES WITH READ LOCK;

Leave this session open β€” it holds the read lock.

3. 🚚 Use rsync to Copy the Data

With both primary and secondary PVCs mounted in a helper pod, I ran:

rsync -a /mnt/primary-data/ /mnt/secondary-data/

βœ… This is faster than mysqldump and preserves internal replication metadata.

4. πŸ”„ Configure the Replica

Once the copy completed, I configured the replica to start replication:

CHANGE MASTER TO
  MASTER_HOST='primary-hostname',
  MASTER_USER='replicator',
  MASTER_PASSWORD='repl-password',
  MASTER_LOG_FILE='mysql-bin.000123',
  MASTER_LOG_POS=456789;

START SLAVE;

5. πŸ”“ Unlock the Primary

Go back to the terminal holding the read lock and run:

UNLOCK TABLES;

6. βœ… Confirm Replication is Working

On the secondary:

SHOW SLAVE STATUS\G

You should see:

Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0

πŸ’‘ Key Takeaways

  • Segfault (code 139) can be a symptom of a broken MariaDB datadir, especially after PVC loss

  • rsync is faster and more robust than mysqldump for full-state syncing

  • Lock the primary when syncing to avoid inconsistent data

  • Don't forget to note the binlog file and position before syncing

  • Bitnami MariaDB replicas work beautifully if the initial state is consistent


Hopefully, this helps someone avoid a full day of trial and error!

Let me know if you've had similar horror stories in Kubernetes + databases!

0
Subscribe to my newsletter

Read articles from Vikram directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vikram
Vikram