ETCD Backup and Restore Explained: Day 35 of 40daysofkubernetes
Introduction
In any Kubernetes cluster, etcd plays a vital role as it stores all the cluster's critical data, including configuration, state, and secrets. Since it acts as the "source of truth" for the entire cluster, ensuring the safety and integrity of etcd data is crucial for maintaining the health and continuity of your environment. A proper backup and restore strategy is essential for disaster recovery, cluster migrations, and maintaining data consistency.
Let’s dive into why and how we can back up and restore etcd in a Kubernetes cluster!
What is ETCD in Kubernetes?
etcd is a distributed, key-value store that stores the critical configuration data and state of a Kubernetes cluster. This includes information such as:
Cluster state
API objects (nodes, pods, services, secrets, etc.)
Configuration data required to run and manage the cluster.
It is the "source of truth" for the entire Kubernetes cluster, making it one of the most important components.
Why Do We Need to Backup and Restore ETCD?
Disaster Recovery: In case of hardware failures, accidental deletions, or corruption, you can restore the cluster to a previous state.
Cluster Migration: When moving the Kubernetes cluster to another environment or upgrading to a new version, you need the backup to migrate data.
Audit and Compliance: Regular backups help ensure your data is safe and can be restored in case of auditing needs.
Maintain State: To prevent loss of the cluster’s entire state (configurations, deployments, etc.), backups are crucial.
How to Backup and Restore ETCD
Prerequisites
Access to the Master Node: We should be able to SSH into our Kubernetes master node.
ETCD CLI tool: Ensure
etcdctl
(the etcd command-line client) is installed on the node where etcd runs.To install it, run:
sudo apt install etcd-client
Backup Steps
Set the Environment Variables
First, set the environment variableETCDCTL_API=3
to specify the correct etcd API version:export ETCDCTL_API=3
Provide the Required Certificates and Endpoints
Extract the necessary endpoint, CA certificate, and keys from the/etc/kubernetes/manifests/etcd.yaml
file.Run the Backup Command
Useetcdctl
to back up etcd:etcdctl --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ snapshot save /opt/etcd-backup.db
Verify the Backup
Check the size of the backup file:du -sh /opt/etcd-backup.db
To get detailed information about the snapshot, use:
sudo etcdctl --write-out=table snapshot status /opt/etcd-backup.db
Restore Steps
Simulate a Failure
For demonstration, delete some resources such as deployments or services.Run the Restore Command
Restore the etcd backup file using:sudo etcdctl --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ snapshot restore /opt/etcd-backup.db --data-dir=/var/lib/etcd-restore-from-backup
Update the ETCD Configuration
After restoring, update theetcd.yaml
manifest file to point to the restored data directory,volume mountPath directory and hostPath directory.Before
After
Restart the kubelet
Move all manifests temporarily to/tmp
, then back to their original location to refresh the components:sudo systemctl stop kubelet sudo mv /etc/kubernetes/manifests/* /tmp sudo mv /tmp/* /etc/kubernetes/manifests/ sudo systemctl start kubelet sudo systemctl daemon-reload
Verify the Restoration
After restarting the services, our pods and services should be up and running, confirming that the restoration was successful.
Conclusion
Regular backups of etcd are crucial for ensuring the health, recoverability, and continuity of a Kubernetes cluster. Whether for disaster recovery, migration, or maintaining consistency, the backup and restore process helps safeguard the critical state of your cluster.
Resources I used
Subscribe to my newsletter
Read articles from Shivam Gautam directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Shivam Gautam
Shivam Gautam
DevOps & AWS Learner | Sharing my insights and progress 📚💡|| 1X AWS Certified || AWS CLoud Club Captain