ETCD Backup and Restore Explained: Day 35 of 40daysofkubernetes

Shivam GautamShivam Gautam
3 min read

Introduction

In any Kubernetes cluster, etcd plays a vital role as it stores all the cluster's critical data, including configuration, state, and secrets. Since it acts as the "source of truth" for the entire cluster, ensuring the safety and integrity of etcd data is crucial for maintaining the health and continuity of your environment. A proper backup and restore strategy is essential for disaster recovery, cluster migrations, and maintaining data consistency.

Let’s dive into why and how we can back up and restore etcd in a Kubernetes cluster!

What is ETCD in Kubernetes?

etcd is a distributed, key-value store that stores the critical configuration data and state of a Kubernetes cluster. This includes information such as:

  • Cluster state

  • API objects (nodes, pods, services, secrets, etc.)

  • Configuration data required to run and manage the cluster.

It is the "source of truth" for the entire Kubernetes cluster, making it one of the most important components.

Why Do We Need to Backup and Restore ETCD?

  • Disaster Recovery: In case of hardware failures, accidental deletions, or corruption, you can restore the cluster to a previous state.

  • Cluster Migration: When moving the Kubernetes cluster to another environment or upgrading to a new version, you need the backup to migrate data.

  • Audit and Compliance: Regular backups help ensure your data is safe and can be restored in case of auditing needs.

  • Maintain State: To prevent loss of the cluster’s entire state (configurations, deployments, etc.), backups are crucial.

How to Backup and Restore ETCD

Prerequisites

  1. Access to the Master Node: We should be able to SSH into our Kubernetes master node.

  2. ETCD CLI tool: Ensure etcdctl (the etcd command-line client) is installed on the node where etcd runs.

    To install it, run:

     sudo apt install etcd-client
    

Backup Steps

  1. Set the Environment Variables
    First, set the environment variable ETCDCTL_API=3 to specify the correct etcd API version:

     export ETCDCTL_API=3
    

  2. Provide the Required Certificates and Endpoints
    Extract the necessary endpoint, CA certificate, and keys from the /etc/kubernetes/manifests/etcd.yaml file.

  3. Run the Backup Command
    Use etcdctl to back up etcd:

     etcdctl --endpoints=https://127.0.0.1:2379 \
     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt \
     --key=/etc/kubernetes/pki/etcd/server.key \
     snapshot save /opt/etcd-backup.db
    

  4. Verify the Backup
    Check the size of the backup file:

     du -sh /opt/etcd-backup.db
    

    To get detailed information about the snapshot, use:

     sudo etcdctl --write-out=table snapshot status /opt/etcd-backup.db
    

Restore Steps

  1. Simulate a Failure
    For demonstration, delete some resources such as deployments or services.

  2. Run the Restore Command
    Restore the etcd backup file using:

     sudo etcdctl --endpoints=https://127.0.0.1:2379 \
     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt \
     --key=/etc/kubernetes/pki/etcd/server.key \
     snapshot restore /opt/etcd-backup.db --data-dir=/var/lib/etcd-restore-from-backup
    
  3. Update the ETCD Configuration
    After restoring, update the etcd.yaml manifest file to point to the restored data directory,volume mountPath directory and hostPath directory.

    Before

    After

  4. Restart the kubelet
    Move all manifests temporarily to /tmp, then back to their original location to refresh the components:

     sudo systemctl stop kubelet
     sudo mv /etc/kubernetes/manifests/* /tmp
     sudo mv /tmp/* /etc/kubernetes/manifests/
     sudo systemctl start kubelet
     sudo systemctl daemon-reload
    

  5. Verify the Restoration
    After restarting the services, our pods and services should be up and running, confirming that the restoration was successful.

Conclusion

Regular backups of etcd are crucial for ensuring the health, recoverability, and continuity of a Kubernetes cluster. Whether for disaster recovery, migration, or maintaining consistency, the backup and restore process helps safeguard the critical state of your cluster.

Resources I used

1
Subscribe to my newsletter

Read articles from Shivam Gautam directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shivam Gautam
Shivam Gautam

DevOps & AWS Learner | Sharing my insights and progress 📚💡|| 1X AWS Certified || AWS CLoud Club Captain