Backup and Restore ETCD

Rohit PagoteRohit Pagote
6 min read

Backup and Restore Method 1

Basic ETCD configuration

  • View the ETCD version

  • Address to reach the ETCD cluster from the controlplane node

  • ETCD server certificate and key file location

    • Describe the etcd pod and check for --cert-file option for certificate and --key-file option for key in command section

      kubectl describe pod -n kube-system etcd-controlplane

      --cert-file=/etc/kubernetes/pki/etcd/server.crt

      --key-file=/etc/kubernetes/pki/etcd/server.key

  • ETCD CA certificate file location

    • Describe the etcd pod and check for --trusted-ca-file option for CA certificate in command section

      kubectl describe pod -n kube-system etcd-controlplane

      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

ETCD Backup and Restore

  • Install etcdctl

    apt-get install etcd-client

  • Command to take the snapshot of ETCD database

    • Firstly set the etcd version to 3

      export ETCDCTL_API=3

    • Use etcd snapshot save -h command to view all the options

    • Options --cacert, --cert, --key, --endpoints are mandatory to pass while taking a snapshot

      etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \

      --cert=/etc/kubernetes/pki/etcd/server.crt \

      --key=/etc/kubernetes/pki/etcd/server.key \

      --endpoints=127.0.0.1:2379 \

      snapshot save /opt/snapshot-pre-boot.db

  • Steps to restore the ETCD database form a snapshot

    • Firstly set the etcd version to 3

      export ETCDCTL_API=3

    • Use etcd snapshot restore -h command to view all the options

    • Options --cacert, --cert, --key, --endpoints are optional and --data-dir is only required to pass while restoring a snapshot

      etcdctl --data-dir /var/lib/etcd-from-backup \

      snapshot restore /opt/snapshot-pre-boot.db

    • Note: In this case, we are restoring the snapshot to a different directory but in the same server where we took the backup (the controlplane node) As a result, the only required option for the restore command is the --data-dir.

    • Next, update the /etc/kubernetes/manifests/etcd.yaml

    • We have now restored the etcd snapshot to a new path on the controlplane - /var/lib/etcd-from-backup, so, the only change to be made in the YAML file, is to change the hostPath for the volume called etcd-data from old directory (/var/lib/etcd) to the new directory (/var/lib/etcd-from-backup).

          volumes:
          - hostPath:
              path: /var/lib/etcd-from-backup
              type: DirectoryOrCreate
            name: etcd-data
      
    • With this change, /var/lib/etcd on the container points to /var/lib/etcd-from-backup on the controlplane (which is what we want).


Backup and Restore Method 2

ETCD Backup and Restore - Stacked ETCD

“Backup can only be taken from controlplane node”

  • Steps to check how ETCD is configured on cluster

    • Option 1

      • Run kubectl get pods -A command to view all the pods running in cluster.

      • If the pod with the name (suffix) etcd is present in pod list, i.e. it is stacked ETCD.

    • Option 2

      • Describe the kube-apiserver pod and check for --etcd-servers option in command section.

      • If it is set to --etcd-servers=127.0.0.1:2379, i.e. it is running on same machine as of master/controlplane node and it is stacked ETCD.

  • Default data directory used the for stacked ETCD

    • Describe the etcd pod and check for --data-dir option in command section

      kubectl describe pod -n kube-system etcd-cluster1-controlplane

      --data-dir=/var/lib/etcd

  • Take a backup of ETCD on cluster1 and save it on the student-node at the path /opt/cluster1.db

    • On the student-node: First set the context to cluster1:

        student-node ~ ➜  kubectl config use-context cluster1
        Switched to context "cluster1".
      
    • Next, inspect the endpoints and certificates used by the etcd pod.

        student-node ~ ✖ kubectl describe  pods -n kube-system etcd-cluster1-controlplane  | grep advertise-client-urls
              --advertise-client-urls=https://192.160.244.10:2379
      
        student-node ~ ➜  
      
        student-node ~ ➜  kubectl describe  pods -n kube-system etcd-cluster1-controlplane  | grep pki
              --cert-file=/etc/kubernetes/pki/etcd/server.crt
              --key-file=/etc/kubernetes/pki/etcd/server.key
              --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
              --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
              --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
              --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
              /etc/kubernetes/pki/etcd from etcd-certs (rw)
            Path:          /etc/kubernetes/pki/etcd
      
        student-node ~ ➜
      
    • NOTE: The IP address (192.160.244.10) shown in the above command could be different in your lab environment. Make sure to note the correct IP address before taking the backup of the cluster.

    • SSH to the controlplane node of cluster1 and then take the backup using the endpoints and certificates we identified above:

        cluster1-controlplane ~   ETCDCTL_API=3 etcdctl --endpoints=https://192.160.244.10:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/cluster1.db
        Snapshot saved at /opt/cluster1.db
      
        cluster1-controlplane ~ 
      
    • Finally, copy the backup to the student-node. To do this, go back to the student-node and use scp as shown below:

        student-node ~   scp cluster1-controlplane:/opt/cluster1.db /opt
        cluster1.db                                                                                                        100% 2088KB 112.3MB/s   00:00    
      
        student-node ~ 
      

ETCD Backup and Restore - External ETCD

  • Steps to check how ETCD is configured on cluster

    • Option 1

      • Run kubectl get pods -A command to view all the pods running in cluster.

      • If the pod with the name (suffix) etcd is not present in pod list, i.e. it is external ETCD.

    • Option 2

      • Describe the kube-apiserver pod and check for --etcd-servers option in command section.

      • If it is set to other than --etcd-servers=127.0.0.1:2379, i.e. it is not running on same machine as of master/controlplane node and it is external ETCD.

  • Steps to login into the External ETCD

    • Describe the kube-apiserver pod and check for --etcd-servers option in command section.

    • You will get the IP of ETCD server, ex: --etcd-servers=https://192.6.129.22:2379.

    • SSH into it, and you will get login into ETCD server.

      ssh 192.6.129.22

  • Default data directory used the for external ETCD

    • Run the below command on the ETCD server after logging into it and look for --data-dir option.

      ps -ef | grep etcd / ps -aux | grep etcd

  • Number of nodes that are part of the ETCD cluster that etcd-server is a part of

    • Firstly set the etcd version to 3

      export ETCDCTL_API=3

    • Run the etcdctl member list command to view the members and count the number of rows.

    • Options --cacert, --cert, --key, --endpoints are mandatory to pass while taking a snapshot

      etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \

      --cert=/etc/kubernetes/pki/etcd/server.crt \

      --key=/etc/kubernetes/pki/etcd/server.key \

      --endpoints=127.0.0.1:2379 \

      member list

Restore backup of external ETCD (IMP)

Step 1. Copy the snapshot file from the student-node to the etcd-server. In the example below, we are copying it to the /root directory:

student-node ~  scp /opt/cluster2.db etcd-server:/root
cluster2.db                                                                                                        100% 1108KB 178.5MB/s   00:00    

student-node ~ ➜

Step 2: Restore the snapshot on the cluster2. Since we are restoring directly on the etcd-server, we can use the endpoint https:/127.0.0.1. Use the same certificates that were identified earlier. Make sure to use the data-dir as /var/lib/etcd-data-new:

etcd-server ~ ➜  ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem snapshot restore /root/cluster2.db --data-dir /var/lib/etcd-data-new
{"level":"info","ts":1721940922.0441437,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
{"level":"info","ts":1721940922.060755,"caller":"mvcc/kvstore.go:388","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":951}
{"level":"info","ts":1721940922.0667593,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1721940922.0732546,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}

etcd-server ~ ➜

Step 3: Update the systemd service unit file for etcd by running vi /etc/systemd/system/etcd.service and add the new value for data-dir:

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \
  --name etcd-server \
  --data-dir=/var/lib/etcd-data-new \
---End of Snippet---

Step 4: make sure the permissions on the new directory is correct (should be owned by etcd user):

etcd-server /var/lib ➜  chown -R etcd:etcd /var/lib/etcd-data-new

etcd-server /var/lib ➜ 


etcd-server /var/lib ➜  ls -ld /var/lib/etcd-data-new/
drwx------ 3 etcd etcd 4096 Jul 15 20:55 /var/lib/etcd-data-new/
etcd-server /var/lib ➜

Step 5: Finally, reload and restart the etcd service.

etcd-server ~ ➜  systemctl daemon-reload 
etcd-server ~ ➜  systemctl restart etcd
etcd-server ~ ➜

Step 6 (optional): It is recommended to restart controlplane components (e.g. kube-scheduler, kube-controller-manager, kubelet) to ensure that they don't rely on some stale data.

0
Subscribe to my newsletter

Read articles from Rohit Pagote directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rohit Pagote
Rohit Pagote

I am an aspiring DevOps Engineer proficient with containers and container orchestration tools like Docker, Kubernetes along with experienced in Infrastructure as code tools and Configuration as code tools, Terraform, Ansible. Well-versed in CICD tool - Jenkins. Have hands-on experience with various AWS and Azure services. I really enjoy learning new things and connecting with people across a range of industries, so don't hesitate to reach out if you'd like to get in touch.