Backup and Restore Method 1

Basic ETCD configuration

View the ETCD version
- Describe the etcd pod and check the image name
  
  kubectl describe pod -n kube-system etcd-controlplane
  
  Image: registry.k8s.io/etcd:3.5.12-0
Address to reach the ETCD cluster from the controlplane node
- Describe the etcd pod and check for the --listen-client-urls option in command section
  
  kubectl describe pod -n kube-system etcd-controlplane
  
  --listen-client-urls=https://127.0.0.1:2379,https://192.6.63.9:2379
ETCD server certificate and key file location
- Describe the etcd pod and check for --cert-file option for certificate and --key-file option for key in command section
  
  kubectl describe pod -n kube-system etcd-controlplane
  
  --cert-file=/etc/kubernetes/pki/etcd/server.crt
  
  --key-file=/etc/kubernetes/pki/etcd/server.key
ETCD CA certificate file location
- Describe the etcd pod and check for --trusted-ca-file option for CA certificate in command section
  
  kubectl describe pod -n kube-system etcd-controlplane
  
  --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

ETCD Backup and Restore

Install etcdctl

apt-get install etcd-client
Command to take the snapshot of ETCD database
- Firstly set the etcd version to 3
  
  export ETCDCTL_API=3
- Use etcd snapshot save -h command to view all the options
- Options --cacert, --cert, --key, --endpoints are mandatory to pass while taking a snapshot
  
  etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  
  --key=/etc/kubernetes/pki/etcd/server.key \
  
  --endpoints=127.0.0.1:2379 \
  
  snapshot save /opt/snapshot-pre-boot.db
Steps to restore the ETCD database form a snapshot
- Firstly set the etcd version to 3
  
  export ETCDCTL_API=3
- Use etcd snapshot restore -h command to view all the options
- Options --cacert, --cert, --key, --endpoints are optional and --data-dir is only required to pass while restoring a snapshot
  
  etcdctl --data-dir /var/lib/etcd-from-backup \
  
  snapshot restore /opt/snapshot-pre-boot.db
- Note: In this case, we are restoring the snapshot to a different directory but in the same server where we took the backup (the controlplane node) As a result, the only required option for the restore command is the --data-dir.
- Next, update the /etc/kubernetes/manifests/etcd.yaml
- We have now restored the etcd snapshot to a new path on the controlplane - /var/lib/etcd-from-backup, so, the only change to be made in the YAML file, is to change the hostPath for the volume called etcd-data from old directory (/var/lib/etcd) to the new directory (/var/lib/etcd-from-backup).
```
    volumes:
    - hostPath:
        path: /var/lib/etcd-from-backup
        type: DirectoryOrCreate
      name: etcd-data
```
- With this change, /var/lib/etcd on the container points to /var/lib/etcd-from-backup on the controlplane (which is what we want).

Backup and Restore Method 2

ETCD Backup and Restore - Stacked ETCD

“Backup can only be taken from controlplane node”

Steps to check how ETCD is configured on cluster
- Option 1
  - Run kubectl get pods -A command to view all the pods running in cluster.
  - If the pod with the name (suffix) etcd is present in pod list, i.e. it is stacked ETCD.
- Option 2
  - Describe the kube-apiserver pod and check for --etcd-servers option in command section.
  - If it is set to --etcd-servers=127.0.0.1:2379, i.e. it is running on same machine as of master/controlplane node and it is stacked ETCD.
Default data directory used the for stacked ETCD
- Describe the etcd pod and check for --data-dir option in command section
  
  kubectl describe pod -n kube-system etcd-cluster1-controlplane
  
  --data-dir=/var/lib/etcd

Take a backup of ETCD on cluster1 and save it on the student-node at the path /opt/cluster1.db

On the student-node: First set the context to cluster1:

  student-node ~ ➜  kubectl config use-context cluster1
  Switched to context "cluster1".

Next, inspect the endpoints and certificates used by the etcd pod.

  student-node ~ ✖ kubectl describe  pods -n kube-system etcd-cluster1-controlplane  | grep advertise-client-urls
        --advertise-client-urls=https://192.160.244.10:2379

  student-node ~ ➜  

  student-node ~ ➜  kubectl describe  pods -n kube-system etcd-cluster1-controlplane  | grep pki
        --cert-file=/etc/kubernetes/pki/etcd/server.crt
        --key-file=/etc/kubernetes/pki/etcd/server.key
        --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
        --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
        --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
        --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
        /etc/kubernetes/pki/etcd from etcd-certs (rw)
      Path:          /etc/kubernetes/pki/etcd

  student-node ~ ➜

NOTE: The IP address (192.160.244.10) shown in the above command could be different in your lab environment. Make sure to note the correct IP address before taking the backup of the cluster.

SSH to the controlplane node of cluster1 and then take the backup using the endpoints and certificates we identified above:

  cluster1-controlplane ~ ➜  ETCDCTL_API=3 etcdctl --endpoints=https://192.160.244.10:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/cluster1.db
  Snapshot saved at /opt/cluster1.db

  cluster1-controlplane ~ ➜

Finally, copy the backup to the student-node. To do this, go back to the student-node and use scp as shown below:

  student-node ~ ➜  scp cluster1-controlplane:/opt/cluster1.db /opt
  cluster1.db                                                                                                        100% 2088KB 112.3MB/s   00:00    

  student-node ~ ➜

ETCD Backup and Restore - External ETCD

Steps to check how ETCD is configured on cluster
- Option 1
  - Run kubectl get pods -A command to view all the pods running in cluster.
  - If the pod with the name (suffix) etcd is not present in pod list, i.e. it is external ETCD.
- Option 2
  - Describe the kube-apiserver pod and check for --etcd-servers option in command section.
  - If it is set to other than --etcd-servers=127.0.0.1:2379, i.e. it is not running on same machine as of master/controlplane node and it is external ETCD.
Steps to login into the External ETCD
- Describe the kube-apiserver pod and check for --etcd-servers option in command section.
- You will get the IP of ETCD server, ex: --etcd-servers=https://192.6.129.22:2379.
- SSH into it, and you will get login into ETCD server.
  
  ssh 192.6.129.22
Default data directory used the for external ETCD
- Run the below command on the ETCD server after logging into it and look for --data-dir option.
  
  ps -ef | grep etcd / ps -aux | grep etcd
Number of nodes that are part of the ETCD cluster that etcd-server is a part of
- Firstly set the etcd version to 3
  
  export ETCDCTL_API=3
- Run the etcdctl member list command to view the members and count the number of rows.
- Options --cacert, --cert, --key, --endpoints are mandatory to pass while taking a snapshot
  
  etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  
  --key=/etc/kubernetes/pki/etcd/server.key \
  
  --endpoints=127.0.0.1:2379 \
  
  member list

Restore backup of external ETCD (IMP)

Step 1. Copy the snapshot file from the student-node to the etcd-server. In the example below, we are copying it to the /root directory:

student-node ~  scp /opt/cluster2.db etcd-server:/root
cluster2.db                                                                                                        100% 1108KB 178.5MB/s   00:00    

student-node ~ ➜

Step 2: Restore the snapshot on the cluster2. Since we are restoring directly on the etcd-server, we can use the endpoint https:/127.0.0.1. Use the same certificates that were identified earlier. Make sure to use the data-dir as /var/lib/etcd-data-new:

etcd-server ~ ➜  ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem snapshot restore /root/cluster2.db --data-dir /var/lib/etcd-data-new
{"level":"info","ts":1721940922.0441437,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
{"level":"info","ts":1721940922.060755,"caller":"mvcc/kvstore.go:388","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":951}
{"level":"info","ts":1721940922.0667593,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1721940922.0732546,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}

etcd-server ~ ➜

Step 3: Update the systemd service unit file for etcd by running vi /etc/systemd/system/etcd.service and add the new value for data-dir:

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \
  --name etcd-server \
  --data-dir=/var/lib/etcd-data-new \
---End of Snippet---

Step 4: make sure the permissions on the new directory is correct (should be owned by etcd user):

etcd-server /var/lib ➜  chown -R etcd:etcd /var/lib/etcd-data-new

etcd-server /var/lib ➜ 


etcd-server /var/lib ➜  ls -ld /var/lib/etcd-data-new/
drwx------ 3 etcd etcd 4096 Jul 15 20:55 /var/lib/etcd-data-new/
etcd-server /var/lib ➜

Step 5: Finally, reload and restart the etcd service.

etcd-server ~ ➜  systemctl daemon-reload 
etcd-server ~ ➜  systemctl restart etcd
etcd-server ~ ➜

Step 6 (optional): It is recommended to restart controlplane components (e.g. kube-scheduler, kube-controller-manager, kubelet) to ensure that they don't rely on some stale data.

10 - Backup and Restore ETCD

Table of contents