Backup and Restore ETCD
Backup and Restore Method 1
Basic ETCD configuration
View the ETCD version
Describe the etcd pod and check the image name
kubectl describe pod -n kube-system etcd-controlplane
Address to reach the ETCD cluster from the controlplane node
Describe the etcd pod and check for the
--listen-client-urls
option in command sectionkubectl describe pod -n kube-system etcd-controlplane
--listen-client-urls=
https://127.0.0.1:2379,https://192.6.63.9:2379
ETCD server certificate and key file location
Describe the etcd pod and check for
--cert-file
option for certificate and--key-file
option for key in command sectionkubectl describe pod -n kube-system etcd-controlplane
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--key-file=/etc/kubernetes/pki/etcd/server.key
ETCD CA certificate file location
Describe the etcd pod and check for
--trusted-ca-file
option for CA certificate in command sectionkubectl describe pod -n kube-system etcd-controlplane
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
ETCD Backup and Restore
Install
etcdctl
apt-get install etcd-client
Command to take the snapshot of ETCD database
Firstly set the etcd version to 3
export ETCDCTL_API=3
Use
etcd snapshot save -h
command to view all the optionsOptions
--cacert, --cert, --key, --endpoints
are mandatory to pass while taking a snapshotetcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--endpoints=127.0.0.1:2379 \
snapshot save /opt/snapshot-pre-boot.db
Steps to restore the ETCD database form a snapshot
Firstly set the etcd version to 3
export ETCDCTL_API=3
Use
etcd snapshot restore -h
command to view all the optionsOptions
--cacert, --cert, --key, --endpoints
are optional and--data-dir
is only required to pass while restoring a snapshotetcdctl --data-dir /var/lib/etcd-from-backup \
snapshot restore /opt/snapshot-pre-boot.db
Note: In this case, we are restoring the snapshot to a different directory but in the same server where we took the backup (the controlplane node) As a result, the only required option for the restore command is the
--data-dir
.Next, update the
/etc/kubernetes/manifests/etcd.yaml
We have now restored the etcd snapshot to a new path on the controlplane -
/var/lib/etcd-from-backup
, so, the only change to be made in the YAML file, is to change the hostPath for the volume calledetcd-data
from old directory (/var/lib/etcd
) to the new directory (/var/lib/etcd-from-backup
).volumes: - hostPath: path: /var/lib/etcd-from-backup type: DirectoryOrCreate name: etcd-data
With this change,
/var/lib/etcd
on the container points to/var/lib/etcd-from-backup
on thecontrolplane
(which is what we want).
Backup and Restore Method 2
ETCD Backup and Restore - Stacked ETCD
“Backup can only be taken from controlplane node”
Steps to check how ETCD is configured on cluster
Option 1
Run
kubectl get pods -A
command to view all the pods running in cluster.If the pod with the name (suffix)
etcd
is present in pod list, i.e. it is stacked ETCD.
Option 2
Describe the kube-apiserver pod and check for
--etcd-servers
option in command section.If it is set to
--etcd-servers=127.0.0.1:2379
, i.e. it is running on same machine as of master/controlplane node and it is stacked ETCD.
Default data directory used the for stacked ETCD
Describe the etcd pod and check for
--data-dir
option in command sectionkubectl describe pod -n kube-system etcd-cluster1-controlplane
--data-dir=/var/lib/etcd
Take a backup of ETCD on
cluster1
and save it on thestudent-node
at the path/opt/cluster1.db
On the
student-node
: First set the context tocluster1
:student-node ~ ➜ kubectl config use-context cluster1 Switched to context "cluster1".
Next, inspect the endpoints and certificates used by the
etcd
pod.student-node ~ ✖ kubectl describe pods -n kube-system etcd-cluster1-controlplane | grep advertise-client-urls --advertise-client-urls=https://192.160.244.10:2379 student-node ~ ➜ student-node ~ ➜ kubectl describe pods -n kube-system etcd-cluster1-controlplane | grep pki --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt /etc/kubernetes/pki/etcd from etcd-certs (rw) Path: /etc/kubernetes/pki/etcd student-node ~ ➜
NOTE: The IP address (
192.160.244.10
) shown in the above command could be different in your lab environment. Make sure to note the correct IP address before taking the backup of the cluster.SSH to the
controlplane
node ofcluster1
and then take the backup using the endpoints and certificates we identified above:cluster1-controlplane ~ ➜ ETCDCTL_API=3 etcdctl --endpoints=https://192.160.244.10:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/cluster1.db Snapshot saved at /opt/cluster1.db cluster1-controlplane ~ ➜
Finally, copy the backup to the
student-node
. To do this, go back to thestudent-node
and usescp
as shown below:student-node ~ ➜ scp cluster1-controlplane:/opt/cluster1.db /opt cluster1.db 100% 2088KB 112.3MB/s 00:00 student-node ~ ➜
ETCD Backup and Restore - External ETCD
Steps to check how ETCD is configured on cluster
Option 1
Run
kubectl get pods -A
command to view all the pods running in cluster.If the pod with the name (suffix)
etcd
is not present in pod list, i.e. it is external ETCD.
Option 2
Describe the kube-apiserver pod and check for
--etcd-servers
option in command section.If it is set to other than
--etcd-servers=127.0.0.1:2379
, i.e. it is not running on same machine as of master/controlplane node and it is external ETCD.
Steps to login into the External ETCD
Describe the kube-apiserver pod and check for
--etcd-servers
option in command section.You will get the IP of ETCD server, ex:
--etcd-servers=
https://192.6.129.22:2379
.SSH into it, and you will get login into ETCD server.
ssh 192.6.129.22
Default data directory used the for external ETCD
Run the below command on the ETCD server after logging into it and look for
--data-dir
option.ps -ef | grep etcd
/ps -aux | grep etcd
Number of nodes that are part of the ETCD cluster that
etcd-server
is a part ofFirstly set the etcd version to 3
export ETCDCTL_API=3
Run the
etcdctl member list
command to view the members and count the number of rows.Options
--cacert, --cert, --key, --endpoints
are mandatory to pass while taking a snapshotetcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--endpoints=127.0.0.1:2379 \
member list
Restore backup of external ETCD (IMP)
Step 1. Copy the snapshot file from the student-node
to the etcd-server
. In the example below, we are copying it to the /root
directory:
student-node ~ scp /opt/cluster2.db etcd-server:/root
cluster2.db 100% 1108KB 178.5MB/s 00:00
student-node ~ ➜
Step 2: Restore the snapshot on the cluster2
. Since we are restoring directly on the etcd-server
, we can use the endpoint https:/127.0.0.1
. Use the same certificates that were identified earlier. Make sure to use the data-dir
as /var/lib/etcd-data-new
:
etcd-server ~ ➜ ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem snapshot restore /root/cluster2.db --data-dir /var/lib/etcd-data-new
{"level":"info","ts":1721940922.0441437,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
{"level":"info","ts":1721940922.060755,"caller":"mvcc/kvstore.go:388","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":951}
{"level":"info","ts":1721940922.0667593,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1721940922.0732546,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
etcd-server ~ ➜
Step 3: Update the systemd service unit file for etcd by running vi /etc/systemd/system/etcd.service
and add the new value for data-dir
:
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \
--name etcd-server \
--data-dir=/var/lib/etcd-data-new \
---End of Snippet---
Step 4: make sure the permissions on the new directory is correct (should be owned by etcd user):
etcd-server /var/lib ➜ chown -R etcd:etcd /var/lib/etcd-data-new
etcd-server /var/lib ➜
etcd-server /var/lib ➜ ls -ld /var/lib/etcd-data-new/
drwx------ 3 etcd etcd 4096 Jul 15 20:55 /var/lib/etcd-data-new/
etcd-server /var/lib ➜
Step 5: Finally, reload and restart the etcd
service.
etcd-server ~ ➜ systemctl daemon-reload
etcd-server ~ ➜ systemctl restart etcd
etcd-server ~ ➜
Step 6 (optional): It is recommended to restart controlplane components (e.g. kube-scheduler, kube-controller-manager, kubelet) to ensure that they don't rely on some stale data.
Subscribe to my newsletter
Read articles from Rohit Pagote directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Rohit Pagote
Rohit Pagote
I am an aspiring DevOps Engineer proficient with containers and container orchestration tools like Docker, Kubernetes along with experienced in Infrastructure as code tools and Configuration as code tools, Terraform, Ansible. Well-versed in CICD tool - Jenkins. Have hands-on experience with various AWS and Azure services. I really enjoy learning new things and connecting with people across a range of industries, so don't hesitate to reach out if you'd like to get in touch.