Deep dive into Kubernetes Cluster Maintenance
Kubernetes clusters can be complex and mission-critical systems, which means that it's important to have a solid backup and restore plan in place.
Upgrading Kubeadm clusters :
Here is a bash script to upgrade a Kubernetes cluster using Kubeadm:
bashCopy code#!/bin/bash
# Check the current version of Kubernetes
echo "Current Kubernetes version:"
kubectl version
# Check the available versions of kubeadm
echo "Available versions of kubeadm:"
apt list -a kubeadm
# Upgrade kubeadm to the latest version
echo "Upgrading kubeadm..."
apt-get update && apt-get install -y kubeadm=<latest-version>
# Drain the nodes in the Kubernetes cluster
echo "Draining the nodes..."
for node in $(kubectl get nodes --no-headers | awk '{print $1}'); do
kubectl drain $node --ignore-daemonsets
done
# Upgrade the control plane components
echo "Upgrading the control plane components..."
kubeadm upgrade apply <new-version>
# Upgrade the kubelet and kubectl components on each worker node
echo "Upgrading the kubelet and kubectl components on each worker node..."
for node in $(kubectl get nodes --no-headers | awk '{print $1}'); do
ssh $node "apt-get update && apt-get install -y kubelet=<new-version> kubectl=<new-version>"
done
# Upgrade the control plane configuration
echo "Upgrading the control plane configuration..."
kubeadm upgrade node config --kubelet-version <new-version>
# Uncordon the nodes in the Kubernetes cluster
echo "Uncordoning the nodes..."
for node in $(kubectl get nodes --no-headers | awk '{print $1}'); do
kubectl uncordon $node
done
# Verify that the upgrade was successful
echo "Verifying the upgrade..."
kubectl version
Note that you will need to replace <latest-version>
and < new version>
with the latest version of Kubeadm available and the version you want to upgrade to, respectively. You may also need to modify the SSH command to access the worker nodes in your cluster.
Backup and Restore a Kubernetes Cluster Using TrilioVault For Kubernetes:
TrilioVault for Kubernetes is a data protection and disaster recovery solution designed specifically for Kubernetes environments. It provides application-centric backup and restore capabilities, as well as the ability to migrate workloads across clusters.
Prerequisites
Before we get started, there are a few prerequisites you'll need to have in place:
A Kubernetes cluster running version 1.17 or later
Helm version 3 installed
Access to a TrilioVault for Kubernetes installation
A storage location where backups can be stored
Installing TrilioVault for Kubernetes
First, we need to install TrilioVault for Kubernetes. To do this, we'll use Helm.
- Add the Trilio repository to Helm:
helm repo add trilio https://charts.trilio.io/
- Update your local Helm chart repository:
helm repo update
- Install TrilioVault for Kubernetes:
helm install triliovault trilio/triliovault \
--namespace triliovault \
--create-namespace \
--set credentials.username=<username> \
--set credentials.password=<password> \
--set global.deployment.envName=triliovault \
--set backup.target=<backup-storage-target>
Replace <username>
and <password>
with the credentials for your TrilioVault installation, and <backup-storage-target>
with the storage location where backups should be stored.
Accessing the TVK Management Console :
kubectl get svc -n tvk
kubectl port-forward svc/k8s-triliovault-ingress-nginx-controller 8080:80 -n tvk &
Creating a Backup
Now that TrilioVault for Kubernetes is installed, we can create a backup of our cluster. To do this, we'll use the tvctl
command-line interface provided by TrilioVault.
- Install the
tvctl
command-line interface:
curl -s https://raw.githubusercontent.com/trilioData/tvctl/main/install.sh | bash
- Log in to TrilioVault:
tvctl login --user <username> --password <password> --url <triliovault-url>
Replace <username>
, <password>
, and <triliovault-url>
with the appropriate values for your TrilioVault installation.
- Create a backup of the cluster:
tvctl backup create --cluster <cluster-name> --target <backup-storage-target>
Replace <cluster-name>
with the name of your Kubernetes cluster, and <backup-storage-target>
with the storage location where backups should be stored.
Restoring from a Backup
In the event of a disaster or data loss, we can use TrilioVault for Kubernetes to restore our cluster from a backup.
- Log in to TrilioVault:
tvctl login --user <username> --password <password> --url <triliovault-url>
- List available backups:
tvctl backup list --cluster <cluster-name>
Replace <cluster-name>
with the name of your Kubernetes cluster.
- Restore the cluster from a backup:
tvctl backup restore --cluster <cluster-name> --backup <backup-name>
Replace <cluster-name>
with the name of your Kubernetes cluster, and <backup-name>
with the name of the backup you want to restore from.
Creating a TrilioVault Target to Store Backups
nano trilio-s3-target-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: trilio-s3-target-secret
namespace: tvk
type: Opaque
stringData:
accessKey: your_bucket_access_key
secretKey: your_bucket_secret_key
kubectl apply -f trilio-s3-target-secret.yaml -n tvk
nano trilio-s3-target.yaml
apiVersion: triliovault.trilio.io/v1
kind: Target
metadata:
name: trilio-s3-target
namespace: tvk
spec:
type: ObjectStore
vendor: Other
enableBrowsing: true
objectStoreCredentials:
bucketName: your_bucket_name
region: your_bucket_region # e.g.: nyc1 or us-east-1
url: https://nyc1.digitaloceanspaces.com # update the region to match your bucket
credentialSecret:
name: trilio-s3-target-secret
namespace: tvk
thresholdCapacity: 10Gi
Creating the Kubernetes Cluster Backup
k8s-cluster-backup-plan.yaml
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: k8s-cluster-backup-plan
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
backupComponents:
- namespace: wordpress
- namespace: mysqldb
- namespace: etcd
kubectl apply -f k8s-cluster-backup-plan.yaml
Outputclusterbackupplan.triliovault.trilio.io/k8s-cluster-backup-plan created
kubectl get clusterbackupplan k8s-cluster-backup-plan -n tvk
The output looks similar to this:
OutputNAME TARGET ... STATUS
k8s-cluster-backup-plan trilio-s3-target ... Available
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackup
metadata:
name: k8s-cluster-backup
namespace: tvk
spec:
type: Full
clusterBackupPlan:
name: k8s-cluster-backup-plan
namespace: tvk
kubectl apply -f k8s-cluster-backup.yaml
Outputclusterbackup.triliovault.trilio.io/k8s-cluster-backup created
kubectl get clusterbackup k8s-cluster-backup -n tvk
OutputNAME BACKUPPLAN BACKUP TYPE STATUS ... PERCENTAGE COMPLETE
k8s-cluster-backup k8s-cluster-backup-plan Full Available ... 100
kubectl delete ns wordpress
kubectl delete ns mysqldb
kubectl delete ns etcd
Outputnamespace "wordpress" deleted
namespace "mysqldb" deleted
namespace "etcd" deleted
Now that your namespaces are deleted, you’ll restore the backup.
Restoring the Backup with the Management Console :
In this section, you will use the TVK web console to restore all the important applications from your backup. The restore process will validate the target
where the backup is stored. TVK will connect to the target repository to pull the backup files using Datamover
and metal over pods
. TVK will create the Kubernetes application that was pulled from the backup storage.
To get started with the restore operation, you’ll first need to create your target
Checking the DOKS Cluster Applications State :
In this section, you will make sure that the restore operation was successful and that the applications are accessible after the restore. To begin, run the following commands to retrieve all of the objects related to the application from the namespaces listed:
kubectl get all --namespace wordpress
kubectl get all --namespace mysqldb
kubectl get all --namespace etcd
Your output will look similar to the following for each application:
OutputNAME READY STATUS RESTARTS AGE
pod/wordpress-5dcf55f8fc-72h9q 1/1 Running 1 2m21s
pod/wordpress-mariadb-0 1/1 Running 1 2m20s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/wordpress LoadBalancer 10.120.1.38 34.71.102.21 80:32402/TCP,443:31522/TCP 2m21s
service/wordpress-mariadb ClusterIP 10.120.7.213 <none> 3306/TCP 2m21s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/wordpress 1/1 1 1 2m21s
NAME DESIRED CURRENT READY AGE
replicaset.apps/wordpress-5dcf55f8fc 1 1 1 2m21s
NAME READY AGE
statefulset.apps/wordpress-mariadb 1/1 2m21s
Step 6 — Scheduling Backups :
Creating backups automatically based on a schedule is a very useful feature to have. It allows you to rewind time and restore the system to a previous working state if something goes wrong. By default, TrilioVault creates three scheduled policies: daily, weekly, and monthly.
In the TVK console, you can view the default policies under Backup & Recovery, then Scheduling Policies:
scheduled-backup-every-5min.yaml
apiVersion: triliovault.trilio.io/v1
kind: Policy
apiVersion: triliovault.trilio.io/v1
metadata:
name: scheduled-backup-every-5min
namespace: tvk
spec:
type: Schedule
scheduleConfig:
schedule:
- "*/5 * * * *" # trigger every 5 minutes
kubectl apply -f scheduled-backup-every-5min.yaml
Your output will look like this:
Outputpolicy.triliovault.trilio.io/scheduled-backup-every-5min created
k8s-cluster-backup-plan
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: k8s-cluster-backup-plan
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
schedulePolicy:
fullBackupPolicy:
name: scheduled-backup-every-5min
namespace: tvk
backupComponents:
- namespace: wordpress
- namespace: mysqldb
- namespace: etcd
TVK also has a default retention policy, which you can view in the TVK console under Backup & Recovery, then Rentention Policies:
sample-retention-policy.yaml
apiVersion: triliovault.trilio.io/v1
kind: Policy
metadata:
name: sample-retention-policy
spec:
type: Retention
retentionConfig:
latest: 2
weekly: 1
dayOfWeek: Wednesday
monthly: 1
dateOfMonth: 15
monthOfYear: March
k8s-cluster-backup-plan :
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: k8s-cluster-backup-plan
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
retentionPolicy:
fullBackupPolicy:
name: sample-retention-policy
namespace: tvk
backupComponents:
- namespace: wordpress
- namespace: mysqldb
- namespace: etcd
Backing up and Restoring the Kubernetes Cluster :
nano k8s-cluster-backup-plan.yaml
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: k8s-cluster-backup-plan
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
backupComponents:
- namespace: wordpress
- namespace: mysqldb
- namespace: etcd
kubectl apply -f k8s-cluster-backup-plan.yaml
kubectl get clusterbackupplan k8s-cluster-backup-plan -n tvk
nano k8s-cluster-backup.yaml
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackup
metadata:
name: k8s-cluster-backup
namespace: tvk
spec:
type: Full
clusterBackupPlan:
name: k8s-cluster-backup-plan
namespace: tvk
kubectl apply -f k8s-cluster-backup.yaml:
Autoscaling in Kubernetes :
Scaling a Kubernetes cluster involves adding or removing nodes to or from the cluster to increase or decrease its capacity. Here are the general steps to scale a Kubernetes cluster:
Add new worker nodes to the cluster. You can add worker nodes using a cloud provider's console or API, or by provisioning new nodes using a tool like kubeadm.
Join the new worker nodes to the Kubernetes cluster. You can join the nodes to the cluster using kubeadm or by running the kubectl join command with the appropriate flags.
Verify that the new worker nodes are added to the cluster and functioning correctly. You can use the kubectl get nodes command to verify the nodes are added to the cluster and the kubectl describe node command to check their status.
If desired, you can adjust the number of replicas for a deployment, stateful set, or replication controller to take advantage of the new capacity provided by the additional worker nodes. This can be done using the kubectl scale command.
If necessary, you can also scale the control plane components, such as the API server, etcd, scheduler, and controller manager, to handle the increased workload. This can be done by adding more replicas or upgrading the resources assigned to each component.
Monitor the cluster to ensure that everything is functioning correctly and that the new nodes are handling their share of the workload.
If desired, you can remove nodes from the cluster to decrease its capacity. This involves draining the node of any running Pods, deleting the node from the cluster using kubectl, and optionally deleting the node from the cloud provider.
Setting Up Autoscaling on GCE
First, we set up a cluster with Cluster Autoscaler turned on. The number of nodes in the cluster will start at 2, and autoscale up to a maximum of 5. To implement this, we’ll export the following environment variables:
export NUM\_NODES=2
export KUBE\_AUTOSCALER\_MIN\_NODES=2
export KUBE\_AUTOSCALER\_MAX\_NODES=5
export KUBE\_ENABLE\_CLUSTER\_AUTOSCALER=true
start the cluster by running:
./cluster/kube-up.sh
Let’s see our cluster, it should have two nodes:
kubectl get nodes
Run & Expose PHP-Apache Server
kubectl run php-apache \
kubectl get deployment
kubectl run -i --tty service-test --image=busybox /bin/sh
Hit enter for command prompt
$ wget -q -O- http://php-apache.default.svc.cluster.local
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
kubectl get hpa
kubectl get hpa
Conclusion :
Backing up and restoring a Kubernetes cluster can be a complex task, but with TrilioVault for Kubernetes, it becomes a lot easier. By following the steps outlined in this blog post, you can ensure that your Kubernetes cluster is protected from disasters and data loss.
Backing up and restoring a Kubernetes cluster can be a complex task, but with TrilioVault for Kubernetes, it becomes a lot easier. By following the steps outlined in this blog post, you can ensure that your Kubernetes cluster is protected from disasters and data loss.
Subscribe to my newsletter
Read articles from Subho Dey directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Subho Dey
Subho Dey
"DevOps engineer with a passion for continuous improvement and a drive to build better software, faster. I'm a strong believer in the power of collaboration, automation, and agile methodologies to transform the world of software development and delivery. My expertise includes continuous integration and delivery, infrastructure as code, Docker, Kubernetes, and configuration management. Follow along as I share my insights and experiences on Hashnode and let's build better software, together!"