HA Kubernetes Cluster and Kubadm

Rohit PagoteRohit Pagote
10 min read

Hosting Production Applications

  • High availability multi node cluster with multiple master nodes (imp)

  • Kubeadm or GCP or Kops on AWS or other supported platforms

  • Up to 5000 nodes

  • Up to 150,000 PODs in the cluster

  • Up to 300,000 total containers

  • Up to 100 PODs per node

K8s on local machine

Minikube

  • Minikube deploys a single node cluster easily.

  • It relies on one of the virtualization software like Oracle VirtualBox to create virtual machines that run the Kubernetes cluster components.

  • Minikube provisions VMs with supported configuration by itself.

Kubeadm

  • Kubeadm tool can be used to deploy a single-node or a multi-node cluster real quick.

  • But for this, you must provision a required host with supported configuration yourself.

  • Kubeadm tool expects the VMs provisioned already.


Configure HA

High Availability in Kubernetes

  • What happens when you lose the master node in your cluster? As long as the worker nodes are up and containers are alive your applications are still running.

  • User will be able to access the application until the things starts to fail.

  • Ex: A container or a pod on the worker node crashed. Now if the pod is a part of replicaset, the the replication controller on the master need to instruct the worker to load a new pod, but the master is not available and so are the controllers and schedulers on the master. There is no one to create a pod and no one to schedule it on nodes. Similarly since the KubeAPI server is not available you cannot access the cluster externally through kubectl tool or API for management purposes.

  • Which is why you must consider multiple master nodes in a high availability configuration.

  • Running a single node control plane could lead to single point of failure of all the control plane components.

  • A high availability configuration is where you have redundancy across every component in the cluster so as to avoid a single point of failure.

Configure HA

  • The master node hosts the control plane components:

    • ETCD Cluster

    • Kube API Server

    • Kube Controller

    • Kube Scheduler

  • In a HA setup, with a additional master node, you have the same components running on the new master as well.

  • The nature of each control plane components differs when deployed as multiple copies across nodes. Few components use leader-election while few use load-balancer.

Kube API Server

  • The API server is a stateless application which primarily interacts with ETCD data store to store and retrieve the information of the cluster. They work on one request at a time.

  • So the API server on all nodes can be alive and running at the same time in an active-active mode.

  • kubectl utility talks to the API server to get things done and we point the kubectl utility to reach the master node using --server option in kubeconfig file.

  • Now with two API server, we can send the request to either one of them, but we shouldn’t be sending the same request to both of them.

  • For that purpose, we use load balancer configured in front of the master node that spilt traffic between API servers.

  • We then point the kubectl utility to that load balancer.

Kube Scheduler and Kube Controller Manager

  • Kube Scheduler involves in pod scheduling activities and only one instance can make decision at a time.

  • Kube Controller Manager consists of controllers like the replication controller that is constantly watching the state of pods and taking the necessary actions like creating a new pod when one fails, etc.

  • In multiple instances of this run in parallel, then they might duplicate actions resulting in more pods than actually needed.

  • The same is true with Scheduler.

  • Therefore, they must not run in parallel. They run in an active-standby mode.

  • This is achieve (active-standby mode) using leader-election method/process.

  • Process:

    • Consider controller manager for this instance.

    • When a controller manager process is configured, you may specify a leader elect option which is by default set to true

      kube-controller-manager --leader-elect true [other options]

    • When the controller manager process starts it tries to gain a lease or a lock on an endpoint object in Kubernetes named as kube-controller-manager-endpoint.

    • Whichever process first updates the endpoint with its information gains the lease/lock and becomes the active. The other becomes passive/standby.

    • It holds the lock for the lease-duration specified using the leader-elect-duration option (default 15sec).

    • The active process then renews the lease every 10sec which is the default value for the option leader-elect-renew-deadline

    • Both the processes try to become the leader every 2sec set by the leader-elect-retry-period.

      kube-controller-manager --leader-elect true [other options]

      --leader-elect-lease-duration 15

      leader-elect-renew-deadline 10s

      leader-elect-retry-period 2s

    • The scheduler follows the similar approach and has the same command line options.

ETCD

  • With ETCD, there are two topologies that you can configure in Kubernetes.

  • Stacked Topology:

    • The ETCD data store resides or a part of same control plane node.

    • It is easier to setup and manage and requires fewer nodes.

    • It has risk during failures.

  • External ETCD Topology:

    • The ETCD data store is separated from the control plane nodes and run on its own set of servers known as ETCD servers.

    • Less risky as a failed control plane node does not impact the etcd cluster and the data it stores.

    • It is harder to setup and manage and requires more servers.

  • We specify the list of ETCD servers in a KubeAPI Server configuration.

ETCD in HA

  • ETCD also follows the leader-election process when working in HA environment.

  • Let’s say we have 3 ETCD cluster/server, ETCD does not process the writes on each node. Instead, only one of the instances is responsible for processing the writes.

  • Internally, the nodes elects a leader among them. One node becomes the leader and the other node becomes the followers.

  • If the writes came in through a leader node, then the leader processes the write and make sure that the other nodes (followers) are sent a copy of the data.

  • If the writes came in through any of the other follower nodes, then they forward the writes to the leader internally and then the leader processes the writes.

  • The writes only considered complete if the leader gets consent from the other members in the cluster (followers).

  • ETCD implements distributed consensus (leader election) using RAFT protocol.

  • A write is considered to be complete if it can be written on the majority of the nodes in the cluster.

    Majority = N/2 + 1, N=no of control plane nodes

  • Majority is also known as Quorum.

  • Quorum is a minimum number of nodes that must be available for the cluster to make a successful write.

  • It is recommended to have minimum of 3 instances of ETCD cluster running in a cluster in order to achieve HA (Quorum of 3 is 2). It has a fault tolerance of 1 node (1 node is working).

  • Fault tolerance is no of instances minus quorum (see below table).

  • When deciding on a number of master nodes, it is recommended to select an odd number, 3 or 5 or 7.

  • Even number of instances (2, 4, 6) have the chances of leaving the cluster without quorum in certain network partition scenarios.


Deployment with Kubeadm

  • We have 3 VM, 1 master node and 2 worker nodes.

Steps:

0
Subscribe to my newsletter

Read articles from Rohit Pagote directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rohit Pagote
Rohit Pagote

I am an aspiring DevOps Engineer proficient with containers and container orchestration tools like Docker, Kubernetes along with experienced in Infrastructure as code tools and Configuration as code tools, Terraform, Ansible. Well-versed in CICD tool - Jenkins. Have hands-on experience with various AWS and Azure services. I really enjoy learning new things and connecting with people across a range of industries, so don't hesitate to reach out if you'd like to get in touch.