Understanding Kubernetes

What is Kubernetes ? (or K8s for short)

Definition

Before diving into the definition, let's take a moment to appreciate the weird-ass name "Kubernetes." It comes from the Greek word "κυβερνήτης" (kubernētēs), meaning helmsman or pilot - the one who steers the ship.

So, what ship are we referring to? Containerized applications. Kubernetes is the captain 🧑‍✈️ guiding or orchestrating them in the world of infrastructure.

It is technically an open source container orchestration tool developed by Google. 🌐

It helps you manage containerized applications in different deployment environments like physical, virtual, or cloud. ☁️

Why we need it?

Rise of Microservices - The move to microservices architecture has increased the use of container technologies, so we need Kubernetes to orchestrate and manage these containers.

Features of Kubernetes

  • High Availability and Zero Downtime ⏱️

  • Seamless Scalability and High Performance 🚀

  • Robust Disaster Recovery with Backup and Restore 🔄

Kubernetes Main Components

  1. Pod

    Consists of multiple containers with one of the container being the main application.

    It is an abstraction over container.

    Pods communicate with each other using IP address but pods can die easily, so pods automatically replace themselves with different IP address, each pod gets it’s own IP address. Changing the IP addresses every time creates a tedious job, to counter this, we use service & Ingress.

  2. Service

    Each pod gets a ‘service‘ that has a permanent IP address, so if the pod dies, we don’t have to worry about the communication between pods.

    Service has 2 functionalities

    - Permanent address of pods

    - Load balancing ⚖️

    Example to open the service on browser

    http://124.89.101.2:8080

    Where 124.89.101.2 is the Node address & 8080 is the port number of the pod.

    btw, this is an external service, exposing a service externally like this is typically only recommended for development or testing purposes and not for production.

    Fundamentally there are two types of Services

    1. Internal Service - Which are used to communicate within the Kubernetes cluster and among the pods.

    2. External Service - Used for testing purposes, the URL is exposed and can be accessed through a browser.

  1. Ingress

    Kubernetes Ingress is an API object that manages external access to services within the K8s cluster, typically HTTP to HTTPS. 🌐 It provides a way to expose services to the outside world, route requests based on defined rules, and enable features like load balancing ⚖️, SSL termination 🔒, and path-based routing. 🛤️

  2. ConfigMap

    External configuration of the application like DB_URL or any environment variables. It is a key-value store used to store configuration data separately from application code.

    Keeps these settings separate from your app's code, so you can change them anytime without needing to rebuild or restart your app.

    Note - Do not put credentials into ConfigMap 🚫🔑

  3. Secret

    It is like a ConfigMap but designed specifically for storing sensitive data, such as passwords, API keys, or tokens that should be base64 encoded. 🔐 It keeps this information safe by encoding it and restricting access, so your app can use it securely.

    ⚠️This built-in security mechanism is not enabled by default.

  4. Volume

    Just like with Docker containers, if you restart them, the data is lost, and similarly, the pods lose data too. To counter this, we use Kubernetes Volume.

    A volume allows data to persist. It can come from different sources, like the host machine, cloud storage, or network drives, and is shared across containers in a pod.

    ⚠️K8 doesn’t manage data in the cluster.

  5. Deployment

    K8 creates cluster replications in case one node or cluster dies the service, which acts as a load-balancer starts up the least busiest node to take it’s place.

    In order to create those replicated nodes, we need to define them. So, we use “Deployments“ a blueprint for your pods.

    You can scale up / scale down the number of pod replications.

    In practice you would mostly work with Deployments not pods.

    DB’s can’t be replicated via deployment

  6. Statefulset

    This is designed to maintain the state of individual pods across restarts, providing persistent storage among multiple DB pods. 💾

    Deployment for State LESS apps like a normal web application 🌐

    Statefulset for State FULL apps like Databases 🗄️

    Deploying the Statefulset is very tedious work, so basically, most people host the DB outside of the K8 cluster.

Overview of Main components of K8

  • Pod - Abstraction of Containers

  • Service - Communication between Pods

  • Ingress - Route traffic into cluster

  • ConfigMap & Secret - External configuration

  • Volume - Data persistence

  • Deployment - Pod Blueprints with replicating mechanism for normal use case.

  • Statefulset - A pod blueprint for applications handling databases.

With these components, we can build a pretty powerful Kubernetes Cluster.

Kubernetes Architecture

There are two types of nodes Kubernetes operates on,

  • Slave / Worker Node 🛠️

  • Master Node 🧠

Worker Node ⚒️

Each node has multiple pods in it.

Worker nodes do the actual work

3 Processes must be installed on every node.

  1. Container runtime / Docker

  2. Kubelet - Interacts with both, the container & node. Kubelet also starts the pod with the container inside.

  3. Kube Proxy - Forwards requests of services

So how do you interact with this cluster? How to schedule pod, monitor, restart, join a new node?

Answer - All these managing processes is done by Master Node.

Master Node 🧠

4 Processes must run on every master node.

  1. API Server - can be used to interact with the client (K8 dashboard, Kubelet, K8 API) It is a cluster gateway, also acts as a gatekeeper for authentication.

    You can also make an update or query through API Server.

    It is an only entry point to the cluster.

  2. Scheduler - Used for scheduling pods.

    Scheduler just decides on which node the new pod should be scheduled.

  3. Controller manager

    Detects cluster state changes, example - if the pod crashes or dies, It tries to recover as soon as possible.

  4. Etcd

    It is the cluster brain, cluster changes get stored in key-value format.

    How is etcd the brain? - Data.

    Scheduler & controller Manager takes the data from etcd as it stores them.

    Application data is not stored in etcd.

When there are multiple master nodes, API server is load balanced, etcd has distributed storage across all Masters.

Minikube 📦 & Kubectl

Minikube - Virtual box for K8 📦

It is a test/local cluster setup where Master and Node processes run on ONE machine with docker pre-installed.

Minikube creates a virtual box on your laptop, Node runs in that virtual box. It is a ONE NODE K8 cluster for testing purpose only.

Kubectl - Command-line tool for K8 cluster

There are 3 ways to access the entry point of the K8’s cluster i.e. API Service

  • Kubernetes UI Dashboard - A web-based interface for managing and monitoring the cluster.

  • API - Directly interacting with the Kubernetes API via RESTful endpoints.

  • CLI (Kubectl) - The command-line interface is widely used for its flexibility and scripting capabilities.

The CLI is the most powerful compared to other two.

Kubectl not just works for Minikube cluster but also Cloud clusters like AWS EKS, Azure AKS etc.

Installation ℹ️

You can install Minikube from here🔗, and Kubectl from here🔗.

After installation of K8’s Minikube and Kubectl, run these commands to verify the installation.

minikube start --driver=docker or minikube start --driver=hyperv

Here, Minikube needs to run in a virtual hypervisor. You can run it on Docker or any other hypervisor.

minikube status

kubectl get nodes

Kubectl CLI - For configuring the Minikube cluster.
Minikube CLI - For start up or deleting the cluster.

Main Kubectl Commands 🧑🏼‍💻

Basic CRUD operations using Kubectl

Note - We don’t create pods in K8 instead we use an abstraction called “Deployment“ as a pod is the smallest unit in the K8 cluster.

  1. kubectl get nodes ➡️ Gets the status of nodes

  2. kubectl get pod ➡️ Gets all the pods

  3. kubectl get services ➡️ Gets all the services

  4. kubectl create deployment nginx-depl --image=nginx ➡️ Here deployment is the component and an abstraction of pod. nginx-depl is the name given to the pod and --image=nginx is pulling the docker image of nginx from DockerHub

  5. kubectl get deployment ➡️ Gets all the deployment units

  6. kubectl get replicaset ➡️ Gets all the replicas of the pod (Replicaset manages replicas of the pod)

  7. kubectl edit deployment nginx-depl ➡️ Will take you to the configuration YAML file opened in an editor where all the default values are present. The nginx-depl is the file name here, you have to replace it with your deployment name.

  8. kubectl delete deployment nginx-depl ➡️ Deletes the deployment

Debugging a Pod

  1. Kubectl logs <POD_NAME> ➡️ Gives the overview about the pod status overtime, here <POD_NAME> can be obtained from kubectl get pod

  2. kubectl describe pod <POD_NAME> ➡️ Gives a detailed description of the pod status overtime.

  3. kubectl exec -it <POD_NAME> --bin/bash ➡️ Enter inside the pod through bash terminal -it stands for interactive terminal
    or

    kubectl exec -it <POD_NAME> --bin/sh ➡️ Enter inside the pod through shell terminal

  4. kubectl delete -f nginx-depl.yaml ➡️ Deletes the deployment file. (-f stands for file) nginx-depl as usual is the example file name.

  5. nano nginx-depl.yaml ➡️ Nano is a simple and easy-to-use text editor, we can use this command to open our yaml file for editing or debugging.

  6. Kubectl apply -f nginx-depl.yaml ➡️ After making changes in the yaml file you have to enter this command, so that the changes take place in the pod.

YAML Configuration File

There are 3 parts of config file:

  1. Metadata - Consists of name, label, etc.

  2. Specification - Attributes of spec are specific to the kind whether it’s deployment, service or any other component.

  3. Status - Automatically generated & added by K8

apiVersion: apps/v1  # Specifies the API version for Deployment.
kind: Deployment     # Indicates this resource is a Deployment.
metadata:
  name: mongodb-deployment  # Name of the Deployment, used to identify it.
spec:
  replicas: 1  # Defines the number of pod replicas to create.
  selector:
    matchLabels:
      app: mongodb  # Ensures the Deployment manages pods with this label.
  template:
    metadata:
      labels:
        app: mongodb  # Labels applied to pods created by this Deployment.
    spec:
      containers:
      - name: mongodb  # Name of the container inside the pod.
        image: mongo:4.2.1  # Specifies the MongoDB image and version to use.
        ports:
        - containerPort: 21017  # The port MongoDB listens on within the container.

Desired state is always = Actual state (K8s is self healing)

Where does the status data come from? Etcd - the brain, it holds the current status of K8s components.

Note - YAML is strict with indentations

In spec there’s an attribute called template which further has it’s own metadata & spec section.

Under spec of the template is the blueprint of pod, with container Image, name, port, etc.

Connections are established using labels & selectors. Labels are present in metadata and Selectors are present in spec.

To get a detailed information & the IP address of the pod:

kubectl get pod -o wide - o stands for output

To get the deployment in YAML format:

kubectl get deployment nginx-depl -o yaml - With this you also get the status.

Mini Project 💻

Now, let’s put all that we’ve learnt to practice!

Below diagram shows the flow of data between the browser and the pods. For demonstration purpose I have used MongoDB and Mongo Express Container application.

Here, the browser sends a request to the Mongo Express Dashboard, for example, to create or update a table or column.

To create this mini project, we'll start in reverse order by first setting up the components for MongoDB and then for Mongo Express.

MongoDB

First, we write the YAML code for MongoDB deployment, and save the name mongodb-deployment.yaml

Remember, Pod = Deployment

apiVersion: apps/v1 # Specifies the API version for the Deployment resource
kind: Deployment # Declares this resource as a Deployment
metadata:
  name: mongodb-deployment # Names the Deployment as mongodb-deployment
  labels:
    app: mongodb # Labels used for identifying resources
spec:
  replicas: 1 # Ensures only one replica (pod) of MongoDB is running
  selector:
    matchLabels:
      app: mongodb # Matches pods with the label app: mongodb
  template:
    metadata:
      labels:
        app: mongodb # Assigns the app: mongodb label to the pods
    spec:
      containers: # Container Blueprint starts from here
      - name: mongodb # Name of the container
        image: mongo:4.2.1 # Specifies the MongoDB image version
        ports:
        - containerPort: 21017 # MongoDB container listens on this port
        env:
        - name: MONGO_INITDB_ROOT_USERNAME # Environment variable for MongoDB root username
          valueFrom:
            secretKeyRef:
              name: mongodb-secret # Refers to the Kubernetes Secret containing the username
              key: mongo-root-username # The specific key in the Secret for the username
        - name: MONGO_INITDB_ROOT_PASSWORD # Environment variable for MongoDB root password
          valueFrom:
            secretKeyRef:
              name: mongodb-secret # Refers to the Kubernetes Secret containing the password
              key: mongo-root-password # The specific key in the Secret for the password
--- # These lines signify change in 'kind' of component.
apiVersion: v1 # Specifies the API version for the Service resource
kind: Service # Declares this resource as a Service
metadata:
  name: mongodb-service # Names the Service as mongodb-service
spec:
  selector:
    app: mongodb # Connects to pods with the app: mongodb label
  ports:
    - protocol: TCP # Uses TCP protocol for communication
      port: 27017 # Exposes the Service on port 27017
      targetPort: 27017 # Maps to the container's port 27017

The above file includes both the components Deployment and Internal Service divided by ---

apiVersion: v1 # Specifies the API version for the Secret resource
kind: Secret # Declares this resource as a Secret
metadata:
  name: mongodb-secret # Names the Secret as mongodb-secret
type: Opaque # Indicates the Secret contains arbitrary user-defined data
data:
  mongo-root-username: dXNlcm5hbWU= # Base64-encoded value for the MongoDB root username
  mongo-root-password: cGFzc3dvcmQ= # Base64-encoded value for the MongoDB root password

The above file is a Secret file for storing the databases’ root username and root password that is referred in the main deployment file.

You can make the Base64-encoded value through your Linux terminal -

echo -n <USERNAME OR PASSWORD> | base64

After writing both deployment and secret, first we need to apply secret and then deployment due to it’s referencing.

kubectl apply -f mongodb-secret.yaml

kubectl apply -f mongodb-deployment.yaml

Mongo Express

Creating the Mongo Express deployment is similar to creating the MongoDB deployment. However, I used the Mongo Express 0.49 container image and different environment variables.

apiVersion: apps/v1 # Specifies the API version for the Deployment resource
kind: Deployment # Declares this resource as a Deployment
metadata:
  name: mongo-express # Names the Deployment as mongo-express
  labels:
    app: mongo-express # Labels used for identifying resources
spec:
  replicas: 1 # Ensures only one replica (pod) of Mongo Express is running
  selector:
    matchLabels:
      app: mongo-express # Matches pods with the label app: mongo-express
  template:
    metadata:
      labels:
        app: mongo-express # Assigns the app: mongo-express label to the pods
    spec:
      containers:
      - name: mongo-express # Name of the container
        image: mongo-express:0.49 # Specifies the Mongo Express image version
        ports:
        - containerPort: 8081 # The container listens on this port
        env:
        - name: ME_CONFIG_MONGODB_ADMINUSERNAME # Environment variable for the MongoDB admin username
          valueFrom:
            secretKeyRef:
              name: mongodb-secret # Refers to the Kubernetes Secret containing the username
              key: mongo-root-username # The specific key in the Secret for the username
        - name: ME_CONFIG_MONGODB_ADMINPASSWORD # Environment variable for the MongoDB admin password
          valueFrom:
            secretKeyRef:
              name: mongodb-secret # Refers to the Kubernetes Secret containing the password
              key: mongo-root-password # The specific key in the Secret for the password
        - name: ME_CONFIG_MONGODB_SERVER # Environment variable for the MongoDB server URL
          valueFrom:
            configMapKeyRef:
              name: mongodb-configmap # Refers to the ConfigMap containing the database URL
              key: database_url # The specific key in the ConfigMap for the URL
--- # These lines signify change in 'kind' of component.
apiVersion: v1 # Specifies the API version for the Service resource
kind: Service # Declares this resource as a Service
metadata:
  name: mongo-express-service # Names the Service as mongo-express-service
spec:
  selector:
    app: mongo-express # Connects to pods with the app: mongo-express label
  type: LoadBalancer # Exposes the Service externally via a LoadBalancer
  ports:
    - protocol: TCP # Uses TCP protocol for communication
      port: 8081 # Exposes the Service on port 8081
      targetPort: 8081 # Maps to the container's port 8081
      nodePort: 30000 # Assigns a static NodePort (30000) for external access

The service defined in this YAML file is external, by just specifying the type to LoadBalancer and adding an extra attribute under ports called as **nodePort (**More on this soon)

Note - the nodePort will work only between the ports 30000 - 32767

apiVersion: v1
kind: ConfigMap
metadata:
  name: mongodb-configmap
data:
  database_url: mongodb-service

The above is the ConfigMap YAML file, where the DB URL refers to the MongoDB internal service. 🗂️

Like Secret, ConfigMap should also be applied before the pod/deployment.

Finally, to run the mongo-express on your browser after applying the deployment files, use this command in the terminal -

minikube service mongo-express-service

This command will start the test node in Minikube and provide you with the IP address. You can use this IP address in your browser to access the Mongo Express dashboard running on the Kubernetes cluster. 🌐

The image below shows an overview of how data flows in the Kubernetes cluster.

That’s it!

Conclusion 🎉

If you've made it this far, congratulations! 🎊 You've gained impressive knowledge about container orchestration with Kubernetes. I hope this article helped you understand the basics of managing multiple pods as an abstraction of containers. I'm planning to write another article on Kubernetes that will explore deeper concepts like K8s Ingress, Namespaces, Helm, Volumes, Statefulset, and more. 🚀

Stay relevant and keep learning! 📚

Adios 👋

0
Subscribe to my newsletter

Read articles from Shah Abul Kalam A K directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shah Abul Kalam A K
Shah Abul Kalam A K

Hi, my name is Shah and I'm learning my way through various fascinating technologies.