Learning Kubernetes: Week 1 - Core Concepts & cluster Architecture


To be honest, this is not what I learned this week, but a combined effort of what I started a few months ago and what I learned in the last two days
Core-concepts
Cluster Architecture
The purpose of K8s is to deploy your application as containers in an automated fashion so that you can easily deploy any instance of your application
and easily enable communication between anything in your application
Worker Node: host application as containers
Master Node: Manage, plan, schedule, monitor nodes
uses ETCD Cluster to store data, like what applications are being deployed, their time, and all other information in a key-value pair
A kube-scheduler → identifies the right node to place a container on based on the container’s resource requirement, the worker node’s capacity, and any other configurations
Controllers → That take care of different areas
Node Controller → responsible for onboarding new nodes to the cluster
handling situations where nodes become unavailable and are destroyed
Replication Controller → makes sure that the desired number of containers are running at all times in a replication group
Kube API-server → responsible for orchestrating all operations within the cluster
- It exposes the K8s api that external users use to perform management operations on the cluster, as well as various clusters to monitor the state of the cluster and make necessary changes as required
As everything is run as a container, we need a Container Engine to run those containers
DOCKER → Container Engine installed on all the nodes (worker and master)
It doesn’t always have to be Docker; k8s supports other Container engines as well, like containerd or Rocket
kubelet(captain)
It is an agent that runs on each node in a cluster.
It listens for instructions via kube-api-server and deploys or destroys containers on the nodes as required
kube-api-server periodically fetches data from the kubelet to monitor the status of nodes with containers on them
Kube-proxy
service ensures that the necessary rules in-placed on the worker nodes to allow the containers running on them to reach each other
Docker vs Containerd
In the beginning, K8s was built to orchestrate Docker specifically
As K8s grew in popularity, users wanted to be able to use K8s with other container engines like RKT(Rocket)
So Kubernetes came with CRI (Container Runtime Interface)
CRI allows any vendor to work as a container Runtime as long as they adhere to the OCI Standards
OPEN CONTAINER INITIATIVE (OCI)
imagespec → specifications on how an image should be built
runtimespec → standards on how any container runtime should be developed
But at the time, Docker didn’t support OCI standards, as it was built earlier than these standards were introduced, and as it was the dominant container runtime at the time, Kubernetes had to support it
K8s came up with dockershim →a hacky and temporary way to continue to support Docker outside the CRI
Docker includes many things, one of which is the daemon Docker runs on containerd → which supports OCI standards and can run as a runtime on its own, separate from Docker
In v1.24 k8s, removed dockershim completely,
If you don’t require Docker’s other features, you can directly use containerd ( Graduate CNCF member)
Containerd
It has its own CLI called ctr
Not very user-friendly
only supports limited features
for any other way you have to make api calls, which is not very friendly
ctr
ctr images pull <image-name>
ctr run <image-name>
NerdCTL
A better alternative is nerdctl
Provide a Docker-like CLI for ContainerD
nerdctl supports Docker Compose
nerdctl supports the newest features in containerd]
Encrypted container images
Lazy pulling
P2P image distribution
Image signing and verifying
Namespaces in Kubernetes
nerdctl
nerdctl run --name redis redis:alpine
nerdctl run --name webserver -p 80:80 -d nginx
CRICTL
crictl provides a CLI for CRI-compatible container runtimes
Installed separately
Used to inspect and debug container runtimes
- Not to create containers ideally
Works across different runtimes
crictl crictl pull <image-name> crictl images crictl ps -a crictl exec -i -t <container-id> ls crticl logs <contianer-id> crictl pods # list pods
in v1.24
- The dokcershim.sock was replaced by the containerd sock
unix:///run/containerd/containerd.sock
unix:///run/crio/crio.sock
unix:///var/run/cri-dockerd.sock
ETCD
It is a distributed, reliable key-value store that is Simple, Secure, & Fast
key-value store
stores information in the form of keys and values or files for each separate key
You can add additional information in one of the documents without having to change all those documents
The default client that comes with etcd is the etcdctl client
./etcdctl set key1 value1 # creates entry in the DB
./etcdctl get key1 # get value of key
It is a leader-based distributed system. Ensure that the leader periodically sends heartbeats on time to all followers to keep the cluster stable
You should run
etcd
as a cluster with an odd number of membersAny resource starvation can lead to a heartbeat timeout, causing instability of the cluster. An unstable etcd indicates that no leader is elected. Under such circumstances, a cluster cannot make any changes to its current state, which implies that no new pods can be scheduled.
etcdctl
andetcdutl
→ command-line tools to interact with etcd clusters, but they serve a different purposeetcdctl
→ primary CLI client for interacting with etcd over a network- used for day-to-day operations → managing keys and values, administering cluster, checking health, and more
etcdutl
→ an administration utility designed to operate directly on etcd data files, includingmigrating data between etcd versions,
defragmenting the database,
restoring snapshots
validating data consistency
Commands
backup → backup an etcd directory
cluster-health → check the health of the etcd cluster
mk → make a new key with a given value
mkdir → make a new directory
rm → remove a key or a directory
rmdir → removes the key if it is an empty directory or a key-value pair
get → retrieve the value of key
ls → retrieve a directory
set → set the value of akye
sedir → create a new directory or update an existing directory TTL
update → update an existing key with a given value
updatedir → update an existing directory
watch → watch a key for changes
exec-watch → watch a key for changes and exec an executable
member → member add, remove, and list subcommands
user → add, grant, and revoke subcommands
role → role add, grant, and revoke subcommands
KIND | Version |
POD | v1 |
Service | v1 |
ReplicaSet | apps/v1 |
Deployment | apps/v1 |
Kube Controller Manager
A controller is an office or department within the master ship with its own set of responsibilities to take important actions whenever a new “ship” enters, leaves or changes, or is destroyed
These offices are on
Continuous lookout for the status of the ship
Take necessary actions to remediate the situation
In K8s terms, the Kube Controller is a process that continuously monitors the state of various components within the system, and works towards bringing the whole system within a desired state
The Node Controller
The Node Controller is responsible for monitoring the status of nodes & taking the necessary action to keep the application running → It does that via kube-apiserver
The node controller checks the status of the nodes every 5 seconds that the node controller can monitor the health of the nodes
If it stops receiving heartbeat from a node, then it is marked as unreachable, but it waits for 40 seconds before marking it as UNREACHABLE
After the node is marked UNREACHABLE, it gives it 5m to come back up After that, it removes the pods assigned to that node and provisions them on another node if it’s part of a replica set
The Replication Controller
It monitors the status of replica sets and ensures that the desired number of pods are available in all sets
If a pod dies, it creates another one
How do you see these controllers, and where are they located in your cluster?
They are all packaged into a single process known as Kube-Controller-Manager
When you install the Kube-controller-manager, the different controllers get installed as well
Install from the said link and then run it as a service. When you run it, you will get various list of options to choose from, here, you will get things we discussed, like
node-monitor-period
node-monitor-grace-period
pod-eviction-timeout
There is a specific option ‘controllers’ to set which controller to enable
By default, all of them are enabled
How do you view your kube-controller-manager server options?
Depends on how you have set it up. If you have set up via kube-admin tool , it sets up the kube-controller-manager as a pod in the kube-system namespace on the master node
You can see the options within the pod definition file created at
/etc/kubernetes/kube-controller-manager.yaml
In a non-kube-admin setup, you can inspect the options located at the following path :
/etc/systemd/system/kube-controller-manager.service
You can also see the running process and effecting options by searching the process on master nodes
ps -aux | grep kube-controller-manager
Kube Scheduler
responsible for scheduling pods on nodes (only deciding which pod goes on which node) , it doesn’t actually place them there that is the job of kubelet
Kubelet creates pod on the ship
Why need a Scheduler?
Because there are many pods, you want to make sure that the right container goes on the right ship
In K8s , the scheduler decides which node the pods are placed on, depending on certain criteria.
You may have a pod with different resource requirements
You can have nodes in a cluster dedicated to certain applications
Scheduler looks at each pod and tries to find the best node for it
It has a set of memory and CPU requirements
Scheduler goes through two phases to identify the best node for the pod
Filter Nodes
Rank Nodes
Install kube-scheduler
Get the kube-scheduler binary from Kubernetes docs page
Run it as a service
View kube-scheduler options via kubeadm
/etc/kubernetes/mainfests/kube-scheduler.yaml
ps -aux | grep kube-scheduler
Kubelet
It’s like a captain on a ship
Lead all activities on the ship, sole point of contact with the master ship, send back a report at regular intervals
kubelet in the k8s worker node registers the node with the Kubernetes cluster
When it receives instructions to load a container or a pod on the node, it requests the container run-time engine to pull the required image and run an instance
The kubelet then monitors the node and the pod and sends reports to the kube-api server on a regular basis
YOU MUST ALWAYS MANUALLY INSTALL KUBELET, not automatically deploy with kubeadm
Kube Proxy
Within a k8s cluster, every pod can reach every other pod. This is accomplished by deploying a pod networking solution to a cluster
Pod Network: An Internal virtual network that expands to all the nodes in the cluster to which all the nodes or pods connect.
- Through this network, they are able to communicate with each other
Eg→, Web application deployed on the first node and DB on the second
Web App can reach the DB simply by using the IP of the pod
but no guarantee that the IP of the DB pod will always remain the same
Better way for the web app to access the DB is via using a service.
Create a Service to expose the DB application across the cluster
web app can now access the DB using the name of the service
Service also gets an IP address assigned to it
Whenever a pod tries to reach the service, using its IP or name, it forwards the traffic to the DB (backend part)
The service cannot join the pod network as the service is not the actual thing, it is not a container like pod, it is a virtual component that only lives in the kubernetes memory. It does not have any actively listening process
So, how is service accessible across the cluster from any node?
Via Kube-proxy, → a process that runs on each node in a k8s cluster, its job is to look for new services
Every time a new service is created, it creates the appropriate rules on each node to forward the traffic to those services to the backend pods
One way, it create an IP Table rules on each node in a cluster to forward the traffic heading to the IP of the service to the IP of the actual pod
Install kube-proxy
Download binary form k8s release page , download it and run it as a service
Kubeadm tool deploys kube-proxy as pods on each node, in fact it is deployed as daemonset so every node has at least one pod in the cluster
Pods
Assumption:
Docker Images been created
Kubernetes cluster already been setup and running
All services are in running state
With K8s our ultimate aim is to deploy our application in the form of containers on a set of machine that are configured as worker nodes in a cluster
K8s does not deploy containers directly on the worker nodes
Containers are encapsulated into a Kubernetes object known as pods
A pod is a single instance of an application
A pod is the smallest object that you can create in Kubernetes app
pod-definition via YAML
apiVersion: v1 kind: Pod metadata: name: myapp-pod labels: # to mark the pod for later user(can have any number of key-value pairs) app: myapp type: front-end spec: containers: # List/Array - name: nginx-contianer image: nginx
# To create the pod from the file
kubectl create -f <filename>
Controllers → brain behind k8s
- They are the processes that monitor the k8s objects and respond accordingly
ReplicaSets
What is a Replica? Why do we need a replication controller?
If there is a single pod running in our application, if the pod fails, then the entire application will be down
- In order to prevent users from losing access to our application, we would like to have more than one instance of our application at the same time (Fault Tolerance)
High Availability: Replication Controller allows us to be able to run multiple instances of our application at the same time
Can we not use a replication controller if we have a single pod? → No
- Even if we have a single pod, in case that pod fails, the replication controller will help us bring a new pod automatically
Load Balancing & Scaling: We need a Replication Controller to run multiple pods to share the load across them
Eg: If no. of users accessing the app increases, the number of pods will increase. If users further increase and we run out of node space, then the replication controller allows us to run across multiple nodes with multiple pods
Replication controller
- It is the older technology that is being replaced by ReplicaSet
ReplicaSet
new recommended way to setup replication
rc-definition.yaml
apiVersion: v1 kind: ReplicationController metadata: name: myapp-src labels: app: myapp type: front-end spec: template: # pod template metadata: name: myapp-pod labels: app: myapp type: front-pod spec: containers: - name: nginx-container image: nginx replicas: 3
kubectl create -f rc-definition.yaml
kubectl get replicationcontroller
kubectl get replicaset
kubectl get pods
replicaset.yaml
(selector is optional in replicationController but not here)apiVersion: apps/v1 kind: ReplicaSet metadata: name: myapp-replicaset labels: app: myapp type: front-end spec: template: metadata: name: myapp-pod labels: app: myapp type: front-pod spec: containers: - name: nginx-container image: nginx replicas: 3 selector: # help to check what pods are under it as it can also take pods that are not created by this yaml file matchLabels: type: front-pod
Labels and Selectors
The role of the replicaset is to make sure we have a few replicas at any time in the system. In case any pod fails, it deploys a new one at the same time
ReplicaSet is in fact a process that monitors the pods
How does ReplicaSet know which pod to monitor
- Labelling works as a filter to query the pods that we want to monitor
If there are already pods created which we filter and monitor via replicaSet, then why do we need to defiine a template for the pod in the ReplicaSet?
So that in case the ReplicaSet wants to deploy a new pod, it has the information it needs to create one
How to update the replicas from a Replicaset
- Change the number of replicas in the YAML file and then apply
kubectl replace -f replicaset-definitio .yaml
kuebctl scale --replicas=6 -f replicaset-definition.yaml
Setting it via type and name ( this won’t change anything in the definition file
kubectl scale --replicas-6 replicaset myapp-replicaset
Automatically scaling based on load
kubectl delete replicaset myapp-replicaset # Also deletes all underlying PODS kubectl replace -f replicaset-definition.yaml
Deployments
if u want to deploy your application in a production env. With many instances of this application, for obvious reasons
Whenever a new version of the builds is updated on the Docker registry, you would like to upgrade your instances seamlessly.
However, when u want to upgrade your instances, u don’t want to do them all at once, this may impact users accessing your application (Rolling update)
In case any of the update cause some issue in your instance, you would like to Rollback your changes.
Making multiple changes to your environment. You don’t want to apply changes immediately after the command is run; instead, you would like to apply a pause to your environment, make changes, and roll out the changes together
All of these capabilities are available in K8s Deployments
Deployment: Kubernetes Object that comes higher in the hierarchy
- Provides us with the capability to upgrade the underlying instance seamlessly using Rolling Updates (which allow for undo changes, pause and resume changes as required)
How do we create a deployment?
Create a Deployment file, the content of which will be exactly similar to that of a replica set except for the kind: Deployment
deployment-definition.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deployment labels: app: myapp type: front-end spec: template: metadata: name: myapp-pod labels: app: myapp type: front-pod spec: containers: - name: nginx-container image: nginx replicas: 3 selector: # help to check what pods are under it as it can also take pods that are not created by this yaml file matchLabels: type: front-pod
kubectl create -f deployment-definition.yaml
kubectl get deployents
kubectl get all # to see all the created resoruces at once
- This creates a ReplicaSet, which in turn creates pods, so you can view them too
Services
K8s services enable communication between various components
It helps us connect applications together
services make it possible for. te frontend application to be made available to the user
It helps communication between backend and frontend pods and helps in connectivity to an external datasource
Services enable loose coupling between microservices in our Application
Service Type
Nodeport: The service makes an internal port accessible on a port on the node
Cluster IP: The service creates a virtual IP inside the cluster to enable communication between different services
Load Balancer: It provisions a load balancer for our application in a supported cloud provider
Nodeport
A service can help us by mapping a port on the node to a port on the pod
There are 3 ports involved, a port on the Node, where the actual server is running ⇒ Target PORT
port on the service itself ⇒ PORT
- These terms are from the viewpoint of the service
Service → is like a virtual server inside the node
Inside the cluster, it has its own IP address, and that IP address is called the ClusterIP of the service
And finally, we have the port on the node itself, which we use to access the web server externally ⇒ Node PORT
Nodeport can only be in a valid range, which by default is from 30,000 to 32,767
How to create a service?
service-definition.yaml
If you don’t provide a target port, it will be assumed same as the port
If you don’t provide a nodeport, a free value between the range will be allotted
You have multiple port mappings within a service, as ‘ports’ is an array
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
type: Nodeport
ports:
- targetPort: 80
port: 80 # port on service object
nodePort: 30008
selector:
app: myapp
type: frontend # took frmthe pod we want to catch
kubectl create -f service-definition.yaml
kubetctl get services
# when service is created it looks for matching pod with the said label, it then selects all pods as endpoints to forward external traffic to
# it uses.a random alogrith mto select the pod to send hte request on
If pods a distributed across multiple nodes?
- In this case, we have a web application on pods on separate nodes in a cluster
When we create a service, without us having to do any additional configuration, Kubernetes automatically creates a service that spans across all nodes in the cluster and maps the target port to the same node port on all the nodes
This way, you can access your application using the IP of any node in the cluster and using the same port number, which in this case is 30,008
To summarize, in any case, whether it be a single Pod on a single node, multiple Pods on a single node, or multiple Pods on multiple nodes, the service is created exactly the same, without you having to do any additional steps during the service creation.
When Pods are removed or added, the service is automatically updated, making it highly flexible and adaptive.
Once created, you won't typically have to make any additional configuration changes.
Services → Cluster IP
A full-stack application has frontend, backend, db , and datastore pods; they all need to communicate with each other
What is the best way to do so?
Pods have IP addresses assigned to them, but these IPs, as we know, are not static
What if one pod IP needs to connect to the backend service? Which pod would it go to? And who makes that decision?
A k8s service can help us group the pods together and provide a single interface to access the pods
The requests are forwarded to one of the pods under the service randomly
This enables us to easily & effectively deploy a microservices-based application on k8s cluster
Each layer can now scale or move as required without impacting communication
Each service gets an IP and name assigned to it inside the cluster, and that is the name that other pods should use to access the service ⇒ CLUSTER IP
service-definition.yaml
apiVersion: v1 kind: Service metadata: name: back-end spec: type: ClusterIP # default type ports: - targetPort: 80 # backend is exposed port: 80 # service is exposed selector: app: myapp type: back-end
Services → Load Balancer
The services with type Nodeport help in receiving traffic on the ports on the nodes and routing the traffic to the respective ports
But what URL would you give your end users to access the application (you have IPs and port combinations on each pod)
One way to achieve this is to create. new VM for load balancer purpose and install a suitable load balancer on it, like HA proxy or Nginx, then configure the load balancer to route traffic to the underlying nodes
Another method is using the native load balancers of a supported cloud platform, as Kubernetes has support for integrating with the native load balancers of certain cloud providers in configuring that for us
Set the service type to LoadBalancer instead of NodePort
Remember, this only works with supported cloud platforms: GCP, AWS, Azure
For an unsupported environment, it would work exactly like NodePort, where the services are exposed to the high-end port of the nodes
Namespaces
Whatever we do in k8s, we do in a namespace (house)
If we don’t create a namespace, a namespace gets created automatically, “default”. When the cluster is first set up
K8s creates a set of pods and services for internal purposes, such as those required by the network solution, the DNS service, etc.
- To isolate these from the user and to prevent you from accidentally deleting or modifying these services, it creates them under the name-space “kube-system” → also created at cluster startup
Another namespace is kube-public, created by k8s, this is where resources that should be made available to all users are created
You can create your own Ns as well
Each of these ns can have its own set of policies, which define who can do what
- You can also assign a quota of resources to each of these namespaces, that way each Namespace is guaranteed a certain amount and does not use more than its allowed limit
The resources within a namespace can refer to each other simply by using their name
If required, to reach a resource in another namespace, you must append the name of ns to the name of the resource
- Eg→ servicename.namespace.svc.cluster.local
You are able to do this because when a service is created, a DNS entry is added in this format
cluster.local → default domain name of K8s cluster
svc → subdomain of service
kubectl get pods # list pods in default ns
kubectl get pods -n dev # list pods in dev ns
kubectl get pods -namespace=kube-syste
kubectl create -f pod-definition.yml --namespace=dev
# create pod in ns = dev
# you can also add namespace: dev under metadata of pod-definition.yml
kubectl create namespace dev
# to go into another namespace so u don't have to specify ns with each command use this
kubectl config set-context $(kubectl config current-context) --namespace=dev
kubectl get pods --all-namespaces # list all pods in all namespace
namespace-def.yml
apiVersion: v1 kind: Namespace metadata: name: dev
Imperative vs Declarative
Specifying what to do and how to do it is an Imperative approach
Specifying the final destination without going over any step-by-step instructions, the system figures out the right path (specifying what to do, not how to do) is the Declarative Approach
In Kubernetes, this is as follows: there are 2 ways to deploy k8s
Imperatively → with many kubectl commands
- good for learning and interactive experimentation
kubectl edit pod <pod-name> → make changes in the k8s memory object
- Make changes in the pod-definition file and then perform kubectl replace -f nginx.yml
Declaratively → by writing manifests and using ‘kubectl apply`
Latter is good for reproducible deployments
In this approach, instead of creating or replacing the object, we use the kubectl apply command to manage the object
This command is intelligent enough to create an object if it doesn’t exist, and if there are multiple object config files as you would usually, then you may specify it directly as the path instead
That way, all the objects are created at once
If the object exists, make updates to the object
resource-quota.yml
apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: dev spec: hard: pods: "10" requests: cpu: "4" memory: 5Gi limits: cpu: "10" memory: "10Gi
kubectl Apply
The apply command takes into consideration the local configuration file, the live object definition on K8s, and the last applied configuration before making a decision on what changes are to be made
So, when you run the apply command
If the object doesn’t exist, it gets created.
When an object is created, an object configuration, similar to what we created locally, is created within Kubernetes → with additional fields to store the status of the object → live configuration of the object on the k8s cluster
When you run a kubectl apply command, the YAML version of the local object configuration file we wrote is converted to a JSON format, and it is then stored as the last applied configuration.
Going forward, for any updates to the object, all three are compared to identify what changes are to be made to the live object
Once I make changes, → run kubectl apply → live configuration is updated, and then last applied configuration (JSON one) is updated
Why do we need the last applied configuration?
If a field is deleted, and now we run the kubectl apply command, we see the last applied configuration had that field → meaning the field needs to be removed from the live configuration
The last applied configuration helps us figure out what fields have been removed from the local file
We know that the local file is stored on our system, the live configuration is stored on Kubernetes memory, and the Last applied configuration (JSON one) is stored in the live configuration itself under the annotation: kubectl.kubernetes.io/last-applied configuration
. Only Kubernetes apply does this
Subscribe to my newsletter
Read articles from MRIDUL TIWARI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

MRIDUL TIWARI
MRIDUL TIWARI
Software Engineer | Freelancer | Content Creator | Open Source Enthusiast | I Build Websites and Web Applications for Remote Clients.