Kubernetes Learning Week Series 5
Kubernetes Learning Week Series 4
Introduction to Kubernetes Service
This article introduces Kubernetes Service, an essential component of Kubernetes used to manage and scale applications within a cluster.
Key points:
Kubernetes Service provides IP allocation and load balancing for services that scale horizontally or vertically within the cluster.
ClusterIP is the default service type, forwarding requests from Ingress services to the appropriate nodes and pods based on the specified selector.
Headless Services allow applications to communicate directly with specific pods, which is useful for database clusters that need to access specific master nodes.
NodePort exposes a port on each node to allow external access to applications, though it is not recommended for production due to security concerns.
LoadBalancer is an extension of NodePort that utilizes the load balancer provided by the cloud provider where the Kubernetes cluster is deployed.
My added point: The ExternalName service type allows applications inside Kubernetes to access services running outside of the Kubernetes zone.
Kubernetes Silent Pod Killer
This article discusses the ‘invisible OOM termination’ issue in Kubernetes, where child processes inside a container may be terminated by an OOM (Out of Memory) kill without the Kubernetes system being aware. The article provides insights and solutions for this issue.
Key points:
An ‘invisible’ OOM kill occurs when a child process inside a container (any process that is not the main process, PID 1) is killed due to OOM, and this event is not visible to Kubernetes.
Starting from Kubernetes version 1.28, a cgroup v2 feature called ‘cgroup grouping’ is enabled by default, which causes the entire container to be killed if any process inside it is OOM killed.
For certain workloads (such as PostgreSQL or ML workloads), this new behavior may be undesirable, as these workloads handle child process OOM events independently.
Prior to Kubernetes 1.28, you could use tools like mtail to monitor kernel log messages and detect OOM termination of child processes.
To monitor ‘normal OOM termination’ (root process termination), you can use the metric kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}.
When writing Dockerfiles, it is recommended to use the correct ENTRYPOINT / CMD to keep the main workload in the primary process.
Argo CD and Flux CD
This article provides a detailed comparison of two popular GitOps tools, Argo CD and Flux CD, discussing their features, use cases, and suitability for different scenarios.
Key points:
Argo CD is more popular than Flux CD, likely due to its user-friendly graphical interface and additional features like pod logs, manifest editing, and custom actions that go beyond standard GitOps practices.
Flux CD focuses on implementing the most standard Kubernetes patterns and can serve as a foundation for building other systems, with many third-party solutions based on it.
Argo CD has a more complex deployment model, calling a configuration plugin to render manifests before applying them to the cluster, which can sometimes lead to drift issues with Helm charts. Flux CD, on the other hand, uses native Kubernetes tools for deployment, making it simpler and more straightforward.
Argo CD has its own access control model, while Flux CD leverages the standard Kubernetes RBAC model. Argo CD’s model makes it easier to restrict developers’ access to Kubernetes.
Argo CD is easier to extend with custom plugins, while extending Flux CD requires more effort.
Argo CD stores all resources in a single namespace, while Flux CD can operate on resources using labels, making it more scalable.
The choice between Argo CD and Flux CD depends on the specific problems you aim to solve, whether you need a tool to streamline cluster management or facilitate developer application delivery.
Understanding Docker Layers Through Examples: The Impact of the RUN Instruction
https://pointbase.hashnode.dev/understand-docker-layers-by-example-run-instructions-impact
This article discusses the impact of using multiple RUN instructions in a Dockerfile and the importance of minimizing their use. It demonstrates how each RUN instruction creates a new layer in the Docker image, which can sometimes introduce risks when combined with other poor practices.
Key points:
Minimizing the number of RUN instructions in a Dockerfile is essential for reducing the final image size, improving build and deployment performance, and enhancing security by avoiding the creation of unnecessary layers that may contain sensitive or redundant data.
The best practice is to combine related commands into a single RUN instruction, thereby reducing the number of layers and optimizing the image’s efficiency.
The article provides examples of two Dockerfiles, one with a single RUN instruction and another with two RUN instructions, and demonstrates how to use the dive tool to inspect the different layers of the resulting images.
It shows how to retrieve files from Docker layers and explains how unnecessary layers may unintentionally retain sensitive information.
BGP, Cilium, and FRR: ToR Architecture
https://blog.miraco.la/bgp-cilium-and-frr-top-of-rack-for-all
This article discusses how to use Cilium’s BGP functionality to expose services or export Pod CIDRs and advertise them to peers. The author explains the setup process, including using FRR on a UDM-SE and Raspberry Pi running K3S and Cilium.
Key points:
The author explains the concept of Top-of-Rack (ToR) and how it can be used to connect directly to applications without a load balancer or NodePort.
The author showcases the FRR configuration, which includes setting up BGP and using route maps to allow all connections.
The author explains the steps to install Cilium on a K3s cluster and disable flannel, servicelb, and network-policy.
The author provides an example of a CiliumBGPPeeringPolicy that allows Pod CIDR advertisement and sets the local ASN.
The author demonstrates how to verify the setup by checking BGP peers and making test requests to the application.
A Tragedy Triggered by a Single Kubernetes Command
https://zouyee.medium.com/a-tragedy-caused-by-a-single-kubernetes-command-7b6126b06513
This article discusses an issue that arose during the migration from cgroup v1 to cgroup v2 in Kubernetes, where a single command led to a tragedy. It provides technical background on how container metrics are generated, how Kubernetes integrates container monitoring, and how CPU load is calculated. The article also traces the root cause of the issue, explaining the roles of cAdvisor and Kubelet in this process.
Key points:
Due to CentOS reaching end-of-life (EOL), the team was busy migrating to a new operating system and transitioning from cgroup v1 to cgroup v2.
In a cgroup v2 environment, using the -enable_load_reader configuration causes kubelet to crash.
cAdvisor is a powerful monitoring tool for Docker containers, and Kubelet uses it to collect container resource and performance metrics on nodes.
cAdvisor supports enabling or disabling specific metrics, including CPU load metrics.
The generation of CPU load metrics is controlled by the -enable_load_reader command-line flag in cAdvisor.
Kubelet embeds the cAdvisor service and exposes all related runtime metrics in Prometheus format under /stats/.
The container_cpu_load_average_10s metric, which reflects the average CPU load, is calculated using a formula that combines the previous value with the current collection value.
The issue relates to the lack of cgroup v2 handling in the cgroupstats_build function, which is used to retrieve CPU load information.
The kernel community recommends using Pressure Stall Information (PSI) instead of the CGROUPSTATS_CMD_GET netlink API to obtain CPU statistics.
Subscribe to my newsletter
Read articles from Nan Song directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by