Introduction to Kubernetes monitoring

With the growing need for scalable applications, Kubernetes emerged as the standard for managing containerized workloads and services. It makes deploying and running applications on distributed instances easy, but monitoring the infrastructure can be challenging.

Kubernetes monitoring is the practice of tracking and observing the performance, health and behavior of your applications and the infrastructure providing it. It involves collecting and analyzing metrics and logs to help you detect and troubleshoot issues, and even optimize your clusters for better resource management.

But being such a complex environment as it is, various tools have arisen to address this issue. Here we’ll explore the advantages and differences between the main solutions and hopefully help you choose according to your needs.

Key metrics to monitor

First, let’s divide our metrics into two groups:

Resource Utilization

These include CPU, memory and disk usage at the cluster, node, pod and container levels, and help you make decisions about decreasing or increasing the size of your cluster. It is also important to monitor your cluster with more general metrics such as node availability and health.

Cluster is a set of nodes that run containerized applications. It is the highest level of abstraction in Kubernetes.
Node is a physical or virtual machine that is part of your cluster. It can be a virtual machine in the cloud, for instance
Pod is a group of one or more containers that share storage and network resources. It is the smallest deployable unit in Kubernetes.
Container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

Application Performance

These will depend on the type of your application and business. For example, an API will provide metrics like response time, request latencies, error rates and throughput.

With that in mind, let’s get to know some of the solutions we can use.

Tools

Kubernetes Dashboard

https://github.com/kubernetes/dashboard

The Kubernetes Dashboard is a web-based user interface made for monitoring and managing Kubernetes clusters. You can access essential information such as CPU and memory utilization, deploy and manage applications running in the pods, and change the amount of resources in the cluster.

It gives you a basic overview of your cluster and it makes it easy to execute some actions, while it is maintained by the Kubernetes community.

But being that simple also means it doesn’t have many options for visualizations, and it also does not have advanced resource metrics.

cAdvisor

https://github.com/google/cadvisor

cAdvisor is an open-source tool developed to monitor containers, and since Kubernetes is a container orchestrator, we can use it too. It can help you collect, process and export container metrics such as CPU and memory usage. By default, it exists on every Kubernetes node, and it can even expose Prometheus metrics. It is one of the more basic Kubernetes-native monitoring tools.

It is built into Kubernetes and easy to use, but it is also basic and has limited functionality. It is usually used together with Prometheus and Grafana.

Prometheus

https://prometheus.io/

Prometheus is the number one solution when we talk about open-source monitoring in general, and it is the main standard for monitoring Kubernetes, being part of the Cloud Native Computing Foundation (CNCF).

Prometheus is divided into three components: the server, the alert manager and the exporters. The exporters are responsible for creating and exporting your metrics, while the server extracts the metrics from exporters and stores them in a database.

It has an easy-to-use query-based system, built-in alerts, and a large community.

But it doesn’t have a visualization interface, so it is common to use it with Grafana, which is another open-source project and has some built-in dashboards for Kubernetes, besides being able to make the visualizations you want.

ELK stack (and OpenSearch)

https://www.elastic.co/pt/elastic-stack

https://opensearch.org/

The ELK stack used to be an open-source monitoring solution for Kubernetes, but Elastic decided to close it with proprietary licenses.

It is an acronym for ElasticSearch (a database engine for storing and searching data), Logstash (used to capture and process logs, to later send them to ElasticSearch) and Kibana (a data visualization tool).

AWS forked ElasticSearch and Kibana to create OpenSearch and OpenSearch Dashboard, and for now, they are still relatively similar. The main advantage of choosing the open-source option is that it has some security and analysis features that are paid in the ELK stack.

Both these options have a good community and are easy to deploy and use with Kubernetes, while providing the capability to do rich analysis.

The main disadvantage is that they can be difficult to maintain at scale, often being used together with Apache Kafka for buffering data when we have massive amounts of it. And although the closed source has a free tier, you'll need to pay to access some features.

Datadog

https://www.datadoghq.com/

Going away from the open-source options, Datadog is a full-stack monitoring solution.

It has great infrastructure, security and application monitoring features, end-to-end. You can monitor requests, traces, logs and even correlate all these different sets of data to create insights. And of course, you’ll have every metric about your resource utilization.

Its initial setup may be a bit complex (it requires some work on editing some configuration files), but once you do it it will get data from all over your architecture - which is great but can also hurt your budget if you’re getting data that you won’t use.

Dynatrace

https://www.dynatrace.com/

Dynatrace is also a full-stack - and paid - monitoring solution. It can be used to monitor the availability and health of everything in your Kubernetes clusters, while it enables you to unify monitoring from a big pool of tools (such as other services in AWS or Google Cloud, for example).

It is easy to set up and use, and it is great to track metrics from complex and distributed systems. But to achieve all that, it requires a good investment as well. If you don’t want to set up your monitoring (or the infra it needs) yourself, it is a good option.

Usually, it is chosen over Datadog when the priority is application monitoring, over infrastructure resources usage.

Wrapping up

As we saw, there are many options for monitoring Kubernetes, and the best one will depend on your needs. We've introduced you to some of the main tools, but there are many others that you can explore.

Now that you know the basics, you can choose the one that fits your needs and start monitoring your Kubernetes clusters.

Comparing monitoring solutions for Kubernetes