Kubernetes Log Collection with Fluentd DaemonSet

Yesterday, on Day 44 of my #90DaysOfDevOps journey, I introduced the EFK stack - Elasticsearch, Fluentd, and Kibana- and explained how it helps turn raw container logs into structured, searchable insights, complementing the metrics and alerts already in place.

Today on Day 45, I put that into action by deploying Fluentd as a DaemonSet in my Kubernetes cluster. Fluentd now runs on each node, collects logs from /var/log/containers/, and forwards them to Elasticsearch. With a custom ConfigMap in place and logs flowing in, I’m all set to visualize them tomorrow using Kibana dashboards.

What is Fluentd?

Fluentd is an open-source data collector and log forwarder designed to unify data collection and consumption across various sources and destinations. It was originally developed by Treasure Data and is part of the Cloud Native Computing Foundation (CNCF). Fluentd is widely used in cloud-native environments, especially Kubernetes, for log aggregation, transformation, and routing.

Key Responsibilities of Fluentd:

Collect Logs
- Fluentd can gather logs from a wide variety of sources:
  - Local file paths (e.g., /var/log/containers/)
  - Syslog
  - Containers
  - Cloud services
  - Application output streams
- In Kubernetes, it commonly reads logs from the container log files written to disk (/var/log/containers/*.log), which are symlinked from Docker or container runtimes.
Parse & Transform Logs
- Fluentd supports a plugin-based architecture with over 500 plugins.
- It can:
  - Parse logs (JSON, regex, CSV, etc.)
  - Add or remove fields
  - Mask sensitive data
  - Convert timestamps
  - Format logs for specific outputs (e.g., logstash_format for Elasticsearch)
- This transformation makes logs structured and consistent, which is critical for searchability and analysis.
Route Logs
- Fluentd can send logs to multiple outputs, including:
  - Elasticsearch (for search and dashboards via Kibana)
  - Amazon S3 (for storage and archival)
  - Kafka (for real-time processing)
  - MongoDB, MySQL, HDFS, CloudWatch, and more
- It allows conditional routing, meaning logs can be routed to different backends based on tags, severity, or source.

Fluentd in Kubernetes — Why DaemonSet?

In Kubernetes, Fluentd is commonly deployed as a DaemonSet. This ensures that:

One Fluentd pod runs on each node
It collects logs from all containers running on that node
It has access to the node’s /var/log and /var/lib/docker/containers/ paths via hostPath volumes
Logs are continuously collected, even as pods come and go

This setup guarantees full log coverage across your cluster without needing to modify individual application containers.

Fluentd Log Flow in Kubernetes

Here’s a typical flow of how logs move:

[ Kubernetes Pod ]
       |
[ /var/log/containers/*.log ]
       |
[ Fluentd DaemonSet (tail input plugin) ]
       |
[ Parsing, Filtering, Tagging ]
       |
[ Output: Elasticsearch / S3 / Kafka ]

Why Fluentd?

Lightweight and resource-efficient
Highly extensible via plugins
Reliable and fault-tolerant
Cloud-native friendly
Supports structured logging, which is essential for modern observabilityFluentd is a log forwarder that:

Collects logs from the node (including containers)
Parses and transforms them (if needed)
Sends them to Elasticsearch, S3, Kafka, or other destinations

In Kubernetes, we typically deploy Fluentd as a DaemonSet, so that one pod runs on each node, ensuring all logs are captured.

Fluentd as a DaemonSet — Setup Overview

Here’s how I deployed Fluentd on my Kubernetes cluster:

Step 1: Create Namespace (Optional)

kubectl create namespace logging

Step 2: Create ConfigMap for Fluentd Configuration

# fluentd-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/td-agent/containers.log.pos
      tag kubernetes.*
      format json
    </source>

    <match kubernetes.**>
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      logstash_format true
      flush_interval 5s
    </match>

NOTE: Make sure the host matches the internal DNS name of your Elasticsearch service.

Apply the config:

kubectl apply -f fluentd-configmap.yaml

Step 3: Deploy Fluentd DaemonSet

# fluentd-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      containers:
        - name: fluentd
          image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
          env:
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "elasticsearch.logging.svc.cluster.local"
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "9200"
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: config-volume
              mountPath: /fluentd/etc
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: config-volume
          configMap:
            name: fluentd-config

Apply it:

kubectl apply -f fluentd-daemonset.yaml

Validation: Is Fluentd Working?

You can check Fluentd logs to ensure it’s collecting and forwarding data:

kubectl logs -n logging -l app=fluentd

Look for logs showing that it's sending data to Elasticsearch successfully.

Bonus: What Happens Under the Hood?

Fluentd reads container logs from:

/var/log/containers/*.log

These are symbolic links to:

/var/lib/docker/containers/<container-id>/<container-id>-json.log

Fluentd then parses these logs (typically JSON) and pushes them to Elasticsearch using the Logstash format.

Final Thoughts

With Fluentd successfully deployed as a DaemonSet, I now have a powerful log collector running on every node in my Kubernetes cluster. It’s quietly gathering logs from all containers and forwarding them to Elasticsearch, creating a centralized log pipeline. This setup ensures that no log goes unnoticed and sets the stage for powerful search, filtering, and debugging capabilities.

Tomorrow on Day 46, I’ll move to the final piece of the EFK puzzle: Kibana. I’ll deploy Kibana, connect it to Elasticsearch, and begin exploring logs visually. From building custom dashboards to crafting saved searches and filters, I’m excited to bring observability full circle with a clean, insightful UI that makes navigating logs easier and more intuitive.

Day 45 of 90 Days of DevOps Challenge: Setting Up Fluentd as a DaemonSet for Kubernetes Log Collection