Application Logging with FluentD

Hey everyone! Welcome back ๐Ÿ‘‹. In this blog, we will see what is FluentD and why we actually need it. To start with simple information, Fluentd comes under the "Observability and Analysis" part of the cloud-native application cycle and is very helpful for logging the logs from the application.

What is FluentD:

Top 50 Interview Questions and Answers of FluentD - DevOpsSchool.com

As I mentioned it is an open source logging tool for cloud-native applications. Instead of saying it logging tool we can mention it as an open-source data collector that also helps you unify the data in order to make more sense of it. FluentD comes will a lot of unique features which really help while you are working with different sets of tools. All the features will be introduced in the further part of the blog.

Why do we log data:

Before jumping directly into FluentD, let's consider a simple scenario first to better understand why we need logging and appreciate what FluentD offers us.

Let's say you have a microservice application deployed on Kuberenetes Cluster and applications are written in different languages for example some in Javascript, python different databases, message brokers, and other services. Now as these applications communicate with each other they generate some data also called logs. And each application will generate this data in different formats depending on the languages and plugins you are using.

Now these data can be anything. You might need to log for compliance or logging specific data depending on the industry you are working in or the product you are working on.

These logs can also be for ensuring the security of your cluster and server (access logs) and can help in detecting suspicious access to your application data.

The logs can be also used like always in the traditional way to find an error and debug your application in case anything goes wrong

Now that we know why we need logging, let's see how is the data logged.

How is data logged?

Once the data is sent by the application it is generally stored in 3 ways

  • File - The log data can directly be written to files. The issue with the method is that it is not humanly accessible and there can be a lot of log files which makes it impossible to go through files. Also, as mentioned before these files may be of different formats for different applications.

  • Log into DB - The logs can be stored in a log database like Elastic Search so that it can be visualized easily with the help of different applications, or bundled visualizers. But in this case, Each application should be configured to log the data to elastic search.

  • Third-party application - We can use Third-Party applications to log the data generated from the application but we can't control how they log the data making it inconsistent when the applications are written in different languages and structures.

How does FluentD solve this?

Now that we know how the data is logged and what can be the shortcoming in all the above scenarios. It would be easier to understand how FluentD comes as a solution.

  • FluentD acts as a Unified logging layer which means no matter how many ways it gets logs in from source applications. It will convert all the logs into a singular unified format which can be distributed for the use of analysis, alerting, etc.

  • FluentD collects data from different data sources Data sources like apps, access logs, system logs, and databases collect & Process into a unified format and then these logs are sent and used at the destination for Alerting analysis, archiving

How does it work?

Now let's see how FluentD works, first of all as we have seen in previous tutorials and tools, it is similarly deployed on the cluster you will be collecting logs. FluentD collects the logs from every application on the server. This also includes the logs from the 3rd party applications installed on the server.

After receiving the logs from the application, FluentD converts them into a unified format. Conversion to a unified format helps to work on logs coming from different applications and types. In addition to the conversation about the unified format, you can also enrich the data with additional information like information about the pod, namespace, container names, etc. FluentD also helps us modify the data being logged. After the conversion and necessary filtering and modification, the logs are sent to the destination. This can be anything like the one mentioned. The interesting part is that you can choose where your logs go. So you can define what destination a particular type of log goes to. This is called "routing"

Features of FluentD

  • Not tied to any specific backend: This feature gives flexibility to FluentD to function with all types of backend services

  • No vendor lock-in: Because FluentD not being dependent on the backend there is no vendor lock-in.

  • Apart from this saves data on the hard drive until fluent parses and sends the data to the destination. Data will still be there if the server restarts and will pick up the execution where is was halted. Even if it facilitates this, it doesn't need additional storage configuration.

  • It will keep on trying to push the logs to the destinations in case the destination (eg: database) fails and will keep on trying until the destination is available

How to configure

  • We have to install the FluentD deamonset. We can find the installation guide here

  • FluentD hence runs on the Kubernetes node and thus receives the logs from the applications residing on those nodes

  • We have to configure the FluentD using the configuration file stating the rules, source, and destination configurations respectively. We use FluentD plugins to configure the working of FluentD on the cluster

Fluentd plugins are classified as:

  • Input: What are the sources and types of input you want to log and can be of different types like http, tcp, syslog etc

  • Parser: How data is processed in key-value pair example csv, tsv, json

  • Filter: You can enrich the data as I discussed above using record_transformer

  • Output: You can configure the destination where the logs go example elastic search, mongo

  • Then we Use tags to group the logs. We essentially use the source block to bring the logs in the input and parse them then use filter block to help us with enriching and modifying the logs

      # All apps with tag myapp to be parsed as json
      <filter myapp.*>
      ...
          <parse>
              @type json
              ...
          </parse>
      </filter>
    

    Similarly, we use match block to define the destination where we desire to send the logs

      # All logs from service with tag myservice should go to elasticsearch 
      <match myservice.*>
          @type elasticsearch
          ...
      </match>
    

Difference in FluentD and Fluent-Bit

FluentD is similar to Fluent Bit in terms of functionality but the difference is that Fluent Bit is lightweight and used for high efficiency and low cost. Fluent Bit is known as a "High scale low resource", and is preferred for containerized applications

Conclusion:

Although there are many logging tools and we will be learning about them as well in the future but that's it for the introduction to FluentD in this blog. In upcoming blogs, we will be looking into how you can use FluentD with a demo.

Resources:

Thank you so much for reading ๐Ÿ’–

Connect: https://link.kaiwalyakoparkar.com/

11
Subscribe to my newsletter

Read articles from Kaiwalya Koparkar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kaiwalya Koparkar
Kaiwalya Koparkar

I am Kaiwalya Koparkar, founder of Geek Around Community, a GitHub Campus Expert, MLH Coach, Open-Source Advocate & DevRel. I work as a Program Manager/ Community Manager in several communities. I love to speak at sessions, workshops, meetups, and conferences. In the past year, I have won over 10+ hackathons and mentored/judged/participated in over 35+ global hackathons.