Introduction to OpenTelemetry

K8sCloudDevK8sCloudDev
6 min read

OpenTelemetry is an open-source observability framework for modern distributed applications managed by CNCF. It is a collection of tools, APIs, and SDKs. It provides a unified approach for collecting traces, logs, and metrics.

Why OpenTelemetry?

Let's assume a particular microservice is failing, with logs providing us with what error or exception has occurred, and metrics will show high CPU utilization. But it's difficult to determine if all calls to this service are failing or only for a particular consumer. The most important factor concerning observability is to find the maximum information which will allow resolving the issue faster.

A trace represents an entire path of a request across different components in a distributed system enabling end-to-end visibility of the system. A trace can also contain logs and metrics.

https://www.aspecto.io/wp-content/uploads/2022/01/how-the-opentelemetry-sdk-works-1536x677.png

  • A trace can have logs inside it.

  • Logs can point to a trace.

  • Metrics can be correlated to a trace and log via time.

A Span is an event that occurs within a request. It can be DBQuery or a service call, etc. It helps to identify the parent-child relationship between the service calls or events within a request.

We use OpenTelemetry to collect three pillars of observability (trace, logs, metrics) in a unified SDK. This helps in identifying and resolving errors quickly.

OpenTelemetry SDK

OpenTelemetry integrates with popular frameworks and libraries such as Spring, ASP.Net, Express, etc. The SDK collects traces, logs, and metrics and exports them. It helps propagate context between the services and then ships it to the collector.

Components of SDK

Instrumentation: Every service/library has instrumentation attached to it. It collects data about the library at runtime and produces spans based on the specification. Each instrumentation has its configuration, such as adding custom data to a span. Above is an example of Automatic instrumentation. We can have manual instrumentation also. A developer has to manage the entire process by writing custom code. Sometimes, we have to do manual instrumentation for some libraries that do not support automatic instrumentation.

Detectors: They are used to get some default metadata specific to the service. For example, we can have detectors to find the AWS-specific metadata such as region, VPC, etc.

Resources: Resources are wrappers that are used to store the above metadata. They are attached to every span.

Processor: A processor collects the data from the instrumentation and sends it to the exporter. A processor samples the data, meaning it can either omit or modify the data before sending it to the exporter.

Exporter: An exporter sends the data to an external collector. It supports HTTP and GRPC protocols and JSON and PROTO formats for communicating with the external collector.

Provider: A provider is a wrapper for all the behaviors of how traces are generated.Deploying OpenTelemetry

Deploying OpenTelemetry

https://www.aspecto.io/wp-content/uploads/2022/01/OpenTelemetry-Stack-1024x466.png

  • Every service of the distributed application has an OTEL SDK running along with it.

  • SDK collects the data and sends it to the OTEL Collector.

  • An OTEL collector can modify/omit the data and send it to the DB for storing it.

  • An open-source vendor such as Jaeger provides visualization of the data by connecting to the DB.

It is one example of how OpenTelemetry can be deployed in a production environment. Here the Collector can also modify or omit the data. It is similar to what Processor does in the SDK. The modification that is done in the processor is called head sampling and the modification that is done in the collector is called tail sampling.

We must decide on the strategy for the sampling of data initially as it determines how much expensive the tracing is. If we decide to trace every action in the system, then the cost increases. It can be network cost for transporting the data, storage cost for storing the data and compute and memory cost for processing the data.

An Example project to understand OpenTelemetry

we will take a sample project, deploy it and see practically how it all comes together.

  • we will be using Napptive Playground to deploy the same as an OAM application.

  • Go to https://playground.napptive.dev and signup for a free account.

  • Click on deploy apps from the top right corner as shown in the image.

  • We will first deploy Jaeger all-in-one. jaeger all-in-one deploys the agent, collector, and Jaeger, all components necessary to collect the logs and traces. It is not recommended to use it for production purposes, but we will deploy this for learning purposes. Choose the YAML Deploy option and paste the below code.
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: jaeger
spec:
  components:
    - name: jaeger
      type: webservice
      properties:
        image: jaegertracing/all-in-one
        ports:
          - port: 6831
            expose: true
            type: UDP
          - port: 6832
            expose: true
            type: UDP
          - port: 5778
            expose: true
            type: TCP
          - port: 16686
            expose: true
            type: TCP
          - port: 9441
            expose: true
            type: TCP
          - port: 4317
            expose: true
            type: TCP
          - port: 4318
            expose: true
            type: TCP
          - port: 14250
            expose: true
            type: TCP
          - port: 14268
            expose: true
            type: TCP
          - port: 14269
            expose: true
            type: TCP
      traits:
      - type: napptive-ingress  
        properties:
          name: jaeger-ingress
          port: 16686
          path: /
      - type: napptive-ingress
        properties:
          name: jaeger-http-collector-ingress
          port: 14268
          path: /
      - type: env
        properties:
            env: 
              COLLECTOR_ZIPKIN_HOST_PORT: "9411"
              COLLECTOR_OTLP_ENABLED: "true"
  • Now, we have the backend to collect the traces. Let's deploy an application to check if we can see the traces.

  • Again click Deploy and choose Yaml Deploy and paste the below code.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: hotrod
spec:
  components:
    - name: hotrod
      type: webservice
      properties:
        image: jaegertracing/example-hotrod:latest
        args: ["all"]
        ports:
          - port: 8080
            expose: true
          - port: 8081
            expose: true
          - port: 8082
            expose: true
          - port: 8083
            expose: true
      traits:
      - type: napptive-ingress
        properties:
          name: hotrod-ingress
          port: 8080
          path: /
      - type: env
        properties:
          env:
            JAEGER_AGENT_HOST: "http://jaeger-http-collector-ingress-cgrvjjbh51taq9hpjmkg.apps.hackathon.napptive.dev/api/traces"
            JAEGER_AGENT_PORT: "80"
  • Please change the JAEGER_AGENT_HOST to the appropriate link. You can see the link in the UI "endpoints" section as shown below:

  • Application:

  • Jaeger UI:

  • Napptive is a Kubernetes development playground for creating cloud-native applications fast and at scale. It allows us to focus on the application and the underlying infrastructure required to create the application are managed. if you want more information about the platform, following are the useful links

Conclusion

I hope this blog was useful. We learned what is OpenTelemetry, Why we need it, The architecture and How it works internally. We also saw how we can deploy the OpenTelemetry stack in the Napptive playground as an OAM application.

Following are the resources that I used to learn OpenTelemetry and write the above blog:

The images used to explain architecture of OpenTelemetry in this blog are taken from aspecto.io and solely belong to them.

I would also like to thank Kunal Kushwaha, Napptive for conducting this hackathon on building cloud-native applications.

2
Subscribe to my newsletter

Read articles from K8sCloudDev directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

K8sCloudDev
K8sCloudDev

Software Engineer, Aspiring tech story teller.