Introduction to OpenTelemetry
OpenTelemetry is an open-source observability framework for modern distributed applications managed by CNCF. It is a collection of tools, APIs, and SDKs. It provides a unified approach for collecting traces, logs, and metrics.
Why OpenTelemetry?
Let's assume a particular microservice is failing, with logs providing us with what error or exception has occurred, and metrics will show high CPU utilization. But it's difficult to determine if all calls to this service are failing or only for a particular consumer. The most important factor concerning observability is to find the maximum information which will allow resolving the issue faster.
A trace represents an entire path of a request across different components in a distributed system enabling end-to-end visibility of the system. A trace can also contain logs and metrics.
A trace can have logs inside it.
Logs can point to a trace.
Metrics can be correlated to a trace and log via time.
A Span is an event that occurs within a request. It can be DBQuery or a service call, etc. It helps to identify the parent-child relationship between the service calls or events within a request.
We use OpenTelemetry to collect three pillars of observability (trace, logs, metrics) in a unified SDK. This helps in identifying and resolving errors quickly.
OpenTelemetry SDK
OpenTelemetry integrates with popular frameworks and libraries such as Spring, ASP.Net, Express, etc. The SDK collects traces, logs, and metrics and exports them. It helps propagate context between the services and then ships it to the collector.
Components of SDK
Instrumentation: Every service/library has instrumentation attached to it. It collects data about the library at runtime and produces spans based on the specification. Each instrumentation has its configuration, such as adding custom data to a span. Above is an example of Automatic instrumentation. We can have manual instrumentation also. A developer has to manage the entire process by writing custom code. Sometimes, we have to do manual instrumentation for some libraries that do not support automatic instrumentation.
Detectors: They are used to get some default metadata specific to the service. For example, we can have detectors to find the AWS-specific metadata such as region, VPC, etc.
Resources: Resources are wrappers that are used to store the above metadata. They are attached to every span.
Processor: A processor collects the data from the instrumentation and sends it to the exporter. A processor samples the data, meaning it can either omit or modify the data before sending it to the exporter.
Exporter: An exporter sends the data to an external collector. It supports HTTP and GRPC protocols and JSON and PROTO formats for communicating with the external collector.
Provider: A provider is a wrapper for all the behaviors of how traces are generated.Deploying OpenTelemetry
Deploying OpenTelemetry
Every service of the distributed application has an OTEL SDK running along with it.
SDK collects the data and sends it to the OTEL Collector.
An OTEL collector can modify/omit the data and send it to the DB for storing it.
An open-source vendor such as Jaeger provides visualization of the data by connecting to the DB.
It is one example of how OpenTelemetry can be deployed in a production environment. Here the Collector can also modify or omit the data. It is similar to what Processor does in the SDK. The modification that is done in the processor is called head sampling and the modification that is done in the collector is called tail sampling.
We must decide on the strategy for the sampling of data initially as it determines how much expensive the tracing is. If we decide to trace every action in the system, then the cost increases. It can be network cost for transporting the data, storage cost for storing the data and compute and memory cost for processing the data.
An Example project to understand OpenTelemetry
we will take a sample project, deploy it and see practically how it all comes together.
we will be using Napptive Playground to deploy the same as an OAM application.
Go to https://playground.napptive.dev and signup for a free account.
Click on deploy apps from the top right corner as shown in the image.
- We will first deploy Jaeger all-in-one. jaeger all-in-one deploys the agent, collector, and Jaeger, all components necessary to collect the logs and traces. It is not recommended to use it for production purposes, but we will deploy this for learning purposes. Choose the YAML Deploy option and paste the below code.
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: jaeger
spec:
components:
- name: jaeger
type: webservice
properties:
image: jaegertracing/all-in-one
ports:
- port: 6831
expose: true
type: UDP
- port: 6832
expose: true
type: UDP
- port: 5778
expose: true
type: TCP
- port: 16686
expose: true
type: TCP
- port: 9441
expose: true
type: TCP
- port: 4317
expose: true
type: TCP
- port: 4318
expose: true
type: TCP
- port: 14250
expose: true
type: TCP
- port: 14268
expose: true
type: TCP
- port: 14269
expose: true
type: TCP
traits:
- type: napptive-ingress
properties:
name: jaeger-ingress
port: 16686
path: /
- type: napptive-ingress
properties:
name: jaeger-http-collector-ingress
port: 14268
path: /
- type: env
properties:
env:
COLLECTOR_ZIPKIN_HOST_PORT: "9411"
COLLECTOR_OTLP_ENABLED: "true"
Now, we have the backend to collect the traces. Let's deploy an application to check if we can see the traces.
Again click Deploy and choose Yaml Deploy and paste the below code.
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: hotrod
spec:
components:
- name: hotrod
type: webservice
properties:
image: jaegertracing/example-hotrod:latest
args: ["all"]
ports:
- port: 8080
expose: true
- port: 8081
expose: true
- port: 8082
expose: true
- port: 8083
expose: true
traits:
- type: napptive-ingress
properties:
name: hotrod-ingress
port: 8080
path: /
- type: env
properties:
env:
JAEGER_AGENT_HOST: "http://jaeger-http-collector-ingress-cgrvjjbh51taq9hpjmkg.apps.hackathon.napptive.dev/api/traces"
JAEGER_AGENT_PORT: "80"
- Please change the JAEGER_AGENT_HOST to the appropriate link. You can see the link in the UI "endpoints" section as shown below:
- Application:
- Jaeger UI:
Napptive is a Kubernetes development playground for creating cloud-native applications fast and at scale. It allows us to focus on the application and the underlying infrastructure required to create the application are managed. if you want more information about the platform, following are the useful links
Conclusion
I hope this blog was useful. We learned what is OpenTelemetry, Why we need it, The architecture and How it works internally. We also saw how we can deploy the OpenTelemetry stack in the Napptive playground as an OAM application.
Following are the resources that I used to learn OpenTelemetry and write the above blog:
https://www.aspecto.io/blog/what-is-opentelemetry-the-infinitive-guide/
https://www.youtube.com/playlist?list=PLNxnp_rzlqf6z1cC0IkIwp6yjsBboX945
https://github.com/jaegertracing/jaeger/tree/main/examples/hotrod
The images used to explain architecture of OpenTelemetry in this blog are taken from aspecto.io and solely belong to them.
I would also like to thank Kunal Kushwaha, Napptive for conducting this hackathon on building cloud-native applications.
Subscribe to my newsletter
Read articles from K8sCloudDev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
K8sCloudDev
K8sCloudDev
Software Engineer, Aspiring tech story teller.