O11y with Grafana Tempo and OpenTelemetry


Intro
In this blog, we would like to share the experience of attending Grafana’s recent Meetup, where Devarsh gave an insightful talk about how Grafana Tempo is being used at his firm. He spoke about why they chose Tempo over other observability tools and what made it stand out in their setup. Let's dive into what made this session so interesting!
What is Observability?
O11y means the output which we will get from the API Calls, database, or they can can be received by Kubernetes or Docker containers.
The output(or the three pillars of o11y) can be in the form of Metrics, Logs and Traces.
Understanding each,
Metrics help us monitor the performance and resource usage—like CPU and memory—of the underlying infrastructure.
The Logs, the logs are, the real-time messages from the services, that what’s internal process are going internally. Like if we see the logs of the Node.js server, we know that we see the message → “Server is running on port 3000” or so on.
The Traces, this is something interesting, which we’re going to witness today. The traces take you through the path the request travels. Whereas, logs tell you this request was sent, and this request was received. The traces show you how many milliseconds (the latency) it took to reach that route.
Why O11y?
If we go back a decade, we will see that companies used to use monolithic architecture. So, if something goes wrong, they will just check the logs and see which part is breaking and remediate it.
But today, things are different. Microservices have an attraction, and companies are leveraging it, and here the need for o11y arises.
The o11y follows this cycle → Detect → Investigate → and Remediate
Detect → If we do not use O11y, the customer will call us or raise the issue that the particular microservice is misbehaving. But, using o11y, we will detect the issue beforehand can investigate and remediate the issue.
What do I mean when I speak of investigating? → We will see what status code we’re seeing and which request is misbehaving, and ultimately, we can remediate the issue.
What is OpenTelemetry?
- Using OTel, we examine the internal state of o11y → otel o11y pipeline can be broken down into three main stages: the generation → the collection and → the export of the three pillars that we saw.
What is the need for OpenTelemetry?
It generalises the way. There exist multiple tools which achieve this in their way. Tools, like Nagios, Jaeger, Zipkin, AWS X-ray, Grafana, Prometheus. But with multiple choices, there comes misery in selecting one. Hence, OTel.
OTel will instrument our microservice to measure the changes. In our case, instrumenting Node.js.
Why did we choose Grafana Tempo?
Simplified Documentation on What and How
Integrate it with the Grafana ecosystem.
Native support for OpenTelemetry
Helped us to find out the root cause analysis and find Slow Traces
Database support and accurate service graph
Which eventually boosted our productivity and saved our costs, as it’s an open-source project
Why not Jaeger?
- For sure Jaeger will provide us Traces and Monitoring whereas Tempo has the Native support of Grafana alongwith the service graph and long term storage support.
Why not Zipkin?
- No doubt, Zipkin is reliable and lightweight. But, it doesn’t support Grafana and OpenTelemetry natively and have limited capabilities in terms of Service Graphs and Long-Term storage option.
Please refer to this GitHub repository. I have made changes to it and made it lightweight for my machine.
https://github.com/devarsh10/opentelemetry-demo/tree/lightweight
Finally, we will be able to see something like this,
Key Takeaways
Metrics, logs, and traces are the 3 pillars of observability, all viewable in a unified Grafana dashboard experience.
OpenTelemetry provides a vendor-neutral pipeline to generate, collect, and export telemetry data directly to Grafana's ecosystem.
Grafana Tempo offers a distributed tracing backend that integrates with OpenTelemetry and Prometheus.
References
- Why O11y Image → https://aws-observability.github.io/observability-best-practices/assets/images/Why_is_Observability_Important-88ac959abe0712c1a548537e274501f4.png
Conclusion
Now, we know what is the use of Grafana Tempo. How can it be useful when we integrate it with our microservice.
Let me know, how you guys are handling observability and what tools you’re using?
Lastly, I would like to express my sincere gratitude to Rushabh Shah. Without his support, this blog would not have been possible. Thank you very much.
Subscribe to my newsletter
Read articles from Devarsh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Devarsh
Devarsh
Hello Everyone, welcome to my blogging channel. Here, I simplify concepts for beginners. If you want to just follow the tutorial and clone the project for your resume, I simplify that process for you by making you understand the concepts. But, if you want to choose the other way around like you are learning skills to optimize the cost and build something. That's what an engineer would do. Please follow.