🔍Ultimate Guide to OpenTelemetry for Node.js: Tracing and Metrics Explained

In cloud-native environments, it's crucial to have visibility into your services' performance and behavior. OpenTelemetry provides a standardized framework for collecting traces, metrics, and logs, which can help you monitor the health of your applications and debug performance issues.
In this blog post, we'll walk through integrating OpenTelemetry for tracing and metrics into a Node.js microservices architecture consisting of:
A Core Service: A simple Express.js application.
A Notification Service: A RabbitMQ consumer that processes messages and sends emails.
We'll also set up observability with Grafana, Prometheus, and Tempo for trace and metrics visualization.
đź§ What is OpenTelemetry?
OpenTelemetry is an open-source framework for collecting telemetry data, including traces, metrics, and logs. It allows you to track requests across services, visualize performance, and debug issues. OpenTelemetry supports a wide range of languages, libraries, and backends, making it ideal for distributed systems.
🚀 Why Use OpenTelemetry?
OpenTelemetry offers several benefits:
Unified Observability: It collects traces, metrics, and logs under a single framework.
Vendor-Neutral: You can export telemetry data to different backends, such as Tempo, Prometheus, Jaeger, etc.
Automatic Instrumentation: OpenTelemetry provides auto-instrumentation for popular libraries (HTTP requests, databases, message brokers).
Easy to Integrate: It works seamlessly in different environments, such as Docker, Kubernetes, and local setups.
đź’ˇ OpenTelemetry in Popular Tools
Several widely used tools and services leverage OpenTelemetry under the hood, enabling seamless observability integration into your applications. Here are some examples:
1. Sentry
Sentry, a popular error tracking and monitoring service, uses OpenTelemetry for its tracing and telemetry collection. By leveraging OpenTelemetry, Sentry can provide enhanced distributed tracing and performance monitoring. You can seamlessly integrate Sentry into your Node.js apps, and it will automatically instrument your app using OpenTelemetry, allowing you to view both error logs and performance traces in one platform.
2. Datadog
Datadog also uses OpenTelemetry for distributed tracing, metrics, and logs collection. It offers an OpenTelemetry integration that lets you send telemetry data directly from your applications to Datadog for centralized observability.
3. AWS X-Ray
AWS X-Ray is another example of a service that supports OpenTelemetry for tracing. It enables you to track requests, monitor the performance of your microservices, and pinpoint errors or bottlenecks.
4. Jaeger
Jaeger is an open-source distributed tracing system that integrates directly with OpenTelemetry for collecting traces. It provides a user-friendly interface for viewing traces and understanding latency across services in your system.
These tools simplify the process of collecting telemetry data by using OpenTelemetry's standards, making it easier for developers to integrate observability into their applications without locking themselves into a single vendor's ecosystem.
🧑‍💻 Step-by-Step Implementation
Step 1: Create the tracing.js
file for Core Service and Notification Service
First, create a tracing.js
file in both the Core Service and Notification Service. This file will contain the OpenTelemetry configuration for exporting traces and metrics.
Note: For manual instrumentation you wont need to add the instrumentations you will just need to initialize the open telemetery sdk.
/apps/core/tracing.js:
this will contain all of the configuration for the opentelemetery instrumentation for traces metrics and logs you can add and remove the instrumentations as per your need.
// core-service/tracing.js and core-service/tracing.js
/*
Alternatively initOpenTelemetry function can be called in the top of the entry file of the server instead of importing the file
and can pass the serviceName to it so that single tracing.js can be used across the different services
*/
const { NodeSDK } = require("@opentelemetry/sdk-node");
const { getNodeAutoInstrumentations } = require("@opentelemetry/auto-instrumentations-node");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-grpc");
const { OTLPMetricExporter } = require("@opentelemetry/exporter-metrics-otlp-grpc");
const { diag, DiagConsoleLogger, DiagLogLevel, SpanKind, SpanStatusCode } = require("@opentelemetry/api");
const { PeriodicExportingMetricReader } = require("@opentelemetry/sdk-metrics");
const pkg = require("@prisma/instrumentation")
const { PrismaInstrumentation } = pkg;
const initOpenTelemetry = () => {
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);
const traceExporter = new OTLPTraceExporter({
url: process.env.TEMPO_URL
});
const metricsExporter = new OTLPMetricExporter({
url: process.env.TEMPO_URL
})
class LoggingSpanExporter {
export(spans, resultCallback) {
traceExporter.export(spans, resultCallback);
}
shutdown() {
return traceExporter.shutdown();
}
}
const sdk = new NodeSDK({
traceExporter: new LoggingSpanExporter(),
metricReader: new PeriodicExportingMetricReader({
exporter: metricsExporter,
exportIntervalMillis: 5000 //export the metrics after every 5 seconds
}),
instrumentations: [
new PrismaInstrumentation(),
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': {
enabled:true,
requestHook: (span, req)=>{
span.updateName(`${req.method} : ${req.url}`);
}
},
'@opentelemetry/instrumentation-ioredis': {
enabled: true,
responseHook: (span, cmd, arg, result) => {
span.setAttributes({
'db.argument': arg,
'db.result': result ? true : false
});
span.updateName(`ioredis:${cmd}`);
span.spanContext().kind = SpanKind.CLIENT;
span.setStatus({ code: result ? SpanStatusCode.OK : SpanStatusCode.ERROR });
},
dbStatementSerializer: (statement) => {
return statement;
}
},
'@opentelemetry/instrumentation-dns': {
enabled: false
},
'@opentelemetry/instrumentation-amqplib': {
enabled: true
},
'@opentelemetry/instrumentation-net': {
enabled: false
}
}),
],
serviceName:"apps/core" //this will change as per the service. if using monolithic architecture then it will be same for microservices make sure to keep it different so that distributed tracing can be done across the services
});
// Start tracing
sdk.start();
// Graceful shutdown
process.on("SIGTERM", async () => {
await sdk.shutdown();
console.log("Tracing terminated");
process.exit(0);
});
};
initOpenTelemetry()
Step 2: Import tracing.js
in the Core and Notification Services
Next, import the tracing.js
file at the top of both server.js
(Core Service) and notificationProcessor.js
(Notification Service) to enable tracing.
/apps/core/server.js:
the tracing.js should be imported at the top of the application
require('./tracing'); // Import the tracing configuration
require("./tracing")
const express = require("express");
const mappingRoute = require("./routes/mapping.route");
const userRoute = require("./routes/user.route");
const { metricsExporter } = require("./middlewares/metrics");
const { port } = require("./config/awsSecretsManager");
const app = express();
app.use(metricsExporter);
app.use(express.json());
app.use(mappingRoute);
app.use(userRoute);
app.get("/health", (req, res) => {
res.send("The application is healthy");
});
app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});
/apps/core/middlewares/metrics.js:
This middleware is used to generate the custom metricsfor more valuable insights
const { metrics } = require("@opentelemetry/api");
const PackageJson = require("../package.json");
const meter = metrics.getMeter(PackageJson.name);
const requestCounter = meter.createCounter("request_count", {
description: "Count of HTTP requests",
});
const requestDuration = meter.createHistogram("request_duration", {
description: "Duration of HTTP requests in milliseconds",
});
function metricsExporter(req, res, next){
const start = Date.now();
res.on("finish", () => {
const duration = Date.now() - start;
const labels = {
method: req.method,
path: req.route?.path,
status: res.statusCode,
};
requestCounter.add(1, labels);
requestDuration.record(duration, labels);
});
next();
};
module.exports = {
metricsExporter
};
/apps/notification/emailProcessor.js:
this is another microservice where we will use tracing
require("./tracing")
const { EXCHANGES } = require("static_values");
const emailQueue = require("./controller/email.controller");
const Email = require("./util/email.util");
const emailService = new Email();
emailQueue.channel.consume(EXCHANGES.NOTIFICATION_EXCHANGE.QUEUES.MAIL_QUEUE.NAME, (msg) => {
if (msg !== null) {
const emailData = JSON.parse(msg.content.toString());
emailService.send(emailData)
emailQueue.channel.ack(msg);
}
});
If there are multiple services with different service names then only service graph will be generated and distributed tracing will be enabled.
Step 3: Docker & Infrastructure Setup
Once the application is instrumented with OpenTelemetry, it's time to set up the infrastructure. You’ll use Docker Compose to orchestrate the services.
/infra/apm/docker-compose.yaml :
# /infra/apm/docker-compose.yaml
services:
# Tempo runs as user 10001, and docker compose creates the volume as root.
# As such, we need to chown the volume in order for Tempo to start correctly.
init:
image: &tempoImage grafana/tempo:latest
user: root
entrypoint:
- "chown"
- "10001:10001"
- "/var/tempo"
volumes:
- ./tempo-data:/var/tempo
tempo:
image: *tempoImage
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ../shared/tempo.yaml:/etc/tempo.yaml
- ./tempo-data:/var/tempo
ports:
- "3200" # tempo
- "4317" # otlp grpc
depends_on:
- init
# And put them in an OTEL collector pipeline...
otel-collector:
image: otel/opentelemetry-collector:0.86.0
command: [ "--config=/etc/otel-collector.yaml" ]
volumes:
- ./otel-collector.yaml:/etc/otel-collector.yaml
ports:
- "4317:4317"
- "8889:8889"
prometheus:
image: prom/prometheus:latest
command:
- --config.file=/etc/prometheus.yaml
- --web.enable-remote-write-receiver
- --enable-feature=exemplar-storage
- --enable-feature=native-histograms
volumes:
- ../shared/prometheus.yaml:/etc/prometheus.yaml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:11.0.0
volumes:
- ../shared/grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
- GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
ports:
- "4000:3000"
/infra/apm/otel-collector.yaml:
we will push the traces, metrics and logs to the otel collector and from there prometheus and tempo will pull the metrics and traces.
# /infra/apm/otel-collector.yaml
receivers:
otlp:
protocols:
grpc:
http:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['localhost:8888']
exporters:
otlp:
endpoint: tempo:4317
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
metrics:
receivers: [otlp]
exporters: [prometheus]
/infra/shared/tempo.yaml :
Tempo is used to store the traces of the application which we export to otel-collector
stream_over_http_enabled: true
server:
http_listen_port: 3200
log_level: info
query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
metadata_slo:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09
trace_by_id:
duration_slo: 5s
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: "tempo:4317"
ingester:
max_block_duration: 5m # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally
compactor:
compaction:
block_retention: 1h # overall Tempo trace retention. set for demo purposes
metrics_generator:
registry:
external_labels:
source: tempo
cluster: docker-compose
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
traces_storage:
path: /var/tempo/generator/traces
storage:
trace:
backend: local # backend configuration to use
wal:
path: /var/tempo/wal # where to store the wal locally
local:
path: /var/tempo/blocks
overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics, local-blocks] # enables metrics generator
generate_native_histograms: both
/infra/shared/prometheus.yaml:
Add the scraping config for the prometheus to scrape the metrics from otel-collector exposed on port 8889.
Note: We cant directly push/add the metrics to the prometheus. Prometheus scrapes the metrics from the endpoint provided to it in the scrape_configs.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8889']
📊 Visualizing Traces and Metrics in Grafana
Once everything is set up, you can access Grafana at http://localhost:4000
and configure it to visualize both traces (from Tempo) and metrics (from Prometheus). The Grafana Data Source Configuration can be defined as follows:
/infra/shared/grafana-datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
uid: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
version: 1
editable: false
jsonData:
httpMethod: GET
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: true
version: 1
editable: false
apiVersion: 1
uid: tempo
jsonData:
httpMethod: GET
serviceMap:
datasourceUid: prometheus
streamingEnabled:
search: true
Visualization of the traces and the generated service graph.
🎯 Final Thoughts
Integrating OpenTelemetry with Grafana, Prometheus, and Tempo gives you powerful observability for both tracing and metrics. This setup allows you to monitor and visualize your Node.js microservices in real-time and quickly identify bottlenecks or issues.
đź’ˇ Bonus Tips
Custom Metrics: You can add custom metrics, such as queue length or request duration, to monitor business-critical operations.
Service Maps: Grafana's service map visualization helps you track dependencies between services and pinpoint where failures occur.
Mix of Auto and Manual Instrumentations: There are cases where the auto instrumentation does not work in those cases you can implement a hybrid of the auto instrumentaion anf manual instrumentation.
Subscribe to my newsletter
Read articles from Onkar Sabale directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
