OpenTelemetry Node.js: Tracing & Metrics Guide

In cloud-native environments, it's crucial to have visibility into your services' performance and behavior. OpenTelemetry provides a standardized framework for collecting traces, metrics, and logs, which can help you monitor the health of your applications and debug performance issues.

In this blog post, we'll walk through integrating OpenTelemetry for tracing and metrics into a Node.js microservices architecture consisting of:

A Core Service: A simple Express.js application.
A Notification Service: A RabbitMQ consumer that processes messages and sends emails.

We'll also set up observability with Grafana, Prometheus, and Tempo for trace and metrics visualization.

🧠 What is OpenTelemetry?

OpenTelemetry is an open-source framework for collecting telemetry data, including traces, metrics, and logs. It allows you to track requests across services, visualize performance, and debug issues. OpenTelemetry supports a wide range of languages, libraries, and backends, making it ideal for distributed systems.

🚀 Why Use OpenTelemetry?

OpenTelemetry offers several benefits:

Unified Observability: It collects traces, metrics, and logs under a single framework.
Vendor-Neutral: You can export telemetry data to different backends, such as Tempo, Prometheus, Jaeger, etc.
Automatic Instrumentation: OpenTelemetry provides auto-instrumentation for popular libraries (HTTP requests, databases, message brokers).
Easy to Integrate: It works seamlessly in different environments, such as Docker, Kubernetes, and local setups.

💡 OpenTelemetry in Popular Tools

Several widely used tools and services leverage OpenTelemetry under the hood, enabling seamless observability integration into your applications. Here are some examples:

1. Sentry

Sentry, a popular error tracking and monitoring service, uses OpenTelemetry for its tracing and telemetry collection. By leveraging OpenTelemetry, Sentry can provide enhanced distributed tracing and performance monitoring. You can seamlessly integrate Sentry into your Node.js apps, and it will automatically instrument your app using OpenTelemetry, allowing you to view both error logs and performance traces in one platform.

2. Datadog

Datadog also uses OpenTelemetry for distributed tracing, metrics, and logs collection. It offers an OpenTelemetry integration that lets you send telemetry data directly from your applications to Datadog for centralized observability.

3. AWS X-Ray

AWS X-Ray is another example of a service that supports OpenTelemetry for tracing. It enables you to track requests, monitor the performance of your microservices, and pinpoint errors or bottlenecks.

4. Jaeger

Jaeger is an open-source distributed tracing system that integrates directly with OpenTelemetry for collecting traces. It provides a user-friendly interface for viewing traces and understanding latency across services in your system.

These tools simplify the process of collecting telemetry data by using OpenTelemetry's standards, making it easier for developers to integrate observability into their applications without locking themselves into a single vendor's ecosystem.

🧑‍💻 Step-by-Step Implementation

Step 1: Create the `tracing.js` file for Core Service and Notification Service

First, create a tracing.js file in both the Core Service and Notification Service. This file will contain the OpenTelemetry configuration for exporting traces and metrics.
Note: For manual instrumentation you wont need to add the instrumentations you will just need to initialize the open telemetery sdk.

/apps/core/tracing.js:

this will contain all of the configuration for the opentelemetery instrumentation for traces metrics and logs you can add and remove the instrumentations as per your need.

// core-service/tracing.js and core-service/tracing.js
/*
Alternatively initOpenTelemetry function can be called in the top of the entry file of the server instead of importing the file
and can pass the serviceName to it so that single tracing.js can be used across the different services
*/

const { NodeSDK } = require("@opentelemetry/sdk-node");
const { getNodeAutoInstrumentations } = require("@opentelemetry/auto-instrumentations-node");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-grpc");
const { OTLPMetricExporter } = require("@opentelemetry/exporter-metrics-otlp-grpc");
const { diag, DiagConsoleLogger, DiagLogLevel, SpanKind, SpanStatusCode } = require("@opentelemetry/api");
const { PeriodicExportingMetricReader } = require("@opentelemetry/sdk-metrics");
const pkg = require("@prisma/instrumentation")

const { PrismaInstrumentation } = pkg;

const initOpenTelemetry = () => {

    diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);


    const traceExporter = new OTLPTraceExporter({
        url: process.env.TEMPO_URL
    });

    const metricsExporter = new OTLPMetricExporter({
        url: process.env.TEMPO_URL
    })


    class LoggingSpanExporter {
        export(spans, resultCallback) {
            traceExporter.export(spans, resultCallback);
        }

        shutdown() {
            return traceExporter.shutdown();
        }
    }

    const sdk = new NodeSDK({
        traceExporter: new LoggingSpanExporter(),
        metricReader: new PeriodicExportingMetricReader({
            exporter: metricsExporter,
            exportIntervalMillis: 5000 //export the metrics after every 5 seconds
        }),
        instrumentations: [
            new PrismaInstrumentation(),
            getNodeAutoInstrumentations({
                '@opentelemetry/instrumentation-http': {
                    enabled:true,
                    requestHook: (span, req)=>{
                        span.updateName(`${req.method} : ${req.url}`);
                    }
                },
                '@opentelemetry/instrumentation-ioredis': {
                    enabled: true,
                    responseHook: (span, cmd, arg, result) => {
                        span.setAttributes({
                            'db.argument': arg,
                            'db.result': result ? true : false
                        });
                        span.updateName(`ioredis:${cmd}`);
                        span.spanContext().kind = SpanKind.CLIENT;
                        span.setStatus({ code: result ? SpanStatusCode.OK : SpanStatusCode.ERROR });
                    },
                    dbStatementSerializer: (statement) => {
                        return statement;
                    }
                },
                '@opentelemetry/instrumentation-dns': {
                    enabled: false
                },
                '@opentelemetry/instrumentation-amqplib': {
                    enabled: true
                },
                '@opentelemetry/instrumentation-net': {
                    enabled: false
                }
            }),
        ],
        serviceName:"apps/core" //this will change as per the service. if using monolithic architecture then it will be same for microservices make sure to keep it different so that distributed tracing can be done across the services
    });

    // Start tracing
    sdk.start();

    // Graceful shutdown
    process.on("SIGTERM", async () => {
        await sdk.shutdown();
        console.log("Tracing terminated");
        process.exit(0);
    });
};
initOpenTelemetry()

Step 2: Import `tracing.js` in the Core and Notification Services

Next, import the tracing.js file at the top of both server.js (Core Service) and notificationProcessor.js (Notification Service) to enable tracing.

/apps/core/server.js:

the tracing.js should be imported at the top of the application

require('./tracing');  // Import the tracing configuration

require("./tracing")
const express = require("express");
const mappingRoute = require("./routes/mapping.route");
const userRoute = require("./routes/user.route");
const { metricsExporter } = require("./middlewares/metrics");
const { port } = require("./config/awsSecretsManager");

const app = express();
app.use(metricsExporter);
app.use(express.json());
app.use(mappingRoute);
app.use(userRoute);

app.get("/health", (req, res) => {
  res.send("The application is healthy");
});


app.listen(port, () => {
  console.log(`Server is running on port ${port}`);
});

/apps/core/middlewares/metrics.js:

This middleware is used to generate the custom metricsfor more valuable insights

const { metrics } = require("@opentelemetry/api");
const PackageJson = require("../package.json");


const meter = metrics.getMeter(PackageJson.name);

const requestCounter = meter.createCounter("request_count", {
  description: "Count of HTTP requests",
});

const requestDuration = meter.createHistogram("request_duration", {
  description: "Duration of HTTP requests in milliseconds",
});


function metricsExporter(req, res, next){
    const start = Date.now();
    res.on("finish", () => {
        const duration = Date.now() - start;
        const labels = {
            method: req.method,
            path: req.route?.path,
            status: res.statusCode,
        };
        requestCounter.add(1, labels);
        requestDuration.record(duration, labels);
    });
    next();
};


module.exports = {
    metricsExporter
};

/apps/notification/emailProcessor.js:

this is another microservice where we will use tracing

require("./tracing")
const { EXCHANGES } = require("static_values");
const emailQueue = require("./controller/email.controller");
const Email = require("./util/email.util");

const emailService = new Email();

emailQueue.channel.consume(EXCHANGES.NOTIFICATION_EXCHANGE.QUEUES.MAIL_QUEUE.NAME, (msg) => {
  if (msg !== null) {
    const emailData = JSON.parse(msg.content.toString());
    emailService.send(emailData)
    emailQueue.channel.ack(msg);
  }
});

If there are multiple services with different service names then only service graph will be generated and distributed tracing will be enabled.

Step 3: Docker & Infrastructure Setup

Once the application is instrumented with OpenTelemetry, it's time to set up the infrastructure. You’ll use Docker Compose to orchestrate the services.

/infra/apm/docker-compose.yaml :

# /infra/apm/docker-compose.yaml
services:

  # Tempo runs as user 10001, and docker compose creates the volume as root.
  # As such, we need to chown the volume in order for Tempo to start correctly.
  init:
    image: &tempoImage grafana/tempo:latest
    user: root
    entrypoint:
      - "chown"
      - "10001:10001"
      - "/var/tempo"
    volumes:
      - ./tempo-data:/var/tempo

  tempo:
    image: *tempoImage
    command: [ "-config.file=/etc/tempo.yaml" ]
    volumes:
      - ../shared/tempo.yaml:/etc/tempo.yaml
      - ./tempo-data:/var/tempo
    ports:
      - "3200" # tempo
      - "4317" # otlp grpc
    depends_on:
      - init

  # And put them in an OTEL collector pipeline...
  otel-collector:
    image: otel/opentelemetry-collector:0.86.0
    command: [ "--config=/etc/otel-collector.yaml" ]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - "4317:4317"
      - "8889:8889"

  prometheus:
    image: prom/prometheus:latest
    command:
      - --config.file=/etc/prometheus.yaml
      - --web.enable-remote-write-receiver
      - --enable-feature=exemplar-storage
      - --enable-feature=native-histograms
    volumes:
      - ../shared/prometheus.yaml:/etc/prometheus.yaml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:11.0.0
    volumes:
      - ../shared/grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_AUTH_DISABLE_LOGIN_FORM=true
      - GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
    ports:
      - "4000:3000"

/infra/apm/otel-collector.yaml:

we will push the traces, metrics and logs to the otel collector and from there prometheus and tempo will pull the metrics and traces.

# /infra/apm/otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
      http:

  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          static_configs:
            - targets: ['localhost:8888']

exporters:
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true

  prometheus:
    endpoint: "0.0.0.0:8889"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

    metrics:
      receivers: [otlp]
      exporters: [prometheus]

/infra/shared/tempo.yaml :

Tempo is used to store the traces of the application which we export to otel-collector

stream_over_http_enabled: true
server:
  http_listen_port: 3200
  log_level: info

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09
    metadata_slo:
        duration_slo: 5s
        throughput_bytes_slo: 1.073741824e+09
  trace_by_id:
    duration_slo: 5s

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "tempo:4317"

ingester:
  max_block_duration: 5m               # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally

compactor:
  compaction:
    block_retention: 1h                # overall Tempo trace retention. set for demo purposes

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: docker-compose
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces

storage:
  trace:
    backend: local                     # backend configuration to use
    wal:
      path: /var/tempo/wal             # where to store the wal locally
    local:
      path: /var/tempo/blocks

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks] # enables metrics generator
      generate_native_histograms: both

/infra/shared/prometheus.yaml:

Add the scraping config for the prometheus to scrape the metrics from otel-collector exposed on port 8889.

Note: We cant directly push/add the metrics to the prometheus. Prometheus scrapes the metrics from the endpoint provided to it in the scrape_configs.

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8889']

📊 Visualizing Traces and Metrics in Grafana

Once everything is set up, you can access Grafana at http://localhost:4000 and configure it to visualize both traces (from Tempo) and metrics (from Prometheus). The Grafana Data Source Configuration can be defined as follows:

/infra/shared/grafana-datasources.yaml:

apiVersion: 1

datasources:
- name: Prometheus
  type: prometheus
  uid: prometheus
  access: proxy
  orgId: 1
  url: http://prometheus:9090
  basicAuth: false
  isDefault: false
  version: 1
  editable: false
  jsonData:
    httpMethod: GET
- name: Tempo
  type: tempo
  access: proxy
  orgId: 1
  url: http://tempo:3200
  basicAuth: false
  isDefault: true
  version: 1
  editable: false
  apiVersion: 1
  uid: tempo
  jsonData:
    httpMethod: GET
    serviceMap:
      datasourceUid: prometheus
    streamingEnabled:
      search: true

Visualization of the traces and the generated service graph.

🎯 Final Thoughts

Integrating OpenTelemetry with Grafana, Prometheus, and Tempo gives you powerful observability for both tracing and metrics. This setup allows you to monitor and visualize your Node.js microservices in real-time and quickly identify bottlenecks or issues.

💡 Bonus Tips

Custom Metrics: You can add custom metrics, such as queue length or request duration, to monitor business-critical operations.
Service Maps: Grafana's service map visualization helps you track dependencies between services and pinpoint where failures occur.
Mix of Auto and Manual Instrumentations: There are cases where the auto instrumentation does not work in those cases you can implement a hybrid of the auto instrumentaion anf manual instrumentation.

🔍Ultimate Guide to OpenTelemetry for Node.js: Tracing and Metrics Explained

🧠 What is OpenTelemetry?

🚀 Why Use OpenTelemetry?

💡 OpenTelemetry in Popular Tools

1. Sentry

2. Datadog

3. AWS X-Ray

4. Jaeger

🧑‍💻 Step-by-Step Implementation

Step 1: Create the `tracing.js` file for Core Service and Notification Service

Step 2: Import `tracing.js` in the Core and Notification Services

Step 3: Docker & Infrastructure Setup

/infra/apm/docker-compose.yaml :

/infra/apm/otel-collector.yaml:

/infra/shared/tempo.yaml :

/infra/shared/prometheus.yaml:

📊 Visualizing Traces and Metrics in Grafana

🎯 Final Thoughts

💡 Bonus Tips

Subscribe to my newsletter

Onkar Sabale

Onkar Sabale

🔍Ultimate Guide to OpenTelemetry for Node.js: Tracing and Metrics Explained

🧠 What is OpenTelemetry?

🚀 Why Use OpenTelemetry?

💡 OpenTelemetry in Popular Tools

1. Sentry

2. Datadog

3. AWS X-Ray

4. Jaeger

🧑‍💻 Step-by-Step Implementation

Step 1: Create the tracing.js file for Core Service and Notification Service

Step 2: Import tracing.js in the Core and Notification Services

Step 3: Docker & Infrastructure Setup

/infra/apm/docker-compose.yaml :

/infra/apm/otel-collector.yaml:

/infra/shared/tempo.yaml :

/infra/shared/prometheus.yaml:

📊 Visualizing Traces and Metrics in Grafana

🎯 Final Thoughts

💡 Bonus Tips

Subscribe to my newsletter

Onkar Sabale

Onkar Sabale

Step 1: Create the `tracing.js` file for Core Service and Notification Service

Step 2: Import `tracing.js` in the Core and Notification Services