Observability with OpenTelemetry and Grafana stack Part 2: OpenTelemetry And Java agent instrumentation

Driptaroop DasDriptaroop Das
8 min read

This is the second part of the series of articles on Observability with OpenTelemetry and Grafana stack. In this part, we will be instrumenting the services with the OpenTelemetry Java agent. Previously, we have setup the services and the auth server.

What is Observability?

Observability lets you understand a system from the outside by letting you ask questions about that system without knowing its inner workings. Furthermore, it allows you to easily troubleshoot and handle novel problems, that is, “unknown unknowns”. It also helps you answer the question “Why is this happening?”

To ask those questions about your system, your application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue, because they have all of the information they need.

OpenTelemetry is the mechanism by which application code is instrumented to help make a system observable. [sic]

What is OpenTelemetry?

  • OpenTelemetry is an open-source observability framework that provides APIs, SDKs, and tools for instrumenting, generating, and collecting telemetry data such as traces, metrics, and logs. Originally born from the merger of OpenTracing and OpenCensus, OpenTelemetry has become the de facto standard for collecting telemetry data in modern cloud-native applications.
  • OpenTelemetry supports collecting the 3 pillars of observability: traces, metrics, and logs. Additionally, it also adds experimental support for the 4th pillar: Profiling.

What are the benefits of OpenTelemetry?

  • Standardization Across Observability Pillars: Observability consists of three core pillars:

    • Tracing: Understanding request flows across distributed systems.
    • Metrics: Capturing quantitative system health indicators.
    • Logging: Storing event details for debugging and forensic analysis.

    OpenTelemetry provides a unified API for all three, ensuring consistency across different observability tools and vendors. Additionally, OpenTelemetry also includes experimental support for profiling, which can provide insights into application performance bottlenecks.

  • Vendor-Agnostic & Open Source : Traditional APM (Application Performance Monitoring) solutions often lock users into proprietary ecosystems. OpenTelemetry, being open-source and vendor-neutral, allows organizations to choose their backend (e.g., Prometheus, Jaeger, Zipkin, Datadog) without rewriting instrumentation code.

  • Seamless Integration with Cloud-Native Ecosystem : OpenTelemetry integrates seamlessly with Kubernetes, Istio, Envoy, AWS X-Ray, Azure Monitor, and other cloud-native services, making it ideal for microservices architectures.
  • Automatic and Manual Instrumentation :
    • Auto-Instrumentation: Many popular libraries (e.g., Spring Boot, Django, Express.js) support automatic telemetry data collection, reducing engineering effort.
    • Manual Instrumentation: Developers can customize instrumentation when deeper visibility is required.
  • Enhanced Debugging & Faster MTTR : With distributed tracing capabilities, OpenTelemetry enables developers to:
    • Identify performance bottlenecks.
    • Pinpoint root causes in complex call chains.
    • Reduce MTTR (Mean Time to Resolution) by quickly correlating logs, metrics, and traces.
  • Future-Proof & Cloud-Native First : As an open-source project under the CNCF (Cloud Native Computing Foundation), OpenTelemetry is rapidly evolving with strong community support. It is designed for serverless, containerized, and microservices-based architectures.
  • Cost Efficiency : Since OpenTelemetry allows for sampling, aggregation, and intelligent data collection, it reduces storage costs and data ingestion expenses compared to traditional full-fidelity logging solutions.

Instrumentation using OpenTelemetry

For a system to be observable, it must be instrumented: that is, code from the system’s components must emit signals, such as traces, metrics, and logs. There are 2 ways to instrument a system or application using OpenTelemetry:

  • Code Based Instrumentation: This way allows you to get deeper insight and rich telemetry from your application itself. You can use the OpenTelemetry SDKs (available for different programming languages) to instrument your application code.
  • Zero-code Instrumentation: Zero Code Instrumentation is a way to instrument your application without modifying the application code. This is great for getting started, or when you can’t modify the application you need to get telemetry out of. They provide rich telemetry from libraries you use and/or the environment your application runs in. Another way to think of it is that they provide information about what’s happening at the edges of your application. There are multiple ways to do zero-code instrumentation for different languages... for Java, the most common way is to use the OpenTelemetry Java Agent and this is what we will be using in our services.

Instrumenting the services with OpenTelemetry Java agent

Back to our services, we will be using the OpenTelemetry Java agent to instrument the services. A java agent is just a specially crafted jar file. It utilizes the Instrumentation API that the JVM provides to alter existing byte-code that is loaded in a JVM. In this case, the OpenTelemetry Java agent will be used to add the necessary instrumentation to the services to collect the telemetry data without modifying the application code. Once downloaded, the agent jar file can be used with the -javaagent JVM argument to instrument the services.

To use the OpenTelemetry Java agent, we need to first download the agent jar file. The agent jar file can be downloaded from the opentelemetry-java-instrumentation releases page. We download the latest version of the agent jar file and place it in the root directory of the repo with the name opentelemetry-javaagent.jar.

Next, we need to add the -javaagent JVM argument to the services to use the agent. Since we will be running the services in docker containers, we can do it there. Let's write a simple single Dockerfile for all the services. We will use a multi-stage build to build the services and then run all services as separate targets with the OpenTelemetry Java agent.

# Builder stage
FROM eclipse-temurin:21-alpine AS builder

WORKDIR /app
COPY . .
COPY ca.crt /usr/local/share/ca-certificates/all-ca-certs.crt # If you have any custom CA certificates, you can add them here. We will use this to trust the certificates in the services. Ignore if not needed.

RUN chmod 644 /usr/local/share/ca-certificates/all-ca-certs.crt && update-ca-certificates
RUN keytool -importcert -trustcacerts -cacerts -file /usr/local/share/ca-certificates/all-ca-certs.crt -alias all-ca-certs -storepass changeit -noprompt

RUN ./gradlew build

# user-service
FROM eclipse-temurin:21-alpine AS user-service

WORKDIR /app

EXPOSE 8080
COPY --from=builder /app/application/user-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar

ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]

# notification-service
FROM eclipse-temurin:21-alpine AS notification-service

WORKDIR /app

EXPOSE 8080
COPY --from=builder /app/application/notification-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar

ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]

# account-service
FROM eclipse-temurin:21-alpine AS account-service

WORKDIR /app

EXPOSE 8080
COPY --from=builder /app/application/account-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar

ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]

# transaction-service
FROM eclipse-temurin:21-alpine AS transaction-service

WORKDIR /app

EXPOSE 8080
COPY --from=builder /app/application/transaction-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar

ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]

# auth-server
FROM eclipse-temurin:21-alpine AS auth-server

WORKDIR /app

EXPOSE 9090
COPY --from=builder /app/application/auth-server/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar

ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]

NOTE: In the Dockerfile, I am adding a custom CA certificate to the truststore of the JVM. This is needed if the services are making requests to services with self-signed certificates. Ignore that part if not needed.

Now, we can build and run the services with the OpenTelemetry Java agent. The services will be running in docker containers and will be instrumented with the OpenTelemetry Java agent to collect the telemetry data. Now its time to run the services and the auth server in docker containers.

Running the services and the auth server in docker compose

We will be running the services and the auth server in docker containers using docker compose. This is where we will also setup the database along with the services. Let's create the compose.yaml file in the root directory of the repo.

x-common-env-services: &common-env-services # we will use this anchored extension section to set the common environment variables for all the services.
  SPRING_DATASOURCE_URL: jdbc:postgresql://db-postgres:5432/postgres
  SPRING_DATASOURCE_USERNAME: postgres
  SPRING_DATASOURCE_PASSWORD: password
x-common-services-build: &common-services-build # we will use this anchored extension section to set the common build configurations for all the services.
  context: .
  dockerfile: Dockerfile
services:
  db-postgres:
    image: postgres:17
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: postgres
    ports:
      - "5432:5432"
    volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql # we will use this sql file to create the schemas in the database.
    deploy:
      resources:
        limits:
          cpus: 1
          memory: 1G
    profiles:
      - db
      - services
  auth-server:
    build:
      <<: *common-services-build
      target: auth-server
    ports:
      - "9090:9090"
    environment:
      <<: *common-env-services
      OTEL_SERVICE_NAME: auth-server
      OTEL_RESOURCE_ATTRIBUTES: "application=auth-server"
    profiles:
      - services
  user-service:
    build:
      <<: *common-services-build
      target: user-service
    environment:
      <<: *common-env-services
      OTEL_SERVICE_NAME: user-service
      OTEL_RESOURCE_ATTRIBUTES: "application=user-service"
    depends_on:
      - db-postgres
      - auth-server
    profiles:
      - services
  account-service:
    build:
      <<: *common-services-build
      target: account-service
    environment:
      <<: *common-env-services
      OTEL_SERVICE_NAME: account-service
      OTEL_RESOURCE_ATTRIBUTES: "application=account-service"
    depends_on:
      - user-service
    profiles:
      - services
  notification-service:
    build:
      <<: *common-services-build
      target: notification-service
    environment:
      <<: *common-env-services
      OTEL_SERVICE_NAME: notification-service
      OTEL_RESOURCE_ATTRIBUTES: "application=notification-service"
    depends_on:
      - user-service
    profiles:
      - services
  transaction-service:
    build:
      <<: *common-services-build
      target: transaction-service
    ports:
      - "8080:8080"
    environment:
      <<: *common-env-services
      OTEL_SERVICE_NAME: transaction-service
      OTEL_RESOURCE_ATTRIBUTES: "application=transaction-service"
    depends_on:
      - notification-service
      - account-service
    profiles:
      - services

In the compose file, we are setting up the db-postgres, auth-server, user-service, account-service, notification-service, and transaction-service services. Since the services has a lot in common, we are using the anchored extension sections to set the common environment variables and build configurations for all the services. We also define different profiles for the services. The db profile is for the db-postgres service and the services profile is for the services. We will use it later on. For now, ignore the profiles.

We are also setting up the db-postgres service to run the PostgreSQL database. We will be using the init.sql file to create the schemas in the database. Let's create the init.sql file in the root directory of the repo.

create schema if not exists user_data;
create schema if not exists accounts_data;
create schema if not exists notifications_data;
create schema if not exists transactions_data;

Start the services

Now we have a separation of schemas in the database for different services. We can now run the services and the auth server in docker containers using the compose file.

docker compose  --profile "*" up

The services will be running in docker containers and will be instrumented with the OpenTelemetry Java agent to collect the telemetry data. You will see a lot of errors in the logs because the open telemetry agent, even though is running and capturing the telemetry data, has nowhere to send it.

In the next part, we will setup the OpenTelemetry Collector to collect the telemetry data from the services and export it to the monitoring backends.

0
Subscribe to my newsletter

Read articles from Driptaroop Das directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Driptaroop Das
Driptaroop Das

Self diagnosed nerd 🤓 and avid board gamer 🎲