Observability with OpenTelemetry and Grafana stack Part 2: OpenTelemetry And Java agent instrumentation


This is the second part of the series of articles on Observability with OpenTelemetry and Grafana stack. In this part, we will be instrumenting the services with the OpenTelemetry Java agent. Previously, we have setup the services and the auth server.
What is Observability?
Observability lets you understand a system from the outside by letting you ask questions about that system without knowing its inner workings. Furthermore, it allows you to easily troubleshoot and handle novel problems, that is, “unknown unknowns”. It also helps you answer the question “Why is this happening?”
To ask those questions about your system, your application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue, because they have all of the information they need.
OpenTelemetry
is the mechanism by which application code is instrumented to help make a system observable. [sic]
What is OpenTelemetry?
OpenTelemetry
is an open-source observability framework that provides APIs, SDKs, and tools for instrumenting, generating, and collecting telemetry data such as traces, metrics, and logs. Originally born from the merger of OpenTracing and OpenCensus, OpenTelemetry has become the de facto standard for collecting telemetry data in modern cloud-native applications.OpenTelemetry
supports collecting the 3 pillars of observability: traces, metrics, and logs. Additionally, it also adds experimental support for the 4th pillar: Profiling.
What are the benefits of OpenTelemetry
?
Standardization Across Observability Pillars: Observability consists of three core pillars:
- Tracing: Understanding request flows across distributed systems.
- Metrics: Capturing quantitative system health indicators.
- Logging: Storing event details for debugging and forensic analysis.
OpenTelemetry
provides a unified API for all three, ensuring consistency across different observability tools and vendors. Additionally, OpenTelemetry also includes experimental support for profiling, which can provide insights into application performance bottlenecks.Vendor-Agnostic & Open Source : Traditional APM (Application Performance Monitoring) solutions often lock users into proprietary ecosystems. OpenTelemetry, being open-source and vendor-neutral, allows organizations to choose their backend (e.g., Prometheus, Jaeger, Zipkin, Datadog) without rewriting instrumentation code.
- Seamless Integration with Cloud-Native Ecosystem :
OpenTelemetry
integrates seamlessly with Kubernetes, Istio, Envoy, AWS X-Ray, Azure Monitor, and other cloud-native services, making it ideal for microservices architectures. - Automatic and Manual Instrumentation :
- Auto-Instrumentation: Many popular libraries (e.g., Spring Boot, Django, Express.js) support automatic telemetry data collection, reducing engineering effort.
- Manual Instrumentation: Developers can customize instrumentation when deeper visibility is required.
- Enhanced Debugging & Faster MTTR : With distributed tracing capabilities,
OpenTelemetry
enables developers to:- Identify performance bottlenecks.
- Pinpoint root causes in complex call chains.
- Reduce MTTR (Mean Time to Resolution) by quickly correlating logs, metrics, and traces.
- Future-Proof & Cloud-Native First : As an open-source project under the CNCF (Cloud Native Computing Foundation),
OpenTelemetry
is rapidly evolving with strong community support. It is designed for serverless, containerized, and microservices-based architectures. - Cost Efficiency : Since
OpenTelemetry
allows for sampling, aggregation, and intelligent data collection, it reduces storage costs and data ingestion expenses compared to traditional full-fidelity logging solutions.
Instrumentation using OpenTelemetry
For a system to be observable
, it must be instrumented: that is, code from the system’s components must emit signals, such as traces, metrics, and logs.
There are 2 ways to instrument a system or application using OpenTelemetry
:
- Code Based Instrumentation: This way allows you to get deeper insight and rich telemetry from your application itself. You can use the OpenTelemetry SDKs (available for different programming languages) to instrument your application code.
- Zero-code Instrumentation: Zero Code Instrumentation is a way to instrument your application without modifying the application code. This is great for getting started, or when you
can’t modify the application you need to get telemetry out of. They provide rich telemetry from libraries you use and/or the environment your application runs in. Another way to think of it is that they
provide information about what’s happening at the edges of your application. There are multiple ways to do zero-code instrumentation for different languages... for Java, the most common way is to use the
OpenTelemetry Java Agent
and this is what we will be using in our services.
Instrumenting the services with OpenTelemetry Java agent
Back to our services, we will be using the OpenTelemetry Java agent
to instrument the services.
A java agent is just a specially crafted jar file. It utilizes the Instrumentation API that the JVM provides to alter existing byte-code that is loaded in a JVM.
In this case, the OpenTelemetry Java agent will be used to add the necessary instrumentation to the services to collect the telemetry data without modifying the application code.
Once downloaded, the agent jar file can be used with the -javaagent
JVM argument to instrument the services.
To use the OpenTelemetry Java agent, we need to first download the agent jar file. The agent jar file can be downloaded from the opentelemetry-java-instrumentation releases page.
We download the latest version of the agent jar file and place it in the root directory of the repo with the name opentelemetry-javaagent.jar
.
Next, we need to add the -javaagent
JVM argument to the services to use the agent. Since we will be running the services in docker containers, we can do it there.
Let's write a simple single Dockerfile
for all the services. We will use a multi-stage build to build the services and then run all services as separate targets with the OpenTelemetry Java agent.
# Builder stage
FROM eclipse-temurin:21-alpine AS builder
WORKDIR /app
COPY . .
COPY ca.crt /usr/local/share/ca-certificates/all-ca-certs.crt # If you have any custom CA certificates, you can add them here. We will use this to trust the certificates in the services. Ignore if not needed.
RUN chmod 644 /usr/local/share/ca-certificates/all-ca-certs.crt && update-ca-certificates
RUN keytool -importcert -trustcacerts -cacerts -file /usr/local/share/ca-certificates/all-ca-certs.crt -alias all-ca-certs -storepass changeit -noprompt
RUN ./gradlew build
# user-service
FROM eclipse-temurin:21-alpine AS user-service
WORKDIR /app
EXPOSE 8080
COPY --from=builder /app/application/user-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar
ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]
# notification-service
FROM eclipse-temurin:21-alpine AS notification-service
WORKDIR /app
EXPOSE 8080
COPY --from=builder /app/application/notification-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar
ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]
# account-service
FROM eclipse-temurin:21-alpine AS account-service
WORKDIR /app
EXPOSE 8080
COPY --from=builder /app/application/account-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar
ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]
# transaction-service
FROM eclipse-temurin:21-alpine AS transaction-service
WORKDIR /app
EXPOSE 8080
COPY --from=builder /app/application/transaction-service/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar
ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]
# auth-server
FROM eclipse-temurin:21-alpine AS auth-server
WORKDIR /app
EXPOSE 9090
COPY --from=builder /app/application/auth-server/build/libs/*.jar /app.jar
COPY opentelemetry-javaagent.jar ./otel.jar
ENTRYPOINT ["java", "-javaagent:/app/otel.jar", "-jar", "/app.jar"]
NOTE: In the
Dockerfile
, I am adding a custom CA certificate to the truststore of the JVM. This is needed if the services are making requests to services with self-signed certificates. Ignore that part if not needed.
Now, we can build and run the services with the OpenTelemetry Java agent. The services will be running in docker containers and will be instrumented with the OpenTelemetry Java agent to collect the telemetry data. Now its time to run the services and the auth server in docker containers.
Running the services and the auth server in docker compose
We will be running the services and the auth server in docker containers using docker compose. This is where we will also setup the database along with the services.
Let's create the compose.yaml
file in the root directory of the repo.
x-common-env-services: &common-env-services # we will use this anchored extension section to set the common environment variables for all the services.
SPRING_DATASOURCE_URL: jdbc:postgresql://db-postgres:5432/postgres
SPRING_DATASOURCE_USERNAME: postgres
SPRING_DATASOURCE_PASSWORD: password
x-common-services-build: &common-services-build # we will use this anchored extension section to set the common build configurations for all the services.
context: .
dockerfile: Dockerfile
services:
db-postgres:
image: postgres:17
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres
ports:
- "5432:5432"
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql # we will use this sql file to create the schemas in the database.
deploy:
resources:
limits:
cpus: 1
memory: 1G
profiles:
- db
- services
auth-server:
build:
<<: *common-services-build
target: auth-server
ports:
- "9090:9090"
environment:
<<: *common-env-services
OTEL_SERVICE_NAME: auth-server
OTEL_RESOURCE_ATTRIBUTES: "application=auth-server"
profiles:
- services
user-service:
build:
<<: *common-services-build
target: user-service
environment:
<<: *common-env-services
OTEL_SERVICE_NAME: user-service
OTEL_RESOURCE_ATTRIBUTES: "application=user-service"
depends_on:
- db-postgres
- auth-server
profiles:
- services
account-service:
build:
<<: *common-services-build
target: account-service
environment:
<<: *common-env-services
OTEL_SERVICE_NAME: account-service
OTEL_RESOURCE_ATTRIBUTES: "application=account-service"
depends_on:
- user-service
profiles:
- services
notification-service:
build:
<<: *common-services-build
target: notification-service
environment:
<<: *common-env-services
OTEL_SERVICE_NAME: notification-service
OTEL_RESOURCE_ATTRIBUTES: "application=notification-service"
depends_on:
- user-service
profiles:
- services
transaction-service:
build:
<<: *common-services-build
target: transaction-service
ports:
- "8080:8080"
environment:
<<: *common-env-services
OTEL_SERVICE_NAME: transaction-service
OTEL_RESOURCE_ATTRIBUTES: "application=transaction-service"
depends_on:
- notification-service
- account-service
profiles:
- services
In the compose file, we are setting up the db-postgres
, auth-server
, user-service
, account-service
, notification-service
, and transaction-service
services.
Since the services has a lot in common, we are using the anchored extension sections to set the common environment variables and build configurations for all the services.
We also define different profiles for the services. The db
profile is for the db-postgres
service and the services
profile is for the services. We will use it later on.
For now, ignore the profiles.
We are also setting up the db-postgres
service to run the PostgreSQL database. We will be using the init.sql
file to create the schemas in the database.
Let's create the init.sql
file in the root directory of the repo.
create schema if not exists user_data;
create schema if not exists accounts_data;
create schema if not exists notifications_data;
create schema if not exists transactions_data;
Start the services
Now we have a separation of schemas in the database for different services. We can now run the services and the auth server in docker containers using the compose file.
docker compose --profile "*" up
The services will be running in docker containers and will be instrumented with the OpenTelemetry Java agent to collect the telemetry data. You will see a lot of errors in the logs because the open telemetry agent, even though is running and capturing the telemetry data, has nowhere to send it.
In the next part, we will setup the OpenTelemetry Collector to collect the telemetry data from the services and export it to the monitoring backends.
Subscribe to my newsletter
Read articles from Driptaroop Das directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Driptaroop Das
Driptaroop Das
Self diagnosed nerd 🤓 and avid board gamer 🎲