Observability with OpenTelemetry and Grafana stack Part 6: Load testing observability stack with k6

Driptaroop DasDriptaroop Das
5 min read

k6 is a modern load testing tool that is built for the cloud-native era. It is designed to be developer-friendly, scalable, and open-source. k6 is built on top of the Go programming language and use a JavaScript-like syntax for writing test scripts.

In the previous sections, we have set up the observability stack for the microservices. We have log aggregation, metrics monitoring, and distributed tracing in place. We will use k6 to generate some load on the services and see the performance impact on the services with and without the observability java agent. First, we will generate some load on the services by generating some random transactions.

The complete component diagram with k6 would look something like this (a little intimidating, I know). The plantUML code is available in the repo.

Generating load on the services

Let's create the k6 script to generate some load on the services. We will create a script to generate some random transactions on the transaction service. We will get the access_token by calling the auth server token endpoint and then use the token to call the transaction service to generate some random transactions.

import http from 'k6/http';
import { sleep, check } from 'k6';
import encoding from 'k6/encoding';

sleep(15); // wait for the services to start

export const options = { // stepped load profile
    stages: [
        { duration: '15s', target: 20 },
        { duration: '45s', target: 10 },
        { duration: '15s', target: 0 },
    ]
}

const accounts = JSON.parse(open("./accounts.json")); // load the static accounts from a file. this file is uploaded in the github repo. link at the end of the article.

function getBearerToken(proto, host, port, clientId, clientSecret, grantType, scope) {
    const url = `${proto}://${host}:${port}/oauth2/token`;
    const credentials = `${clientId}:${clientSecret}`;
    const encodedCredentials = encoding.b64encode(credentials);
    const payload = `grant_type=${grantType}&scope=${scope}`;
    const params = {
        headers: {
            'Content-Type': 'application/x-www-form-urlencoded',
            'Authorization': `Basic ${encodedCredentials}`,
        },
    };
    return http.post(url, payload, params);
}

export function setup() {
    const bearerToken = getBearerToken('http',`${__ENV.AUTH_SERVICE}`, 9090, 'k6', 'k6-secret', 'client_credentials', 'local').json('access_token');
    return {
        token: bearerToken
    }
}

export default function (data) {
    const url = `http://${__ENV.TRANSACTION_SERVICE}:8080/transactions/random`;

    const params = {
        headers: {
            'Authorization': `Bearer ${data.token}`,
        },
    };
    const res = http.post(url, {}, params);
    if (res.status !== 201){
        console.log(`Failed to create transaction: ${JSON.stringify(res)}`);
    }
    check(res, { "status is 201": (r) => r.status === 201 });
}

Now, we hook up the k6 script with the compose file. We will add the k6 service to the compose file and set the environment variables to point to the auth server and transaction service.

    k6:
      image: grafana/k6:0.57.0
      volumes:
        - ./k6:/config
      environment:
        K6_WEB_DASHBOARD: true
        K6_WEB_DASHBOARD_EXPORT: /config/k6-report.html
        AUTH_SERVICE: auth-server
        TRANSACTION_SERVICE: transaction-service
      command: >
        run /config/populate.js
      depends_on:
        - transaction-service
      profiles:
        - load-test

Results with observability enabled

Now, we can start the entire stack along with the k6 service.

docker compose  --profile "*" up --force-recreate --build

This should start the entire stack along with the k6 service. You can access the k6 web dashboard at http://localhost:6565 when the test is running. You can see the test results in the dashboard. You can also wait for the test to finish and then check the k6-report.html file in the k6 directory to see the detailed test results.

img.png

Results without observability enabled

Now, let's disable the observability java agent and run the test again to see the performance impact. To disable, we need to remove it from the Dockerfile and remove the OTEL_* environment variables from the compose file x-common-env-services section. Then we can start the entire stack along with the k6 service again. We keep the other observability stack running even though we are not using them. We do this so that we can compare the results side by side in a equal environment.

docker compose  --profile "*" up --force-recreate --build

Now lets see the same k6 benchmark now without the observability enabled.

img.png

Conclusion

If we compare the http_req_duration and http_reqs metrics from the k6 test results with and without the observability stack enabled, we can see that the observability stack impacts the services' performance.

MetricWith observabilityWithout observability
http_req_duration(p90)793ms754ms
http_req_duration(p95)861ms821ms
http_req_duration(p99)1s1s
http_reqs17.07/s17.44/s

We can see that the otel has slight (~4%-5%) performance impact on the http request duration. Which plateaued at higher percentiles. We can request throughput is almost the same with and without the observability stack (~2% performance impact).

So is this performance impact significant and worth the observability stack? It depends on the use case. If you are running a production system, then the observability stack is a must. The performance impact might or might not be significant depending on the production SLA for the services. However, having the observability stack is a worthy trade off for the performance impact.

Are there any alternatives to the Java Agent with lesser performance impact? Yes, there are alternatives like using eBPF based signal exporters with extremely low overhead like Grafana Beyla. OpenTelemetry is also working on a eBPF signal exporter but it is still in the experimental stage.

Future enhancements

  • Include a profiler signal to the observability stack to get the CPU and memory profiles of the services.
  • Include an eBPF signal exporter(preferably OTEL, but can try to use Beyla as well) to the observability stack to reduce the performance impact.
  • Have the entire stack running in a Kubernetes cluster to see how the observability stack performs in a production environment.
  • Connect the services and OTEL collectors to a cloud based observability platform like Grafana Cloud.

References

  • OpenTelemetry
  • The code repo: https://github.com/driptaroop/local-observability-opentelemetry-grafana
0
Subscribe to my newsletter

Read articles from Driptaroop Das directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Driptaroop Das
Driptaroop Das

Self diagnosed nerd ๐Ÿค“ and avid board gamer ๐ŸŽฒ