Collecting Java Runtime, System- and Process-Metrics using OpenTelemetry Java Auto Instrumentation
With the release of version 1.0.0 the OpenTelemetry Java Auto Instrumentation now supports collecting system- and process metrics. In this blog post we will show you how to enable the collection of those metrics and how to expose them via a Prometheus endpoint.
Introduction
OpenTelemetry is an open-source observability framework that aims to define standards for collecting distributed traces, metrics and logs as well as provide instrumentations with extensive support for common technologies [https://opentelemetry.io/docs/java/automatic_instrumentation/] to automatically do so. Resulting from the merge of OpenTracing and OpenCensus and being part of the Cloud Native Computing Foundation, various Cloud- and APM-providers already offer support and integration of OpenTelemetry [https://www.dynatrace.com/integrations/opentelemetry/, https://www.dynatrace.com/monitoring/integrations/opentelemetry/]. In our virtual 23. Meetup last November we demoed how a Java application can be instrumented using the OpenTelemetry Java Auto Instrumentation with very little effort to collect distributed traces. Based on Willie Wheeler’s article and corresponding demo “Auto-Instrumentation with OpenTelemetry” [https://medium.com/wwblog/auto-instrumentation-with-opentelemetry-3b096fdd068f], we demonstrated how to collect traces, but did not show how to collect system- and process metrics at the time. With recent releases this capability was introduced in the OpenTelemetry Java Auto Instrumentation. Here we will show you how to enable the collection of process- and system-metrics and how to expose them via a Prometheus endpoint. A disclaimer: OpenTelemetry is very much in relatively early active development and there are issues to be resolved – in case of Java process metrics the consumed memory metric and consumed CPU time metric are broken: [https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/2231]. This article refers to v1.0.0 of the OpenTelemetry Java Auto Instrumentation.
Auto Instrumentation Metrics
As of release 1.0.0, OpenTelemetry’s Java Auto Instrumentation supports collecting three types of metrics: Java runtime metrics, system-metrics and process-metrics.
Java Runtime Metrics
These metrics collect information on Java garbage collection and memory pools. For garbage collection, the overall runtime is observed while the memory metrics observe the used, committed, and maximum storage of the memory pools of a Java runtime.
System and Process Metrics
System metrics encompass statistics on memory and network io. The process metrics collect memory and CPU usage of the Java process.
Setting up OpenTelemetry instrumentation
OpenTelemetry’s Java Auto Instrumentation is realized through a Java agent, which is available on the Github repository [https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases]. The agent is embedded into a Java process using the -javaagent flag:
This agent automatically starts recording traces in your Java process and can be configured to export them to e.g. Jaeger via Java system properties (-Dkey=value parameters). In the below example, traces would be exported to a Jaeger collector listening at port 14250 of host “jaeger”:
This activates the exporter to export traces to a Jaeger collector listening on http://jaeger:14250 under the application name “myapp”.
In order to „export“ metrics to Prometheus, a Prometheus exporter must be configured:
While an exporter in OpenTelemetry sends data to the configured endpoint, the Prometheus exporter is an exception, inverting the flow of communication. It exposes a Prometheus endpoint under the configured port and (locally available) host, which can then be scraped by a Prometheus server.
Traces are collected by the OpenTelmetry Java Auto Instrumentation by merely configuring it as shown above and some debugging metrics are already computed and provided via the configured Prometheus endpoint without any further configuration. To start collecting and exporting Java runtime, system- and process-metrics however, a few more steps are necessary.
Configuring metrics collection
Activating Java Runtime Metrics
All that is necessary to start collecting basic Java runtime metrics is to add the property “otel.instrumentation.runtime-metrics.enabled=true”:
Passing the otel.instrumentation.runtime-metrics.enabled=true system property will activate collection of the java runtime metrics. On the host where the application is executed, navigating to the Prometheus endpoint at http://localhost:9464/metrics will return these garbage collection and memory pool metrics:
Activating System and Process Metrics
The collection of system- and process-metrics in OpenTelemetry’s Java Auto Instrumentation relies on the OSHI library [Operating System and Hardware Information, https://github.com/oshi/oshi]. In theory, the metrics collection ought to be activated if Oshi is found on the classpath [https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/f8dd8c8f561240392ce3f4b17dd9caeecaa0499b/javaagent-bootstrap/src/main/java/io/opentelemetry/javaagent/OpenTelemetryAgent.java#L71]
Adding the Oshi library to the classpath alone however did not work in our tests and documentation on the feature is pending [https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1566]. But it is still possible to start collecting these metrics through a workaround with additional configuration, although it requires modification to the application. Two dependencies are required:
The opentelemetry-oshi is the part of the OpenTelemetry Java Agent that contains the implementation of the process- and systems-metrics collection, while oshi-core adds OSHI itself. To register the metrics the observers, call SystemMetrics.registerObservers() and ProcessMetrics.registerObservers() once. Ideally at startup, e.g. as in this dummy Spring Boot application:
As with the Java runtime metrics, the System and Process metrics can be scraped from the configured Prometheus endpoint, e.g. https://localhost:9464/metrics:
As mentioned in the beginning, runtime_java_cpu_time, which ought to report the CPU time spent in seconds is broken, as it is being tracked as a series of snapshots of a value in time, while the underlying value reported by OSHI is the sum of CPU time consumed by the process. Unit conversion is also incorrect, since the underlying millisecond value is multiplied by thousand, whereas it ought to be divided. On the other hand, runtime_java_memory, which ought to track memory consumption over time, is being tracked as a continually rising sum value instead of a series of snapshots of the amount of consumed memory over time:
[https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/2231]
Conclusion
We have shown that starting to collect distributed traces and Java runtime metrics using the OpenTelemetry Java Auto Instrumentation is possible with minimal setup and requires no code adaption. Basic system- and process-metrics using the OpenTelemetry Java Auto Instrumentation are available as well, but require a few more steps to collect, including minor code adaption. This combination of tracing and metrics already provides fundamental insights into your application’s performance characteristics.