Java Prometheus Exporter for Traffic Metrics

Sidath Weerasinghe

unread,

Oct 17, 2023, 7:31:32 AM10/17/23

to Prometheus Developers

Hi Team,

I have written a custom java Prometheus Exporter to export API traffic details such as API name, version, consumer name, response latency, status code, and many metadata. For this, I have used counters and histograms. With the heavy traffic in production, I'm getting OOM on the client side because of this huge amount of histogram object size.
Prometheus pulls the data in every 3s from the client.

Do you have any other solution for this?

Thank you

Fabian Stäber

unread,

Oct 17, 2023, 7:46:52 AM10/17/23

to Sidath Weerasinghe, Prometheus Developers

Hi Sidath,

histograms have a limited number of buckets, they should not grow indefinitely.

The reason for your OOM might be "cardinality explosion": Maybe you generate more and more different label values, each set of label values adds a new histogram.

If this is not the case, and you see increasing memory usage with a fixed set of histograms, please open an issue on https://github.com/prometheus/client_java, ideally with a way to reproduce this.

Fabian

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/76bc4885-181c-46eb-9a9c-03c466607f21n%40googlegroups.com.

Sidath Weerasinghe

unread,

Oct 18, 2023, 10:47:39 PM10/18/23

to Prometheus Developers

Hi Fabian and team,

I'm using the API name, version, consumer name, response latency, status code, and many metadata for Histrogram labels. With the live traffic, those labels assign different values, and a lot of histogram child objects are created.

That is caused by the OOM of the JVM.

I would like to know what is the best and recommended way to export those values.

Matthias Rampke

unread,

Oct 20, 2023, 5:46:15 AM10/20/23

to Sidath Weerasinghe, Prometheus Users

Hi,

I think this discussion is better suited for the -users mailing list, moving it there.

Metric systems, like Prometheus, offer you a specific tradeoff: they allow you to count a large number of events by limited dimensions. Fundamentally, for each combination of dimensions, it tracks a number and incrementing that number is very cheap, but this breaks down if you have too many dimensions, because you end up with a huge amount of numbers that each only change infrequently. In practice, this means metric systems are not suited for tracking all the metadata of API request, and you will need to remove any dimension that varies a lot from your labels. See this document for some recommendations: https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels

To give a concrete example, metrics are not well suited to track API latency "by customer" (I assume this is what you mean with consumer name?) You can use Prometheus to track overall latency, and break it down by a few low-cardinality dimensions such as the status code. For in-depth breakdowns, record the events (logs) into a separate system that is designed for this, such as a logging system. These have other tradeoffs, notably that querying them is a lot slower and more expensive; typically you would use metrics to tell you that there is a problem, and logs to tell you what the problem is once you start narrowing it down. I hope understanding these tradeoffs will help you design a viable observability stack for your requirements :)

/Matthias

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2e5aff68-3796-467f-82ae-2d5f109a889fn%40googlegroups.com.

Reply all

Reply to author

Forward