Hi,
I think this discussion is better suited for the -users mailing list, moving it there.
Metric systems, like Prometheus, offer you a specific tradeoff: they allow you to count a
large number of events by
limited dimensions. Fundamentally, for each combination of dimensions, it tracks a
number and incrementing that number is very cheap, but this breaks down if you have too many dimensions, because you end up with a huge amount of numbers that each only change infrequently. In practice, this means metric systems are not suited for tracking
all the metadata of API request, and you will need to remove any dimension that varies a lot from your labels. See this document for some recommendations:
https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels
To give a concrete example, metrics are not well suited to track API latency "by customer" (I assume this is what you mean with consumer name?) You can use Prometheus to track overall latency, and break it down by a few low-cardinality dimensions such as the status code. For in-depth breakdowns, record the events (logs) into a separate system that is designed for this, such as a logging system. These have other tradeoffs, notably that querying them is a lot slower and more expensive; typically you would use metrics to tell you that there is a problem, and logs to tell you what the problem is once you start narrowing it down. I hope understanding these tradeoffs will help you design a viable observability stack for your requirements :)
/Matthias
Hi Fabian and team,
I'm using the API name, version, consumer name, response latency, status code, and many metadata for Histrogram labels. With the live traffic, those labels assign different values, and a lot of histogram child objects are created.
That is caused by the OOM of the JVM.
I would like to know what is the best and recommended way to export those values.
On Tuesday, 17 October 2023 at 17:16:52 UTC+5:30 Fabian Stäber wrote:
Hi Sidath,
histograms have a limited number of buckets, they should not grow indefinitely.
The reason for your OOM might be "cardinality explosion": Maybe you generate more and more different label values, each set of label values adds a new histogram.
Fabian
Hi Team,I have written a custom java Prometheus Exporter to export API traffic details such as API name, version, consumer name, response latency, status code, and many metadata. For this, I have used counters and histograms. With the heavy traffic in production, I'm getting OOM on the client side because of this huge amount of histogram object size.
Prometheus pulls the data in every 3s from the client.
Do you have any other solution for this?
Thank you
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2e5aff68-3796-467f-82ae-2d5f109a889fn%40googlegroups.com.