Hi Team,
we have a situation , where we have 8 to 15 million head series in each Prometheus and we have 7 instance of them (federated). Our prometheus are in a constant flooded situation handling the incoming metrics and back end recording rules.
One thought which came to was - do we have something similar to log level for prometheus metrics ? If its there then... we can benefit from it .... by configuring to run all targets in error level in production and in debug/info level in development... This will help control flooding of metrics.
Say, If we write a wrapper on top of prometheus java client API, its going to be messy - hence wanted to check if this request makes sense or is there any other way out ?
Let me know your thoughts how this can be achieved .... Really need to hear from others on how this sort of situation is handled and whats the way to tackle ...
fyr - We have raised the same issue @ prometheus java client project - https://github.com/prometheus/client_java/issues/815
Many Thanks
Muthuveerappan
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d3a1bb24-2d87-48c0-8b01-9f91a71dff7bn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gbgV5qmQmRVmqdkDuFx2gymz_vWjzxeK0h8oLfUgV51%3Dg%40mail.gmail.com.
On 07/10/2022 04:09, Muthuveerappan Periyakaruppan wrote:
> we have a situation , where we have 8 to 15 million head series in
> each Prometheus and we have 7 instance of them (federated). Our
> prometheus are in a constant flooded situation handling the incoming
> metrics and back end recording rules.
8-15 million time series on a single Prometheus instance is pretty high.
What spec machine/pod are these?
When you say "flooded" what are you meaning?
> One thought which came to was - do we have something similar to log
> level for prometheus metrics ? If its there then... we can benefit
> from it .... by configuring to run all targets in error level in
> production and in debug/info level in development... This will help
> control flooding of metrics.
>
I'm not sure what I understand what you are suggesting. What would be
the difference between setting this hypothetical "error" and "debug"
levels? Are you meaning some metrics would only be exposed on some
environments?
--
Stuart Clark
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPX310iSucfPV79-5NTth-BhV2H9op_9R846AVZmsT1yFiVfgQ%40mail.gmail.com.
Please find replies inline.On Friday, 7 October, 2022 at 1:25:27 pm UTC+5:30 Stuart Clark wrote:On 07/10/2022 04:09, Muthuveerappan Periyakaruppan wrote:
> we have a situation , where we have 8 to 15 million head series in
> each Prometheus and we have 7 instance of them (federated). Our
> prometheus are in a constant flooded situation handling the incoming
> metrics and back end recording rules.
8-15 million time series on a single Prometheus instance is pretty high.
What spec machine/pod are these?
90gb ram, 5000 millicores.
When you say "flooded" what are you meaning?Always high usage of ram, no oom , although missing metrics, average scrape duration like 35 seconds ... (may be due to no of targets/metrics)cpu demand/usage is not that high
> One thought which came to was - do we have something similar to log
> level for prometheus metrics ? If its there then... we can benefit
> from it .... by configuring to run all targets in error level in
> production and in debug/info level in development... This will help
> control flooding of metrics.
>
I'm not sure what I understand what you are suggesting. What would be
the difference between setting this hypothetical "error" and "debug"
levels? Are you meaning some metrics would only be exposed on some
environments?
Lets say every pod has close to 100 metrics , we may not need all of them in production ...
A developer before logging a metric can access on how useful this metric will be in production ...what indicators does it have - Utilization, Saturation, and Errors (USE) / Rate, Errors, and Duration (RED) ... based on this he can choose the metric level.
Based on the level of metric, only few can be enabled (ERROR / SEVERE level) in production the rest can be enabled (INFO /DEBUG Level) in development / testing / staging environments.few metrics should / are enough to troubleshoot and on demand we should have the option to change the metric level ...like log level at runtime to get more metrics--
Stuart Clark
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/577a43f4-3e8d-4c16-9061-3ba35699bd41n%40googlegroups.com.