I've been looking into methods to do time series aggregations in prometheus where I, for example, what to work out a quantile over a time range. The latest version of prometheus provides the xxx_over_time functions (quantile_over_time) in order to do this. However, these functions don't allow for aggregations within the time series. For example there is no way to say "give me an average over this time series by metric name, but combine all the hosts together in a single dataset."
That's annoying since that's typically the data that I find useful from a dashboarding perspective; if a single host is running slow that is not interesting as long as the overall service performance remains within the latency bounds I've outlined.
Since there is nothing in the APIs to do this, I looked into writing a function to do the timeseries aggregations which I could provide a patch for. However, as I was adding additional functionality to this I came to the realization that what I was doing was essentially rewriting the aggregation logic which already exists but only supports instant points.
From what I've been able to tell, getting the aggregation functions to support timeseries is not that hard, although all "instant" readings need to be silently upconverted to time series with a single entry.
I'd like to understand whether there is a philosophical reason this has not been done already; it seems that all of the existing aggregations would apply to time series equally as they do to the existing "instant"s. The benefit of doing it is a much more natural and powerful query interface; all of the xxx_over_time functions can be deprecated and the facilities available to time series become much more powerful.
Rod.
I've been looking into methods to do time series aggregations in prometheus where I, for example, what to work out a quantile over a time range. The latest version of prometheus provides the xxx_over_time functions (quantile_over_time) in order to do this. However, these functions don't allow for aggregations within the time series. For example there is no way to say "give me an average over this time series by metric name, but combine all the hosts together in a single dataset."
That's annoying since that's typically the data that I find useful from a dashboarding perspective; if a single host is running slow that is not interesting as long as the overall service performance remains within the latency bounds I've outlined.
Since there is nothing in the APIs to do this, I looked into writing a function to do the timeseries aggregations which I could provide a patch for. However, as I was adding additional functionality to this I came to the realization that what I was doing was essentially rewriting the aggregation logic which already exists but only supports instant points.
From what I've been able to tell, getting the aggregation functions to support timeseries is not that hard, although all "instant" readings need to be silently upconverted to time series with a single entry.
I'd like to understand whether there is a philosophical reason this has not been done already; it seems that all of the existing aggregations would apply to time series equally as they do to the existing "instant"s. The benefit of doing it is a much more natural and powerful query interface; all of the xxx_over_time functions can be deprecated and the facilities available to time series become much more powerful.
Rod.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2a161d64-dfcf-4e33-a8a2-99d6e45c684d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Suppose, I have a load of gauge metrics across a fleet of 50 hosts, collected every 15 seconds. For example JMX memory usage. I would like to calculate the p90 JMX memory usage across the fleet.
Quantile_over_time(0.9,jmx_memory{}[5m]) will return a set of metrics (one per instance). However, there is no way of statistically combining these to get an accurate p90 over the entire dataset.
The first approach I outlined below involves writing a function: combine() such that I can use:
Quantile_over_time(0.9, combine(jmx_memory()[5m])) which combines the multiple time series into a single one over which I can run a timeseries aggregation function.
However, if I want to apply the same approach with multiple metrics (for example maybe I have host-based cache hit rate metrics over a number of caches and want to alarm if the p90 hit rate falls out of bounds for any one of the caches).
e.g. the following metrics:
cache_1_hitrate{instance=”blah”}
cache_2_hitrate{instance=”blah”}
cache_3_hitrate{instance=”blah”}
The above combine function then requires additional logic to know it needs to aggregate metrics with the same name (or other labels) into the same bucket.
What I’d really like to write is:
Quantile(0.9, {__name__ =~ “.*_hitrate”}[5m]) by (__name__)
However, the aggregations functions sum(), quantile(), topK(), etc. don’t support time series[matrix types] despite the fact that the operations they are performing make logical sense on them. Indeed the workaround for Prometheus to provide a set of duplicate functions to perform these operations (xxx_over_time).
What I’m proposing is to update the parser/engine to support aggregations functions over matrix types.
Rod.
From:
Brian Brazil <brian....@robustperception.io>
Date: Monday, October 3, 2016 at 10:17
To: Rod Chamberlin <rcham...@zulily.com>
Cc: Prometheus Developers <prometheus...@googlegroups.com>
Subject: Re: Aggregations over timeseries
On 3 October 2016 at 18:09, <rcham...@zulily.com> wrote:
I've been looking into methods to do time series aggregations in prometheus where I, for example, what to work out a quantile over a time range. The latest version of prometheus provides the xxx_over_time functions (quantile_over_time) in order to do this. However, these functions don't allow for aggregations within the time series. For example there is no way to say "give me an average over this time series by metric name, but combine all the hosts together in a single dataset."
I don't understand the calculation you're trying to perform. Can you provide a concrete example?
Brian
That's annoying since that's typically the data that I find useful from a dashboarding perspective; if a single host is running slow that is not interesting as long as the overall service performance remains within the latency bounds I've outlined.
Since there is nothing in the APIs to do this, I looked into writing a function to do the timeseries aggregations which I could provide a patch for. However, as I was adding additional functionality to this I came to the realization that what I was doing was essentially rewriting the aggregation logic which already exists but only supports instant points.
From what I've been able to tell, getting the aggregation functions to support timeseries is not that hard, although all "instant" readings need to be silently upconverted to time series with a single entry.
I'd like to understand whether there is a philosophical reason this has not been done already; it seems that all of the existing aggregations would apply to time series equally as they do to the existing "instant"s. The benefit of doing it is a much more natural and powerful query interface; all of the xxx_over_time functions can be deprecated and the facilities available to time series become much more powerful.
Rod.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2a161d64-dfcf-4e33-a8a2-99d6e45c684d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Suppose, I have a load of gauge metrics across a fleet of 50 hosts, collected every 15 seconds. For example JMX memory usage. I would like to calculate the p90 JMX memory usage across the fleet.
Quantile_over_time(0.9,jmx_memory{}[5m]) will return a set of metrics (one per instance). However, there is no way of statistically combining these to get an accurate p90 over the entire dataset.
The first approach I outlined below involves writing a function: combine() such that I can use:
Quantile_over_time(0.9, combine(jmx_memory()[5m])) which combines the multiple time series into a single one over which I can run a timeseries aggregation function.
However, if I want to apply the same approach with multiple metrics (for example maybe I have host-based cache hit rate metrics over a number of caches and want to alarm if the p90 hit rate falls out of bounds for any one of the caches).
--
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/2a161d64-dfcf-4e33-a8a2-99d6e45c684d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On 3 October 2016 at 18:43, Rod Chamberlin <rcham...@zulily.com> wrote:Suppose, I have a load of gauge metrics across a fleet of 50 hosts, collected every 15 seconds. For example JMX memory usage. I would like to calculate the p90 JMX memory usage across the fleet.
Quantile_over_time(0.9,jmx_memory{}[5m]) will return a set of metrics (one per instance). However, there is no way of statistically combining these to get an accurate p90 over the entire dataset.
What is exactly the number you are trying to calculate here? I'm having difficult understanding what you want in a way that would make sense both statistically and operationally.
The first approach I outlined below involves writing a function: combine() such that I can use:
Quantile_over_time(0.9, combine(jmx_memory()[5m])) which combines the multiple time series into a single one over which I can run a timeseries aggregation function.
However, if I want to apply the same approach with multiple metrics (for example maybe I have host-based cache hit rate metrics over a number of caches and want to alarm if the p90 hit rate falls out of bounds for any one of the caches).
That's not how caches work. You care about the overall hit rate, which is sum(rate(hits))/sum(rate(requests)).
On Monday, October 3, 2016 at 10:52:17 AM UTC-7, Brian Brazil wrote:On 3 October 2016 at 18:43, Rod Chamberlin <rcham...@zulily.com> wrote:Suppose, I have a load of gauge metrics across a fleet of 50 hosts, collected every 15 seconds. For example JMX memory usage. I would like to calculate the p90 JMX memory usage across the fleet.
Quantile_over_time(0.9,jmx_memory{}[5m]) will return a set of metrics (one per instance). However, there is no way of statistically combining these to get an accurate p90 over the entire dataset.
What is exactly the number you are trying to calculate here? I'm having difficult understanding what you want in a way that would make sense both statistically and operationally.A quantile over a combined set of timeseries which represent similar data points, but have different labels.
The first approach I outlined below involves writing a function: combine() such that I can use:
Quantile_over_time(0.9, combine(jmx_memory()[5m])) which combines the multiple time series into a single one over which I can run a timeseries aggregation function.
However, if I want to apply the same approach with multiple metrics (for example maybe I have host-based cache hit rate metrics over a number of caches and want to alarm if the p90 hit rate falls out of bounds for any one of the caches).
That's not how caches work. You care about the overall hit rate, which is sum(rate(hits))/sum(rate(requests)).Whilst in an ideal world you would be correct we do not always have the opportunity to add the instrumentation to our services that we might desire; I am avoiding going into specifics of the system which i'm instrumenting because I don't feel it will add a great deal to the discussion and will likely send us off into a tangent.You asked for examples of what I would like to accomplish and I had thought I had provided you with some; I have collected time series, I'd like to be able to perform aggregations over them. The need for this has clearly been identified in the past because the XXX_over_time functions have been provided. However, I'm surprised this isn't supported as a first class aggregation because it's easy to implement at what appears to be low cost to the framework, yet considerably increases its flexibility which is not available with the existing aggregation functions.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/1ada2720-8d94-43f1-a70b-09dc3097ac95%40googlegroups.com.
Consider the following:
Metric{context="c1",instance="node-5"}
313 @1475518428
Metric{context="c1",instance="node-7"}
321 @1475518367.128
Metric{context="c2",instance="node-5"}
96 @1475518353.948
Metric{context="c2",instance="node-7"}
24 @1475518382
23 @1475518427
24 @1475518442
Metric{context="c3",instance="node-5"}
230 @1475518368
229 @1475518428
230 @1475518444
Metric{context="c3",instance="node-6"}
224 @1475518358
222 @1475518433
220 @1475518434
Metric{context="c3",instance="node-8"}
225 @1475518368
225 @1475518369
223 @1475518370
221 @1475518381
219 @1475518428
221 @1475518442
222 @1475518443
Metric{context="c3",instance="node-4"}
265 @1475518418
Metric{context="c4",instance="node-6"}
647 @1475518358
537 @1475518359
714 @1475518400
512 @1475518410
501 @1475518420
552 @1475518433
553 @1475518434
Metric{context="c4",instance="node-5"}
678 @1475518353
565 @1475518353
589 @1475518354
535 @1475518432
576 @1475518440
523 @1475518442
Metric{context="c4",instance="node-7"}
556 @1475518352
509 @1475518353
547 @1475518355
530 @1475518427
523 @1475518433
554 @1475518434
506 @1475518436
I would like to be able to generate aggregations over:
· Everything (what’s the overall p50 for this data?)
· What is the p50 by context aggregated over all instances?
· What is the p50 over all contexts by instance?
· What is the p50 for each distinct label?
At the moment I can only answer the last of these questions:
Quantile_over_time(0.5,Metric{}[5m])
There is no way to combine that result, because the input to the quantile_over_time needs to be the full dataset under consideration. For the “p50 aggregated over all instances” I need the input to quantile_over_time to be:
Metric{context="c1",instance="node-5"}
313 @1475518428
321 @1475518367.128
Metric{context="c2",instance="node-5"}
96 @1475518353.948
24 @1475518382
23 @1475518427
24 @1475518442
Metric{context="c3"}
230 @1475518368
229 @1475518428
230 @1475518444
224 @1475518358
222 @1475518433
220 @1475518434
225 @1475518368
225 @1475518369
223 @1475518370
221 @1475518381
219 @1475518428
221 @1475518442
222 @1475518443
265 @1475518418
Metric{context="c4"}
647 @1475518358
537 @1475518359
714 @1475518400
512 @1475518410
501 @1475518420
552 @1475518433
553 @1475518434
678 @1475518353
565 @1475518353
589 @1475518354
535 @1475518432
576 @1475518440
523 @1475518442
556 @1475518352
509 @1475518353
547 @1475518355
530 @1475518427
523 @1475518433
554 @1475518434
506 @1475518436
but that cannot be generated (Metric{context=”c4”}[5m]) corresponds to 3 distinct matrixes). One could add a function which takes a set of matrix arguments and returns a single matrix which is the individual matrixes combined (which is what I originally started on). However,
If aggregation functions supported timeseries I could instead write:
Quantile(0.5, Metric{}[5m]) by (context)
At present “quantile(0.5,Metric{}[5m])” will throw an error (“expected type vector in aggregation expression, got matrix”) because it does not support timeseries (matrix) arguments.
Fixing that seems like an overall better approach so that you can treat vectors and matrixes equally in aggregate queries.
Rod.
From:
<prometheus...@googlegroups.com> on behalf of Brian Brazil <brian....@robustperception.io>
Date: Monday, October 3, 2016 at 11:55
To: Rod Chamberlin <rcham...@zulily.com>
Cc: Prometheus Developers <prometheus...@googlegroups.com>
Subject: Re: Aggregations over timeseries
On 3 October 2016 at 19:32, <rcham...@zulily.com> wrote:
On Monday, October 3, 2016 at 10:52:17 AM UTC-7, Brian Brazil wrote:On 3 October 2016 at 18:43, Rod Chamberlin <rcham...@zulily.com> wrote:
Suppose, I have a load of gauge metrics across a fleet of 50 hosts, collected every 15 seconds. For example JMX memory usage. I would like to calculate the p90 JMX memory usage across the fleet.
Quantile_over_time(0.9,jmx_memory{}[5m]) will return a set of metrics (one per instance). However, there is no way of statistically combining these to get an accurate p90 over the entire dataset.
What is exactly the number you are trying to calculate here? I'm having difficult understanding what you want in a way that would make sense both statistically and operationally.
A quantile over a combined set of timeseries which represent similar data points, but have different labels.
In what fashion do you want to combine the time series? Can you provide a numeric example?
Brian
The first approach I outlined below involves writing a function: combine() such that I can use:
Quantile_over_time(0.9, combine(jmx_memory()[5m])) which combines the multiple time series into a single one over which I can run a timeseries aggregation function.
However, if I want to apply the same approach with multiple metrics (for example maybe I have host-based cache hit rate metrics over a number of caches and want to alarm if the p90 hit rate falls out of bounds for any one of the caches).
That's not how caches work. You care about the overall hit rate, which is sum(rate(hits))/sum(rate(requests)).
Whilst in an ideal world you would be correct we do not always have the opportunity to add the instrumentation to our services that we might desire; I am avoiding going into specifics of the system which i'm instrumenting because I don't feel it will add a great deal to the discussion and will likely send us off into a tangent.
You asked for examples of what I would like to accomplish and I had thought I had provided you with some; I have collected time series, I'd like to be able to perform aggregations over them. The need for this has clearly been identified in the past because the XXX_over_time functions have been provided. However, I'm surprised this isn't supported as a first class aggregation because it's easy to implement at what appears to be low cost to the framework, yet considerably increases its flexibility which is not available with the existing aggregation functions.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/1ada2720-8d94-43f1-a70b-09dc3097ac95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/prometheus-developers/LtEX6w-M5Yg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
prometheus-devel...@googlegroups.com.
To post to this group, send email to
prometheus...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLqb8otYx7SNG5_YzZ4v_%2B%2BXmW%3DQzgGf7sEXd4D0ZPqSsg%40mail.gmail.com.
· What is the p50 by context aggregated over all instances?
· What is the p50 over all contexts by instance?
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/1ada2720-8d94-43f1-a70b-09dc3097ac95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-developers/LtEX6w-M5Yg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLqb8otYx7SNG5_YzZ4v_%2B%2BXmW%3DQzgGf7sEXd4D0ZPqSsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/9EA8CC12-D3F4-4C11-9DE0-9A2757FF1714%40zulily.com.
On 3 October 2016 at 20:43, Rod Chamberlin <rcham...@zulily.com> wrote:· Everything (what’s the overall p50 for this data?)
I don't think that has any statistical use, each series has different properties.
To clarify:* The data presented represents a single timeslice (or 'matrix' in the language of the internals)* The goal is to run aggregation functions over these matrixes combining them into a single larger matrix based around some label aggregation** The question as to whether all of the aggregations I've suggested make sense is something that is beyond the scope of the discussion (since I'm hacking together data to generate examples, I'm not going to guaratee it all looks sensible)* The label aggregations should be able to follow a similar pattern to those available for standard aggregation functions (i.e. BY (label-list) or WITHOUT (label-list))Does this make sense?
On Monday, October 3, 2016 at 12:56:19 PM UTC-7, Brian Brazil wrote:On 3 October 2016 at 20:43, Rod Chamberlin <rcham...@zulily.com> wrote:· Everything (what’s the overall p50 for this data?)
I don't think that has any statistical use, each series has different properties.Surely that's for me to judge based on knowledge of my data; I'll admit that, given the random data pulled from an arbitrary source it does look like a couple of the series don't make sense to combine together, but it's my job to understand my data and know when and where that's applicable.
· What is the p50 by context aggregated over all instances?
What do you mean by "aggregated"?
· What is the p50 over all contexts by instance?
In what time slice?I'm not seeing your use case. What's your ultimate goal here?Brian
· What is the p50 for each distinct label?
At the moment I can only answer the last of these questions:
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/a6292414-085f-4d2d-9777-11db30cbe71f%40googlegroups.com.
“It doesn't make sense to me. If you can't convince me that this has a valid use case that makes sense both operationally and statistically, it has extremely little chance of getting added. We don't add features merely because one user asserts it's useful to them, features cost us maintenance and cost our users cognitive load.”
It isn’t clear to me whether you don’t understand:
1/ what I’m trying to do, or
2/ why I’m trying to do it.
Those two are fundamentally different questions, you potentially need to understand both of them in the longer term, but right now I’m focusing on the “What” side of things.
To be honest, I’m confused as to what I can explain more clearly about that, but if we just focus on that, the following attempts to explain the situation:
· I have a number of distinct metrics (as determined by their labels)
· I can create timeseries (or timeslices or matrixes) from these metrics
· I can perform analysis on these metrics in a 1x1 fashion using the xxx_over_time aggregation functions
· I cannot, combine these metrics prior to aggregation in order to execute the aggregation function (xxx_over_time) on a combined set of comprising the entire set of (or some grouped subset of) datapoints from a larger group of such timeslices
Given this explanation do you understand what it is I am trying to accomplish (if not the why)?
Rod.
From:
<prometheus...@googlegroups.com> on behalf of Brian Brazil <brian....@robustperception.io>
Date: Monday, October 3, 2016 at 14:33
To: Rod Chamberlin <rcham...@zulily.com>
Cc: Prometheus Developers <prometheus...@googlegroups.com>
Subject: Re: Aggregations over timeseries
On 3 October 2016 at 22:20, <rcham...@zulily.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/a6292414-085f-4d2d-9777-11db30cbe71f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Developers" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/prometheus-developers/LtEX6w-M5Yg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
prometheus-devel...@googlegroups.com.
To post to this group, send email to
prometheus...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLojbyJptnhdap2C5Y%3DTtbLw-7jzxro0%2B%2B%2BdKmMzdVjSwQ%40mail.gmail.com.
“It doesn't make sense to me. If you can't convince me that this has a valid use case that makes sense both operationally and statistically, it has extremely little chance of getting added. We don't add features merely because one user asserts it's useful to them, features cost us maintenance and cost our users cognitive load.”
It isn’t clear to me whether you don’t understand:
1/ what I’m trying to do, or
2/ why I’m trying to do it.
Those two are fundamentally different questions, you potentially need to understand both of them in the longer term, but right now I’m focusing on the “What” side of things.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/a6292414-085f-4d2d-9777-11db30cbe71f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-developers/LtEX6w-M5Yg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLojbyJptnhdap2C5Y%3DTtbLw-7jzxro0%2B%2B%2BdKmMzdVjSwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
I have gauges and histograms recording durations. One of the labels is the marathon_task_id (a unique identifier for each instance of a process). I would like to collect statistics about these metrics within a specific time window aggregating the results of a metric across all marathon_task_id (i.e., the aggregate across all instances of the process that produces the metric).
This can be correctly done for the min or max as follows:
query?query=min(min_over_time())&time=<EVAL_TIME>
query?query=max(max_over_time())&time=<EVAL_TIME>
because the minimum of all the minimums across all marathon_task_id instances is the true minimum. Same for the maximum.
This isn't true for stddev, avg and quantiles.
Did anyone ever figure out a solution to this?
Hi all,
=PERCENTILE(B2:B268,0.95).