Visualising and working with histograms

3,170 views
Skip to first unread message

james.he...@gmail.com

unread,
Mar 9, 2018, 3:32:16 PM3/9/18
to Prometheus Users
Hi all,

Looking for some guidance on working with prometheus when bucketing data.

There doesn't seem to be much 'out there' on the visualisation of prometheus histograms so thought starting a thread here might be useful to the community.

Essentially I'm trying to put the duration of an operation in to predefined buckets and subsequently display the count of 'occurrences of that duration' as a histogram bar chart in grafana.

As a a starter for 10 this is the histogram PHP code (alright, alright don't judge ha) that we have utilised to instrument the code. Although its PHP it'll hopefully be agnostic enough for people to leverage in their own languages.

$adapter        = new \Prometheus\Storage\InMemory();
$registry       = new CollectorRegistry($adapter);
 
$histogram      = $registry->getOrRegisterHistogram(
   "some_namespace",
   "my_operation_duration_seconds",
   'Some super useful piece of help regarding the metric',
   ['some_label'],
   [30, 60, 90, 120, 150, 180, 210, 240, 270, 300]
);
 
// intDuration is the length of the operation will be between 0 and 300
// anything more is ok to go in the +Inf bucket
$histogram->observe($intDuration, ['some_label_value']);


So if we had two operations (in the last 5 minutes), first was 35 seconds and the second was 47 seconds we'd hope to see a histogram bar chart where a bar with a Y value of 2 sits above an X value of 60. This is based upon the 'le' of a histogram being less than or equal to as described here:


So any ideas/thoughts welcome. Even if its someone saying it can't be done ha at least I know!

Thanks in advance


Björn Rabenstein

unread,
Mar 12, 2018, 11:46:11 AM3/12/18
to james.he...@gmail.com, Prometheus Users
On 9 March 2018 at 21:32, <james.he...@gmail.com> wrote:
>
> There doesn't seem to be much 'out there' on the visualisation of prometheus
> histograms

Yeah, I would like if Grafana had a direct support for
Prometheus-style histograms. Not just the classical bar graph, but
ideally heat maps (which essentially visualizes distributions over
time).

The existing heat maps visualize the distribution of Prometheus
metrics, while the Prometheus histogram has already done the
distribution part. Visualizing Prometheus histograms is “in principle”
very straight forward. The only catch is that they are cumulative, so
for the classical non-cumulative bar graph, you have to do some
substractions.

> Essentially I'm trying to put the duration of an operation in to predefined
> buckets and subsequently display the count of 'occurrences of that duration'
> as a histogram bar chart in grafana.
>
> As a a starter for 10 this is the histogram PHP code (alright, alright don't
> judge ha) that we have utilised to instrument the code. Although its PHP
> it'll hopefully be agnostic enough for people to leverage in their own
> languages.
>
> $adapter = new \Prometheus\Storage\InMemory();
> $registry = new CollectorRegistry($adapter);
>
> $histogram = $registry->getOrRegisterHistogram(
> "some_namespace",
> "my_operation_duration_seconds",
> 'Some super useful piece of help regarding the metric',
> ['some_label'],
> [30, 60, 90, 120, 150, 180, 210, 240, 270, 300]
> );
>
> // intDuration is the length of the operation will be between 0 and 300
> // anything more is ok to go in the +Inf bucket
> $histogram->observe($intDuration, ['some_label_value']);
>
>
> So if we had two operations (in the last 5 minutes), first was 35 seconds
> and the second was 47 seconds we'd hope to see a histogram bar chart where a
> bar with a Y value of 2 sits above an X value of 60. This is based upon the
> 'le' of a histogram being less than or equal to as described here:
>
> https://www.robustperception.io/why-are-prometheus-histograms-cumulative/

That sounds about right. Note that you have to apply the `rate` or the
`increase` function on a `[5m]` range of the buckets to really only
see the events from the last 5 minutes. Also, as hinted above, the
Prometheus histogram is cumulative, so you have to subtract the
previous bucket to get the “classical” bar graph, cf.
https://en.wikipedia.org/wiki/Histogram#/media/File:Cumulative_vs_normal_histogram.svg

--
Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

jgra...@gmail.com

unread,
Mar 12, 2018, 8:42:03 PM3/12/18
to Prometheus Users
I was also recently looking for a histogram heatmap for Prometheus.

It turns out someone is almost done implementing it. Looks promising:

Simon Pasquier

unread,
Mar 14, 2018, 3:00:16 AM3/14/18
to Björn Rabenstein, james.he...@gmail.com, Prometheus Users
On Mon, Mar 12, 2018 at 4:45 PM, 'Björn Rabenstein' via Prometheus Users <promethe...@googlegroups.com> wrote:
On 9 March 2018 at 21:32,  <james.he...@gmail.com> wrote:
>
> There doesn't seem to be much 'out there' on the visualisation of prometheus
> histograms

Yeah, I would like if Grafana had a direct support for
Prometheus-style histograms. Not just the classical bar graph, but
ideally heat maps (which essentially visualizes distributions over
time).


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CALSNGh%3D%3DSUtbKv-TdqO0DeM44WwbLmv9die5hqU98Ce2Jyi0yA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

james.he...@gmail.com

unread,
Mar 16, 2018, 5:34:36 AM3/16/18
to Prometheus Users
This is so useful and especially the link visualising the cumulative version - thank you!

Can I just double check the subtraction logic (regarding previous bucket) ?

Currently I have this is Grafana but its not quite working and it returns NULL

my_bucket{le=~"60"} - mu_bucket{le=~"30"}

james.he...@gmail.com

unread,
Mar 16, 2018, 5:55:41 AM3/16/18
to Prometheus Users
Also not sure if other people see this but I'm also only ever seeing 1 for the bucket value - no matter what the duration specified so wonder if I need to perform some form of some?

james.he...@gmail.com

unread,
Mar 16, 2018, 6:06:48 AM3/16/18
to Prometheus Users
Should also add that my bucket_count value never goes above 1.

It feels like the instrumentation code may be wrong - we are also having to use a push gateway for this.

Björn Rabenstein

unread,
Mar 19, 2018, 1:44:33 PM3/19/18
to james.he...@gmail.com, Prometheus Users
On 16 March 2018 at 10:34, <james.he...@gmail.com> wrote:
>
> Can I just double check the subtraction logic (regarding previous bucket) ?
>
> Currently I have this is Grafana but its not quite working and it returns
> NULL
>
> my_bucket{le=~"60"} - mu_bucket{le=~"30"}

This should yield the count of observations between 30 and 60. Perhaps
the `le` label values are not an exact match, e.g. they might be
"60.0" or "6e1" or something. PromQL (sadly) sees the `le` label value
just as an opaque string with no idea of equivalent representations of
floating point numbers.

james.he...@gmail.com

unread,
Apr 13, 2018, 6:05:46 AM4/13/18
to Prometheus Users
Thanks for the help on this Bjorn - in the end I had to park it but will definitely be coming back to this thread in the future.

Hope other people find the above useful and are able to contribute too

Alin Sînpălean

unread,
Apr 13, 2018, 10:11:30 AM4/13/18
to Prometheus Users
I've rolled my own latency graph in Grafana, where I'm computing some fixed percentiles (0, 50, 90, 99, 100), replacing the names with aliases and using the "Fill below to" option to draw something that looks like a stacked graph:


It's also usable with summaries (just use the percentile directly) and you can choose whether you want a linear or log scale.

Here's the JSON if anyone's interested (it has a couple of Grafana template variables, nothing that a quick find and replace can't fix):

{
  "aliasColors": {},
  "bars": false,
  "dashLength": 10,
  "dashes": false,
  "datasource": "Prometheus",
  "decimals": null,
  "description": "Request duration quantiles across selected resources filtered by response code and instance (all configurable),  aggregated over **$_period** (configurable).",
  "fill": 1,
  "id": 1,
  "legend": {
    "alignAsTable": true,
    "avg": false,
    "current": true,
    "hideZero": false,
    "max": false,
    "min": false,
    "rightSide": true,
    "show": true,
    "sideWidth": 300,
    "total": false,
    "values": true
  },
  "lines": true,
  "linewidth": 1,
  "links": [],
  "nullPointMode": "null",
  "percentage": false,
  "pointradius": 5,
  "points": false,
  "renderer": "flot",
  "seriesOverrides": [
    {
      "alias": "Max",
      "color": "#E24D42",
      "fillBelowTo": "99th percentile",
      "lines": false
    },
    {
      "alias": "99th percentile",
      "color": "#EF843C",
      "fillBelowTo": "90th percentile",
      "lines": false
    },
    {
      "alias": "90th percentile",
      "color": "#EAB839",
      "fillBelowTo": "50th percentile",
      "lines": false
    },
    {
      "alias": "50th percentile",
      "color": "#7EB26D",
      "fillBelowTo": "Min",
      "lines": false
    },
    {
      "alias": "Min",
      "color": "#3F6833",
      "fill": 0,
      "lines": false
    }
  ],
  "spaceLength": 10,
  "span": 12,
  "stack": false,
  "steppedLine": false,
  "targets": [
    {
      "expr": "histogram_quantile(1, sum without(instance,request,code) (rate(request_latency_seconds_bucket{job='$_job',env='$_env',instance=~'$_instance',request=~'$_request',code=~'$_code'}[$_period])))",
      "format": "time_series",
      "interval": "10s",
      "intervalFactor": 1,
      "legendFormat": "Max",
      "refId": "A",
      "step": 10
    },
    {
      "expr": "histogram_quantile(.99, sum without(instance,request,code) (rate(request_latency_seconds_bucket{job='$_job',env='$_env',instance=~'$_instance',request=~'$_request',code=~'$_code'}[$_period])))",
      "format": "time_series",
      "interval": "10s",
      "intervalFactor": 1,
      "legendFormat": "99th percentile",
      "refId": "B",
      "step": 10
    },
    {
      "expr": "histogram_quantile(.9, sum without(instance,request,code) (rate(request_latency_seconds_bucket{job='$_job',env='$_env',instance=~'$_instance',request=~'$_request',code=~'$_code'}[$_period])))",
      "format": "time_series",
      "interval": "10s",
      "intervalFactor": 1,
      "legendFormat": "90th percentile",
      "refId": "C",
      "step": 10
    },
    {
      "expr": "histogram_quantile(.5, sum without(instance,request,code) (rate(request_latency_seconds_bucket{job='$_job',env='$_env',instance=~'$_instance',request=~'$_request',code=~'$_code'}[$_period])))",
      "format": "time_series",
      "interval": "10s",
      "intervalFactor": 1,
      "legendFormat": "50th percentile",
      "refId": "D",
      "step": 10
    },
    {
      "expr": "histogram_quantile(1e-9, sum without(instance,request,code) (rate(request_latency_seconds_bucket{job='$_job',env='$_env',instance=~'$_instance',request=~'$_request',code=~'$_code'}[$_period])))",
      "format": "time_series",
      "interval": "10s",
      "intervalFactor": 1,
      "legendFormat": "Min",
      "refId": "E",
      "step": 10
    }
  ],
  "thresholds": [],
  "timeFrom": null,
  "timeShift": null,
  "title": "Request Latency, Aggregated",
  "tooltip": {
    "shared": true,
    "sort": 2,
    "value_type": "individual"
  },
  "type": "graph",
  "xaxis": {
    "buckets": null,
    "mode": "time",
    "name": null,
    "show": true,
    "values": []
  },
  "yaxes": [
    {
      "format": "s",
      "label": "",
      "logBase": 10,
      "max": null,
      "min": ".001",
      "show": true
    }
  ]
}

Cheers,
Alin.
Reply all
Reply to author
Forward
0 new messages