client_golang: Troubleshooting duplicate labels in metrics (leading to Gather() errors)

723 views
Skip to first unread message

Douglas Reid

unread,
Sep 26, 2018, 5:17:34 PM9/26/18
to Prometheus Users
Prometheus Users:

I'm currently investigating issues reported by other users around prometheus failures related to "label dimensions inconsistent with previously collected metrics in the same metric family" (see issue for more context: https://github.com/istio/istio/issues/8906). I have not been able to duplicate the issue locally on any of my test setups, but there have been two independent reports of the issue. Both involve the `v0.9.0-pre1` (967789050ba94deca04a5e84cce8ad472ce313c1) version of the prometheus client_golang library.

At first, I thought it might be because something was changing in the system to cause inconsistent labeling of metrics. However, closer inspection of the errors reveals that the metrics that are reported as having inconsistent dimensions actually have *duplicated* or *triplicated* labels of the same values. These duplicated entries "replace" other dimensions (meaning cardinality stays the same). Example errors:


* collected metric istio_request_bytes
...
label
:<name:"destination_version" value:"8" >
label
:<name:"destination_version" value:"8" >
...

* collected metric istio_response_bytes
...
label
:<name:"destination_app" value:"payrollsteps" >
label
:<name:"destination_app" value:"payrollsteps" >
label
:<name:"destination_app" value:"payrollsteps" >
...


I don't really understand how this could happen, based on a cursory look at the code. The metric values are observed via code that looks as follows:


pl := promLabels(val.Dimensions)
vec
.With(pl).Observe(amt)
...

func promLabels(l map[string]interface{}) prometheus.Labels {...}

And, `prometheus.Labels` is defined in the codebase as follows:

type Labels map[string]string

So, the metric takes a *map* of label values when observations are being recorded.

Shouldn't this make duplication of labels impossible?

Is there some other way in which labels could be duplicated?  Has anyone ever seen such behavior before?

Thanks in advance for any help or pointers that can be provided,
Doug.

Björn Rabenstein

unread,
Sep 27, 2018, 10:46:54 AM9/27/18
to dougla...@gmail.com, Prometheus Users
On Wed, 26 Sep 2018 at 23:17, Douglas Reid <dougla...@gmail.com> wrote:
>
> pl := promLabels(val.Dimensions)
> vec.With(pl).Observe(amt)
> ...
>
> func promLabels(l map[string]interface{}) prometheus.Labels {...}
>
> And, `prometheus.Labels` is defined in the codebase as follows:
>
> type Labels map[string]string
>
> So, the metric takes a *map* of label values when observations are being recorded.
>
> Shouldn't this make duplication of labels impossible?
>
> Is there some other way in which labels could be duplicated? Has anyone ever seen such behavior before?

If you only use the direct instrumentation parts of the library, it
will not be possible to ever create inconsistent label dimensions. The
Prometheus registry will reject inconsistent metrics at registration
time.

However, there are various ways of creating inconsistent metrics, e.g.
by writing a custom Collector implementation (which Istio might do in
its codebase, but I didn't check the code) or by merging various
registries (see
https://godoc.org/github.com/prometheus/client_golang/prometheus#Gatherers).

However (2nd order... ;), recently we decided to allow inconsistent
label dimensions in those cases. see
https://github.com/prometheus/client_golang/commit/c06fb788be8a05442219295095ee0e51523802f0
. I would recommend to pin the vendoring to the current master (which
is planned to be released as v0.9.0 as soon as my day job leaves me
with a few minutes of spare time.)

--
Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Douglas Reid

unread,
Sep 27, 2018, 12:21:09 PM9/27/18
to Prometheus Users
Björn,

Thank you for your response.  I don't believe that the code is doing anything other than using the direct instrumentation parts of the library.

The code uses use a pedantic registry (via prometheus.NewPedanticRegistry) and uses standard Collectors (via methods like: prometheus.NewHistogramVec). The collectors themselves are registered via the following snippet:

func registerOrGet(registry *prometheus.Registry, c prometheus.Collector) (prometheus.Collector, error) {
 
if err := registry.Register(c); err != nil {
   
if are, ok := err.(prometheus.AlreadyRegisteredError); ok {
     
return are.ExistingCollector, nil
   
}
   
return nil, err
   
}
 
return c, nil
}

This doesn't seem to be a widespread problem fwiw.

I'm less concerned about the inconsistency of the labels, however, than I am about the apparent duplication of labels in the metrics. That looks like memory corruption naively.

Is there something that could cause that to happen -- even with inconsistent labels ?

I will happily to pin 0.9.0 when that happens.

Thanks again for your time and consideration,
Doug.

Björn Rabenstein

unread,
Sep 27, 2018, 8:07:13 PM9/27/18
to dougla...@gmail.com, Prometheus Users
On Thu, 27 Sep 2018 at 18:21, Douglas Reid <dougla...@gmail.com> wrote:
>
> I'm less concerned about the inconsistency of the labels, however, than I am about the apparent duplication of labels in the metrics. That looks like memory corruption naively.
>
> Is there something that could cause that to happen -- even with inconsistent labels ?

That's very weird indeed. I'll think about ways to make that happen.
Perhaps it's even a bug in client_golang. (It definitely is if you can
create that just by using the normal direct instrumentation.)

> I will happily to pin 0.9.0 when that happens.

It would be helpful if you could just try to reproduce with the
current state of master. (Version tags really don't mean a lot prior
to 1.0, and we always try to keep master in good working state.)

Björn Rabenstein

unread,
Sep 29, 2018, 3:55:11 AM9/29/18
to dougla...@gmail.com, Prometheus Users
I looked a bit more at the Istio code and re-evaluated client_golang code.

Still no theory how this can happen at all. Thus, this is deeply disturbing.

I'll follow up directly on the istio issue https://github.com/istio/istio/issues/8906 to get to the bottom of this.


Douglas Reid

unread,
Oct 1, 2018, 12:12:37 PM10/1/18
to bjo...@soundcloud.com, promethe...@googlegroups.com
Thanks for continuing to help troubleshoot this issue. That is really above and beyond. I very much appreciate it.

Please let me know how I can help with the debugging.

Thanks again,
Doug.
Reply all
Reply to author
Forward
0 new messages