FR/Discuss: Enable non-collection of metrics based on curried labels

14 views
Skip to first unread message

Aaron Gable

unread,
Mar 30, 2021, 8:42:01 PM3/30/21
to Prometheus Developers
I'm currently running into a problem closely related to https://github.com/prometheus/client_golang/issues/834: the inability to exclude metrics from collection based on partially-specified labels. I'm sending this as an email rather than as a github issue based on the advice in the new issue template; please let me know if you'd prefer I move this conversation to a github issue.

Suppose I have a `CounterVec` that I call `requestsCounter`, with labels `method` (whose values are only `GET` and `POST`) and `success` (whose values are only `true` and `false`). Maybe I have a set of unittests which implement a purposefully-broken handler, and I'd like to assert that we incremented the counter for unsuccessful requests the appropriate number of times, _regardless_ of whether those requests were GETs or POSTs.

Here's what I wrote in my first attempt to accomplish this goal:

```go
// AssertMetricEquals determines whether the value held by a prometheus Collector
// (e.g. Gauge, Counter, CounterVec, etc) is equal to the expected integer.
// In the case of vector collectors, it collects and sums the values from all
// metrics in the vector prior to comparison; this is so that users can easily
// make assertions about one of many field dimensions (e.g. for a CounterVec with
// fields "host" and "valid", being able to assert that two "valid": "true"
// incremets occurred, without caring which host did each increment). To make
// assertions about subsets of a vector's metrics, use the MetricVec's .CurryWith
// or .GetMetricWith methods. Only works for integer metrics (Counters and Gauges),
// does not work for Histograms.
func AssertMetricEquals(t *testing.T, c prometheus.Collector, expected int) {
ch := make(chan prometheus.Metric)
done := make(chan struct{})
go func() {
c.Collect(ch)
close(done)
}()
var total int
timeout := time.After(time.Second)
loop:
for {
select {
case <-timeout:
t.Fatal("timed out collecting metrics")
case <-done:
break loop
case m := <-ch:
var mpb prometheus_client_pb.Metric
_ = m.Write(&mpb)
// Exactly one of the Counter or Gauge values will be populated by the
// .Write() operation, so just add both because the other will be 0.
total += int(mpb.Counter.GetValue())
total += int(mpb.Gauge.GetValue())
}
}
AssertEquals(t, total, expected)
}

<snip>

AssertMetricEquals(t, requestsCounter.MustCurryWith(prometheus.Labels{"success": "false"}), 10)
```

Unfortunately this fails, because the curried `MetricVec` still sends *all* metrics to the `.Collect()` channel, even those that would be excluded by the curried labels.

The only real solution that I see here is to perform the currying ourselves -- rather than passing a curried `MetricVec` into the `AssertMetricEquals` function, pass a set of `Labels` into it and do the filtering ourselves based on the contents of `iom.Label`. This would be functionally the same as the "very hacky and weird" workaround in #834.

From an outsider perspective, the most elegant solution here would be to change the behavior of `.Collect()` to not send metrics which are excluded by the current set of curried labels. But I understand that that may be sufficiently backwards incompatible as to be a non-starter. What are the options here?

Thanks,
Aaron

Bjoern Rabenstein

unread,
Mar 31, 2021, 4:36:05 PM3/31/21
to Aaron Gable, Prometheus Developers
On 30.03.21 17:42, Aaron Gable wrote:
>
> Suppose I have a `CounterVec` that I call `requestsCounter`, with labels
> `method` (whose values are only `GET` and `POST`) and `success` (whose
> values are only `true` and `false`). Maybe I have a set of unittests which
> implement a purposefully-broken handler, and I'd like to assert that we
> incremented the counter for unsuccessful requests the appropriate number of
> times, _regardless_ of whether those requests were GETs or POSTs.

In very general, I'd break this down, following the philosophy of
letting a unit test just exercise the code it is supposed to test.

With your current approach, you are almost doing and end-to-end test
by invoking the whole `Collect` and `Write` machinery. I guess that's
fine for true end-to-end tests. (See
https://pkg.go.dev/github.com/prometheus/client...@v1.10.0/prometheus/testutil
for utilities to write those tests.) In your case, as you said, you
want to know if "we incremented the counter for unsuccessful requests
the appropriate number of times". Ideally, you inject a CounterVec
mock that doesn't really do anything with the metrics, but just
records that the right counter child was retrieved and the appropriate
increments were performed on it. Go doesn't lend itself to this kind
of monkey-patching, and the client_golang library misses good support
for this kind of approach right now, which has to do with certain
design problems that cannot be fixed without a breaking change, see
https://github.com/prometheus/client_golang/issues/230 . But it's
still possible. In your case, your code probably would have to act on
an interface, which is designed in a way that both `CounterVec` as
well as your injected mock are implementing it.

> Unfortunately this fails, because the curried `MetricVec` still sends *all*
> metrics to the `.Collect()` channel, even those that would be excluded by
> the curried labels.

Yeah, that's another design problem of the library. The excuse is that
currying was introduced relatively late in the design process. In an
ideal world (and in v2 of the library), the curried CounterVec
wouldn't even have a `Collect` method.

You could also follow the "inject a mock" approach half-way and
inject a CounterVec that only contains the relevant metrics instead of
using the full CounterVec with all the other metrics plus currying.

> The only real solution that I see here is to perform the currying ourselves
> -- rather than passing a curried `MetricVec` into the `AssertMetricEquals`
> function, pass a set of `Labels` into it and do the filtering ourselves
> based on the contents of `iom.Label`. This would be functionally the same
> as the "very hacky and weird" workaround in #834.

Yes, in case you want to stick with the "let's collect the metrics and
inspect the protobuf" approach, that's probably the most
straightforward way to go.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in
Reply all
Reply to author
Forward
0 new messages