what's experimental .. again

167 views
Skip to first unread message

Julien Pivotto

unread,
Sep 3, 2021, 3:44:34 AM9/3/21
to prometheus-developers
Dear developers,

TL;DR: I'd like to be able to mark feature as experimental in the
documentation without feature flag.




I am bringing to your attention a discussion we're having in github
issue https://github.com/prometheus/prometheus/pull/9248. That github
issue is about having atan2 support in Prometheus as a binary operator.

Because it is strange for users to have atan written as foo atan2 bar,
instead of atan2(foo, bar), I've asked to mark the feature as
experimental. Should we in the future add syntactic sugar or add binary
ops to functions, we would be able to do so.

I also did not want a feature flag for this. In my vision, and that's
how I acted as maintainer, feature flags should be used when:

- We change an existing behaviour.
=> PromQL query range semantics, with @ delimiter
=> Expanding env variables for external labels
- We introduce very risky features, that introduce additional memory /
storage requirements.
=> Remote write receiver
=> Exemplars
=> Snapshots on shutdown
=> Additional scrape metrics (next to up etc)

However, I am not a believer of feature-flags driven Prometheus.

Feature flags have been introduced in Prometheus 2.25. Since then, we
have added a few new features without feature flag:

- body_size_limit in scrape configs
- setting timeout and interval via relabeling.

In general, I think it does not benefit users to launch Prometheus with
lots of feature flags. Our users should be able to assess the risk they
take by using a feature, without always requiring feature flags.
Especially for relatively small features like atan2. There is no
intention to drop atan2 in Prometheus 2.x anyway, just we might find a
better way to call it.

I try to draw a line between what's a useful feature flag, and where
just marking experimental in documentation is fine. Prometheus is very
conservative anyway, and I value the continuity of our features,
including the "experimental" ones.

Just to give you an idea, if we had a very strong feature flags policy
in Prometheus, here is what it could have looked like, based on
https://prometheus.io/docs/prometheus/latest/stability/#api-stability-guarantees

--enable-feature=promql-at-modifier
--enable-feature=expand-external-labels
--enable-feature=promql-negative-offset
--enable-feature=remote-write-receiver
--enable-feature=exemplar-storage
--enable-feature=body-size-limit
--enable-feature=relabel-intervals
--enable-feature=remote-read
--enable-feature=https-basic-auth
--enable-feature=web-ui
--enable-feature=service-discovery-k8s
--enable-feature=service-discovery-consul
--enable-feature=remote-write-retry-on-429
--enable-feature=target-limit

And that's what I want to avoid.


Concretely, my proposal is to continue to be able to mark features as
experimental in the documentation, without requiring feature flags.
Feature flags can be introduced when some conditions are met, to the
appreciation of the maintainers. In some cases (breaking changes or
extra unexpected resource consumption), they are mandatory.

--
Julien Pivotto
@roidelapluie

l.mi...@gmail.com

unread,
Sep 3, 2021, 9:11:37 AM9/3/21
to Prometheus Developers
Having feature flags for extra features that a user needs act on to use anyway does feel unnecessary from my POV. I feels more like "I acknowledge this is beta" box tick.
I think it mostly makes sense if one is enabling something that changes existing behaviour (and so can break existing use cases), possibly with the assumption that the flag will be removed in the next major release and the behind-the-flag behaviour would become default.

I do think flags might also make sense for enabling extra metrics that can be expensive (like scrape timeouts), mostly because the alternative would be to have a release notes line announcing new metrics and advising to drop those via relabelling if not used - this only works when user is upgrading to that release and if they read notes, for any new install it's likely to be missed.

Bjoern Rabenstein

unread,
Sep 16, 2021, 12:16:18 PM9/16/21
to prometheus-developers
Thanks, Julien, for bringing this to the mailing list, and apologies
for my late reply.

Despite the long time I needed to reply, only one other reply (by
l.mierzwa) has happened. Not sure if that means there is not much
interest in the topic, or everyone else agrees with Julien.

Anyway, here is my take:

Before we had feature flags, we already had the option of introducing
features declared as experimental and thus not covered by our semantic
versioning.

Before we had feature flags, we already had the option of hiding risky
features or those that introduced additional resource usage behind
config settings or flags (e.g. --storage.tsdb... flags for overlapping
blocks or WAL compression and many more examples). And of course,
there was nothing keeping us turning breaking changes into features
that needed to be turned on explicitly via a flag or a config setting.

Julien's idea about feature flags are the following:
> In my vision, and that's how I acted as maintainer, feature flags
> should be used when:
>
> - We change an existing behaviour. [...]
> - We introduce very risky features, that introduce additional memory /
> storage requirements.

If that's the case, why did we introduce feature flags at all? Nothing
really changed, right?

I think we introduced feature flags for more reasons, and those were
crucial for the liberating effect on our velocity:

(1) We shied away from experimental features because we got burned by
too many users using experimental features without being aware of
them being experimental and thus being angry at us if we broke
them (or, in reverse, we being reluctant to change an experimental
feature because too many users were already relying on it). And in
fairness, it is hard as a user to keep track of what features are
experimental if we have many of them. Feature flags make it very
explicit to the user if they are using an experimental feature and
which.

(2) We did not want an explosion of flags or config settings to gate
features or behaviors (as had happened in v1.x). Feature flags are
a lightweight alternative (because they are not separate flags but
just a comma separated list, which also implies that we don't have
to keep old flags around as no-ops once the experimental feature
is declared stable).

(3) We often got caught in long-winded discussions if a certain
feature is even desirable, if it perhaps encourages anti-patterns
or discourages best practices, etc. A feature flag is both light
weight but also very explicit that the feature is not yet
recommended/endorsed. It allows us to shortcut the long-winded
discussion and just try something out without throwing our users
under the bus.

(4) Even the question if a feature is actually breaking is more or
less hard to answer (obligatory reference: https://xkcd.com/1172/
). Feature flags allow us to postpone that discussion to the point
where we consider graduating a feature to stable.

In sum, it's all about "worry less, use more feature flags". But that
only works if we are liberal with using feature flags. Being
restrictive about the cases when to use feature flags will create a
whole new type of long-winded discussion (whether a particular feature
deserves a feature flag or not), and worse, it might just subtly bring
back all those blockers above (if we consciously or sub-consciously
avoid the discussion if that feature deserves a feature flag, we are
back to square one).
I think this whole line of argument is a bit of a red herring.

First of all, you can just provide the feature flag as a
comma-separated list, which makes it appear much less scary than the
list above. But that's just syntax.

More importantly, feature flags are not necessarily there to stay. If
the feature is of the non-breaking kind, just behind a feature flag
because it is prone to change or risky or just needs some experience
before we can decide if we even want the feature, we expect that
feature to either "graduate" to stable quite soon or getting
removed. From that time on, you don't need to include it in the list
of feature flags anymore. Only in the case where we continue to
consider the feature breaking in a prohibitive way, we need to keep
hiding it behind a flag (but it could then become a regular flag or
config setting, and it would still go away with the next major
release). I could even see a beneficial effect here: In the past, we
often just forgot to graduate an experimental feature to stable. By
linking the experimental state to a feature flag, we get constantly
reminded about the pending decision.

Finally, not all features behind feature flags are generally
needed. Many, if not most of them are fairly niche, so that it is rare
that a user really needs or wants to activate all the features.

In sum, a scarily long list of feature flags will happen rarely, and
even if it does, it is not as long and scary as suggested by the
format above.

> Concretely, my proposal is to continue to be able to mark features as
> experimental in the documentation, without requiring feature flags.
> Feature flags can be introduced when some conditions are met, to the
> appreciation of the maintainers. In some cases (breaking changes or
> extra unexpected resource consumption), they are mandatory.

I would not go as far as making feature flags mandatory for each and
every experimental feature, but I would suggest a policy of "in doubt,
use a feature flag" or in other words, a feature flag should be the
default for an experimental feature, and only if certain conditions
are met, we can introduce it without a feature flag (e.g. a feature
that is unlikely to be used in an uninformed way, or we already have
some confidence that nothing fundamental will change with the feature,
etc.). What Julien suggests is the opposite: You have to justify
introducing a feature flag. Which has the danger of what I said above:
When we avoid feature flags, we are easily back at avoiding features
altogether.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

l.mi...@gmail.com

unread,
Nov 12, 2021, 9:34:05 AM11/12/21
to Prometheus Developers
I recently stumbled upon a ticket that mentioned adding something behind a feature flag, then in Prometheus 3.x making that the default and removing the flag.
This approach might work well for changes in behaviour or default configuration values, but it doesn't sound like it would work for flags like ----enable-feature=extra-scrape-metrics and I wonder if those would always stay under --enable-feature flag.
I guess there are a few options:
- promote it to a dedicated flag
- expose those extra metrics under a different path (/metrics/debug ?) so user still needs to opt-in, it's just that the mechanism is different
- enable them if /metrics endpoint receives some query params (/metrics?extra-scrape-metrics=true), just a different version of the above
- always produce extra metrics and document how to drop it if it's too much

I don't think there's anything wrong with having that under a feature flag, but I do agree that having excessive number of feature flags means that each user runs a slightly different version of prometheus, with a unique set of "features" enabled, which won't improve usability and might be confusing (which flags should I enable and which I shouldn't?).
Plus I don't think it's unique to this one use case to have some metrics that are considered "debug" or "optional", so there might be a benefit in some best practice how to expose such metrics when writing instrumentation code.

Julien Pivotto

unread,
Nov 12, 2021, 9:40:31 AM11/12/21
to l.mi...@gmail.com, Prometheus Developers
If we decide to make extra scrape metrics optional, the extra scrape metrics should not be a flag, but a per-job option (with a global value) changeable a runtime.

If you think they should be optional, it's better to open an issue in the repository to discuss it.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/564a3e70-0f3a-44b4-88c4-1a1bc70d49f9n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages