Multi-Tenant ACLs with Prometheus

19 views
Skip to first unread message

Conrad Wood

unread,
Feb 11, 2020, 3:41:54 AM2/11/20
to Prometheus Developers
Hi,

I have some thoughts on how prometheus can help to support multi-tenant set ups (without going down the rabbit hole of authentication/authorisation). 
I frequently find myself in a position where I want to expose data in prometheus to external parties. These parties are commercially independent of each other, so I need to be specific which users may access which data.
Ideally, I like prometheus to intercept queries for data not belonging to a given user.
That would allow me (and others) to expose prometheus' API through an authenticating reverse proxy to users (and run grafanas for users or they can run it themselves even).

A more detailed proposal is in this google doc: Multi-Tenant ACLs with Prometheus

Would a patch along those lines be something the prometheus development team would accept?
If not, I would kindly request some comments what ought to be addressed in order for it to become acceptable. I think this would be useful feature for quite a few parties.

Conrad



Brian Brazil

unread,
Feb 11, 2020, 3:51:53 AM2/11/20
to Conrad Wood, Prometheus Developers
This is out of scope for Prometheus itself, among other things promql evaluation should never have a hard dependency on the network.

However there are already solutions in this space such as https://github.com/weeco/cortex-gateway/ and https://github.com/rancher/prometheus-auth that do this entirely via a reverse proxy.

--

Conrad Wood

unread,
Feb 11, 2020, 4:13:27 AM2/11/20
to Brian Brazil, Prometheus Developers
Hi,

thank you for your swift response.
I fully understand the desire to keep it out of prometheus and move
this in front of prometheus, like prometheus-auth does.

AFAICT this requires a duplication of the PromQL parser code, which I
consider troubling from a security perspective. If prometheus' promql
and that of the proxy diverge, it may be possible that data is being
leaked between parties. It is also quite hard to verify correctness.

In the usecase I described, IMHO the issue of PromQL network access is
perhaps less relevant, given that (parts of) the network must work for
the user to authenticate (ldap) and access the api (http) and the
reverse proxy too. The disks may well be on a NAS too.

In my proposal, the PromQL rule evaluation, etc. would not require any
more network access than currently. *Only* Api calls.

I would definitely consider it good practice to install the
ACLEvaluator on the same machine as prometheus and deploy caching
strategies within it.

Might it be preferable to use a local socket interface instead of gRPC
to mitigate the issue of network access?

Conrad









Brian Brazil

unread,
Feb 11, 2020, 4:34:04 AM2/11/20
to Conrad Wood, Prometheus Developers
Selectors aren't that hard to spot, and you can always choose to fail closed.

In the usecase I described, IMHO the issue of PromQL network access is
perhaps less relevant, given that (parts of) the network must work for
the user to authenticate (ldap) and access the api (http) and the
reverse proxy too. The disks may well be on a NAS too.

Using a NAS for Prometheus storage would not be recommended for reliability reasons.

In my proposal, the PromQL rule evaluation, etc. would not require any
more network access than currently. *Only* Api calls.

I would definitely consider it good practice to install the
ACLEvaluator on the same machine as prometheus and deploy caching
strategies within it.

Might it be preferable to use a local socket interface instead of gRPC
to mitigate the issue of network access?

That's still network access. As long as I can get to a running Prometheus, I should be able to successfully execute PromQL - no matter what else is broken network wise.

This is best done entirely in a reverse proxy.

--

Conrad Wood

unread,
Feb 11, 2020, 4:48:42 AM2/11/20
to Brian Brazil, Prometheus Developers
I see your point re "most succesfully execute PromQL", that is indeed
rather important for a monitoring system that might aid in finding out
why something is borked.

In the case of remote_read, surely that is a bit relaxed. Would it thus
be an option to pass the received headers&cookies to the remote_read
backend? That way the backend could do network magic if it needs to,
but the localstorage would still satisfy above requirement.







Brian Brazil

unread,
Feb 11, 2020, 4:55:21 AM2/11/20
to Conrad Wood, Prometheus Developers
That's getting a bit magic. A remote storage's results should only really depend on the query itself, otherwise it'd be interesting to debug.
The standard way to approach this is by injecting a label.

--

Conrad Wood

unread,
Feb 11, 2020, 6:02:51 AM2/11/20
to Brian Brazil, Prometheus Developers
Thank you for your time. I understand that there is little to no chance
of getting such a feature merged.
I will pursue alternate routes.

Thank you.

Conrad


Julius Volz

unread,
Feb 11, 2020, 9:59:54 AM2/11/20
to Conrad Wood, Brian Brazil, Prometheus Developers
Just to add to this, I would unfortunately agree that this has always been considered as out of scope for Prometheus, whereas Cortex explicitly was designed around multi-tenancy, and could be a good option here.

Regarding this piece of your proposal doc (just to keep the discussion here):

----
1. Call ACLEvaluator with the result of the query
2. Filter out results which are not granted
3. Return all remaining results
4. Return to the caller
----

Note that this would not be correct. The result of a query might have already aggregated away or changed labels that you might need to use for evaluating ACLs. You'd need to validate queries at the point where they load series data, which currently is the two AST nodes VectorSelector and MatrixSelector. That's what a gateway in front of Prometheus would have to do:

1. Parse the PromQL query
2. Find any vector / matrix selector nodes
3. Verify that a tenant label is provided in them (or add one)
4. Forward the query

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/72a09823702c2354363cf90ee70c33533fceb6f7.camel%40conradwood.net.

Julius Volz

unread,
Feb 14, 2020, 9:59:13 AM2/14/20
to Conrad Wood, Brian Brazil, Prometheus Developers
On Fri, Feb 14, 2020 at 2:01 PM Conrad Wood <c...@conradwood.net> wrote:
That is a good point, I did not consider that labels might have been
aggregated away. Clearly that needs to be considered.

However, the ACLEvaluator also needs to look at the label values once
the query returns(see Example 2). Or is there a way that one can query

Note that putting IP addresses or similar high-cardinality items into Prometheus label names doesn't work well unless the possible set of values is restricted to a reasonable number. Otherwise you'll blow up your Prometheus server immediately (a big server can do a couple million series that are present at the same time, and every unique combination of label values creates one series, so usually just putting public IPs into label values is a no-starter, since it multiplies up with other labels very quickly).
 
for all values of a set of labels for a given metric without the
datapoints in a given timerange?


...but again, you'll probably run into cardinality overload if you have an unbounded number of IPs in label values.

Also, the "parsing of promql" - is that available in a library or as an
RPC? If not, would that be also considered out-of-scope?

You would use Prometheus's "promql" Go package: https://godoc.org/github.com/prometheus/prometheus/promql

Conrad

Conrad Wood

unread,
Feb 14, 2020, 10:13:07 AM2/14/20
to Julius Volz, Brian Brazil, Prometheus Developers
Thank you,

I hear you re unbound labels, especially with IPs. In this case there
is quite a small set of IPs (<2048), probably even less.

The information you send is most helpful. I believe I now have
sufficient information to fix something up that will address my usecase
outside of prometheus.

Thanks again,

Conrad


Simon Pasquier

unread,
Feb 14, 2020, 11:43:55 AM2/14/20
to Conrad Wood, Julius Volz, Brian Brazil, Prometheus Developers
You can have a look at those 2 projects too:
https://github.com/hoffie/prometheus-filter-proxy
https://github.com/openshift/prom-label-proxy
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/b556929c59a02d04b92ad6c9422e1ae7fca8c78b.camel%40conradwood.net.
>

Paul Traylor

unread,
Feb 14, 2020, 6:52:48 PM2/14/20
to Simon Pasquier, Conrad Wood, Julius Volz, Brian Brazil, Prometheus Developers
I built this based on prom-label-proxy to support mapping basicAuth to
a set of tenant label matchers
https://github.com/kfdm/promql-guard
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAM6RFu4nFa4yy5PqFUPgTSYKoazDQG44F2vOJJi9wAp3rnv%3DDw%40mail.gmail.com.



--
Paul Traylor
http://kungfudiscomonkey.net/
Reply all
Reply to author
Forward
0 new messages