[Proposal] User authentication and Datasource level authorization in Druid

1,192 views
Skip to first unread message

Himanshu Gupta

unread,
Jan 15, 2016, 11:36:50 AM1/15/16
to Druid Development
Hi,

If you want to operate a Druid cluster with multiple users, you would need some level of authentication and authorization in place. One option to enable that would be following.

1) Add an abstract/interface "AuthenticationFilter" which authenticates "user" making the request, finds its authorization information (what dataSources it has read/right access to) and puts that in the request context.
2) All the HTTP endpoints recover the authorization information from request context and accept/reject based on the privileges of the user.

Druid core would only have abstractions necessary to enable authentication and authorization, actual implementation should be plugged via extensions.

We will soon be working on this. Please provide any thoughts/concerns you might have.

-- Himanshu

Gian Merlino

unread,
Jan 15, 2016, 12:41:22 PM1/15/16
to druid-de...@googlegroups.com
Hey Himanshu,

I'm wondering what is the rationale behind adding the framework to Druid core rather than having it 100% be an extension at your site? I think you could already add request filters via extensions.

The reason I'm asking is that for a universal authorization framework, I think it would be nice to support some more fine grained stuff than per-datasource. There are some use cases for both column and row level authorization (column -> only certain teams can see privacy sensitive dimensions; row -> multitenant datasources where each tenant can only see things with tenant = "some-tenant-id"). This would definitely be more work though.

Perhaps one way to make that possible without doing all the work right now would be to set up the interfaces such that the implementation doesn't just return a whitelist of datasources, but actually makes the decision based on the user and the query. So some implementations might only look at the query datasource and some might look at more parts of the query.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/0b41dbd6-347e-4c6a-9d49-f78ecc2d7580%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Himanshu

unread,
Jan 15, 2016, 2:04:06 PM1/15/16
to druid-de...@googlegroups.com
Hi Gian,

Abstractions are necessary inside druid-core to do the authorization checks. Respective HTTP endpoint implementations are best to identify what dataSource (or a lower level entity) is being accessed and whether request is a read or write on that. I believe we can keep authorization abstractions generic enough to extend them to column/row level when necessary.

-- Himanshu

Charles Allen

unread,
Jan 15, 2016, 2:15:04 PM1/15/16
to Druid Development
What level are you wanting to do the authorization at?

My initial thought would be only at the router level or query optimization level?

On a macro level there are lots of different authorization schemes we can imagine. For example: dataSource, dimension with a datasource, only being able to select particular filters or see a particular dimension values subset in a particular dimension in a datasource.

Limiting datasource access via authorization seems reasonable, but there are multiple authentication paradigms (straight-up user, user-roles, iam) which can be used to determine authorization. Can you elaborate a little on any limits of the kinds of authorization you are wanting to enable (this is probably tied to "what level shall we enforce it")?

Additionally, I'm concerned any authorization we add deeper than "at the ver top of the query stack" would be artificial. What level of "authorization" are you hoping to be able to enforce. we could go all the way to using java security manager stuff, classloader isolation, and somehow tying to OS permissions (example: file permissions of segments stored on disk tied to OS knowledgeable user/group). How deep down the rabbit hole are your use envisioning enabling?

On Friday, January 15, 2016 at 11:04:06 AM UTC-8, Himanshu Gupta wrote:
Hi Gian,

Abstractions are necessary inside druid-core to do the authorization checks. Respective HTTP endpoint implementations are best to identify what dataSource (or a lower level entity) is being accessed and whether request is a read or write on that. I believe we can keep authorization abstractions generic enough to extend them to column/row level when necessary.

-- Himanshu
On Fri, Jan 15, 2016 at 11:41 AM, Gian Merlino <gi...@imply.io> wrote:
Hey Himanshu,

I'm wondering what is the rationale behind adding the framework to Druid core rather than having it 100% be an extension at your site? I think you could already add request filters via extensions.

The reason I'm asking is that for a universal authorization framework, I think it would be nice to support some more fine grained stuff than per-datasource. There are some use cases for both column and row level authorization (column -> only certain teams can see privacy sensitive dimensions; row -> multitenant datasources where each tenant can only see things with tenant = "some-tenant-id"). This would definitely be more work though.

Perhaps one way to make that possible without doing all the work right now would be to set up the interfaces such that the implementation doesn't just return a whitelist of datasources, but actually makes the decision based on the user and the query. So some implementations might only look at the query datasource and some might look at more parts of the query.

Gian

On Fri, Jan 15, 2016 at 8:36 AM, Himanshu Gupta <g.him...@gmail.com> wrote:
Hi,

If you want to operate a Druid cluster with multiple users, you would need some level of authentication and authorization in place. One option to enable that would be following.

1) Add an abstract/interface "AuthenticationFilter" which authenticates "user" making the request, finds its authorization information (what dataSources it has read/right access to) and puts that in the request context.
2) All the HTTP endpoints recover the authorization information from request context and accept/reject based on the privileges of the user.

Druid core would only have abstractions necessary to enable authentication and authorization, actual implementation should be plugged via extensions.

We will soon be working on this. Please provide any thoughts/concerns you might have.

-- Himanshu

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Himanshu

unread,
Jan 15, 2016, 3:05:17 PM1/15/16
to druid-de...@googlegroups.com


Hi Charles,

pls see inline.


On Fri, Jan 15, 2016 at 1:15 PM, Charles Allen <charle...@metamarkets.com> wrote:
What level are you wanting to do the authorization at?

My initial thought would be only at the router level or query optimization level?
[H]If we do the checks only at router then all possible interaction with Druid must happen via Router (not just query forwarding but other things exposed by coordinator, overlord and broker). also, I don't want to make router node a prerequisite to enable authorization, most of our cluster don't use router nodes. Also, given a request , individual jersey resources are best judge of what dataSource is being accessed and whether it is being read or updated/written (e.g. sometimes it might be in the request parameters, sometimes in the request json payload etc).

On a macro level there are lots of different authorization schemes we can imagine. For example: dataSource, dimension with a datasource, only being able to select particular filters or see a particular dimension values subset in a particular dimension in a datasource.
[H]I'm planning dataSource level.. maybe in the future when the need really arises we can think about extending it to other things.

Limiting datasource access via authorization seems reasonable, but there are multiple authentication paradigms (straight-up user, user-roles, iam) which can be used to determine authorization. Can you elaborate a little on any limits of the kinds of authorization you are wanting to enable (this is probably tied to "what level shall we enforce it")?
[H] Authentication abstraction will only care about being able to identify an "user" from the request. If no user could be identified then the request is rejected immediately from AuthFilter.
Authorization abstraction should be able to provide the privileges that user has.
Now the implementations in individual extensions are free to have concept of user-groups and user both (for ease of management, so that you assign privileges to user-groups instead of users and add/remove users to user-groups)


Additionally, I'm concerned any authorization we add deeper than "at the ver top of the query stack" would be artificial. What level of "authorization" are you hoping to be able to enforce. we could go all the way to using java security manager stuff, classloader isolation, and somehow tying to OS permissions (example: file permissions of segments stored on disk tied to OS knowledgeable user/group). How deep down the rabbit hole are your use envisioning enabling?
[H] "user" identification will be totally implementation dependent and druid core will only ascertain that dataSource being accessed by the request is indeed allowed given the authorization information provided.
 


On Friday, January 15, 2016 at 11:04:06 AM UTC-8, Himanshu Gupta wrote:
Hi Gian,

Abstractions are necessary inside druid-core to do the authorization checks. Respective HTTP endpoint implementations are best to identify what dataSource (or a lower level entity) is being accessed and whether request is a read or write on that. I believe we can keep authorization abstractions generic enough to extend them to column/row level when necessary.

-- Himanshu
On Fri, Jan 15, 2016 at 11:41 AM, Gian Merlino <gi...@imply.io> wrote:
Hey Himanshu,

I'm wondering what is the rationale behind adding the framework to Druid core rather than having it 100% be an extension at your site? I think you could already add request filters via extensions.

The reason I'm asking is that for a universal authorization framework, I think it would be nice to support some more fine grained stuff than per-datasource. There are some use cases for both column and row level authorization (column -> only certain teams can see privacy sensitive dimensions; row -> multitenant datasources where each tenant can only see things with tenant = "some-tenant-id"). This would definitely be more work though.

Perhaps one way to make that possible without doing all the work right now would be to set up the interfaces such that the implementation doesn't just return a whitelist of datasources, but actually makes the decision based on the user and the query. So some implementations might only look at the query datasource and some might look at more parts of the query.

Gian

On Fri, Jan 15, 2016 at 8:36 AM, Himanshu Gupta <g.him...@gmail.com> wrote:
Hi,

If you want to operate a Druid cluster with multiple users, you would need some level of authentication and authorization in place. One option to enable that would be following.

1) Add an abstract/interface "AuthenticationFilter" which authenticates "user" making the request, finds its authorization information (what dataSources it has read/right access to) and puts that in the request context.
2) All the HTTP endpoints recover the authorization information from request context and accept/reject based on the privileges of the user.

Druid core would only have abstractions necessary to enable authentication and authorization, actual implementation should be plugged via extensions.

We will soon be working on this. Please provide any thoughts/concerns you might have.

-- Himanshu

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/a5edd5b7-83f7-41b0-a36f-53aff545d6b0%40googlegroups.com.

Alexander Makarenko

unread,
Jan 15, 2016, 5:17:10 PM1/15/16
to Druid Development
Hi guys,

I believe Druid follows and should follow the same security model as Redis does (http://antirez.com/news/96) - it’s totally insecure to let untrusted clients access the system, please protect it from the outside world yourself. Druid is a data store, and performance is critical so any additional computation or allocation would be no good, IMHO. If you care about fields or access to data it would be better to put authentication and restriction functionality in business logic of actual web or other application.

Charles Allen

unread,
Jan 15, 2016, 5:56:52 PM1/15/16
to Druid Development
Is it reasonable to pull auth information from the headers without relying on parsing the message body?

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Himanshu

unread,
Jan 19, 2016, 12:32:09 AM1/19/16
to druid-de...@googlegroups.com
Hi Alexander,

We would like to  operate a hosted multi-tenant druid cluster where all the customers would be "trusted" however they should not be able to step on each other. It can be enabled by supporting proposed authentication and dataSource level authorization inside of Druid. AuthFilter existence at runtime would depend upon a configuration and wouldn't impact performance at all (when security was disabled). However, Individual HTTP endpoint implementations (jersey resources) would include a check for whether to enforce authorization which would be one if call per query received (should be negligible compared to rest of query processing).
It is true that druid cluster can be hidden behind a web service that provides security, however it introduces yet another component to be deployed, wraps pretty much all druid interactions and needs to be kept up-to-date with Druid. Also, for our customers, we would like to keep the experience same as if they were having their own druid cluster (for example they should be able to visit coordinator and overlord consoles and view information and take actions from there). It becomes much simpler when auth implementation code can run as part of Druid itself.
Also, time to time, we have seen requests in the community for same.

-- Himanshu



Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/1f74e3a1-1e38-4774-9492-26f15ebd7cd4%40googlegroups.com.

Charles Allen

unread,
Jan 19, 2016, 12:41:19 PM1/19/16
to Druid Development
Overall I think a basic level of auth (assuming it can be set to have the current behavior without performance impact) is a very reasonable ask and something that is required or at least heavily desired in many enterprise environments.

@Himanshu, are you wanting to break this up into small implementations (ex: queries first, metadata control second, ui third) or are you thinking of trying to get everything in with the same PR?

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Himanshu

unread,
Jan 19, 2016, 10:31:30 PM1/19/16
to druid-de...@googlegroups.com
I think one PR would be OK for all the jersey http end points which should be able to cover all the interactions with Druid.

-- Himanshu


Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/fee2d252-4bff-4342-a5c5-7f1e50038e6d%40googlegroups.com.

Charles Allen

unread,
Jan 21, 2016, 1:57:08 PM1/21/16
to Druid Development
cool, looking forward to seeing it!

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Udayakumar Pandurangan

unread,
Dec 19, 2016, 6:36:34 PM12/19/16
to Druid Development
Hi Himanshu,
        Any further updates on this? I'm keen if there is a security add-on which can do basic authentication.

Thanks,
Uday.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

Himanshu

unread,
Dec 20, 2016, 2:48:19 PM12/20/16
to druid-de...@googlegroups.com
There is no out of the box security module but druid core has been updated to make it possible to plug-in security. It is possible to write extensions that do authentication and authorization. But, note that, this is still considered an experimental feature and things might change in future.

At Yahoo, we wrote an extension that integrates with yahoo's internal authentication and authorization system.

create a Filter (for example, https://github.com/druid-io/druid/blob/master/server/src/main/java/io/druid/server/initialization/jetty/ResponseHeaderFilterHolder.java ) that does necessary authentication and then sets up a concrete instance of AuthorizationInfo ( https://github.com/druid-io/druid/blob/master/server/src/main/java/io/druid/server/security/AuthorizationInfo.java ) in the request attribute AuthConfig.DRUID_AUTH_TOKEN .

also, you would have to  set "druid.auth.enabled=true" in your runtime.properties.

-- Himanshu

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/d18f0e5e-3df2-4d38-8e84-3059efc07c29%40googlegroups.com.

pja...@yahoo-inc.com

unread,
Dec 21, 2016, 1:52:33 AM12/21/16
to Druid Development
To elaborate on what Himanshu mentioned, please read a detailed answer here - https://groups.google.com/d/msg/druid-user/23nVku3G4Rw/DZXYHy2vAgAJ

zhw...@gmail.com

unread,
Dec 25, 2016, 11:23:53 PM12/25/16
to Druid Development
is there any further plan to support "row level security" in druid? The typical scenario is personnel from department A should only see aggregation of data scoped to department A.
Reply all
Reply to author
Forward
0 new messages