Add view support in Druid

655 views
Skip to first unread message

pala.m...@airbnb.com

unread,
Dec 1, 2017, 1:28:59 PM12/1/17
to Druid Development
Hi,

We are trying to write a Presto connector for Druid and in our use case, it would help to have view support in Druid. Internally, it seems there is support for view definition and sql rewrites (ViewManager) but there is no interface to define a view and implementation of a persisted view store (I am looking at v 0.10.1 sources).

We are thinking of the following approach:
1. Support view creation/update/deletion through the query REST endpoint, with a different query type.

2. Implement view persistence as a mysql table in the metadata storage, as tuples of view name

3. Implement a new ViewManager that reads from persisted views in metadata storage during initialization. 

Wanted to solicit comments on related work happening if any, and feedback on the above proposal.


Thanks.

pala.m...@airbnb.com

unread,
Dec 5, 2017, 4:56:37 PM12/5/17
to Druid Development
I am going through the implementation right now. Does anybody here have any thoughts on this?

Also, one question i have is that the ViewManager interface has dependency on PlannerFactory. Is that a correct dependency to maintain? Or should ViewManager deal with view name and view sql attributes and defer the DruidViewMacro creation to DruidSchema? Regardless, within DruidSchema we'd need PlannerFactory object it looks like - leading to circular dependency of objects. @Gian Merlino, any thoughts on this?

Finally, it will be ideal if we can support view creation within the SQL itself, but i haven't started looking into Calcite implementation and what we need to do within Druid to enable this. Please share any details if you have already looked into this.

Thank you.

Himanshu Gupta

unread,
Dec 7, 2017, 11:33:25 AM12/7/17
to Druid Development
Thanks, Presto integration is nice. However why does that have a dependency on view support?

When you say, "View persistence in metadata storage"? Do you mean storing materialized view or just the view definition? In general, use of metadata store is restricted  to be very small and to store configs and essential metadata only and also no feature uses it in a way requiring a high read/write throughput . My main concern is that, It appears that proposed view implementation would bring metadata store in the query path and expecting high read throughput on metadata store.

-- Himanshu

pala.m...@airbnb.com

unread,
Dec 7, 2017, 12:39:36 PM12/7/17
to Druid Development
Thanks for getting back Himanshu!

The reason we need view definition is to support pushdown of aggregate queries into Druid. Currently Presto's connector abstraction only allows purely select queries to be sent to underlying storage (by providing just table name and columns to project). So aggregation would then be happening in Presto, which defeats the purpose here. One suggested approach is to define a view in Druid that captures aggregations, then treat it as table in Presto engine and perform 'select *' query.

View persistence: I mean persist the view sql definition in a metadata table. If there was only one broker node, we can address your concern by caching the view definitions in memory. Given there could be multiple broker nodes, we do need the metadata table to be read on each query. We can employ more complicated approaches but its unclear if its worth it. Let me know if you have any other suggestions here.

It looks like you guys did start with some view support. Did you have any future plans on proceeding with view support?

Himanshu Gupta

unread,
Dec 7, 2017, 3:40:11 PM12/7/17
to Druid Development
I see, so its only for translating a plain select query that presto core can give to druid-presto-connector into a query with aggregations.

For now, given the use case, It might be better to just explore how well presto connector works and store the "view definition and other query translation info" in a separate place outside of Druid that is accessible to presto connector , e.g. in static configuration files deployed on each presto node , in a database somewhere that presto connector directly interacts with (maybe same database that you're using for Druid), in a json file stored on a shared storage system (hdfs ... ) etc.

Once you have the presto working and it proves really useful , Druid then should be enhanced for the specific support or maybe Presto should be enhanced to make it possible for connector writers to push down all the necessary things including aggregation information. But, all this should be after Presto connector is there and tried at scale and proves really valuable.


-- Himanshu

Himanshu Gupta

unread,
Dec 7, 2017, 3:58:52 PM12/7/17
to Druid Development
Also, related to existing view support and plans .. maybe Gian can answer that better as he has been working on the sql layer mostly.

Slim Bouguerra

unread,
Dec 7, 2017, 4:44:11 PM12/7/17
to Druid Development
Hi,
Am Wondering why this View Structor is not stored/maintained in Presto metadata node since most of the Query Planing is done by presto anyway that considers druid as a simple Query eng/data store with special capabilites.
In fact that is how the Hive Connector works, it maintains a view/metadata in the Hive Metastore about tables in Druid and after planing the query it generates a Calcite plan that gets converted to Druid native query.

 

On Friday, December 1, 2017 at 10:28:59 AM UTC-8, pala.m...@airbnb.com wrote:

pala.m...@airbnb.com

unread,
Dec 7, 2017, 6:03:25 PM12/7/17
to Druid Development
We have already tried the approach you propose here, with view metadata maintained purely within the presto connector, on a database. The problem is this approach doesn't help us because Presto coordinates with the connector but ultimately replaces the view with its definition, and then asks just for the raw column data of the view's underlying table from the connector, and executes aggregation itself, instead of pushing it down to druid. We can hack it in the Presto connector to get around this, but clearly is not a proper solution.

If Druid supports views, then it would be a cleaner solution given Presto connector model. Alternative would be for Presto to change its connector model to support pushdown, but it will have to be a rather big change and probably why not much traction on that side.

Will wait for Gian's response. Thanks.

pala.m...@airbnb.com

unread,
Dec 7, 2017, 6:07:20 PM12/7/17
to Druid Development
Presto's view support does all of this already, as you say. The problem is Presto connector interface has no notion of pushing down query operators at all, so connector will have no idea about the aggregation operators present in the query. To force pushdown, we are employing a trick in which we independently create a view on Druid, use that as a table within presto and thus have aggregation operations executed within Druid itself, instead of in Presto.

Gian Merlino

unread,
Dec 7, 2017, 7:03:59 PM12/7/17
to druid-de...@googlegroups.com
Hi Pala,

Sorry for the delay in getting back to you.

The view code is not fully wired up right now -- contributions are welcome and if you want to start working on it, I can help out with advice. As you noticed the main issue is that there is no way to use a ViewManager in production. It's only used in the unit tests. I think probably there are three approaches that would work in production.

1) store views as a key in the druid_config table, something like 'sql_views' with a value that is a json array of views.
2) store views in their own metadata table, like druid_sql_views.
3) store views in hardcoded configuration using runtime properties.

(1) is nice since it prevents the need to have a dedicated table, but (2) would scale better to a large number of views. and (3) is definitely the easiest, although it's not very flexible. I think probably I'd pick (2). As for how to manage views, I think ideally it should be possible to do it using SQL, like CREATE/ALTER/DROP VIEW commands. But REST would be easier to implement and is a good start.

It's also important to make sure views integrate well with the security layer. Ideally how it'd work is that a user could be granted permission to query a view but not to query the underlying datasource (this is how you'd build row/column level security). But there isn't a "Resource" type for views yet, just for datasources. So that should be sorted out eventually, but, it doesn't have to be right now. The next iteration of views could just be written such that a user can only query them if they have read permission to the underlying datasource. There should be unit tests for this to make sure the permission checking works properly.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/b4aea3e4-3de0-4a1c-9558-99bef08dcaff%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Gian Merlino

unread,
Dec 7, 2017, 7:12:04 PM12/7/17
to druid-de...@googlegroups.com
Btw, one other thing. You may (?) be able to get Presto doing what you want using common table expressions, a SQL feature that is like an inline view. The syntax is like:

WITH t AS (SELECT a, b FROM t1 GROUP BY a, b) 
  SELECT * FROM t;

But getting views working is still useful :)

Gian

Pala Muthiah

unread,
Dec 11, 2017, 6:48:09 PM12/11/17
to druid-de...@googlegroups.com
Hi Gian & others,

Thanks for responses.

Gian, your approach #2 was what we had started with. I have submitted a pull request to illustrate the approach. Security is not something i came across as i made changes. I will look more into that. My current changes, which works when manually tested, i have submitted as https://github.com/druid-io/druid/pull/5151



--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/4j3F93ZIU9U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.

To post to this group, send email to druid-development@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages