Handling large queries

474 views
Skip to first unread message

to...@alloy.co

unread,
Feb 27, 2017, 11:13:24 AM2/27/17
to airbnb_superset
Hey Superset team - I've tested Superset with some reasonably large queries that, if sent via SQL directly, would have caused my database to be blocked for a few seconds, but in Superset the query both takes a bit longer but also doesn't put the same load on the DB. That's awesome - how is that being done? I've read through a bunch of discussions on the topic but can't nail down the exact method being used. Is it something about the Panda implementation? 

Maxime Beauchemin

unread,
Feb 27, 2017, 7:34:11 PM2/27/17
to airbnb_superset
Maybe the perceived improvement is related to the fact that we wrap user queries to limit how many rows they return. The config element is called ROW_LIMIT and = 50000 by default.

Max

to...@alloy.co

unread,
Feb 28, 2017, 9:16:35 PM2/28/17
to airbnb_superset
I think you may be right. Do you have a concept (even just high level) for allowing larger queries to be run in an efficient / safe manner? I've been thinking about this a lot and would be interested in working on it but can't think of the right way to go about it. 

Maxime Beauchemin

unread,
Mar 2, 2017, 2:29:47 PM3/2/17
to airbnb_superset
What do you mean by efficient? Isn't that the database's engine job?

Max

to...@alloy.co

unread,
Mar 5, 2017, 1:56:45 PM3/5/17
to airbnb_superset
Oh of course - what I mean is let's say a user wants to enable a larger row limit, maybe even an infinite row limit. I'm trying to think of a way to implement within Superset a method for allowing that safely. 

For example: if I enable a larger row limit and then give access to a few users, I want to prevent them from locking up the database by making large queries. There may be no way to do this though I understand that, but I was trying to think of any strategies that could make that work.  

to...@alloy.co

unread,
Mar 5, 2017, 2:01:13 PM3/5/17
to airbnb_superset
Oh check it out - this is the kind of thing I'm thinking about and it looks like you're working on this for SQL Lab: 


Very cool - if implemented as an option what is the impact? Will longer running queries just log out to an S3 bucket and then you can see the results when they're done? 
Reply all
Reply to author
Forward
0 new messages