--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACZNdYAw-%3DBeqeObkrUZqMTUD59Xu9vGfxgZM1SDjTuSJYqGWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4c7f26a4-c2cc-4b98-a6a8-076ce2e2b92b%40googlegroups.com.
+1
On Tue, Sep 12, 2017 at 2:10 AM Niketh Sabbineni <niketh.s...@gmail.com> wrote:
+1We had our fair share of troubles using the select query--
On Friday, 28 July 2017 15:18:24 UTC-7, Gian Merlino wrote:I was thinking about doing a patch to bring the scan-query contrib extension into core. Not a core extension, but actually into core itself (druid-processing). The main motivation to have it built in is so Druid SQL's default rules can use it instead of the Select query.The motivation for this, in turn, is to avoid the memory use and performance issues with the Select query. It works well for returning small numbers of rows, but for larger numbers of rows it falls down. The Scan code was added in https://github.com/druid-io/druid/pull/3307 and that PR explains what is wrong with Select:> select query cost lots of memory because it has to buffer a huge list of events in> memory, and flushes until the list is ready. scan query flushes when a small batch> is ready, the client can get the batch while the server is preparing the next batchEven with a limit, Select can still use surprisingly large amounts of memory, since due to its parallel execution it generates potentially a lot more rows than are actually needed. Scan is single threaded, which works better for this kind of thing.Along with porting over the Scan query I'd like to change it to return the __time column as "__time" rather than "timestamp". It's more authentic, and will play better with dataSources that actually have a column named "timestamp". We could add a flag to opt-in to the legacy behavior.I'd also like to keep the name "scan", which means that people that were formerly using the contrib extension would have to do the following to migrate:1. Change their queries to set the legacy-behavior flag2. Modify Druid configs to unload the scan-query extension and do a rolling update to the new versionIf I see no objections then I'll raise a PR.Gian
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4c7f26a4-c2cc-4b98-a6a8-076ce2e2b92b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CABs1682Ys7%3DSt%2Bza0J08xJvqPBNi73Xx09F76CN1j%2BSggY_X9A%40mail.gmail.com.To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.