GroupBy v2 strategy not merging results correctly when queried through broker

175 views
Skip to first unread message

Daniel Cook

unread,
Jan 4, 2017, 1:19:25 PM1/4/17
to Druid User
Hey, recently upgraded from 0.9.1.1 to 0.9.2 and wanted to try out the new groupBy strategy but it doesn't appear to be aggregating right when queries are sent through the broker.

If we send the query directly to the historical then it does return the correct result.

It sounds similar to the other topic (https://groups.google.com/d/topic/druid-user/TVyS-B-QQ2E/discussion ) but all nodes are running 0.9.2 and have been restarted several times but the problem is persisting.

In a groupBy on two dimensions(key1, key2) multiple events with the same keys come back.  

If I add a filter for a specific (key1, key2) then it will aggregate correctly.

When I query immediately after re-indexing, before any merge tasks on that datasource have run, I get more unmerged events than when I query after merge tasks have ran for that datasource.

The datasource and the query are both pretty basic (not even a nested groupBy or anything) so I feel like it's more likely that I'm missing something rather then running into a bug.

Any help would be appreciated.

Gian Merlino

unread,
Jan 4, 2017, 2:08:13 PM1/4/17
to druid...@googlegroups.com
Hey Daniel,

Could you attach the query you're using and the results you're getting?

Also, do you have any realtime stuff going on in your setup? If so: what kind (realtime node, tranquility, kafka indexing service)? And do you still have this problem if you exclude the realtime interval (query for older intervals only)?

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/8b783983-0bc8-4540-b70a-72dff1c82a57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Cook

unread,
Jan 4, 2017, 2:30:47 PM1/4/17
to Druid User
Attached is the query and some of the results.

There isn't any realtime stuff going on.

I believe each event has one other it should have been merged with (didn't check all of them but every one that I did showed up twice).  When I queried before the merge tasks ran I was seeing more than 5 events that should have been merged together.


On Wednesday, January 4, 2017 at 2:08:13 PM UTC-5, Gian Merlino wrote:
Hey Daniel,

Could you attach the query you're using and the results you're getting?

Also, do you have any realtime stuff going on in your setup? If so: what kind (realtime node, tranquility, kafka indexing service)? And do you still have this problem if you exclude the realtime interval (query for older intervals only)?

Gian

On Wed, Jan 4, 2017 at 10:19 AM, Daniel Cook <cook...@gmail.com> wrote:
Hey, recently upgraded from 0.9.1.1 to 0.9.2 and wanted to try out the new groupBy strategy but it doesn't appear to be aggregating right when queries are sent through the broker.

If we send the query directly to the historical then it does return the correct result.

It sounds similar to the other topic (https://groups.google.com/d/topic/druid-user/TVyS-B-QQ2E/discussion ) but all nodes are running 0.9.2 and have been restarted several times but the problem is persisting.

In a groupBy on two dimensions(key1, key2) multiple events with the same keys come back.  

If I add a filter for a specific (key1, key2) then it will aggregate correctly.

When I query immediately after re-indexing, before any merge tasks on that datasource have run, I get more unmerged events than when I query after merge tasks have ran for that datasource.

The datasource and the query are both pretty basic (not even a nested groupBy or anything) so I feel like it's more likely that I'm missing something rather then running into a bug.

Any help would be appreciated.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
groupby_results.json
groupBy_query.json

Gian Merlino

unread,
Jan 4, 2017, 5:33:20 PM1/4/17
to druid...@googlegroups.com
I see you don't have "groupByStrategy" in your query context… are you setting it through runtime properties? If so, what property are you setting and are you setting it on the brokers, or historicals, or both?

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Daniel Cook

unread,
Jan 4, 2017, 6:35:42 PM1/4/17
to Druid User
I have the below in the common.runtime.properties:
druid.query.groupBy.defaultStrategy=v2
druid.processing.numMergeBuffers=4

If I specify groupByStrategy=v1 in the query context then it works as expected.

The Broker has 3 processing threads and the Historicals have 2 processing threads. (druid.processing.numThreads)

Gian

Gian Merlino

unread,
Jan 4, 2017, 6:44:55 PM1/4/17
to druid...@googlegroups.com
What happens if you put "groupByStrategy": "v2" in the query context?

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Daniel Cook

unread,
Jan 4, 2017, 6:52:12 PM1/4/17
to Druid User
The same behavior as when I don't specify anything in the context.
I believe I've also tested specifying v2 in the context when I don't have the defaultStrategy runtime property set and saw the same behavior.

Gian

Gian Merlino

unread,
Jan 4, 2017, 7:18:49 PM1/4/17
to druid...@googlegroups.com
Hmm, your query is pretty straightforward, I don't see any reason why it should be breaking.

Could you please attach your broker and historical runtime properties (and common properties)?

And are you totally sure everything is running 0.9.2? Not even any "unsupervised" daemons hanging around?

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Daniel Cook

unread,
Jan 5, 2017, 9:14:11 AM1/5/17
to Druid User
Everything appears to be running 0.9.2 and I don't see any old druid processes hanging around. Also, looking around in zookeeper all of the announcements and listeners in there are pointing to the currently running hosts/ports.


The other mysql and zookeeper properties are set from the command line.

Gian

broker.runtime.properties
historical.runtime.properties
common.runtime.properties

Gian Merlino

unread,
Jan 5, 2017, 10:30:26 AM1/5/17
to druid...@googlegroups.com
I see you have caching enabled; if you add "useCache": false and "populateCache": false to your query context then do you get the correct results? If so, then I bet there is some problem with the caching.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Daniel Cook

unread,
Jan 5, 2017, 10:54:44 AM1/5/17
to Druid User
Ah, yeah, that did it!  Also makes sense why it was working correctly when querying the historical directly since caching was off on the historical.

Is there a need to flush the cache or something before using v2 or is caching itself just not playing nicely?

Gian

Daniel Cook

unread,
Jan 5, 2017, 11:14:27 AM1/5/17
to Druid User
I know that groupBy's are in the "uncacheable" list by default but it looks like that's because they were overwhelming the cache (https://github.com/druid-io/druid/pull/638) not that it wasn't functioning.  Still interesting that v1 was working correctly.

Gian Merlino

unread,
Jan 5, 2017, 11:21:34 AM1/5/17
to druid...@googlegroups.com
I'm 99% sure this is a bug somewhere in the caching for groupBy and not anything you're doing wrong. I'll raise an issue for it in a bit and investigate.

Could you try doing the query with "populateCache": true but "useCache": false? What kind of results do you get then?

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Daniel Cook

unread,
Jan 5, 2017, 11:32:29 AM1/5/17
to Druid User
"populateCache": "true", "useCache": "false"
and
"populateCache":"false", "useCache": "true"

both have the incorrect behavior.

Thanks for taking a look, let me know if there's anything I can help with just let me know.


On Thursday, January 5, 2017 at 11:21:34 AM UTC-5, Gian Merlino wrote:
I'm 99% sure this is a bug somewhere in the caching for groupBy and not anything you're doing wrong. I'll raise an issue for it in a bit and investigate.

Could you try doing the query with "populateCache": true but "useCache": false? What kind of results do you get then?

Gian

On Thu, Jan 5, 2017 at 8:14 AM, Daniel Cook <cook...@gmail.com> wrote:
I know that groupBy's are in the "uncacheable" list by default but it looks like that's because they were overwhelming the cache (https://github.com/druid-io/druid/pull/638) not that it wasn't functioning.  Still interesting that v1 was working correctly.

On Thursday, January 5, 2017 at 10:54:44 AM UTC-5, Daniel Cook wrote:
Ah, yeah, that did it!  Also makes sense why it was working correctly when querying the historical directly since caching was off on the historical.

Is there a need to flush the cache or something before using v2 or is caching itself just not playing nicely?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Gian Merlino

unread,
Jan 5, 2017, 2:31:59 PM1/5/17
to druid...@googlegroups.com
Hey Daniel,

I just raised this bug that I think you're hitting: https://github.com/druid-io/druid/issues/3820

I haven't tested this yet, but I think that if you move caching from broker to historical then groupBy v2 should work fine. Historical caching tends to scale better in large clusters anyway (it allows historicals to handle some of the merging work) so you might actually prefer this. If you have a chance to try that then please let me know.

The way to do that would be set these properties on historicals:

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.historical.cache.unCacheable=[]

And then set useCache, populateCache to false on the broker.

Thanks for reporting this issue.

Gian

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Daniel Cook

unread,
Jan 5, 2017, 2:36:58 PM1/5/17
to Druid User
Thanks! 

I'm following the github issue as was going to do some testing tonight to see if {historical cache + v2} performs better than {broker cache + v1}.
I think when we first started we saw that caching on the brokers worked better for us than caching on historicals but maybe the boost from v2 is enough to offset that now.

Gian

Reply all
Reply to author
Forward
0 new messages