Historical vs Broker caching

381 views
Skip to first unread message

mcap...@kochava.com

unread,
Sep 18, 2015, 12:58:21 AM9/18/15
to Druid User
After running some load testing I've found caching with memcached on broker nodes to be more performant than memcached on historicals, at least in the query load per second that it can handle. I'm curious why the production configs suggest caching on the historicals instead, and whether there is something that I overlooked that would make me want to switch back. Any insight on the matter would be awesome.

Thanks!

Michael

何文斌

unread,
Sep 20, 2015, 10:38:11 PM9/20/15
to Druid User
have the same confusion, is there any guide? thx.

在 2015年9月18日星期五 UTC+8下午12:58:21,mcap...@kochava.com写道:

Xavier Léauté

unread,
Sep 21, 2015, 6:35:50 PM9/21/15
to druid...@googlegroups.com
Thank Michael, do you have some numbers to share, it would be nice to see where the difference comes from and whether it's an artifact of your benchmark setup / config or whether it's something else that we can improve.

Typically the answer to your question is "it depends". It depends a lot on the types of queries you run, and it depends on your data. Sometimes the broker can become a bottleneck if you have lots of segment results to merge, in which case it can be beneficial to off-load to historical nodes in order to distribute the merging load.



--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/0b53fe60-4e4f-487c-ab69-657b8ae945fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jakub Liska

unread,
May 2, 2016, 12:18:22 PM5/2/16
to Druid User
What would you recommend for a small cluster : 

master m4.large
middleManager m4.large
broker-1 m4.large
broker-2 m4.large
historical-1 m4.large
historical-2 m4.large

If the only production queries are "select ... where ... groupBy ..." + only administrators use Pivot ... 

Since Broker cache doesn't cache "select" and "groupBy" queries by default, it should be probably using only historical cache, right ?

Or do you thing that it is a good idea to use only Broker cache and enable "select" and "groupBy" queries in druid.broker.cache.unCacheable ?

Xavier Léauté

unread,
May 2, 2016, 1:26:45 PM5/2/16
to druid...@googlegroups.com
Hi Jakub, given the small size of your cluster, caching on the broker will typically perform better.
However, I still encourage you to benchmark different configurations and draw your own conclusions

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

Jakub Liska

unread,
May 2, 2016, 2:24:53 PM5/2/16
to druid...@googlegroups.com

hi Xavier,but we issue only queries that are not chached by default on broker node,do you think it is wise to enable caching of select and groubBy qeuries?

You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/T0y9qW7oRqI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Xavier Léauté

unread,
May 2, 2016, 2:29:17 PM5/2/16
to druid...@googlegroups.com
Sure, I would give it a try, assuming that the query results are not too large I don't see any reason not to.

Jakub Liska

unread,
May 2, 2016, 5:12:53 PM5/2/16
to Druid User
Perfect, I'll give it a shot and write a query benchmark to see how it works out.


it is not recommended to enable caching on both Broker and Historical nodes

But it is not clear why and what problems it might cause. I'll blindly follow this recommendation and disable caching on Historical nodes then :-) 

Fangjin Yang

unread,
May 3, 2016, 7:50:32 PM5/3/16
to Druid User
There's no need to enable caching on both. If you turned on caching on the broker for example, the historical cache would never get used.

Xavier Léauté

unread,
May 3, 2016, 8:08:52 PM5/3/16
to druid...@googlegroups.com
Actually, you can enable caching on both, and also set cacheBulkMergeLimit, which will limit the number of cache fetches the broker will try to do before falling back to querying the historical nodes.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

Jakub Liska

unread,
Aug 10, 2016, 9:31:47 AM8/10/16
to Druid User
After 3 months in production I had to disable caching on the historical nodes, leaving only broker caching enabled. Because we've noticed that sometimes queries return empty results. Even queries that span 20+ segments. I hope it helps.

Charles Allen

unread,
Aug 10, 2016, 10:43:53 AM8/10/16
to Druid User
That really shouldn't happen related to caching. Also, what kind of cache are you using?

On Wed, Aug 10, 2016 at 6:31 AM Jakub Liska <liska...@gmail.com> wrote:
After 3 months in production I had to disable caching on the historical nodes, leaving only broker caching enabled. Because we've noticed that sometimes queries return empty results. Even queries that span 20+ segments. I hope it helps.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

Jakub Liska

unread,
Aug 10, 2016, 10:55:50 AM8/10/16
to Druid User
These used be my docker-compose settings, otherwise I stick to defaults : 

        - HISTORICAL_DRUID_HISTORICAL_CACHE_USECACHE=true
        - HISTORICAL_DRUID_HISTORICAL_CACHE_POPULATECACHE=true
        - HISTORICAL_DRUID_CACHE_SIZEINBYTES=1000000000
        - HISTORICAL_DRUID_CACHE_TYPE=local
        - BROKER_DRUID_BROKER_CACHE_USECACHE=true
        - BROKER_DRUID_BROKER_CACHE_POPULATECACHE=true
        - BROKER_DRUID_BROKER_CACHE_UNCACHEABLE=[]

And with these settings, all of a sudden some  : 

SELECT COUNT(DISTINCT foo) WHERE blabla GROUP BY bar

that spanned even 20 segments started returning empty results ... 

BUT !!! Now I noticed that it started returning correct results after plyql server restarted : 

plyql -c 2 -h broker:8082 -i P2Y --json-server 8099

I submitted some illegal queries into it which crashed it :

/usr/local/lib/node_modules/plyql/node_modules/q/q.js:155
                throw e;
                      ^
Error: can not serialize an approximate unique value
    at UniqueAttributeInfo.serialize (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:715:19)
    at DruidExternal.makeSelectorFilter (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3779:38)
    at DruidExternal.timelessFilterToDruid (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3833:37)
    at DruidExternal.timelessFilterToDruid (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3823:37)
    at DruidExternal.makeNativeAggregateFilter (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4534:30)
    at DruidExternal.applyToAggregation (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4686:36)
    at DruidExternal.getAggregationsAndPostAggregations (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4701:22)
    at DruidExternal.getQueryAndPostProcess (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4978:64)
    at DruidExternal.External.queryValue (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3170:48)
    at ExternalExpression._computeResolved (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:6788:29)
PlyQL server listening on port: 8099


The plyql docker container restarted and after that it started returning correct results, so I'm being suspicious that it could be caused be the plyql server. I'm using the release from 1.2.1 imply data distribution ...

Gian Merlino

unread,
Aug 10, 2016, 11:09:04 AM8/10/16
to druid...@googlegroups.com
Hey Jakub,

Like Charles said, it would be alarming if the Druid cache caused incorrect query results. We'd definitely treat that as a serious bug to fix asap. There aren't any correctness issues I'm currently aware of.

If your issue does look more like a plyql problem, would you mind reporting that through one of the imply channels? e.g. https://github.com/implydata/plyql/issues or https://groups.google.com/forum/#!forum/imply-user-group.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Jakub Liska

unread,
Aug 10, 2016, 11:17:14 AM8/10/16
to Druid User
Reply all
Reply to author
Forward
0 new messages