should broker nodes be behind a load balancer?

501 views
Skip to first unread message

Prashant Deva

unread,
Feb 23, 2015, 4:21:58 PM2/23/15
to druid-de...@googlegroups.com
I am putting together a High Availibility document on Druid, that I will contribute to the official documentation.

There is something I am not entirely clear about:
It seems like the broker nodes should be behind a load balancer to achieve High Availability.
Or is there some other in built druid mechanism (like that for realtime or historical nodes) to cluster broker nodes?

Hagen Rother

unread,
Feb 23, 2015, 4:26:57 PM2/23/15
to druid-de...@googlegroups.com
Brokers annouce themselves in zookeeper. So a HA client can just go through that list. E.g. https://github.com/liquidm/ruby-druid/ does it.

This way, all you have to do is fire up more than one broker and be done.

(btw: is the hack to do an http request for datasources still necessary to reliably identify a broker in zookeeper)


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/c392975f-d54f-4b9c-9ae6-9d5d67fcc639%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Hagen Rother
Lead Architect | LiquidM

LiquidM Technology GmbH
Straße der Pariser Kommune 8 | 10243 Berlin | Germany
LinkedIn
Phone:+49 176 15 00 38 77
Internet:www.liquidm.com

Managing Directors | André Bräuer, Philipp Simon, Thomas Hille
Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination, distribution, forwarding, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited without the express permission of the sender. If you received this communication in error, please contact the sender and delete the material from any computer.

Xavier Léauté

unread,
Feb 23, 2015, 4:31:09 PM2/23/15
to druid-de...@googlegroups.com
Prashant, you can run as many brokers as you'd like. Either use your favorite load balancer, or use a Druid router node.

The router is a little more intelligent than your average load-balancer in that it can be given rules to route queries to routers in specific tiers based on query parameters. Have a look at the various routing strategies here http://druid.io/docs/0.6.171/Router.html

--

Xavier Léauté

unread,
Feb 23, 2015, 4:35:20 PM2/23/15
to druid-de...@googlegroups.com
Hagen, the service name broker nodes use to announce themselves is configurable, so you should use whatever you defined as druid.service configuration parameters to determine broker nodes.

Starting with 0.7.0, Druid all nodes will have a default name (druid/broker), which will translate to druid:broker in zookeeper (':' gets substituted for '/' in zk names) 

Hagen Rother

unread,
Feb 23, 2015, 4:38:49 PM2/23/15
to druid-de...@googlegroups.com
I read that as "no, you still need the hack". A configurable string is in no way a reliable way to detect the type of node.

A simple (static) type field in the zk entry is all it would take.


For more options, visit https://groups.google.com/d/optout.

Xavier Léauté

unread,
Feb 23, 2015, 5:06:57 PM2/23/15
to druid-de...@googlegroups.com
The reason the name is configurable is to allow for different tiers of brokers with different service names.
This allows someone to define which tier they would like to query.
By default the router uses the default broker service name, but it is possible to configure different tiers of brokers and tell the router which queries to send where.

Using just the datasources endpoint is not a reliable way to identify the node type if you run a router node, since the router node will also respond to that endpoint.
We could certainly add a node type to the zk entries if there is a need for it, but hopefully the router node will help alleviate that need down the road. 

Hagen Rother

unread,
Feb 23, 2015, 5:10:55 PM2/23/15
to druid-de...@googlegroups.com
Like I said, ruby-druid uses zookeeper to get all available nodes (and provides a query api for available service names).

However, coordinators announce themselves there too, so I need the hack to filter them out. A static type in zookeeper would help to avoid that check.


For more options, visit https://groups.google.com/d/optout.
Message has been deleted

abhish...@media.net

unread,
Sep 22, 2017, 3:30:38 AM9/22/17
to Druid Development

Could you please provide details of how router decides to which tier the query should be forwarded to? 
Reply all
Reply to author
Forward
0 new messages