Introducing HTTP based Zookeeper alternatives for Segment and Task management.

Himanshu Gupta

unread,

Jan 9, 2018, 3:56:25 PM1/9/18

to Druid Development

Please see https://groups.google.com/d/msg/druid-development/eIWDPfhpM_U/AzMRxSQGAgAJ for the work on reducing usage of Zookeeper inside Druid and eventually making it optional.

Latest Druid code now has the code to run Druid cluster with Zookeeper usage reduced to bare minimum, that is, only for discovering other Druid nodes and for Leader election of Overlord/Coordinator. If you’re running a Druid cluster with latest code in Druid master(to be released in Druid-0.13.0) already then following options can be enabled.

1. Using HTTP based segment discovery on Coordinators and Brokers.
Set ‘druid.announcer.type=http’ and restart Druid process at Coordinators and Brokers.
Verification: On process start, "Starting HttpServerInventoryView." should be printed in the log.

2. Using HTTP based segment load/drop management at Coordinators.
Set ‘druid.coordinator.loadqueuepeon.type=http’ and restart Druid process at Coordinators. You can optionally configure ‘druid.coordinator.loadqueuepeon.http.batchSize’ at coordinator for processing multiple load/drop requests in parallel. See its description in the doc at https://github.com/druid-io/druid/blob/83c6c48bed5312f20d7344358dd9804d080f68c1/docs/content/configuration/coordinator.md for more details.
Verification: In the coordinator logs, “HttpLoadQueuePeon” class should be printing messages as segments load/drop requests are prepared to be sent to historical nodes.

3. Using HTTP based Task management at Overlord running in remote mode i.e using MiddleManagers to run the tasks.
Set ‘druid.indexer.runner.type=remoteHttp’ and restart Druid process at Overlords. Note that, all Overlords must be using same taskrunner type config at all times. So, you would update the configuration at all Overlords, then stop all of them, and then start all of them.
Verification: In the overlord logs, something like “io.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner - Starting...” should be printed on process start.

There are more configurations to tweak certain behavior but defaults should work fine.

For curious readers wondering how Coordinator HTTP based segment discovery manages to keep segment state up-to-date in realtime, see https://github.com/druid-io/druid/blob/83c6c48bed5312f20d7344358dd9804d080f68c1/server/src/main/java/io/druid/server/http/SegmentListerResource.java#L87 ( basically an implementation of “long polling” via HTTP). Overlord HTTP based Task management uses same mechanism to keep task state up-to-date.

Note: Above options are made available for bleeding edge Druid code testing only at this time and should not be enabled in production. Once above options are verified in multiple test clusters, then we will start removing hard coded zookeeper based Segment and Task management.

Thanks,
Himanshu

Himanshu Gupta

unread,

Jan 9, 2018, 4:00:42 PM1/9/18

to Druid Development

This POST is a result of discussion in dev-sync to start testing of these features on test clusters used by various Druid developers.

Xavier Léauté

unread,

Jan 11, 2018, 1:23:22 PM1/11/18

to druid-de...@googlegroups.com

Thanks Himanshu, very exciting! Is there any specific procedure to follow to upgrade from ZK to HTTP announcement, or is it as simple as following the steps you described, in that order?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/aa09da79-54fc-4c51-ae10-7edcbc4cbadb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Himanshu Gupta

unread,

Jan 11, 2018, 9:21:33 PM1/11/18

to Druid Development

Thanks Xavier.. Steps described here are enough, All 3 things are completely independent of each other and can be done in any order.

For Brokers, You can even have a setup where some Brokers are using Zookeeper based old code for segment discovery while other Brokers are using HTTP based.

To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Gian Merlino

unread,

Feb 2, 2018, 5:50:17 AM2/2/18

to druid-de...@googlegroups.com

I see you put a trap here to check if people are actually reading your instructions! I tried "remoteHttp" but found that it should really be spelled "httpRemote".

I just set all three of these properties in our test cluster, and it worked after this patch: https://github.com/druid-io/druid/pull/5329.

One thing I noticed is that my broker took a long time to start up (almost 10 minutes). I raised this issue describing what I saw: https://github.com/druid-io/druid/issues/5331.

Gian

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/e0b66918-8f64-4d33-8c96-1cdb6e7ddccb%40googlegroups.com.

Himanshu Gupta

unread,

Feb 2, 2018, 1:04:51 PM2/2/18

to Druid Development

> I see you put a trap here to check if people are actually reading your instructions! I tried "remoteHttp" but found that it should really be spelled "httpRemote".

Yes, the trap was that you have to read code to figure out correct incantation because it is wrong even in Druid documentation. Fixing it now https://github.com/druid-io/druid/pull/5334 .