historical servers stopped downloading segments from HDFS

Venkatesh Kavuluri

unread,

Nov 4, 2014, 3:15:52 PM11/4/14

to druid-de...@googlegroups.com

Hi All,

I have setup a druid cluster with latest RC - 0.6.160 and using HDFS (hadoop-2.4.x) as deep storage. I am running my hourly index jobs using local firehose.

The indexed segments for about 5 days were successfully uploaded to HDFS and compute nodes downloaded them without any issues.

However compute nodes seem to be stuck now as I don't see them downloading new segments from HDFS even though I see corresponding entries in coordinator logs. I don't see any relevant entries in zookeeper logs either.

I didn't configure any custom rules - default rule is active :

id dataSource version payload

_default_2014-10-31T21:48:25.308Z _default 2014-10-31T21:48:25.308Z [{"tieredReplicants":{"_default_tier":2},"type":"loadForever"}]

Here are the messages I see from logs. Not sure why compute nodes suddenly stopped downloading new segments. Can someone please help me understand the issue.

---- coordinator log requesting compute node to download segment data. I don't see any response in compute node and ZK logs.

2014-11-04 19:33:39,991 INFO [Coordinator-Exec--0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080] to load segment[mors_2014-10-29T10:00:00.000Z_2014-10-29T11:00:00.000Z_2014-11-04T19:18:52.733Z]

---- queued jobs which doesn't get loaded

2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical4-413974.XXXX:8080, historical, _default_tier] has 20 left to load, 5 left to drop, 428,516,430 bytes queued, 234,046,722 bytes served.

2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical3-413973. XXXX:8080, historical, _default_tier] has 37 left to load, 4 left to drop, 616,042,313 bytes queued, 60,321,960 bytes served.

2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical2-413396. XXXX:8080, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 687,011,066 bytes served.

2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical1-410557. XXXX:8080, historical, _default_tier] has 8 left to load, 3 left to drop, 174,584,976 bytes queued, 501,216,378 bytes served.

I also see the below errors :

2014-11-04 19:33:39,988 ERROR [Coordinator-Exec--0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue stuck after 15+ runs!: {class=io.druid.server.coordinator.ReplicationThrottler, segments=[mors_2014-10-31T22:00:00.000Z_2014-10-31T23:00:00.000Z_2014-11-03T05:04:17.356Z ON druid-historical3-413973.XXXX:8080]}

2014-11-04 19:33:39,988 INFO [Coordinator-Exec--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"alerts","timestamp":"2014-11-04T19:33:39.988Z","service":"druid/prod/coordinator","host":"druid-coordinator1-410556.XXXX:8080","severity":"component-failure","description":"[_default_tier]: Replicant create queue stuck after 15+ runs!","data":{"class":"io.druid.server.coordinator.ReplicationThrottler","segments":["mors_2014-10-31T22:00:00.000Z_2014-10-31T23:00:00.000Z_2014-11-03T05:04:17.356Z ON druid-historical3-413973.XXXX:8080"]}}]

2014-11-04 19:34:40,404 ERROR [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorBalancer - [_default_tier]: Balancer move segments queue has a segment stuck: {class=io.druid.server.coordinator.helper.DruidCoordinatorBalancer, segment=mors_2014-10-30T18:00:00.000Z_2014-10-30T19:00:00.000Z_2014-11-04T04:35:05.122Z, server=DruidServerMetadata{name='druid-historical2-413396.XXXX:8080', host='druid-historical2-413396.XXXX:8080', maxSize=20000000000, tier='_default_tier', type='historical', priority='0'}}

2014-11-04 19:44:17,672 ERROR [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=30151395, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[channel, e, incc, mpid, oid, page, ppid], version='2014-11-04T04:53:12.006Z', loadSpec={type=hdfs, path=hdfs://XXXX:8020/user/dgdeploy/druid/segments/mors/20141031T160000.000Z_20141031T170000.000Z/2014-11-04T04_53_12.006Z/0/index.zip}, interval=2014-10-31T16:00:00.000Z/2014-10-31T17:00:00.000Z, dataSource='mors', binaryVersion='9'}}].

org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-31T16:00:00.000Z_2014-10-31T17:00:00.000Z_2014-11-04T04:53:12.006Z

at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)

at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)

at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)

at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)

at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)

at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)

at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)

at io.druid.server.coordinator.LoadQueuePeon$1.run(LoadQueuePeon.java:224)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

In zookeeper logs

10557.XXXX:8080/mors_2014-11-01T16:00:00.000Z_2014-11-01T17:00:00.000Z_2014-11-03T06:04:55.648Z

2014-11-04 12:01:23,995 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x279 zxid:0x4b3 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T15:00:00.000Z_2014-10-30T16:00:00.000Z_2014-11-03T01:17:18.413Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T15:00:00.000Z_2014-10-30T16:00:00.000Z_2014-11-03T01:17:18.413Z

2014-11-04 12:01:24,009 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x27a zxid:0x4b4 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-11-01T09:00:00.000Z_2014-11-01T10:00:00.000Z_2014-11-03T05:40:07.589Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-11-01T09:00:00.000Z_2014-11-01T10:00:00.000Z_2014-11-03T05:40:07.589Z

2014-11-04 12:01:24,022 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x27b zxid:0x4b5 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T07:00:00.000Z_2014-10-30T08:00:00.000Z_2014-11-02T19:11:15.806Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T07:00:00.000Z_2014-10-30T08:00:00.000Z_2014-11-02T19:11:15.806Z

2014-11-04 12:01:24,035 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x27c zxid:0x4b6 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-10-30T17:00:00.000Z_2014-10-30T18:00:00.000Z_2014-11-03T01:27:38.793Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-10-30T17:00:00.000Z_2014-10-30T18:00:00.000Z_2014-11-03T01:27:38.793Z

Venkatesh Kavuluri

unread,

Nov 4, 2014, 8:30:14 PM11/4/14

to druid-de...@googlegroups.com

I am retrying with "0.6.144". I scaled down my cluster from 3 middle managers and 4 historicals to 1 middle manager and 2 historicals. Cleaned up all cache and segments table.

After running the first index job, coordinator asked both the historicals to load the segments (as replication factor is 2). While one of the historical nodes loaded segments successfully, the other historical node got stuck during the download process (I do see attempt to initiate the download from the logs). After restarting the historical service on the second node, it was able to successfully load the segment.

What could be causing the second historical node to get stuck during the first attempt (before process restart). Both the historical nodes are exactly same instances with processes running with same memory parameters.

-- historical service

java -server -Xmx5g -Xms5g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:MaxDirectMemorySize=8g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -classpath /opt/druid/lib/*:/opt/druid/config/historical:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/conf:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/common/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/common/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/mapreduce/*:/apache/hbase/lib/*.jar::/apache/hadoop/share/hadoop/yarn:/apache/hadoop/share/hadoop/mapreduce io.druid.cli.Main server historical

druid.host=druid-historical1-410557.XXXX

druid.service=druid/prod/historical

druid.port=8080

druid.zk.service.host=zk1-3810.XXXX

druid.zk.paths.base=/druid/prod1

druid.discovery.curator.path=/prod/discovery1

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.160"]

druid.server.maxSize=20000000000

# Change these to make Druid faster

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numThreads=7

druid.server.http.numThreads=50

druid.segmentCache.locations=[{"path": "/opt/druid/working/druidCache1/indexCache", "maxSize": 20000000000}]

#druid.monitoring.monitors=["io.druid.server.metrics.ServerMonitor", "com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"]

# Emit metrics over http

#idruid.emitter=http

#druid.emitter.http.recipientBaseUrl=#{EMITTER_URL}

# If you choose to compress ZK announcements, you must do so for every node type

druid.announcer.type=batch

druid.curator.compress=true

druid.query.groupBy.maxResults=1000000

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://XXXX:8020/apps/hdmi-paypal/dgdeploy/druid/segments

-- zookeeper service

java -Dzookeeper.log.dir=/tmp/zoo_logs -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/bin/../zookeeper-3.4.6.jar:/opt/zookeeper/bin/../src/java/lib/*.jar:/opt/zookeeper/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper/bin/../conf/zoo.cfg

cat ../conf/zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/home/dgdeploy/zookeeper

# the port at which the clients will connect

clientPort=2182

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

Fangjin Yang

unread,

Nov 4, 2014, 8:44:13 PM11/4/14

to druid-de...@googlegroups.com

Hi Venkatesh, what version of Zookeeper are you using? We've seen problems before with Curator missing announcements when the version in Curator and zookeeper did not match. If you are using 3.4.6 version of ZK, then something else very strange is happening.

Fangjin Yang

unread,

Nov 4, 2014, 9:01:52 PM11/4/14

to druid-de...@googlegroups.com

As a quick recovery, you can restart the coordinator. Does your broker seem correct or does it have a bad view of the world?

Venkatesh Kavuluri

unread,

Nov 4, 2014, 10:06:45 PM11/4/14

to druid-de...@googlegroups.com

Hi Fangjin,

Thanks for the reply. I am using 3.4.6 ZK. This issue doesn't seem to be happening all the time. After I saw this issue with 1st hour index file loading, things are fine now for next three runs. I will get back for help if the issue appears again.

Thanks,

Venkatesh

Venkatesh Kavuluri

unread,

Nov 4, 2014, 10:12:25 PM11/4/14

to druid-de...@googlegroups.com

Broker has correct view of segments, no issues there.

Fangjin Yang

unread,

Nov 4, 2014, 10:23:41 PM11/4/14

to druid-de...@googlegroups.com

The coordinator is on 0.6.160 as well?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/a0ac9482-151f-4fc0-a428-acbac44c1286%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Venkatesh Kavuluri

unread,

Nov 5, 2014, 11:54:28 AM11/5/14

to druid-de...@googlegroups.com

Yes, the coordinator was on 0.6.160 (all the services). Before I heard from you, I thought I should try with the stable version which was 0.6.144 and see if historicals load segments. After I downgraded to 0.6.144, I noticed that 0.6.160 was promoted from RC to stable. Now all the services are on 0.6.144.

I will start crunching data with 0.6.144 today and get back to you.

Fangjin Yang

unread,

Nov 5, 2014, 11:56:06 AM11/5/14

to druid-de...@googlegroups.com

We've been running with 0.6.160 for several weeks in cluster with several hundred nodes and haven't seen this problem yet. FWIW, the last stable was 0.6.146, not 0.6.144 and I'm not sure why the problem would exist in this version and not the other...

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/677457b2-32f9-450f-b163-9230d96892f1%40googlegroups.com.

Venkatesh Kavuluri

unread,

Nov 5, 2014, 12:33:46 PM11/5/14

to druid-de...@googlegroups.com

Right, as you mentioned earlier this could be due to some other weird issue. I will upgrade the cluster back to 0.6.160 and try.

Venkatesh Kavuluri

unread,

Nov 5, 2014, 10:32:42 PM11/5/14

to druid-de...@googlegroups.com

Hi Fangjin,

I am stuck with the same issue again. I was able to trace the actual error messages from logs. While trying to load a segment with replication-factor 2, one historical node loaded it fine and the other failed. After this, both the historical nodes stopped loading segments. Also I don't see any info messages in zookeeper logs. Do you think it's problem with zookeeper communication.

zookeeper

====================

2014-11-05 18:56:07,145 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497d5454da0004 type:create cxid:0x2b0 zxid:0x299 txntype:-1 reqpath:n/a Error Path:/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z Error:KeeperErrorCode = NodeExists for /druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z

coordinator

==================================

2014-11-06 02:11:07,892 INFO [Coordinator-Exec--0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080] to load segment[mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z]

2014-11-06 02:11:07,011 ERROR [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=562968, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[mktgcaids, e, ppid, page, oid, incc, channel, mpid], version='2014-11-06T01:43:20.227Z', loadSpec={type=hdfs, path=/apps/hdmi-paypal/dgdeploy/druid/segments/mors_hadoop/mors_hadoop/20141030T060000.000Z_20141030T070000.000Z/2014-11-06T01_43_20.227Z/0/index.zip}, interval=2014-10-30T06:00:00.000Z/2014-10-30T07:00:00.000Z, dataSource='mors_hadoop', binaryVersion='9'}}].

com.metamx.common.ISE: /druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T06:00:00.000Z_2014-10-30T07:00:00.000Z_2014-11-06T01:43:20.227Z was never removed! Failing this operation!

at io.druid.server.coordinator.LoadQueuePeon$1$1.run(LoadQueuePeon.java:234)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

2014-11-06 02:11:07,012 INFO [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080] loading [mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z]

2014-11-06 02:11:07,012 INFO [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080] processing segment[mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z]

2014-11-06 02:11:07,026 ERROR [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=647107, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[mktgcaids, e, ppid, page, oid, incc, channel, mpid], version='2014-11-06T01:16:37.269Z', loadSpec={type=hdfs, path=/apps/hdmi-paypal/dgdeploy/druid/segments/mors_hadoop/mors_hadoop/20141030T050000.000Z_20141030T060000.000Z/2014-11-06T01_16_37.269Z/0/index.zip}, interval=2014-10-30T05:00:00.000Z/2014-10-30T06:00:00.000Z, dataSource='mors_hadoop', binaryVersion='9'}}].

org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z