historical servers stopped downloading segments from HDFS

870 views
Skip to first unread message

Venkatesh Kavuluri

unread,
Nov 4, 2014, 3:15:52 PM11/4/14
to druid-de...@googlegroups.com
Hi All,

I have setup a druid cluster with latest RC - 0.6.160 and using HDFS (hadoop-2.4.x) as deep storage. I am running my hourly index jobs using local firehose.

The indexed segments for about 5 days were successfully uploaded to HDFS and compute nodes downloaded them without any issues. 

However compute nodes seem to be stuck now as I don't see them downloading new segments from HDFS even though I see corresponding entries in coordinator logs. I don't see any relevant entries in zookeeper logs either.  

I didn't configure any custom rules - default rule is active :
id dataSource version payload
_default_2014-10-31T21:48:25.308Z _default 2014-10-31T21:48:25.308Z [{"tieredReplicants":{"_default_tier":2},"type":"loadForever"}]

Here are the messages I see from logs. Not sure why compute nodes suddenly stopped downloading new segments. Can someone please help me understand the issue.

---- coordinator log requesting compute node to download segment data. I don't see any response in compute node and ZK logs.

2014-11-04 19:33:39,991 INFO [Coordinator-Exec--0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080] to load segment[mors_2014-10-29T10:00:00.000Z_2014-10-29T11:00:00.000Z_2014-11-04T19:18:52.733Z]

---- queued jobs which doesn't get loaded

2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical4-413974.XXXX:8080, historical, _default_tier] has 20 left to load, 5 left to drop, 428,516,430 bytes queued, 234,046,722 bytes served.
2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical3-413973. XXXX:8080, historical, _default_tier] has 37 left to load, 4 left to drop, 616,042,313 bytes queued, 60,321,960 bytes served.
2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical2-413396. XXXX:8080, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 687,011,066 bytes served.
2014-11-04 19:32:39,982 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[druid-historical1-410557. XXXX:8080, historical, _default_tier] has 8 left to load, 3 left to drop, 174,584,976 bytes queued, 501,216,378 bytes served.


I also see the below errors :

2014-11-04 19:33:39,988 ERROR [Coordinator-Exec--0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue stuck after 15+ runs!: {class=io.druid.server.coordinator.ReplicationThrottler, segments=[mors_2014-10-31T22:00:00.000Z_2014-10-31T23:00:00.000Z_2014-11-03T05:04:17.356Z ON druid-historical3-413973.XXXX:8080]}
2014-11-04 19:33:39,988 INFO [Coordinator-Exec--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"alerts","timestamp":"2014-11-04T19:33:39.988Z","service":"druid/prod/coordinator","host":"druid-coordinator1-410556.XXXX:8080","severity":"component-failure","description":"[_default_tier]: Replicant create queue stuck after 15+ runs!","data":{"class":"io.druid.server.coordinator.ReplicationThrottler","segments":["mors_2014-10-31T22:00:00.000Z_2014-10-31T23:00:00.000Z_2014-11-03T05:04:17.356Z ON druid-historical3-413973.XXXX:8080"]}}]



2014-11-04 19:34:40,404 ERROR [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorBalancer - [_default_tier]: Balancer move segments queue has a segment stuck: {class=io.druid.server.coordinator.helper.DruidCoordinatorBalancer, segment=mors_2014-10-30T18:00:00.000Z_2014-10-30T19:00:00.000Z_2014-11-04T04:35:05.122Z, server=DruidServerMetadata{name='druid-historical2-413396.XXXX:8080', host='druid-historical2-413396.XXXX:8080', maxSize=20000000000, tier='_default_tier', type='historical', priority='0'}}


2014-11-04 19:44:17,672 ERROR [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=30151395, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[channel, e, incc, mpid, oid, page, ppid], version='2014-11-04T04:53:12.006Z', loadSpec={type=hdfs, path=hdfs://XXXX:8020/user/dgdeploy/druid/segments/mors/20141031T160000.000Z_20141031T170000.000Z/2014-11-04T04_53_12.006Z/0/index.zip}, interval=2014-10-31T16:00:00.000Z/2014-10-31T17:00:00.000Z, dataSource='mors', binaryVersion='9'}}].
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-31T16:00:00.000Z_2014-10-31T17:00:00.000Z_2014-11-04T04:53:12.006Z
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
at io.druid.server.coordinator.LoadQueuePeon$1.run(LoadQueuePeon.java:224)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



In zookeeper logs

10557.XXXX:8080/mors_2014-11-01T16:00:00.000Z_2014-11-01T17:00:00.000Z_2014-11-03T06:04:55.648Z
2014-11-04 12:01:23,995 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x279 zxid:0x4b3 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T15:00:00.000Z_2014-10-30T16:00:00.000Z_2014-11-03T01:17:18.413Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T15:00:00.000Z_2014-10-30T16:00:00.000Z_2014-11-03T01:17:18.413Z
2014-11-04 12:01:24,009 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x27a zxid:0x4b4 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-11-01T09:00:00.000Z_2014-11-01T10:00:00.000Z_2014-11-03T05:40:07.589Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-11-01T09:00:00.000Z_2014-11-01T10:00:00.000Z_2014-11-03T05:40:07.589Z
2014-11-04 12:01:24,022 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x27b zxid:0x4b5 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T07:00:00.000Z_2014-10-30T08:00:00.000Z_2014-11-02T19:11:15.806Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical4-413974.XXXX:8080/mors_2014-10-30T07:00:00.000Z_2014-10-30T08:00:00.000Z_2014-11-02T19:11:15.806Z
2014-11-04 12:01:24,035 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497b91f74d0010 type:create cxid:0x27c zxid:0x4b6 txntype:-1 reqpath:n/a Error Path:/druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-10-30T17:00:00.000Z_2014-10-30T18:00:00.000Z_2014-11-03T01:27:38.793Z Error:KeeperErrorCode = NodeExists for /druid/prod/loadQueue/druid-historical1-410557.XXXX:8080/mors_2014-10-30T17:00:00.000Z_2014-10-30T18:00:00.000Z_2014-11-03T01:27:38.793Z

Venkatesh Kavuluri

unread,
Nov 4, 2014, 8:30:14 PM11/4/14
to druid-de...@googlegroups.com
I am retrying with "0.6.144". I scaled down my cluster from  3 middle managers and 4 historicals to 1 middle manager and 2 historicals. Cleaned up all cache and segments table.

After running the first index job, coordinator asked both the historicals to load the segments (as replication factor is 2). While one of the historical nodes loaded segments successfully,  the other historical node got stuck during the download process (I do see attempt to initiate the download from the logs). After restarting the historical service on the second node, it was able to successfully load the segment. 

What could be causing the second historical node to get stuck during the first attempt (before process restart). Both the historical nodes are exactly same instances with processes running with same memory parameters.

-- historical service
java -server -Xmx5g -Xms5g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:MaxDirectMemorySize=8g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp -classpath /opt/druid/lib/*:/opt/druid/config/historical:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/conf:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/common/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/common/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/mapreduce/*:/apache/hbase/lib/*.jar::/apache/hadoop/share/hadoop/yarn:/apache/hadoop/share/hadoop/mapreduce io.druid.cli.Main server historical

druid.host=druid-historical1-410557.XXXX
druid.service=druid/prod/historical
druid.port=8080

druid.zk.service.host=zk1-3810.XXXX
druid.zk.paths.base=/druid/prod1
druid.discovery.curator.path=/prod/discovery1

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.160"]

druid.server.maxSize=20000000000

# Change these to make Druid faster
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=7
druid.server.http.numThreads=50

druid.segmentCache.locations=[{"path": "/opt/druid/working/druidCache1/indexCache", "maxSize": 20000000000}]

#druid.monitoring.monitors=["io.druid.server.metrics.ServerMonitor", "com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"]

# Emit metrics over http
#idruid.emitter=http
#druid.emitter.http.recipientBaseUrl=#{EMITTER_URL}

# If you choose to compress ZK announcements, you must do so for every node type
druid.announcer.type=batch
druid.curator.compress=true

druid.query.groupBy.maxResults=1000000

druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://XXXX:8020/apps/hdmi-paypal/dgdeploy/druid/segments

-- zookeeper service
java -Dzookeeper.log.dir=/tmp/zoo_logs -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/bin/../zookeeper-3.4.6.jar:/opt/zookeeper/bin/../src/java/lib/*.jar:/opt/zookeeper/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper/bin/../conf/zoo.cfg

cat ../conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/dgdeploy/zookeeper
# the port at which the clients will connect
clientPort=2182
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1


Fangjin Yang

unread,
Nov 4, 2014, 8:44:13 PM11/4/14
to druid-de...@googlegroups.com
Hi Venkatesh, what version of Zookeeper are you using? We've seen problems before with Curator missing announcements when the version in Curator and zookeeper did not match. If you are using 3.4.6 version of ZK, then something else very strange is happening.

Fangjin Yang

unread,
Nov 4, 2014, 9:01:52 PM11/4/14
to druid-de...@googlegroups.com
As a quick recovery, you can restart the coordinator. Does your broker seem correct or does it have a bad view of the world?

Venkatesh Kavuluri

unread,
Nov 4, 2014, 10:06:45 PM11/4/14
to druid-de...@googlegroups.com
Hi Fangjin,

Thanks for the reply. I am using 3.4.6 ZK. This issue doesn't seem to be happening all the time. After I saw this issue with 1st hour index file loading, things are fine now for next three runs. I will get back for help if the issue appears again. 

Thanks,
Venkatesh

Venkatesh Kavuluri

unread,
Nov 4, 2014, 10:12:25 PM11/4/14
to druid-de...@googlegroups.com
Broker has correct view of segments, no issues there.

Fangjin Yang

unread,
Nov 4, 2014, 10:23:41 PM11/4/14
to druid-de...@googlegroups.com

The coordinator is on 0.6.160 as well?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/a0ac9482-151f-4fc0-a428-acbac44c1286%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Venkatesh Kavuluri

unread,
Nov 5, 2014, 11:54:28 AM11/5/14
to druid-de...@googlegroups.com
Yes, the coordinator was on 0.6.160 (all the services). Before I heard from you, I thought I should try with the stable version which was 0.6.144 and see if historicals load segments. After I downgraded to 0.6.144, I noticed that 0.6.160 was promoted from RC to stable. Now all the services are on 0.6.144.

I will start crunching data with 0.6.144 today and get back to you.

Fangjin Yang

unread,
Nov 5, 2014, 11:56:06 AM11/5/14
to druid-de...@googlegroups.com
We've been running with 0.6.160 for several weeks in cluster with several hundred nodes and haven't seen this problem yet. FWIW, the last stable was 0.6.146, not 0.6.144 and I'm not sure why the problem would exist in this version and not the other...

Venkatesh Kavuluri

unread,
Nov 5, 2014, 12:33:46 PM11/5/14
to druid-de...@googlegroups.com
Right, as you mentioned earlier this could be due to some other weird issue. I will upgrade the cluster back to 0.6.160 and try.

Venkatesh Kavuluri

unread,
Nov 5, 2014, 10:32:42 PM11/5/14
to druid-de...@googlegroups.com
Hi Fangjin,

I am stuck with the same issue again. I was able to trace the actual error messages from logs. While trying to load a segment with replication-factor 2, one historical node loaded it fine and the other failed. After this, both the historical nodes stopped loading segments. Also I don't see any info messages in zookeeper logs. Do you think it's problem with zookeeper communication.

zookeeper
====================
2014-11-05 18:56:07,145 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497d5454da0004 type:create cxid:0x2b0 zxid:0x299 txntype:-1 reqpath:n/a Error Path:/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z Error:KeeperErrorCode = NodeExists for /druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z


coordinator
==================================
2014-11-06 02:11:07,892 INFO [Coordinator-Exec--0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080] to load segment[mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z]



2014-11-06 02:11:07,011 ERROR [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=562968, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[mktgcaids, e, ppid, page, oid, incc, channel, mpid], version='2014-11-06T01:43:20.227Z', loadSpec={type=hdfs, path=/apps/hdmi-paypal/dgdeploy/druid/segments/mors_hadoop/mors_hadoop/20141030T060000.000Z_20141030T070000.000Z/2014-11-06T01_43_20.227Z/0/index.zip}, interval=2014-10-30T06:00:00.000Z/2014-10-30T07:00:00.000Z, dataSource='mors_hadoop', binaryVersion='9'}}].
        at io.druid.server.coordinator.LoadQueuePeon$1$1.run(LoadQueuePeon.java:234)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2014-11-06 02:11:07,012 INFO [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080] loading [mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z]
2014-11-06 02:11:07,012 INFO [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080] processing segment[mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z]
2014-11-06 02:11:07,026 ERROR [Master-PeonExec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=647107, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[mktgcaids, e, ppid, page, oid, incc, channel, mpid], version='2014-11-06T01:16:37.269Z', loadSpec={type=hdfs, path=/apps/hdmi-paypal/dgdeploy/druid/segments/mors_hadoop/mors_hadoop/20141030T050000.000Z_20141030T060000.000Z/2014-11-06T01_16_37.269Z/0/index.zip}, interval=2014-10-30T05:00:00.000Z/2014-10-30T06:00:00.000Z, dataSource='mors_hadoop', binaryVersion='9'}}].
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:688)
        at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:672)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:668)
        at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
        at io.druid.server.coordinator.LoadQueuePeon$1.run(LoadQueuePeon.java:224)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

historical
====================
2014-11-06 01:26:06,965 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[LOAD: mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z] with node[/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z].
2014-11-06 01:26:06,966 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Loading segment mors_hadoop_2014-10-30T05:00:00.000Z_2014-10-30T06:00:00.000Z_2014-11-06T01:16:37.269Z

(and I don't see the acknowledgement of segment load completion message).

Venkatesh Kavuluri

unread,
Nov 5, 2014, 10:38:02 PM11/5/14
to druid-de...@googlegroups.com
Forgot to mention, I saw this issue appearing after 11 segments were loaded successfully on a fresh cluster. No more newly indexed segments are getting loaded and all I see in zookeeper logs are the below error messages. I am running the index jobs on batch data in parallel (7 jobs on individual hours at a time - local firehose) and all the segments would be loaded to hostorical nodes. Could this be the reason ?

2014-11-05 19:26:07,209 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497d5454da0004 type:create cxid:0x2c6 zxid:0x2ae txntype:-1 reqpath:n/a Error Path:/druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-29T22:00:00.000Z_2014-10-29T23:00:00.000Z_2014-11-06T02:37:52.947Z Error:KeeperErrorCode = NodeExists for /druid/prod1/loadQueue/druid-historical1-410557.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-29T22:00:00.000Z_2014-10-29T23:00:00.000Z_2014-11-06T02:37:52.947Z
2014-11-05 19:35:08,434 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1497d5454da0004 type:create cxid:0x2ca zxid:0x2b0 txntype:-1 reqpath:n/a Error Path:/druid/prod1/loadQueue/druid-historical2-413396.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-29T23:00:00.000Z_2014-10-30T00:00:00.000Z_2014-11-06T02:37:57.836Z Error:KeeperErrorCode = NodeExists for /druid/prod1/loadQueue/druid-historical2-413396.phx01.dev.ebayc3.com:8080/mors_hadoop_2014-10-29T23:00:00.000Z_2014-10-30T00:00:00.000Z_2014-11-06T02:37:57.836Z

Fangjin Yang

unread,
Nov 5, 2014, 10:42:29 PM11/5/14
to druid-de...@googlegroups.com

Hi, can you provide a thread dump of the stuck historical?

Fangjin Yang

unread,
Nov 5, 2014, 10:44:27 PM11/5/14
to druid-de...@googlegroups.com

Your ZK cluster is located in the same data center as your Druid cluster right?

Venkatesh Kavuluri

unread,
Nov 5, 2014, 11:14:04 PM11/5/14
to druid-de...@googlegroups.com
That could be the reason. ZK is in LVS and all other VMs in the cluster are in PHX. I will move it to PHX datacenter and try. If I still see the issue, I will send you the thread dump - but I think this would solve the issue. How can ZK being in different datacenter cause this issue. 

Venkatesh Kavuluri

unread,
Nov 5, 2014, 11:17:08 PM11/5/14
to druid-de...@googlegroups.com
Fangjin, I have just a single node running ZK. Is it preferable to have multiple nodes in ZK cluster.

Fangjin Yang

unread,
Nov 6, 2014, 12:17:17 AM11/6/14
to druid-de...@googlegroups.com
For high availability, you should consider 3 ZK nodes.

We have no experience running druid and ZK different data centers. I suspect the latency of communication is triggering timeout problems and putting things in a bad state.

One other issue that may be at play here is the fact that the deep storage and druid nodes are in different data centers and downloads are just taking a long time, potentially generating that message, but I suspect the problem to be with ZK.





Venkatesh Kavuluri

unread,
Nov 6, 2014, 1:14:34 AM11/6/14
to druid-de...@googlegroups.com
Thanks Fangjin, I will update you once I test with co located ZK cluster.

Venkatesh Kavuluri

unread,
Nov 6, 2014, 5:28:36 PM11/6/14
to druid-de...@googlegroups.com
Hi Fangjin,

I have moved ZK to the same data center as rest of the druid cluster. The historical nodes are still getting stuck randomly blocking the segment loads until I restart the historical service. After the restart, few segments gets loaded and the process gets blocked again.

I see Thread.State.WAITING in the thread dump log. Sending you the entire thread dump of the historical process after it got stuck. Also sending you the historical process log which shows that the service started, loaded few segments, throws an exception while trying to load a segment, then loads few more segments eventually getting stuck while loading a segment.

Can you please go through the logs and let me know why the process is getting stuck.



Thanks,
Venkatesh
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/o
...

Venkatesh Kavuluri

unread,
Nov 6, 2014, 5:44:40 PM11/6/14
to druid-de...@googlegroups.com
Hi Fangjin,

I have moved ZK instance to the same same data center as rest of the hadoop cluster (still running only one ZK node though). However I am still experiencing problems with historical nodes getting stuck.

When I restart historical node after it gets stuck -  it loads few segments as told by coordinator, throws an exception while loading a segment, then loads few more segments, eventually getting stuck while loading a different segment. The thread dump shows the process is stuck in WAITING state. 

  1. "main" prio=10 tid=0x00007ffbbc012800 nid=0x3ad1 in Object.wait() [0x00007ffbc0f9b000]
  2.    java.lang.Thread.State: WAITING (on object monitor)
  3.         at java.lang.Object.wait(Native Method)
  4.         - waiting on <0x00000005fae046c0> (a java.lang.Thread)
  5.         at java.lang.Thread.join(Thread.java:1281)
  6.         - locked <0x00000005fae046c0> (a java.lang.Thread)
  7.         at java.lang.Thread.join(Thread.java:1355)
  8.         at com.metamx.common.lifecycle.Lifecycle.join(Lifecycle.java:299)
  9.         at io.druid.cli.ServerRunnable.run(ServerRunnable.java:43)
  10.         at io.druid.cli.Main.main(Main.java:90)

Here is the full thread dump : http://pastebin.com/BLrU0LMf
Here is the process log which shows the conditions I have described above : http://pastebin.com/4uiGtNx0 [line #341 shows the last attempt for segment load before getting stuck]

Can you please go over the logs and let me know why the process is getting stuck.

I am running the entire cluster on 1.160 with HDFS as deep storage. Our Hadoop cluster is in the same data center as druid cluster. 
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/o
...

Venkatesh Kavuluri

unread,
Nov 6, 2014, 6:12:23 PM11/6/14
to druid-de...@googlegroups.com
After one of the historical service restarts, I see the below error - timeouts connecting to our hadoop cluster. Could the issue be related to network timeouts ?


I don't see network connectivity problems in the previous attempts which got stuck.

Venkatesh Kavuluri

unread,
Nov 6, 2014, 6:24:48 PM11/6/14
to druid-de...@googlegroups.com
Another flow where historical process is getting stuck :

2014-11-06 23:21:50,039 INFO [Announcer-0] io.druid.curator.announcement.Announcer - Node[/druid/prod1/announcements/druid-historical2-413396.phx01.dev.ebayc3.com:8080] dropped, reinstating.
2014-11-06 23:22:07,652 INFO [ZkCoordinator-Exec--0] io.druid.server.coordination.ServerManager - Told to delete a queryable on dataSource[mors_hadoop] for interval[2014-11-01T23:00:00.000Z/2014-11-02T00:00:00.000Z] and version [2014-11-06T20:32:25.152Z] that I don't have.
2014-11-06 23:22:07,653 INFO [ZkCoordinator-Exec--0] io.druid.segment.loading.OmniSegmentLoader - Asked to cleanup something[DataSegment{size=765413, shardSpec=NoneShardSpec, metrics=[rowcount], dimensions=[mktgcaids, e, ppid, page, oid, incc, channel, mpid], version='2014-11-06T20:32:25.152Z', loadSpec={type=hdfs, path=hdfs://apollo-phx-nn.vip.ebay.com:8020/apps/hdmi-paypal/dgdeploy/druid/segments/mors_hadoop/mors_hadoop/20141101T230000.000Z_20141102T000000.000Z/2014-11-06T20_32_25.152Z/0/index.zip}, interval=2014-11-01T23:00:00.000Z/2014-11-02T00:00:00.000Z, dataSource='mors_hadoop', binaryVersion='9'}] that didn't exist.  Skipping.
2014-11-06 23:22:07,653 WARN [ZkCoordinator-Exec--0] io.druid.server.coordination.ZkCoordinator - Unable to delete segmentInfoCacheFile[/opt/druid/working/druidCache1/indexCache/info_dir/mors_hadoop_2014-11-01T23:00:00.000Z_2014-11-02T00:00:00.000Z_2014-11-06T20:32:25.152Z]
2014-11-06 23:22:34,409 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.403Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/numEntries","value":0}]
2014-11-06 23:22:34,411 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.411Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/sizeBytes","value":0}]
2014-11-06 23:22:34,412 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.411Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/hits","value":0}]
2014-11-06 23:22:34,412 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.412Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/misses","value":0}]
2014-11-06 23:22:34,413 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.413Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/evictions","value":0}]
2014-11-06 23:22:34,415 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.413Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/hitRate","value":0.0}]
2014-11-06 23:22:34,416 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.415Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/averageBytes","value":0}]
2014-11-06 23:22:34,416 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-11-06T23:22:34.416Z","service":"druid/prod/historical","host":"druid-historical2-413396.phx01.dev.ebayc3.com:8080","metric":"cache/delta/timeouts","value":0}]

Xavier

unread,
Nov 6, 2014, 7:28:57 PM11/6/14
to druid-de...@googlegroups.com
Hi Venkatesh, it looks like your node is stuck downloading segments from HDFS (see line 71 in your thread dump).
This appears to be happening inside of the Hadoop HDFS client and is out of the control of Druid.
Maybe there is a hadoop configuration parameter you can set to timeout when reading from HDFS? That would prevent things getting stuck.

Alternatively, you can increase the number of loading threads druid.segmentCache.numLoadingThreads to avoid having a stuck HDFS read from stopping all segment loading.

...
Message has been deleted

Fangjin Yang

unread,
Nov 6, 2014, 8:34:03 PM11/6/14
to druid-de...@googlegroups.com
Looking through the logs again, there just seem to be a bunch of network problems.

2014-11-06 23:04:17,286 WARN [ZkCoordinator-0] org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : 

A segment fails to be downloaded. Druid tries to cleanup anything partially downloaded during exceptions. In this case there's nothing and the cleanup process throws an error message. This is expected.

  1. 2014-11-06 23:05:17,529 INFO [ZkCoordinator-0] org.apache.hadoop.ipc.Client - Retrying connect to server: apollo-phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 2 time(s); maxRetries=45

Here it just looks like we are just going in a loop trying to connect to HDFS and failing over and over again. Most of the code doing this is outside of Druid as Xavier mentioned.

Venkatesh Kavuluri

unread,
Nov 6, 2014, 10:51:19 PM11/6/14
to druid-de...@googlegroups.com
Thanks Xavier, Fangjin. I think the issue is related to RemoteBlockReader2 in the HDFS client we are using : https://issues.apache.org/jira/browse/HDFS-6999

@Xavier, I have set druid.segmentCache.numLoadingThreads to 7 and all the threads were eventually blocked for ever after some download activity.  

I am wondering why there is no timeout on caller side on Compute node while downloading, like here : io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:57)

Fangjin Yang

unread,
Nov 7, 2014, 1:34:28 PM11/7/14
to druid-de...@googlegroups.com
I'd be on board with a compute side timeout. Happy to take PR on the improvement :)
Reply all
Reply to author
Forward
0 new messages