Zookeeper CancelledKeyException while indexing

Jakub Liska

unread,

Aug 31, 2016, 8:30:14 PM8/31/16

to Druid User

Hey,

I'm using the latest implydata quickstart setup to prototype new things in staging environment. I used to submit 4 hadoop indexing tasks in parallel on `m4.2xlarge` instance without any problems using s3 storage, but currently if I submit just 2 tasks concurrently on the identical data I keep getting these Zookeeper exceptions and the only presage is that IoWait extremely increases even though it barely touches disks and bandwidth is at 50% ...

java.nio.channels.CancelledKeyException
 at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
 at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
 at org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1119)
 at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120)
 at org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:620)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:1026)
 at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
 at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)

From that point all the indexing tasks start failing without any error or exception in the task logs or any of druid's logs

I was digging into it for a few hours but no change seem to be effective ... Any idea ?

Fangjin Yang

unread,

Aug 31, 2016, 8:35:29 PM8/31/16

to Druid User

Hi Jakub, where do you see this error from?

Are there no errors in the Druid overlord or task logs?

Jakub Liska

unread,

Sep 5, 2016, 12:04:01 PM9/5/16

to Druid User

Hi Fangjin,

I'm using the implydata quickstart setup, which runs everything in a single container, this exceptions comes from its stdout, all logs are forwarded into it, including zookeepers.

As I mentioned there are no additional errors anywhere, not even in task logs. But everytime I saw this exception the task that was currently executed ended up with FAILED status...

However it stopped happening later, I guess it could be caused by ingesting some "heavier" segments. Later I was able to use 4 tasks concurrently without a problem...

So it's ok now.

Fangjin Yang

unread,

Sep 7, 2016, 5:33:42 PM9/7/16

to Druid User

The imply quickstart should have dedicated logs for every single Druid process. These logs are stored as files in directories. For example, if you have a task fail, can you access the overlord console located at http://localhost:8090/console.html, click on the task log, and include that here?

Reply all

Reply to author

Forward