Storm stops processing tuples in local mode

Brad Heller

unread,

Sep 4, 2012, 6:51:47 PM9/4/12

to storm...@googlegroups.com

Hello list,

This problem has rather suddenly manifested itself when running in local mode. Storm will be chugging along then…suddenly…all processing stops. It appears as if nextTuple is indeed getting called but nothing is getting emitted by the spout and bolts are not emitting anything.

Then, suddenly, ZooKeeper exceptions appear!

577557 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.server.NIOServerCnxn - EndOfStreamException: Unable to read additional data from client sessionid 0x139936a474c0007, likely client has closed socket

578673 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.server.NIOServerCnxn - EndOfStreamException: Unable to read additional data from client sessionid 0x139936a474c0003, likely client has closed socket

579304 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

579327 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

579962 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.server.NIOServerCnxn - EndOfStreamException: Unable to read additional data from client sessionid 0x139936a474c0005, likely client has closed socket

579962 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

579962 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

580496 [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exception:.

java.nio.channels.CancelledKeyException

at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)

at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)

at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)

at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)

at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:171)

at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161)

at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)

581129 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

580497 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.server.NIOServerCnxn - Exception causing close of session 0x139936a474c0001 due to java.io.IOException: Connection reset by peer

Is it possible that storm can overwhelm ZooKeeper? I *did* adjust parallelism on some bolts…but I wouldn't think this could break anything.

Thanks,

We're hiring! http://cloudability.com/jobs

Nathan Marz

unread,

Sep 5, 2012, 1:05:58 AM9/5/12

to storm...@googlegroups.com

How long are you running it in local mode before this happens?

--
Twitter: @nathanmarz
http://nathanmarz.com

Brad Heller

unread,

Sep 5, 2012, 1:09:43 AM9/5/12

to storm...@googlegroups.com

It's a bit random. A few hundred truples ate processed...but sometimes it doesn't happen at all.

I don't have a huge amount of parallelism in my topology. 4 bolts have 10 threads each, 3 have about 5 threads each, and all the other tasks have the default setting (1 I think?).

Sent with my thumbs

Nathan Marz

unread,

Sep 5, 2012, 1:11:34 AM9/5/12

to storm...@googlegroups.com

OK. So this is just a couple seconds?

Nathan Marz

unread,

Sep 5, 2012, 1:11:55 AM9/5/12

to storm...@googlegroups.com

Also, what version of Storm?

Brad Heller

unread,

Sep 5, 2012, 1:12:43 AM9/5/12

to storm...@googlegroups.com

Eeeyup pretty quick when it occurs. Seconds to perhaps a minute or two. Storm 0.7.4

Sent with my thumbs

Nathan Marz

unread,

Sep 5, 2012, 1:14:39 AM9/5/12

to storm...@googlegroups.com

Moving to 0.8.0 might help, as it puts far less load on ZK and also spawns fewer threads.

Moshe Bixenshpaner

unread,

Oct 13, 2012, 3:43:31 AM10/13/12

to storm...@googlegroups.com

Also happens to me from time to time (inconsistent) and when it happens, it's after a few minutes.

I'm using Storm 0.8.1. I have two topologies running in the same LocalCluster. One creates 2 threads, and the other creates 42 threads.

I have a quad-core (8 threads) processor with 8GB of RAM.

Naresh Kosgi

unread,

Nov 2, 2012, 4:24:04 PM11/2/12

to storm-user

Brad,

Were you able to solve this problem. I am having the same issue
currently.

Thanks,
Naresh

Message has been deleted

Richard Chen

unread,

Apr 2, 2013, 2:05:07 AM4/2/13

to storm...@googlegroups.com, br...@cloudability.com

Has any one solve the problem?

I got exactly the issue with my topology.

My machine RAM is 16GB, and I have 16 bolts, around 30000 threads.

After process about a few hundred tuples, it came out with the storm warnings & my results mixed together:

The minimum distance=13.04 [count:97]: GPS Point falls into Road No. :383816

The minimum distance=21.87

The minimum distance=26.28

The minimum distance=14.38

The minimum distance=35.49

The minimum distance=3.898 [count:98]: GPS Point falls into Road No. :53565

The minimum distance=41.31

The minimum distance=64.25 165392 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

165392 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

165393 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.

165393 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

165393 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.

165393 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

176156 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: RECONNECTED

176156 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

176157 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: RECONNECTED

176157 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

The minimum distance=32.72

The minimum distance=38.74

The minimum distance=4.679 [count:99]: GPS Point falls into Road No. :13619196298 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

196298 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

196298 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.

196298 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

196298 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.

196298 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

The minimum distance=96.80

Hope anyone can help ?

Richard

aboubakr

unread,

Jun 21, 2013, 9:53:20 AM6/21/13

to storm...@googlegroups.com, br...@cloudability.com

did anyone solve this problem, i'am having the same problem, i want to process large data, and it hangs in local cluster mode , please help

Jiaqi Liu

unread,

Oct 18, 2013, 5:46:11 AM10/18/13

to storm...@googlegroups.com, br...@cloudability.com

Have this problem been solved?

I am getting the same problem!

My Strom is Version0.8.1 ~~

在 2012年9月5日星期三UTC+8上午6时51分53秒，Brad Heller写道：

Roberto Coluccio

unread,

Nov 25, 2013, 9:39:24 PM11/25/13

to storm...@googlegroups.com, br...@cloudability.com

Storm 0.8.2, SAME PROBLEM! It's driving me crazy... PLEASE, has anybody solved this problem??

Ilidio Gomes

unread,

Nov 29, 2013, 12:50:12 PM11/29/13

to storm...@googlegroups.com, br...@cloudability.com

Hello,

I am using Storm 0.8.2 and i got the same error.

I launch one topology (not in local mode) and after some hours, that error happens:

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss

at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)

at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)

at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353)

at com.netflix.curator.framework.imps.BackgroundSyncImpl.performBackgroundOperation(BackgroundSyncImpl.java:39)

at com.netflix.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:40)

at com.netflix.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:547)

at com.netflix.curator.framework.imps.CuratorFrameworkImpl.access$200(CuratorFrameworkImpl.java:50)

at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.call(CuratorFrameworkImpl.java:177)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)

2013-11-29 17:11:16 ConnectionState [ERROR] Connection timed out

In the kafka(0.7.2), i got the error:

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000

at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)

at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)

at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)

at kafka.producer.ZKBrokerPartitionInfo.<init>(ZKBrokerPartitionInfo.scala:63)

at kafka.producer.Producer.<init>(Producer.scala:53)

at kafka.javaapi.producer.Producer.<init>(Producer.scala:33)

at kafka.javaapi.producer.Producer.<init>(Producer.scala:40)

I don't know if the error in zookeeper or if in zookeeper because of kafka!

I'm using 5 zookeepers servers(3.4.5).

I am concerned about this, because this error happens randomly and i don't know more what i can do.

Another thing that happens to me is that i'm also using storm-signals and they work nice, but after a while the signals are not deliver to the topology. But i think that it may be associated with the same problem...

Does anyone have some suggestion?

Regards,

Ilídio

Patricio Echagüe

unread,

Nov 29, 2013, 12:55:49 PM11/29/13

to storm-user, br...@cloudability.com

Is your zookeeper in another box or colocated with storm?

Also you can check the CPU and load on ZK to make sure it's not starving for resources.

Are you running in EC2?

Sent from my Nexus 4.

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ilidio Gomes

unread,

Nov 29, 2013, 1:04:39 PM11/29/13

to storm...@googlegroups.com

Hello Patricio Echagüe,

Thanks for the fast replay.

The zookeeper is colocated with storm.

The cluster is simulated in one machine, pointing to 5 zk servers, but in reallity are all in the same machine. And that machine is always with low use of CPU and load.

Regards,

Ilídio

Patricio Echagüe

unread,

Nov 29, 2013, 2:56:58 PM11/29/13

to storm-user

5 zk in one box? That is definitely your problem.

Zk is synchronous on disk so your 5 instances are competing for io. Use 1 zk or 3 but increase the tick timeout.

Sent from my Nexus 4.

Ilidio Gomes

unread,

Dec 2, 2013, 5:13:01 AM12/2/13

to storm...@googlegroups.com

Hi,

I do what you said, and for now, the topology is running without errors and all storm-signals that i send are delivered to the topology.

I'm using the kafka-spout, as spout and in the storm ui, the kafka-spout shows lots of failed. I can't find any error in logs. Where can i check why this failed happens?

I'm a confused about the "Emitted", "Transferred", "Acked", "Failed"... I'm reading N events from csv files into kafka, and at this moment i don't know when they finish because the "Emitted" are always increasing... If i send 1000 events to kafka, how can i check if the 1000 events were "processed" from kafka and sended to bolts?

Thank you

Regards,

Ilídio

kafka-spout-failed.png

Reply all

Reply to author

Forward

Message has been deleted