Storm stops processing tuples in local mode

2,300 views
Skip to first unread message

Brad Heller

unread,
Sep 4, 2012, 6:51:47 PM9/4/12
to storm...@googlegroups.com
Hello list,

This problem has rather suddenly manifested itself when running in local mode. Storm will be chugging along then…suddenly…all processing stops. It appears as if nextTuple is indeed getting called but nothing is getting emitted by the spout and bolts are not emitting anything.

Then, suddenly, ZooKeeper exceptions appear!

577557 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN  org.apache.zookeeper.server.NIOServerCnxn  - EndOfStreamException: Unable to read additional data from client sessionid 0x139936a474c0007, likely client has closed socket
578673 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN  org.apache.zookeeper.server.NIOServerCnxn  - EndOfStreamException: Unable to read additional data from client sessionid 0x139936a474c0003, likely client has closed socket
579304 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
579304 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
579327 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager  - There are no ConnectionStateListeners registered.
579327 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager  - There are no ConnectionStateListeners registered.
579962 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN  org.apache.zookeeper.server.NIOServerCnxn  - EndOfStreamException: Unable to read additional data from client sessionid 0x139936a474c0005, likely client has closed socket
579962 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
579962 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager  - There are no ConnectionStateListeners registered.
580496 [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn  - Unexpected Exception:.
java.nio.channels.CancelledKeyException
  at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
  at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
  at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
  at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
  at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:171)
  at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161)
  at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
581129 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager  - State change: SUSPENDED
580497 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x139936a474c0001 due to java.io.IOException: Connection reset by peer

Is it possible that storm can overwhelm ZooKeeper? I *did* adjust parallelism on some bolts…but I wouldn't think this could break anything.

Thanks,

Brad Heller | Engineering Lead | Cloudability.com | 541-231-1514 | Skype: brad.heller | @bradhe | @cloudability


Nathan Marz

unread,
Sep 5, 2012, 1:05:58 AM9/5/12
to storm...@googlegroups.com
How long are you running it in local mode before this happens?
--
Twitter: @nathanmarz
http://nathanmarz.com

Brad Heller

unread,
Sep 5, 2012, 1:09:43 AM9/5/12
to storm...@googlegroups.com
It's a bit random. A few hundred truples ate processed...but sometimes it doesn't happen at all.

I don't have a huge amount of parallelism in my topology. 4 bolts have 10 threads each, 3 have about 5 threads each, and all the other tasks have the default setting (1 I think?).

Sent with my thumbs

Nathan Marz

unread,
Sep 5, 2012, 1:11:34 AM9/5/12
to storm...@googlegroups.com
OK. So this is just a couple seconds?

Nathan Marz

unread,
Sep 5, 2012, 1:11:55 AM9/5/12
to storm...@googlegroups.com
Also, what version of Storm?

Brad Heller

unread,
Sep 5, 2012, 1:12:43 AM9/5/12
to storm...@googlegroups.com
Eeeyup pretty quick when it occurs. Seconds to perhaps a minute or two. Storm 0.7.4

Sent with my thumbs

Nathan Marz

unread,
Sep 5, 2012, 1:14:39 AM9/5/12
to storm...@googlegroups.com
Moving to 0.8.0 might help, as it puts far less load on ZK and also spawns fewer threads.

Moshe Bixenshpaner

unread,
Oct 13, 2012, 3:43:31 AM10/13/12
to storm...@googlegroups.com
Also happens to me from time to time (inconsistent) and when it happens, it's after a few minutes.
I'm using Storm 0.8.1. I have two topologies running in the same LocalCluster. One creates 2 threads, and the other creates 42 threads.
I have a quad-core (8 threads) processor with 8GB of RAM.

Naresh Kosgi

unread,
Nov 2, 2012, 4:24:04 PM11/2/12
to storm-user
Brad,

Were you able to solve this problem. I am having the same issue
currently.

Thanks,
Naresh
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Richard Chen

unread,
Apr 2, 2013, 2:05:07 AM4/2/13
to storm...@googlegroups.com, br...@cloudability.com
Has any one solve the problem?


I got exactly the issue with my topology.

My machine RAM is 16GB,  and I have 16 bolts, around 30000 threads.

After process about a few hundred tuples, it came out with the storm warnings & my results mixed together:

The minimum distance=13.04 [count:97]: GPS Point falls into Road No. :383816
The minimum distance=21.87
The minimum distance=26.28
The minimum distance=14.38
The minimum distance=35.49
The minimum distance=3.898 [count:98]: GPS Point falls into Road No. :53565
The minimum distance=41.31
The minimum distance=64.25 165392 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
165392 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
165393 [main-EventThread] WARN  backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
165393 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
165393 [main-EventThread] WARN  backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
165393 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
176156 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager - State change: RECONNECTED
176156 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
176157 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager - State change: RECONNECTED
176157 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

The minimum distance=32.72
The minimum distance=38.74
The minimum distance=4.679 [count:99]: GPS Point falls into Road No. :13619196298 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
196298 [main-EventThread] INFO  com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
196298 [main-EventThread] WARN  backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
196298 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
196298 [main-EventThread] WARN  backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
196298 [ConnectionStateManager-0] WARN  com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

The minimum distance=96.80


Hope anyone can help ?

Richard

aboubakr

unread,
Jun 21, 2013, 9:53:20 AM6/21/13
to storm...@googlegroups.com, br...@cloudability.com
did anyone solve this problem, i'am having the same problem, i want to process large data, and it hangs in local cluster mode ,  please help

Jiaqi Liu

unread,
Oct 18, 2013, 5:46:11 AM10/18/13
to storm...@googlegroups.com, br...@cloudability.com
Have this problem been solved?
I am getting the same problem!
My Strom is Version0.8.1 ~~


在 2012年9月5日星期三UTC+8上午6时51分53秒,Brad Heller写道:

Roberto Coluccio

unread,
Nov 25, 2013, 9:39:24 PM11/25/13
to storm...@googlegroups.com, br...@cloudability.com

Storm 0.8.2, SAME PROBLEM! It's driving me crazy... PLEASE, has anybody solved this problem??

Ilidio Gomes

unread,
Nov 29, 2013, 12:50:12 PM11/29/13
to storm...@googlegroups.com, br...@cloudability.com
Hello,

I am using Storm 0.8.2 and i got the same error.
I launch one topology (not in local mode) and after some hours, that error happens:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
        at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353)
        at com.netflix.curator.framework.imps.BackgroundSyncImpl.performBackgroundOperation(BackgroundSyncImpl.java:39)
        at com.netflix.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:40)
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:547)
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl.access$200(CuratorFrameworkImpl.java:50)
        at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.call(CuratorFrameworkImpl.java:177)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
2013-11-29 17:11:16 ConnectionState [ERROR] Connection timed out

In the kafka(0.7.2), i got the error:
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000
        at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
        at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
        at kafka.producer.ZKBrokerPartitionInfo.<init>(ZKBrokerPartitionInfo.scala:63)
        at kafka.producer.Producer.<init>(Producer.scala:53)
        at kafka.javaapi.producer.Producer.<init>(Producer.scala:33)
        at kafka.javaapi.producer.Producer.<init>(Producer.scala:40)

I don't know if the error in zookeeper or if in zookeeper because of kafka!
I'm using 5 zookeepers servers(3.4.5).
I am concerned about this, because this error happens randomly and i don't know more what i can do.
Another thing that happens to me is that i'm also using storm-signals and they work nice, but after a while the signals are not deliver to the topology. But i think that it may be associated with the same problem...
Does anyone have some suggestion?

Regards,
Ilídio

Patricio Echagüe

unread,
Nov 29, 2013, 12:55:49 PM11/29/13
to storm-user, br...@cloudability.com

Is your zookeeper in another box or colocated with storm?

Also you can check the CPU and load on ZK to make sure it's not starving for resources.

Are you running in EC2?

Sent from my Nexus 4.

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ilidio Gomes

unread,
Nov 29, 2013, 1:04:39 PM11/29/13
to storm...@googlegroups.com
Hello Patricio Echagüe,
Thanks for the fast replay.
The zookeeper is colocated with storm.
The cluster is simulated in one machine, pointing to 5 zk servers, but in reallity are all in the same machine. And that machine is always with low use of CPU and load.
 
Regards,
Ilídio

Patricio Echagüe

unread,
Nov 29, 2013, 2:56:58 PM11/29/13
to storm-user

5 zk in one box? That is definitely your problem.

Zk is synchronous on disk so your 5 instances are competing for io. Use 1 zk or 3 but increase the tick timeout.

Sent from my Nexus 4.

Ilidio Gomes

unread,
Dec 2, 2013, 5:13:01 AM12/2/13
to storm...@googlegroups.com
Hi,
 
I do what you said, and for now, the topology is running without errors and all storm-signals that i send are delivered to the topology.
 
I'm using the kafka-spout, as spout and in the storm ui, the kafka-spout shows lots of failed. I can't find any error in logs. Where can i check why this failed happens?
 
I'm a confused about the "Emitted", "Transferred", "Acked", "Failed"... I'm reading N events from csv files into kafka, and at this moment i don't know when they finish because the "Emitted" are always increasing... If i send 1000 events to kafka, how can i check if the 1000 events were "processed" from kafka and sended to bolts?
 
Thank you
 
Regards,
Ilídio
 
 
kafka-spout-failed.png
Reply all
Reply to author
Forward
Message has been deleted
0 new messages