2013-06-27 11:41:43 worker [ERROR] Error when processing event
org.zeromq.ZMQException: Too many open files(0x18)
at org.zeromq.ZMQ$Socket.construct(Native Method)
at org.zeromq.ZMQ$Socket.<init>(ZMQ.java:886)
at org.zeromq.ZMQ$Context.socket(ZMQ.java:276)
at zilch.mq$socket.invoke(mq.clj:52)
at backtype.storm.messaging.zmq.ZMQContext.connect(zmq.clj:62)
at
backtype.storm.daemon.worker$mk_refresh_connections$this__4289$iter__4296__4300$fn__4301.invoke(worker.clj:244)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.Cons.next(Cons.java:39)
at clojure.lang.RT.next(RT.java:587)
at clojure.core$next.invoke(core.clj:64)
at clojure.core$dorun.invoke(core.clj:2726)
at clojure.core$doall.invoke(core.clj:2741)
at
backtype.storm.daemon.worker$mk_refresh_connections$this__4289.invoke(worker.clj:238)
at
backtype.storm.daemon.worker$mk_refresh_connections$this__4289.invoke(worker.clj:218)
at backtype.storm.timer$mk_timer$fn__1820$fn__1821.invoke(timer.clj:33)
at backtype.storm.timer$mk_timer$fn__1820.invoke(timer.clj:26)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:722)
2013-06-27 11:41:43 util [INFO] Halting process: ("Error when processing an
event")
I suspect that Storm worker's refresh connection timer is causing the problem.
We are using the default task.refresh.poll.secs (10 sec). When a machine is loaded
with large task threads (12 workers), a timer thread for bolt1/bolt2 may not
be able completely establish ZMQ connections to 400+ bolt3. Another timer thread
may then kick in and eventually cause "too many open files".
Nathan, do you see other potential causes? I assume that we don't need to work
about thread safety of zmq here.
Andy
--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
backtype.storm.daemon.worker$mk_refresh_connections$this__4289.invoke(worker.clj:238) backtype.storm.daemon.worker$mk_refresh_connections$this__4289.invoke(worker.clj:218) backtype.storm.timer$mk_timer$fn__1820$fn__1821.invoke(timer.clj:33)I suspect that the issue is caused by the