Nathan,
We were able to reproduce it now in our local dev. environment on
Windows with a somewhat different stack trace but with exactly the
same symptoms. Maybe the new stack trace would give you some ideas?
2012-02-23 19:24:17,242 INFO [main|] backtype.storm.daemon.supervisor
=> Shutting down supervisor caf34abc-db32-45fb-aaad-1bad05fd2517
2012-02-23 19:24:17,242 INFO [Thread-11|] backtype.storm.util
=> Async loop interrupted!
2012-02-23 19:24:17,243 INFO [Thread-12|] backtype.storm.util
=> Async loop interrupted!
2012-02-23 19:24:17,243 INFO [Thread-13|] backtype.storm.util
=> Async loop interrupted!
2012-02-23 19:24:17,243 INFO [Thread-9|] backtype.storm.event
=> Event manager interrupted
2012-02-23 19:24:17,246 INFO [Thread-10|]
backtype.storm.daemon.supervisor
=> Launching worker with assignment
#:backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"testETL-1-1330043033", :task-ids (34 4 40 10 46 16 52 22 28)} for
this supervisor caf34abc-db32-45fb-aaad-1bad05fd2517 on port 2 with id
7770f731-256e-430f-a21b-20f2e4c9a090
2012-02-23 19:24:17,246 INFO [Thread-10|] backtype.storm.daemon.worker
=> Launching worker for testETL-1-1330043033 on caf34abc-db32-45fb-
aaad-1bad05fd2517:2 with id 7770f731-256e-430f-a21b-20f2e4c9a090 and
conf {"topology.fall.back.on.java.serialization" true,
"zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations"
true, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout"
20000, "nimbus.reassign" true, "nimbus.monitor.freq.secs" 10,
"java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"storm.local.dir" "C:\\cygwin\\tmp\\/d813eb61-2f81-4922-b10c-
bba2078c58ed", "supervisor.worker.start.timeout.secs" 240,
"nimbus.cleanup.inbox.freq.secs" 600,
"nimbus.inbox.jar.expiration.secs" 3600, "storm.zookeeper.port" 2181,
"transactional.zookeeper.port" nil, "transactional.zookeeper.servers"
nil, "storm.zookeeper.root" "/storm", "supervisor.enable" true,
"storm.zookeeper.servers" ["localhost"],
"transactional.zookeeper.root" "/transactional",
"topology.worker.childopts" nil, "worker.childopts" "-Xmx768m",
"supervisor.heartbeat.frequency.secs" 5, "drpc.port" 3772,
"supervisor.monitor.frequency.secs" 3, "task.heartbeat.frequency.secs"
3, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval"
1000, "supervisor.slots.ports" (1 2 3), "topology.debug" false,
"nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60,
"topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10,
"topology.workers" 1, "supervisor.childopts" "-Xmx1024m",
"nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
"worker.heartbeat.frequency.secs" 1, "nimbus.task.timeout.secs" 30,
"drpc.invocations.port" 3773, "zmq.threads" 1,
"storm.zookeeper.retry.times" 5,
"topology.state.synchronization.timeout.secs" 60,
"supervisor.worker.timeout.secs" 30,
"nimbus.file.copy.expiration.secs" 600, "storm.local.mode.zmq" false,
"ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "topology.ackers" 1,
"storm.cluster.mode" "local", "topology.optimize" true,
"topology.max.task.parallelism" nil}
2012-02-23 19:24:17,248 ERROR [Thread-10|] backtype.storm.zookeeper
=> Unrecoverable Zookeeper error, halting process: Background
exception was not retry-able or retry gave up
java.io.IOException: Unable to establish loopback connection
at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:122)
at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:
27)
at java.nio.channels.Pipe.open(Pipe.java:133)
at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:
104)
at
sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:
26)
at java.nio.channels.Selector.open(Selector.java:209)
at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:160)
at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:331)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:377)
at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:
72)
at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:
46)
at com.netflix.curator.ConnectionState.reset(ConnectionState.java:
122)
at com.netflix.curator.ConnectionState.start(ConnectionState.java:95)
at
com.netflix.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:
137)
at
com.netflix.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:
167)
at backtype.storm.zookeeper$mk_client.doInvoke(zookeeper.clj:53)
at clojure.lang.RestFn.invoke(RestFn.java:445)
at backtype.storm.cluster
$mk_distributed_cluster_state.invoke(cluster.clj:24)
at backtype.storm.daemon.worker
$fn__3630$exec_fn__1021__auto____3631.invoke(worker.clj:74)
Caused by: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:
184)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:511)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:78)
... 39 more
2012-02-23 19:24:17,250 INFO [Thread-10|] backtype.storm.util
=> Halting process: ("Unrecoverable Zookeeper error")
2012-02-23 19:24:17,250 ERROR [Thread-10|]
backtype.storm.daemon.worker
=> Error on initialization of server mk-worker
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at backtype.storm.util$halt_process_BANG_.doInvoke(util.clj:157)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at backtype.storm.zookeeper$mk_client
$reify__1613.unhandledError(zookeeper.clj:52)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl
$4.apply(CuratorFrameworkImpl.java:430)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl
$4.apply(CuratorFrameworkImpl.java:426)
at
com.netflix.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:
80)
at
com.netflix.curator.framework.imps.CuratorFrameworkImpl.logError(CuratorFrameworkImpl.java:
423)
at
com.netflix.curator.framework.imps.CuratorFrameworkImpl.handleBackgroundOperationException(CuratorFrameworkImpl.java:
512)
at
com.netflix.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:
185)
at backtype.storm.zookeeper$mk_client.doInvoke(zookeeper.clj:53)
at clojure.lang.RestFn.invoke(RestFn.java:445)
at backtype.storm.cluster
$mk_distributed_cluster_state.invoke(cluster.clj:24)
at backtype.storm.daemon.worker
$fn__3630$exec_fn__1021__auto____3631.invoke(worker.clj:74)
2012-02-23 19:24:17,250 INFO [Thread-10|] backtype.storm.util
=> Halting process: ("Error on initialization")
=============================
Thank you so much!
> ...
>
> read more »