* I'm now running the zookeeper from CDH4, and have built out and distributed the zeromq and jzmq versions recommended in the wiki.
* I've rebuild the nodes from kickstart to shake out any residual cruft.
* I've ran it as a unpriv'd storm user with open permissions on the local dir, as well as root.
* I've explicitly disabled ipv6 via the childopts, as well as set the nimbus and zookeeper config to be all IPs rather than hostnames to eliminate DNS issues.
* I built RPMs and init scripts for Storm, but to eliminate that as a potential source of issues, I rebuilt the whole 3-node cluster by hand from the binary distro and get the same probs.
I'll be at Velocity next week. Happy to buy someone some beer and food for some advice.
(ro...@storm01.nydc1:~)# storm jar /outbrain/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.ExclamationTopology
Running: java -client -Dstorm.home=/opt/storm-0.7.3 -Dstorm.options= -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.jar=/outbrain/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar -cp /opt/storm-0.7.3/storm-0.7.3.jar:/opt/storm-0.7.3/lib/asm-3.2.jar:/opt/storm-0.7.3/lib/commons-fileupload-1.2.1.jar:/opt/storm-0.7.3/lib/zookeeper-3.3.3.jar:/opt/storm-0.7.3/lib/commons-logging-1.1.1.jar:/opt/storm-0.7.3/lib/tools.logging-0.2.3.jar:/opt/storm-0.7.3/lib/tools.cli-0.2.1.jar:/opt/storm-0.7.3/lib/core.incubator-0.1.0.jar:/opt/storm-0.7.3/lib/math.numeric-tower-0.0.1.jar:/opt/storm-0.7.3/lib/clj-time-0.4.1.jar:/opt/storm-0.7.3/lib/slf4j-api-1.5.8.jar:/opt/storm-0.7.3/lib/tools.macro-0.1.0.jar:/opt/storm-0.7.3/lib/servlet-api-2.5.jar:/opt/storm-0.7.3/lib/log4j-1.2.16.jar:/opt/storm-0.7.3/lib/jetty-6.1.26.jar:/opt/storm-0.7.3/lib/guava-10.0.1.jar:/opt/storm-0.7.3/lib/jsr305-1.3.9.jar:/opt/storm-0.7.3/lib/jzmq-2.1.0.jar:/opt/storm-0.7.3/lib/slf4j-log4j12-1.5.8.jar:/opt/storm-0.7.3/lib/hiccup-0.3.6.jar:/opt/storm-0.7.3/lib/commons-lang-2.5.jar:/opt/storm-0.7.3/lib/servlet-api-2.5-20081211.jar:/opt/storm-0.7.3/lib/curator-client-1.0.1.jar:/opt/storm-0.7.3/lib/curator-framework-1.0.1.jar:/opt/storm-0.7.3/lib/commons-exec-1.1.jar:/opt/storm-0.7.3/lib/commons-codec-1.4.jar:/opt/storm-0.7.3/lib/joda-time-2.0.jar:/opt/storm-0.7.3/lib/reflectasm-1.01.jar:/opt/storm-0.7.3/lib/jline-0.9.94.jar:/opt/storm-0.7.3/lib/ring-core-0.3.10.jar:/opt/storm-0.7.3/lib/minlog-1.2.jar:/opt/storm-0.7.3/lib/kryo-1.04.jar:/opt/storm-0.7.3/lib/commons-io-1.4.jar:/opt/storm-0.7.3/lib/snakeyaml-1.9.jar:/opt/storm-0.7.3/lib/carbonite-1.0.1.jar:/opt/storm-0.7.3/lib/libthrift7-0.7.0.jar:/opt/storm-0.7.3/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm-0.7.3/lib/compojure-0.6.4.jar:/opt/storm-0.7.3/lib/jetty-util-6.1.26.jar:/opt/storm-0.7.3/lib/httpcore-4.1.jar:/opt/storm-0.7.3/lib/httpclient-4.1.1.jar:/opt/storm-0.7.3/lib/ring-servlet-0.3.11.jar:/opt/storm-0.7.3/lib/clojure-1.4.0.jar:/opt/storm-0.7.3/lib/json-simple-1.1.jar:/opt/storm-0.7.3/lib/clout-0.4.1.jar:/opt/storm-0.7.3/lib/junit-3.8.1.jar:/outbrain/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar:/root/.storm:/opt/storm-0.7.3/bin storm.starter.ExclamationTopology
0 [main] INFO backtype.storm.zookeeper - Starting inprocess zookeeper at port 2000 and dir /tmp/04379999-74ae-49ea-95fc-3d048e520ca6
204 [main] INFO backtype.storm.daemon.nimbus - Starting Nimbus with conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.fall.back.on.java.serialization" true, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "nimbus.monitor.freq.secs" 10, "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "storm.local.dir" "/tmp/1a9a2d49-cf65-4ae3-8555-285c510ebcee", "supervisor.worker.start.timeout.secs" 120, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "nimbus.host" "localhost", "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "supervisor.enable" true, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.worker.childopts" nil, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "task.heartbeat.frequency.secs" 3, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "supervisor.slots.ports" [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx1024m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "nimbus.task.timeout.secs" 30, "drpc.invocations.port" 3773, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "topology.ackers" 1, "storm.cluster.mode" "local", "topology.optimize" true, "topology.max.task.parallelism" nil}
236 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
281 [main-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
315 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
360 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
368 [main-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
373 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
375 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
379 [main-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
382 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
404 [main] INFO backtype.storm.daemon.supervisor - Starting Supervisor with conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.fall.back.on.java.serialization" true, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "nimbus.monitor.freq.secs" 10, "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "storm.local.dir" "/tmp/c0e92e8e-640f-42f5-9cc7-221d53c4f203", "supervisor.worker.start.timeout.secs" 120, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "nimbus.host" "localhost", "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "supervisor.enable" true, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.worker.childopts" nil, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "task.heartbeat.frequency.secs" 3, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "supervisor.slots.ports" (1 2 3), "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx1024m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "nimbus.task.timeout.secs" 30, "drpc.invocations.port" 3773, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "topology.ackers" 1, "storm.cluster.mode" "local", "topology.optimize" true, "topology.max.task.parallelism" nil}
419 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
422 [main-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
434 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
471 [main] INFO backtype.storm.daemon.supervisor - Starting supervisor with id 169977fe-d7db-40e1-b6f1-2d6f4501ad82 at host
ob1065046.nydc1.outbrain.com
474 [main] INFO backtype.storm.daemon.supervisor - Starting Supervisor with conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.fall.back.on.java.serialization" true, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "nimbus.monitor.freq.secs" 10, "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "storm.local.dir" "/tmp/b795d26b-7726-42c8-8a59-67455e1098b1", "supervisor.worker.start.timeout.secs" 120, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "nimbus.host" "localhost", "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "supervisor.enable" true, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.worker.childopts" nil, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "task.heartbeat.frequency.secs" 3, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "supervisor.slots.ports" (4 5 6), "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx1024m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "nimbus.task.timeout.secs" 30, "drpc.invocations.port" 3773, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "topology.ackers" 1, "storm.cluster.mode" "local", "topology.optimize" true, "topology.max.task.parallelism" nil}
476 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
479 [main-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
482 [main] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
501 [main] INFO backtype.storm.daemon.supervisor - Starting supervisor with id 9700cb2f-3e8e-40f9-88f5-cadb8c65a4d5 at host
ob1065046.nydc1.outbrain.com
555 [main] INFO backtype.storm.daemon.nimbus - Received topology submission for test with conf {"topology.ackers" 1, "topology.kryo.register" nil, "
topology.name" "test", "
storm.id" "test-1-1340315323", "topology.debug" true}
684 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:1 timed out
686 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:2 timed out
687 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:3 timed out
689 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:4 timed out
690 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:5 timed out
691 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:6 timed out
692 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:7 timed out
693 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:8 timed out
694 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:9 timed out
696 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:10 timed out
697 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:11 timed out
698 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:12 timed out
699 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:13 timed out
700 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:14 timed out
702 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:15 timed out
703 [main] INFO backtype.storm.daemon.nimbus - Task test-1-1340315323:16 timed out
713 [main] INFO backtype.storm.daemon.nimbus - Reassigning test-1-1340315323 to 1 slots
713 [main] INFO backtype.storm.daemon.nimbus - Reassign ids: [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]
714 [main] INFO backtype.storm.daemon.nimbus - Available slots: (["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3] ["9700cb2f-3e8e-40f9-88f5-cadb8c65a4d5" 6] ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 1] ["9700cb2f-3e8e-40f9-88f5-cadb8c65a4d5" 4] ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 2] ["9700cb2f-3e8e-40f9-88f5-cadb8c65a4d5" 5])
734 [main] INFO backtype.storm.daemon.nimbus - Setting new assignment for storm id test-1-1340315323: #backtype.storm.daemon.common.Assignment{:master-code-dir "/tmp/1a9a2d49-cf65-4ae3-8555-285c510ebcee/nimbus/stormdist/test-1-1340315323", :node->host {"169977fe-d7db-40e1-b6f1-2d6f4501ad82" "
ob1065046.nydc1.outbrain.com"}, :task->node+port {1 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 2 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 3 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 4 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 5 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 6 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 7 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 8 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 9 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 10 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 11 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 12 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 13 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 14 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 15 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3], 16 ["169977fe-d7db-40e1-b6f1-2d6f4501ad82" 3]}, :task->start-time-secs {1 1340315324, 2 1340315324, 3 1340315324, 4 1340315324, 5 1340315324, 6 1340315324, 7 1340315324, 8 1340315324, 9 1340315324, 10 1340315324, 11 1340315324, 12 1340315324, 13 1340315324, 14 1340315324, 15 1340315324, 16 1340315324}}
748 [main] INFO backtype.storm.daemon.nimbus - Activating test: test-1-1340315323
1456 [Thread-5] INFO backtype.storm.daemon.supervisor - Downloading code for storm id test-1-1340315323 from /tmp/1a9a2d49-cf65-4ae3-8555-285c510ebcee/nimbus/stormdist/test-1-1340315323
1499 [Thread-8] INFO backtype.storm.daemon.supervisor - Downloading code for storm id test-1-1340315323 from /tmp/1a9a2d49-cf65-4ae3-8555-285c510ebcee/nimbus/stormdist/test-1-1340315323
1704 [Thread-5] INFO backtype.storm.daemon.supervisor - Extracting resources from jar at /outbrain/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar to /tmp/c0e92e8e-640f-42f5-9cc7-221d53c4f203/supervisor/stormdist/test-1-1340315323/resources
1710 [Thread-8] INFO backtype.storm.daemon.supervisor - Extracting resources from jar at /outbrain/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar to /tmp/b795d26b-7726-42c8-8a59-67455e1098b1/supervisor/stormdist/test-1-1340315323/resources
1717 [Thread-5] INFO backtype.storm.daemon.supervisor - Finished downloading code for storm id test-1-1340315323 from /tmp/1a9a2d49-cf65-4ae3-8555-285c510ebcee/nimbus/stormdist/test-1-1340315323
1717 [Thread-8] INFO backtype.storm.daemon.supervisor - Finished downloading code for storm id test-1-1340315323 from /tmp/1a9a2d49-cf65-4ae3-8555-285c510ebcee/nimbus/stormdist/test-1-1340315323
1730 [Thread-6] ERROR backtype.storm.event - Error when processing event