I was able to connect workers to the master, they show up in the master UI. I did have to specify the exact same IP address.
I got farther by specifying the exact same URL as listed on the master web UI to spark shell. It still errors out though, and it seems master restarted after some failure, but the web UI is now not operative:
13/03/15 21:06:27 INFO Master: Registering job Spark shell
13/03/15 21:06:27 INFO Master: Registered job Spark shell with ID job-20130315210627-0000
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/0 on worker worker-20130314192829-u11-r1.mtv-46401
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/1 on worker worker-20130314192749-u10-r1.mtv-43827
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/2 on worker worker-20130314192342-u9-r1.mtv-52555
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/0 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/3 on worker worker-20130314192829-u11-r1.mtv-46401
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/1 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/4 on worker worker-20130314192749-u10-r1.mtv-43827
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/2 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/5 on worker worker-20130314192342-u9-r1.mtv-52555
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/3 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/6 on worker worker-20130314192829-u11-r1.mtv-46401
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/4 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/7 on worker worker-20130314192749-u10-r1.mtv-43827
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/5 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/8 on worker worker-20130314192342-u9-r1.mtv-52555
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/6 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/9 on worker worker-20130314192829-u11-r1.mtv-46401
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/7 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/10 on worker worker-20130314192749-u10-r1.mtv-43827
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/8 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/11 on worker worker-20130314192342-u9-r1.mtv-52555
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/9 because it is FAILED
13/03/15 21:06:27 INFO Master: Launching executor job-20130315210627-0000/12 on worker worker-20130314192829-u11-r1.mtv-46401
13/03/15 21:06:27 INFO Master: Removing executor job-20130315210627-0000/10 because it is FAILED
13/03/15 21:06:27 ERROR Master: Job Spark shell wth ID job-20130315210627-0000 failed 11 times.
spark.SparkException: Job Spark shell wth ID job-20130315210627-0000 failed 11 times.
at spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:106)
at spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:65)
at akka.actor.Actor$class.apply(Actor.scala:318)
at spark.deploy.master.Master.apply(Master.scala:18)
at akka.actor.ActorCell.invoke(ActorCell.scala:626)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
at akka.dispatch.Mailbox.run(Mailbox.scala:179)
at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
13/03/15 21:06:27 ERROR Master: Job Spark shell wth ID job-20130315210627-0000 failed 11 times.
spark.SparkException: Job Spark shell wth ID job-20130315210627-0000 failed 11 times.
at spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:106)
at spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:65)
at akka.actor.Actor$class.apply(Actor.scala:318)
at spark.deploy.master.Master.apply(Master.scala:18)
at akka.actor.ActorCell.invoke(ActorCell.scala:626)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
at akka.dispatch.Mailbox.run(Mailbox.scala:179)
at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
13/03/15 21:06:27 INFO Master: Starting Spark master at spark://u9-r1.mtv:7077
13/03/15 21:06:27 INFO IoWorker: IoWorker thread 'spray-io-worker-1' started
13/03/15 21:06:27 ERROR Master: Failed to create web UI
akka.actor.InvalidActorNameException:actor name HttpServer is not unique!
[339049e0-8db4-11e2-900b-003048c63b0c]
at akka.actor.ActorCell.actorOf(ActorCell.scala:392)
at akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.liftedTree1$1(ActorRefProvider.scala:394)
at akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.apply(ActorRefProvider.scala:394)
at akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.apply(ActorRefProvider.scala:392)
at akka.actor.Actor$class.apply(Actor.scala:318)
at akka.actor.LocalActorRefProvider$Guardian.apply(ActorRefProvider.scala:388)
at akka.actor.ActorCell.invoke(ActorCell.scala:626)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
at akka.dispatch.Mailbox.run(Mailbox.scala:179)
at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
I'm getting a bunch of errors like this in each worker
java.io.IOException: Cannot run program "/Users/ev/src/vendor/spark-0.6.2/run" (in directory "/opt/spark-0.6.2/work/job-20130315210627-0000/2"): java.io.IOException: error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:126)
at spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:36)
Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
It seems that Spark worker on the cluster somehow got my local Spark run script path, which is weird. Does the local spark-shell need to have the same path as the one on the remote workers?
thanks,
Evan