worker connect to master failed

2,941 views
Skip to first unread message

Jinliang Wei

unread,
Dec 3, 2012, 6:22:15 PM12/3/12
to spark...@googlegroups.com
Hi,

While starting Spark on cluster, I found that I could't connect the worker node to the master.

I started the master with "./run spark.deploy.master.Master". It succeeded and printed out:

12/12/03 18:05:45 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
12/12/03 18:05:45 INFO master.Master: Starting Spark master at spark://127.0.1.1:7077
12/12/03 18:05:46 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
12/12/03 18:05:46 INFO server.HttpServer: akka://spark/user/HttpServer started on /0.0.0.0:8080


Then when I start the worker on another VM (same copy of vm disk as master), I started it with "./run spark.deploy.worker.Worker spark://127.0.1.1:7077", it failed to connect to the server.
I noticed that worker started with the same IP, 127.0.1.1. Here's the message from the worker:

12/12/03 18:11:45 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
12/12/03 18:11:45 INFO worker.Worker: Starting Spark worker 127.0.1.1:35754 with 1 cores, 989.0 MB RAM
12/12/03 18:11:45 INFO worker.Worker: Spark home: /h/jinlianw/ml-framework/spark/spark-0.6.0/spark-0.6.0
12/12/03 18:11:45 INFO worker.Worker: Connecting to master spark://127.0.1.1:7077
12/12/03 18:11:45 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
12/12/03 18:11:46 INFO server.HttpServer: akka://spark/user/HttpServer started on /0.0.0.0:8081
12/12/03 18:11:46 ERROR worker.Worker: Connection to master failed! Shutting down.

I wondered if you could help see what I did wrong. Thanks!

Jinliang

Josh Rosen

unread,
Dec 3, 2012, 6:34:56 PM12/3/12
to spark...@googlegroups.com
It looks like the master is listening on 127.0.1.1, which mapped to the local loopback device.

Try setting the SPARK_MASTER_IP environment variable to bind the master to a public address, e.g. 

SPARK_MASTER_IP=a_public_ip_or_hostname ./run spark.deploy.master.Master

If you use Spark's standalone cluster launch scripts, then you can set this option in spark-env.sh (see http://www.spark-project.org/docs/0.6.0/spark-standalone.html).

- Josh

Jinliang Wei

unread,
Dec 3, 2012, 11:00:38 PM12/3/12
to spark...@googlegroups.com
Thanks.

I did that. It did not work.

However, I tried passing the IP address with the -i argument. It worked. Just curious about why.

Matei Zaharia

unread,
Dec 4, 2012, 6:07:13 PM12/4/12
to spark...@googlegroups.com
The problem is that your machine is configured to resolve its own DNS name to 127.0.1.1 instead of its external IP address (in /etc/hosts). Ubuntu and maybe Debian do this for some reason. We recently added another way to configure this by setting a SPARK_LOCAL_IP variable (https://github.com/mesos/spark/pull/304) but the way you did it manually also works.

Matei

Jinliang Wei

unread,
Dec 5, 2012, 4:37:57 AM12/5/12
to spark...@googlegroups.com

Thanks.

Does that mean I have to set SPARK_LOCAL_IP per machine if I like to use that option? If so, that is tedious because I have to make a separate image for each virtual instance.

Matei Zaharia

unread,
Dec 6, 2012, 8:52:55 PM12/6/12
to spark...@googlegroups.com
You need to either do that or remove the entry for 127.0.1.1 from /etc/hosts. Maybe there is a Debian command to easily do it. There's not much we can do otherwise because that's what gethostbyname() returns. Other software running on machine configured this way has the same problem.

Matei

Johnpaul Ci

unread,
Jul 18, 2013, 9:06:04 AM7/18/13
to spark...@googlegroups.com
Hello

How to give the worker connectivity command with -i option

kind regards
johnpaul ci
Reply all
Reply to author
Forward
0 new messages