Spark Standalone - Master + Worker + Driver connectivity

Grega Kešpret

unread,

Dec 9, 2013, 9:23:25 AM12/9/13

to spark...@googlegroups.com

Hi,

From our experience, {Master, Worker, Driver} all need connectivity between each other.

However, we have the following use case. We would like to run our application/spark-shell from driver node and use Spark Standalone cluster for master endpoint. Driver node will be able to establish connection to Master, but not the other way around. Master and Worker nodes will have connectivity among each other. The only problems are:

1. Master -> Driver

2. Workers -> Driver

We cannot easily guarantee this, but we want each developer to be able to run spark jobs locally from his dev VM at the same time.

How do others in the industry do this? Do you have your developers ssh to some staging machine, from where it is "safe" to run jobs? Is there some better solution to this?

Thanks,

Grega

--

Grega Kešpret
Analytics engineer

Celtra — Rich Media Mobile Advertising

celtra.com | @celtramobile

celtra_logo.png

MLnick

unread,

Dec 9, 2013, 1:34:05 PM12/9/13

to spark...@googlegroups.com

Not sure things will work if the master cannot communicate with the driver program.

Typically I would collocate driver programs on the master node (actually I submit jobs to a job server similar to this one: https://github.com/apache/incubator-spark/pull/222).

Jobs can be tested locally, and for production either submitted via the job server interface or ssh into the master node and run the jobs from there. I store jars in s3, download them and add to spark class path as required.