Spark Standalone - Master + Worker + Driver connectivity

245 views
Skip to first unread message

Grega Kešpret

unread,
Dec 9, 2013, 9:23:25 AM12/9/13
to spark...@googlegroups.com
Hi,
From our experience, {Master, Worker, Driver} all need connectivity between each other. 

However, we have the following use case. We would like to run our application/spark-shell from driver node and use Spark Standalone cluster for master endpoint. Driver node will be able to establish connection to Master, but not the other way around. Master and Worker nodes will have connectivity among each other. The only problems are:
1. Master -> Driver
2. Workers -> Driver

We cannot easily guarantee this, but we want each developer to be able to run spark jobs locally from his dev VM at the same time.

How do others in the industry do this? Do you have your developers ssh to some staging machine, from where it is "safe" to run jobs? Is there some better solution to this? 

Thanks,

Grega
--
Inline image 1
Grega Kešpret
Analytics engineer

Celtra — Rich Media Mobile Advertising
celtra.com | @celtramobile
celtra_logo.png

MLnick

unread,
Dec 9, 2013, 1:34:05 PM12/9/13
to spark...@googlegroups.com
Not sure things will work if the master cannot communicate with the driver program.

Typically I would collocate driver programs on the master node (actually I submit jobs to a job server similar to this one: https://github.com/apache/incubator-spark/pull/222).

Jobs can be tested locally, and for production either submitted via the job server interface or ssh into the master node and run the jobs from there. I store jars in s3, download them and add to spark class path as required.

This is similar to how I used to do things with Hadoop.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages