How to config the spark to run with Sparrow?

Junfeng Liu

unread,

Oct 16, 2014, 11:10:05 AM10/16/14

to sparrow-sch...@googlegroups.com

I did not find any related code in spark project

Kay Ousterhout

unread,

Oct 16, 2014, 1:12:53 PM10/16/14

to sparrow-sch...@googlegroups.com

On Thu, Oct 16, 2014 at 10:10 AM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote:

Hi,

Unfortunately, right now, to run Spark with Sparrow, you need to use a forked version Spark (available here: https://github.com/kayousterhout/spark/tree/sparrow) that includes the code to interface with Sparrow (the Sparrow code is available here: https://github.com/radlab/sparrow).

-Kay

On Thu, Oct 16, 2014 at 8:10 AM, Junfeng Liu <junfe...@gmail.com> wrote:

I did not find any related code in spark project

--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Junfeng Liu

unread,

Oct 16, 2014, 9:52:05 PM10/16/14

to sparrow-sch...@googlegroups.com

Kay, thanks for quick reply, any guide line how can I make the stuff work?

Another question, I listen the presentation on spark summit. Looks like the motivation of sparrow is to create a decentralized scheduler.. But when I play with spark on yarn today, the spark driver always create its own spark context with scheduler inside, even two user submit same driver. And those scheduler dose not cooperate with each other. I can only use the YARN resource policy to constrain the driver not keep too many resources, but scheduler more like works as greedy way to ask resource as much as possible. It dose not same like the assumption that all the user use the single spark context and single scheduler to distribute tasks. I am not sure if I mis-understanding something here?

On Friday, October 17, 2014 1:12:53 AM UTC+8, Kay Ousterhout wrote:

On Thu, Oct 16, 2014 at 10:10 AM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote:

Hi,

Unfortunately, right now, to run Spark with Sparrow, you need to use a forked version Spark (available here: https://github.com/kayousterhout/spark/tree/sparrow) that includes the code to interface with Sparrow (the Sparrow code is available here: https://github.com/radlab/sparrow).

-Kay

On Thu, Oct 16, 2014 at 8:10 AM, Junfeng Liu <junfe...@gmail.com> wrote:

I did not find any related code in spark project

--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.

Kay Ousterhout

unread,

Oct 17, 2014, 3:56:30 PM10/17/14

to sparrow-sch...@googlegroups.com

You're completely right that this is how Spark works with all schedulers *except* Sparrow. Sparrow works differently, where resource sharing across different drivers is much more fluid. With Sparrow, exactly one Spark executor is launched on each worker machine, and that executor may be used for jobs submitted by many different Spark drivers.

With a typical Spark scheduler, when you start a new driver, you pass a URL of the centralized scheduler, which the driver greedily takes resources from, and then the driver uses those resources for the entire time its running. With Sparrow, instead, when you start a driver, you pass a list of all of the workers (and there is no centralized scheduler). Each driver communicates directly with the workers to do it's own scheduling, and a given worker may be running 8 tasks for user A and 0 for user B at one point, and then later be shared equally between A and B. The Sparrow scheduling mechanism handles this -- so resources will be shared based on current user demands.

To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.

Junfeng Liu

unread,

Oct 18, 2014, 7:06:48 AM10/18/14

to sparrow-sch...@googlegroups.com

I see, so the scheduler here refer to the YARN/Mesos or Spark Master, but not the scheduler inside the driver like TaskScheduler, is it correct? Dose it mean Sparrow combine the resource management and task management together?

In the presentation (https://www.youtube.com/watch?v=ayjH_bG-RC0), the chart 3 shows multiple user submit the tasks to the same spark context. It is refer to the same resource manager like YARN/Mesos? I was a little confusing if we really can have multiple driver share the same SparkContext

On Saturday, October 18, 2014 3:56:30 AM UTC+8, Kay Ousterhout wrote:

You're completely right that this is how Spark works with all schedulers *except* Sparrow. Sparrow works differently, where resource sharing across different drivers is much more fluid. With Sparrow, exactly one Spark executor is launched on each worker machine, and that executor may be used for jobs submitted by many different Spark drivers.

With a typical Spark scheduler, when you start a new driver, you pass a URL of the centralized scheduler, which the driver greedily takes resources from, and then the driver uses those resources for the entire time its running. With Sparrow, instead, when you start a driver, you pass a list of all of the workers (and there is no centralized scheduler). Each driver communicates directly with the workers to do it's own scheduling, and a given worker may be running 8 tasks for user A and 0 for user B at one point, and then later be shared equally between A and B. The Sparrow scheduling mechanism handles this -- so resources will be shared based on current user demands.

On Thu, Oct 16, 2014 at 6:52 PM, Junfeng Liu <junfe...@gmail.com> wrote:

Kay, thanks for quick reply, any guide line how can I make the stuff work?

Another question, I listen the presentation on spark summit. Looks like the motivation of sparrow is to create a decentralized scheduler.. But when I play with spark on yarn today, the spark driver always create its own spark context with scheduler inside, even two user submit same driver. And those scheduler dose not cooperate with each other. I can only use the YARN resource policy to constrain the driver not keep too many resources, but scheduler more like works as greedy way to ask resource as much as possible. It dose not same like the assumption that all the user use the single spark context and single scheduler to distribute tasks. I am not sure if I mis-understanding something here?

On Friday, October 17, 2014 1:12:53 AM UTC+8, Kay Ousterhout wrote:

On Thu, Oct 16, 2014 at 10:10 AM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote:

Hi,

Unfortunately, right now, to run Spark with Sparrow, you need to use a forked version Spark (available here: https://github.com/kayousterhout/spark/tree/sparrow) that includes the code to interface with Sparrow (the Sparrow code is available here: https://github.com/radlab/sparrow).

-Kay

On Thu, Oct 16, 2014 at 8:10 AM, Junfeng Liu <junfe...@gmail.com> wrote:

I did not find any related code in spark project

--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsubscr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Kay Ousterhout

unread,

Oct 20, 2014, 1:03:22 AM10/20/14

to sparrow-sch...@googlegroups.com

You're right that Sparrow combines resource management and task management.

In Spark, you can have multiple users submit tasks to the same Spark context (and the same driver) when running Shark or SparkSQL (with a JDBC/ODBC driver); that's the case shown in the slides.

-Kay

To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.

Junfeng Liu

unread,

Nov 17, 2014, 10:46:08 PM11/17/14

to sparrow-sch...@googlegroups.com

Kay.. Thanks for the information.. Is the SparkSQL only service to share the spark context and do the centralized scheduling? What about other workload like stream, graphx, mlib, blinkdb?

Kay Ousterhout

unread,

Nov 18, 2014, 2:03:17 PM11/18/14

to sparrow-sch...@googlegroups.com

For all of these workloads, it's certainly theoretically possible to share a Spark context. However, I'm not sure to what extent that's possible with the current APIs.

-Kay

To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.

Jahid Hasan

unread,

Dec 19, 2021, 8:45:01 PM12/19/21

to Sparrow Users

I just recently read the stuffs available on Sparrow, May I know how can I run sparrow code and integrate with Spark, some steps or instructions to follow properly.

Kay Ousterhout

unread,

Dec 20, 2021, 1:52:16 PM12/20/21

to sparrow-sch...@googlegroups.com

The instructions to run Sparrow are now for a very old version of Spark, and I'm guessing it will be nearly impossible to get this all working because all of the dependencies are very out of date.

-Kay

--

You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sparrow-scheduler-users/68287c5b-b932-4b39-b013-2ed8907d9b75n%40googlegroups.com.

Reply all

Reply to author

Forward