Hi,Unfortunately, right now, to run Spark with Sparrow, you need to use a forked version Spark (available here: https://github.com/kayousterhout/spark/tree/sparrow) that includes the code to interface with Sparrow (the Sparrow code is available here: https://github.com/radlab/sparrow).-Kay
On Thu, Oct 16, 2014 at 8:10 AM, Junfeng Liu <junfe...@gmail.com> wrote:
I did not find any related code in spark project
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Thu, Oct 16, 2014 at 10:10 AM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote:
Hi,Unfortunately, right now, to run Spark with Sparrow, you need to use a forked version Spark (available here: https://github.com/kayousterhout/spark/tree/sparrow) that includes the code to interface with Sparrow (the Sparrow code is available here: https://github.com/radlab/sparrow).-Kay
On Thu, Oct 16, 2014 at 8:10 AM, Junfeng Liu <junfe...@gmail.com> wrote:
I did not find any related code in spark project
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
You're completely right that this is how Spark works with all schedulers *except* Sparrow. Sparrow works differently, where resource sharing across different drivers is much more fluid. With Sparrow, exactly one Spark executor is launched on each worker machine, and that executor may be used for jobs submitted by many different Spark drivers.With a typical Spark scheduler, when you start a new driver, you pass a URL of the centralized scheduler, which the driver greedily takes resources from, and then the driver uses those resources for the entire time its running. With Sparrow, instead, when you start a driver, you pass a list of all of the workers (and there is no centralized scheduler). Each driver communicates directly with the workers to do it's own scheduling, and a given worker may be running 8 tasks for user A and 0 for user B at one point, and then later be shared equally between A and B. The Sparrow scheduling mechanism handles this -- so resources will be shared based on current user demands.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
You're completely right that this is how Spark works with all schedulers *except* Sparrow. Sparrow works differently, where resource sharing across different drivers is much more fluid. With Sparrow, exactly one Spark executor is launched on each worker machine, and that executor may be used for jobs submitted by many different Spark drivers.With a typical Spark scheduler, when you start a new driver, you pass a URL of the centralized scheduler, which the driver greedily takes resources from, and then the driver uses those resources for the entire time its running. With Sparrow, instead, when you start a driver, you pass a list of all of the workers (and there is no centralized scheduler). Each driver communicates directly with the workers to do it's own scheduling, and a given worker may be running 8 tasks for user A and 0 for user B at one point, and then later be shared equally between A and B. The Sparrow scheduling mechanism handles this -- so resources will be shared based on current user demands.
On Thu, Oct 16, 2014 at 6:52 PM, Junfeng Liu <junfe...@gmail.com> wrote:
Kay, thanks for quick reply, any guide line how can I make the stuff work?
Another question, I listen the presentation on spark summit. Looks like the motivation of sparrow is to create a decentralized scheduler.. But when I play with spark on yarn today, the spark driver always create its own spark context with scheduler inside, even two user submit same driver. And those scheduler dose not cooperate with each other. I can only use the YARN resource policy to constrain the driver not keep too many resources, but scheduler more like works as greedy way to ask resource as much as possible. It dose not same like the assumption that all the user use the single spark context and single scheduler to distribute tasks. I am not sure if I mis-understanding something here?
On Friday, October 17, 2014 1:12:53 AM UTC+8, Kay Ousterhout wrote:
On Thu, Oct 16, 2014 at 10:10 AM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote:
Hi,Unfortunately, right now, to run Spark with Sparrow, you need to use a forked version Spark (available here: https://github.com/kayousterhout/spark/tree/sparrow) that includes the code to interface with Sparrow (the Sparrow code is available here: https://github.com/radlab/sparrow).-Kay
On Thu, Oct 16, 2014 at 8:10 AM, Junfeng Liu <junfe...@gmail.com> wrote:
I did not find any related code in spark project
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sparrow-scheduler-users/68287c5b-b932-4b39-b013-2ed8907d9b75n%40googlegroups.com.