Hi there,
Just two short notes here, i.e.
1. For the comparison between Sparrow and Spark, my (educated) guess would be: the cluster size should be more than a hundred or even much larger; otherwise the performance of the decentralized approach will just be prohibited. Moreover, I don’t think your referred setup of Sparrow would actually have introduced a fair comparison against Spark, e.g. due to too many schedulers involved.
2. Essentially, what (else) has made a difference between Spark cluster scheduler and Sparrow is the scheduling mechanism adopted by each of them, individually. The combo of virtual reservation and batch sampling has resulted in a (very) good scheduling performance when it comes to a single resource consideration.
Btw, my understanding of Sparrow backend is its role as a communication protocol between Sparrow node monitor and Spark executors, and there also might be a correlation between the number of Spark executors and the number of Sparrow frontends, i.e. Shark, granted by the current Github version, as being a two-year+ straight.
Best,
Lou
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ah okay, the reason I was not certain was because I was under the impression that the number of frontends correlates to the number of sparrow schedulers and the number of backends correlates to the number of workers (according to figure 6 in the paper).
But since a single job should use a single frontend, is there a way to set the number of schedulers being used?
Or are all tasks of a given job being distributed to workers by a single scheduler?
You received this message because you are subscribed to a topic in the Google Groups "Sparrow Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sparrow-scheduler-users/kgfS8vEM6q4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sparrow-scheduler...@googlegroups.com.
Sounds good, thank you for the clarification!
You're correct that the number of frontends corresponds to the number of sparrow schedulers and the number of backends corresponds to the number of workers. If you're only running one job, there won't be any benefits of Sparrow, because Sparrow allows *different* jobs to be scheduled by different frontends (but each job needs to be scheduled by a single frontend).
On Mon, Apr 4, 2016 at 1:32 PM, Geet Kumar <gku...@hawk.iit.edu> wrote:
Ah okay, the reason I was not certain was because I was under the impression that the number of frontends correlates to the number of sparrow schedulers and the number of backends correlates to the number of workers (according to figure 6 in the paper).
But since a single job should use a single frontend, is there a way to set the number of schedulers being used?
Or are all tasks of a given job being distributed to workers by a single scheduler?
On Apr 4, 2016 3:10 PM, "Kay Ousterhout" <kayous...@gmail.com> wrote:
There's no single "analogous" setup with Sparrow. With Sparrow, performance will be the same if all jobs are scheduled from a single frontend as it would be if each job were scheduled using its own frontend. Each Sparrow frontend corresponds to a Spark driver, so all jobs that need to share data should be scheduled from the same frontend.-Kay
On Sun, Apr 3, 2016 at 9:16 PM, gkumar7 <gku...@hawk.iit.edu> wrote:
I would like to run various spark workloads utilizing the sparrow distributed scheduling model and compare with the traditional spark approach. In the sparrow model, it seems that each frontend will be a scheduler while the backends can be considered as the workers.Therefore, for example, to compare a spark cluster of 1 master and 8 slaves, an "analogous" setup with sparrow would be consist of 8 frontends and 8 backends. Would this be correct?Thank you for the assistance.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Sparrow Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sparrow-scheduler-users/kgfS8vEM6q4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
Hi Kay,Could you just clarify some ideas for me? When you said "If you're only running one job, there won't be any benefits of Sparrow, because Sparrow allows *different* jobs to be scheduled by different frontends", by job did you mean the set of tasks in a Spark stage? That is, the improvement with the distributed scheduling in Sparrow would only occur when you have jobs (stages) that can be executed in parallel within an application, is that correct? If that is correct, is it safe to assume that such scenario would only occur when you have input data (from HDFS or any other source) being fetch to more than one initial RDD? With all of that said, is there any way that we can make simple applications like Grep and WordCount take advantage of the distributed scheduling on Sparrow?Thank you so much for the attention,Henrique
On Monday, April 4, 2016 at 3:35:12 PM UTC-5, Kay Ousterhout wrote:
You're correct that the number of frontends corresponds to the number of sparrow schedulers and the number of backends corresponds to the number of workers. If you're only running one job, there won't be any benefits of Sparrow, because Sparrow allows *different* jobs to be scheduled by different frontends (but each job needs to be scheduled by a single frontend).
On Mon, Apr 4, 2016 at 1:32 PM, Geet Kumar <gku...@hawk.iit.edu> wrote:
Ah okay, the reason I was not certain was because I was under the impression that the number of frontends correlates to the number of sparrow schedulers and the number of backends correlates to the number of workers (according to figure 6 in the paper).
But since a single job should use a single frontend, is there a way to set the number of schedulers being used?
Or are all tasks of a given job being distributed to workers by a single scheduler?
On Apr 4, 2016 3:10 PM, "Kay Ousterhout" <kayous...@gmail.com> wrote:
There's no single "analogous" setup with Sparrow. With Sparrow, performance will be the same if all jobs are scheduled from a single frontend as it would be if each job were scheduled using its own frontend. Each Sparrow frontend corresponds to a Spark driver, so all jobs that need to share data should be scheduled from the same frontend.-Kay
On Sun, Apr 3, 2016 at 9:16 PM, gkumar7 <gku...@hawk.iit.edu> wrote:
I would like to run various spark workloads utilizing the sparrow distributed scheduling model and compare with the traditional spark approach. In the sparrow model, it seems that each frontend will be a scheduler while the backends can be considered as the workers.Therefore, for example, to compare a spark cluster of 1 master and 8 slaves, an "analogous" setup with sparrow would be consist of 8 frontends and 8 backends. Would this be correct?Thank you for the assistance.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Sparrow Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sparrow-scheduler-users/kgfS8vEM6q4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sparrow-scheduler...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Sparrow Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sparrow-scheduler-users/kgfS8vEM6q4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler-users+unsub...@googlegroups.com.
[info] Compiling 1 Scala source to /home/bishwa/Documents/spark-sparrow/project/project/target/scala-2.9.2/sbt-0.12/classes...[error] error while loading CharSequence, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken[error] (bad constant pool tag 18 at byte 10)[error] error while loading Comparator, class file '/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar(java/util/Comparator.class)' is broken[error] (bad constant pool tag 18 at byte 20)[error] two errors found[error] (compile:compile) Compilation failedProject loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore?
I am using java version like below.