--
You received this message because you are subscribed to the Google Groups "Sparrow Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sparrow-scheduler...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi all,
Here comes my two cents on the topic, (some of) which I hope would make sense.
At first, assume that the referred MapReduce simulator in the context is SLS. To add Sparrow schedulers to such MapReduce simulator, a high-level guideline would be to replace resource manager of Yarn with a scheduling node of Sparrow schedulers, since the way of these two working is very different. A trick here might be: from the scheduling viewpoint, Yarn is essentially a monolithic scheduler, so it is easy to replace its own scheduler with Sparrow schedulers. Next, some low-level changes to make include but are not limited to: 1) a means to support of the communication protocol between Sparrow node monitor and Yarn containers (running on various nodes), or between a Yarn node monitor and Sparrow schedulers (and note that Apache Thrift is a fine choice for a PoC, while a better choice can be made as always) and, 2) loading realistic workloads for evaluation if interested, then the way of generating a DAG (with different constraints w.r.t. dependencies) should also be proposed and developed and, 3) some nontrivial coding skills applied in Sparrow really gotta be well noticed otherwise your heart will just fall apart along the way, without knowingly and, 4) Yarn is container-oriented while Sparrow/Spark as being PMs or VMs oriented and, 5) something has not been mentioned above.
In addition, as shown in the Sparrow paper, it has worked very best for short-lived jobs, e.g. query-based data processing via Shark (a prior version to Spark SQL). When it comes to batch jobs (e.g. MapReduce or multi-stage jobs) processing, some fundamentals may not be the best, e.g. batch sampling and/or the pushing mechanism adopted by her virtual reservation, hypothetically and/or practically.
Yet one more thing, based off of the diligent efforts made
by Spark development team, one may consider to use Spark on Yarn either in the yarn-cluster
or
yarn-client mode in one of the own clusters. For the lightweight virtualized
environment, Google Kubernetes might be of interest. Willing to add a bit sugar
on top? Then just go for Mesos, on which Sparrow schedulers may be more than happy
to chip in.
Cheers,
Lou