Reviews: Mesos

Rodrigo Fonseca

unread,

Oct 11, 2010, 7:30:01 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Hi,

Please post your reviews to Mesos as a group reply to this message.
Don't forget to make it to the Cloudera talk on Wednesday, if you can!

Thanks,
Rodrigo

Sandy Ryza

unread,

Oct 11, 2010, 11:05:07 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Title:
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Authors:
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony
D. Joseph, Randy Katz, Scott Shenker, Ion Stoica

Date:
Wednesday 26th May, 2010, 19:56

Novel Idea:
The authors provide a framework for scheduling and resource sharing
for multiple distributed computation frameworks running on the same
cluster. The system allows for a pluggable module that handles
allocation and defines a notion of fairness.

Main Result(s):
The authors find (sort of obviously) that Mesos performs best when the
frameworks running on top of it scale up and down elastically, don't
have node preferences, and task durations are homogeneous. They find
that when these constraints are relaxed the system still performs
well.

Evidence:
The authors provide of fair amount of deductive evidence, explaining
the relationship between metrics and concepts such as the time that it
takes a new framework to receive its fair share, how long it takes a
job to complete, how much the total cluster is utilized, etc. They
also gathered experimental evidence running the system and a variety
of frameworks on 50 and 93 Amazon EC2 instances. They provide
evidence on system utilization and scalability.

Prior Work:
The system relies on Apache ZooKeeper for fault tolerance. It solves
a problem created by the creation of cluster computing frameworks
such as MapReduce, MPI, and Dryad.

Competitive Work:
Quincy, which provides a more specific and optimized scheduler for
scheduling jobs across a single framework (Dryad).

Reproducibility:
Is the code that they wrote available?

Criticism:
A fair amount of space is spent on deductive analysis, which I found
disorganized and confusing. The experimental evidence is lacking, and
could have provided more detailed results on a wider variety of
configurations.

Question:
What are some of the drawbacks of Dominant Resource Fairness, the
fairness policy that they provide?

Visawee

unread,

Oct 11, 2010, 11:47:47 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Paper Title :

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Author(s) :

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony
D. Joseph, Randy Katz, Scott Shenker, Ion Stoica

Date :
Wednesday 26th May, 2010

Novel Idea :
- Fine-grained sharing of a cluster between multiple distributed
computing frameworks, while giving these frameworks enough control to
achieve placement goals such as data locality
- Dominant resource fairness (DRF) allocation strategy

Main Result(s) :
Mesos is highly scalable. It achieves high utilization, respond
rapidly to workload changes, and give applications with diverse
placement preferences.

Impact :
A highly scalable resource management framework that allows multiple
distributed computing applications to efficiently share a cluster.

Evidence :
The authors perform a series of experiments using Amazon's EC2
environment. The Macrobenchmark shows that Mesos enables multiple
diverse frameworks to efficiently share the same cluster - the
utilization is 100% at almost all time.
The Overhead experiment shows that the overhead of the application
executions on Mesos is not much. Another experiment also shows that
Mesos' resource offer mechanism enables frameworks to control the data
locality.

Prior Work :
Public and Private Clouds - these clouds offer coarse-grained sharing
of resources (in a machine level). The resource allocation mechanisms
of them are also not dynamic.

Reproducibility :
The result is irreproducible without given the source code. The
authors didn't give the detail implementation of the Mesos.
Moreover, many parallel computation frameworks must be modify in order
to run on Mesos.

Criticism :
In order to make Mesos usable to the distributed computing frameworks,
we have to modify a lot of the frameworks' code.
Moreover, some distributed computing frameworks' architecture might
not be able to port to run on Mesos.

On Oct 11, 7:30 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Zikai

unread,

Oct 11, 2010, 11:06:48 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Paper Title: Mesos: A Platform for Fine-Grained Resource Sharing in
the Data Center

Author(s): Benjamin Hindaman (UC Berkeley) et.al
Date/Conference: Technical Report, May 2010

Novel Idea: (1) Create an efficient fine-grained sharing model across
different frameworks. It offers the frameworks a minimal common
interface that enables efficient resource sharing and otherwise pushes
control to them. It supports both data-intensive computing frameworks
like Hadoop and a variety of other applications like client-facing
services. Moreover, it enables users to run multiple different
frameworks, multiple isolate instances of the same framework and
specialized frameworks in the same cluster.
(2) Propose a two-phase distributed scheduling mechanism called
resource offers. Resource offers enable frameworks to achieve their
own placement goals. Furthermore, it is simple and scalable.

Main Results: Mesos enables diverse parallel applications to
efficiently share the same cluster. It achieves high utilization,
rapid respond to workload changes, effective accommodation to
applications’ diverse placement preferences, simplicity, fault-
tolerance and scalability.

Evidence:
Part4 has a theoretic deduction of Mesos’s capability in terms of
cluster utilization,
job running time and framework ramp-up time. Note that authors base
their reasoning on strong assumptions here.

Part6 has a series of experiments in Amazon EC2 to test various design
goals. Part 6.1 tests cluster utilization and that Mesos is able to
handle two diverse applications: Hadoop and MPI (MPI has a minimal
resource requirement while Hadoop hasn’t). Part6.2 tests the overhead
of scheduling the cluster with Mesos in terms of job running
time.Part6.3 tests whether Mesos allows frameworks achieve their
placement preferences (data locality) and how it affect performance.
Part6.4 tests performance advantages of specialized framework over
general framework of Hadoop. Part6.5 tests whether Mesos allows
frameworks to dynamic scaling to accommodate to workload changes.
Part6.6 and 6.7 tests scalability and fault tolerance.

Prior Work: The work is inspired by work on microkernels, exokernels
and hypervisors in OS and narrow-waist IP model in computer networks.

Competitive Work: In Part7, authors discuss related work like HPC/Grid
schedulers, Quincy and so on.

Reproducibility: Mesos is open-source and the paper is detailed. One
can easily install Mesos in his cluster. For the experiments, as long
as one can afford a huge number of Amazon EC2 instances, he can
reproduce and verify them easily.
Question: One advantage of applying Mesos is to avoid data duplication
between different computing frameworks. However, different frameworks
may depend on different (distributed) file systems and therefore
different storage formats, replication strategies and so on. For
example, GFS or HDFS is fundamentally different from NTFS(for Dryad)
or NFS. Therefore, in order to effectively share data and avoid
duplication, there must be some approach to share data between
different file systems or allow different computing frameworks to use
the same file system when they are not designed for so. Is it
possible to do this without hurting performance of computing
frameworks? (At least in the paper authors mentioned they used HDFS to
share data. Is MPI or Spark or something else able to utilize HDFS?)

Criticism: The theoretic reasoning and evaluation on utilization and
overhead are based on the assumption that only one job is submitted
for one framework. But this is rare in real applications. Authors may
need to relax their assumptions and run more stressful experiments for
Part 6.1 and 6.2.

On Oct 11, 7:30 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Tom Wall

unread,

Oct 11, 2010, 8:11:12 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Mesos: A Platform for Fine-Grained Resource Sharing in the Data center

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony

D. Joseph, Randy Katz, Scott Shenker and Ion Stoica
Wednesday 26th May 2010, 19:56!

Novel Ideas:
They add a scheduling layer to clusters that allows resources to be
allocated amongst multiple instances of various distributed computing
frameworks. They use a new fairness heuristic called Dominant Resource
Fairness in their implementation. Finally they also describe Spark, a
distributed framework that runs on Mesos that specializes in iterative
tasks, ideal for machine learning algorithms.

Main Results:
They find that Mesos scales well and works great with frameworks that
are highly scalable and have homogeneous jobs and node preferences. It
also successfully schedules resources in more heterogeneous
environments with little overhead and without ignoring smaller jobs.
Their spark system was relatively easy to implement at just 1300 lines
yet greatly outperforms Hadoop for the type of algorithms it was
designed for.

Evidence:
They provide some math and do some theoretical reasoning about how
their system performs in section 4. They then go on to do some tests
using EC2. Running Mesos doesn't have much of an adverse affect on the
various frameworks. Their implementation of Spark does indeed do a
better job than Hadoop for machine learning type jobs, as it reduces
the need to re-read files constantly.

Impact:
It is difficult to gauge how useful Mesos can be in real scenarios. In
some cases it does a good job at sharing resources, and could be a
huge benefit for some organizations. However, they haven't really
shown that it can handle a diverse set of frameworks/tasks so its
usefulness might be limited only to the frameworks that lend
themselves to short, uniform length jobs.

Reproducibility:
They provide a decent high level description of their system, and they
have released the source code. Wouldn't be too hard to reproduce it.
Their tests however, had some problems. See criticisms.

Prior/Competitive Work:
Mesos was influenced by various HPC and distributed cluster
schedulers. Theirs differs, they say, in that it offers a finer
grained control and is more decentralized. Virtual machine clouds like
Amazon Ec2 also solve similar scheduling issues. Finally, they note
similarities to Dryad, Condor and Clustera.

Questions/Criticisms:
Most of their tests assume each framework only runs a single job.
This seems unrealistic and does not take advantage of the individual
frameworks' internal schedulers, which may be optimized for the types
of jobs the frameworks were built for. Also, they don't do a great job
of testing how Mesos handles multiple competing frameworks at once,
especially those with varying job times.

They say a framework is incentivized to use short tasks, but some jobs
just cant be expressed in that manner. Is the scheduler really fair to
such jobs? They talk about how fragmentation or the sticky slot
problem might occur for larger tasks, and they deal with it by giving
a slave machine a minimum offer size. This does not seem like the
best way to do it, as it forces the slaves to be tailored to the
framework with larger tasks, rather than remaining framework
agnostic. Quincy had a nicer solution to this problem.

I think using EC2 as their testbed was a bad choice. It is too opaque;
we aren't able to really know about the kind of hardware the tests
were actually running on, as its all virtualized. They claim to scale
to 50,000 nodes, but that is really done on 99 8-core EC2 "instances".
Is that really a fair assessment? What was network traffic like during
these tests?

Future Work:
They plan to analyze and extend the resource offer model in Mesos to
be more flexible and expressive so that it can work better with more
environments/frameworks. Using the flexibility that Mesos offers, they
also plan on doing more distributed computing experiments like Spark.

On Oct 11, 7:30 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Duy Nguyen

unread,

Oct 11, 2010, 11:59:54 PM10/11/10

to brown-csci...@googlegroups.com

Paper Title

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Authors
Benjamin Hindman, Andy Konwinski, Matei Zaharia,..

Date
2010/UCB Technical Report

Novel Idea
A platform for sharing cluster resources between multiple computing frameworks.
The main idea is Mesos gives framework the ability to control over their
scheduling policies. Mesos only offers available resource to frameworks, then
frameworks can decide which resource to accept and which computation to run
on them.

Main Results
They proved that their ideas are doable by porting many applications from different
domains to run on top Mesos: Hadoop, MPI, interactive machine learning computation,
Apache web server farm.

Impact
It's quite newly developed so it's hard to say about its impact in practical. From the
theoretical view, it is also not a big deal because it is built on many prior works.

Evidence
They did quite many experiments using Amazon's EC environments. They measured how
much overhead Mesos added to existing frameworks, showed how Hadoop gain improvements
in data locality when being run on top Mesos, built a specific framework that run iterative jobs
and showed how it outperformed a general purpose Hadoop implementation. A web server farm
is also built to run on top Mesos which gave good performance.

Prior Work
They use Dominant Resource Fairness, Zoo Keeper, resource management in Linux, Solaris

Competitive Work
This idea is new, so there is no similar work to compare.

Reproducibility
Impossible, the paper only mentioned main ideas, no detail/source code is available

Questions + Criticism
Section 6.5 describes their server farm implementation. They gave performance statistic but
we have no idea how good it is in compared with other implementation?

Jake Eakle

unread,

Oct 11, 2010, 11:48:45 PM10/11/10

to brown-csci...@googlegroups.com

Paper Title

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Author(s)

Benjamin Hindman, Andy Konwinski, Matei Zaharia,

Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica
Date	2010
Novel Idea	As cluster computing becomes more popular, individual organizations are starting to want to run many different kinds of distributed job, but may only have access to one cluster. Mesos attempts to resolve this issue by acting as a resource allocator for multiple different distributed processing frameworks running on the same cluster.
Main Result(s)	To run on Mesos, a framework must be customized to use the Mesos API. Once this is accomplished, Mesos allocates resources by sending out resource offers to the modified frameworks. The frameworks then either accept or reject the offer, and if they accept, specify a job. Mesos then tells the node it was offering resources from to perform the specified job for the accepting framework. Mesos keeps track of a distinction between long and short jobs, and can guarantee frameworks that need long jobs that it will not kill their jobs if they don't go over a certain resource quota. However, Mesos will kill short jobs if it thinks it can allocate the resources they are using more efficiently, or if it thinks they are misrepresented long jobs. Long jobs can also be killed if they have gone over their guaranteed quota. Mesos allocates resources using a scheme called Dominant Resource Fairness. This scheme attempts to give each framework the same fraction of its particular dominant resource - that is, whatever each framework needs most of (RAM, CPU, network utilization, etc). So a DRF-fair result might be to give one process 80% of the available CPU and another 80% of the available RAM.
Impact	I never have anything interesting to write in this section.
Evidence	They show that Mesos is able to keep cluster utilization at 100% with a random assortment of jobs on a few frameworks, but they don't really test many different kinds of loads or different kinds of frameworks. Instead they have a lengthy section of mathematical proofs of scheduling guarantees under a variety of conditions, but it would be nice if these were borne out in the lab. They also show that the kind of specialized framework they hope to promote with Mesos is a good idea by showing how their framework, SPARC, can use its specialized knowledge of certain data-intensive problems to perform jobs several orders of magnitude faster than Hadoop.
Prior Work	The specifically mention Quincy as an inspiration, and indirectly imply that it was the source of their task-killing mechanism.
Reproducibility	They don't really talk at all about the actual method by which Mesos keeps nodes separated from the frameworks enough to have control over where they run their jobs, but connected enough that the master doesn't become a bottleneck. It seems like there are some nontrivial issues here that would make reproducing this quite a job.
Question	How does DRF avoid limiting uncompetitive resources unnecessarily? If we add a third process to the example that needs a 1 gigabit ethernet connection per job but barely any RAM and only a tiny fraction of a CPU, why won't DRF only give it 80% of the available networking resources, even though it isn't competing for them with anyone else?
Criticism	Is a scheduler that requires custom frameworks really practical? They have implemented Hadoop and a few others themselves, but as those frameworks evolve, these implementations won't evolve with them. Can Mesos really gain the traction it would need to be supported by every major distributed computation framework? Or can it make it easy enough to do the modifications in house that using Mesos doesn't mean forgoing most of the available frameworks?
Ideas for further work	I wonder if it is possible to make something almost as effective that does all the resource management at the system level, without having to interface with the frameworks at all. You lose a lot of easy customizability, but it may be possible to extrapolate the necessary data by simply observing jobs created by various processes, rather than by asking them explicitly what they need, and you gain a lot in developer time and end-user usefulness if you support frameworks that have never heard of you.

--
A warb degombs the brangy. Your gitch zanks and leils the warb.

James Chin

unread,

Oct 11, 2010, 11:16:32 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Paper Title: “Mesos: A Platform for Fine-Grained Resource Sharing in

the Data Center”

Authors(s): Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali
Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica

Date: 2010

Novel Idea: This paper presents Mesos, a platform for sharing
commodity clusters between multiple diverse cluster computing
frameworks. Mesos is built around two contributions: a fine-grained
sharing model where applications divide work into smaller tasks, and a
distributed scheduling mechanism called resource offers that lets
applications choose which resources to run on.

Main Result(s): The authors’ experimental results show that Mesos can
achieve near-optimal locality when sharing the cluster among diverse
frameworks, can scale up to 50,000 nodes, and is resilient to node
failures. They also show that new specialized cluster computing
frameworks, such as Spark, can provide major performance gains.

Impact: It seems clear that new cluster computing frameworks will
continue to emerge, and that no framework will be optimal for all
applications. Therefore, organizations will want to run multiple
frameworks in the same cluster, picking the best one for each
application. Mesos enables efficient fine-grained sharing across
diverse cluster computing frameworks. It gives diverse frameworks a
common interface for running fine-grained tasks in a cluster.

Evidence: To evaluate Mesos, the authors ported three cluster
computing systems to run over it: Hadoop, MPI, and the Torque batch
scheduler. To validate their hypothesis that specialized frameworks
provide value over general ones, they have also built a new framework
on top of Mesos called Spark. Finally, to evaluate the applicability
of Mesos to client-facing workloads, they built an elastic Apache web
server farm framework. They then performed a series of experiments
using Amazon’s EC2 environment, including ones that involved
macrobenchmarking, overhead, data locality through fine-grained
sharing and resource offers, the benefit of specialized frameworks, an
elastic web farm, Mesos scalability, and fault tolerance.

Prior Work: Mesos is inspired by work on microkernels, exokernels, and
hypervisors in the OS community and by the success of the narrow-waist

IP model in computer networks.

Competitive Work: Mesos is related to work involving HPC and grid
schedulers, public and private clouds, the Quincy project, and the
approach of using a specification language.

Reproducibility: The findings appear to be reproducible if one follows
the testing procedures outlined in the paper and has access to the
code for porting cluster computing frameworks to run over Mesos.

Question: Has anyone started using Mesos in production yet?

Criticism: The results are only based on the specific frameworks that
were tested.

Ideas for further work: Further analyze the resource offer model to
characterize environments it works well in and determine whether any
extensions can improve its efficiency while retaining its flexibility;
use Mesos as a springboard to experiment with cluster programming
models; build a stack of higher-level programming abstractions on
Mesos to allow developers to quickly write scalable, fault-tolerant
applications.

On Oct 11, 7:30 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Shah

unread,

Oct 11, 2010, 7:35:58 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Title:

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Authors:

[1] Hindman, Benjamin
[2] Konwinski, Andrew
[3] Zaharia, Matei
[4] Ghodsi, Ali
[5] Joseph, Anthony D.
[6] Katz, Randy H.
[7] Shenker, Scott
[8] Stoica, Ion

Source and Date:

Technical Report Identifier: EECS-2010-87. May 26. 2010.

Novel Idea:

Mesos describes a novel two-level-scheduler called 'resource offers'.
This allows the system to decide the number of resources that must be
offered to each framework.

Main Result:

Mesos is a system that enables fine-grained sharing across many
clusters such as Hadoop. As the authors note 'resource offers' allows
several goals to be achieved - including that of data locality. Also,
due to its inherent simplicity, 'resource offers' makes Mesos highly
scalable and robust.

Impact:

Since this technical report was released in May 2010, it's hard to
gauge the impact of Mesos quite yet.

Evidence:

The scientists produce a substantial amount of evidence by performing
a series of experiments on Amazon's EC2 environment. Specifically,
they ran tests that took macrobenchmarks, measured overhead,
demonstrated Mesos' resource offer mechanism, evaluated the benefit of
specialized frameworks, demonstrated its dynamic nature, proved
scalability and evaluated Mesos' fault-tolerance capabilities.

Prior Work:

Since Mesos is new, there is no directly-related prior work. However,
in the conclusion of the technical report, the scientists state that
they draw their inspiration for this from work on microkernels. They
say that Mesos has properties similar to exokernels and microkernels.

Competitive Work:

The scientists list out several competing systems: HPC and Grid
Schedulers, Public and Private Clouds, Quincy and Specification
Language Approach. However, for each of them, they list out the
differences that these have when pitted against Mesos.

Reproducibility:

The researchers provide a thorough description of their series of
experiments making them reproducible.

Question:

What are some of Mesos' uses? Is this something that will catch on?

Criticism:

The authors provide a detailed overview of the system leaving little
out. This doesn't give any room for criticism.

Ideas for Further Work:

As is mentioned int he last paragraph of this technical report, there
are three ideas that the authors plan:

[1] on further analyzing the resource model to see if its efficiency
can be improved.

[2] to use Mesos to experiment with cluster programming.

[3] build some programming abstractions in order to aid developers to
write scalable, fault-tolerant applications, quicker.

Basil Crow

unread,

Oct 11, 2010, 7:31:57 PM10/11/10

to brown-csci...@googlegroups.com

Title: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Authors: Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica

Date: 2010, UC Berkeley Tech Report

Novel idea: The authors elegantly summarize the purpose of Mesos: "to enable fine-grained sharing between multiple cluster computing frameworks, while giving these frameworks enough control to achieve placement goals such as data locality."

Main result: The authors provide an implementation that delivers on the above promise by offering frameworks resources and allowing them to choose which computations to run. In practice, this model achieved near-optimal locality when sharing a cluster among diverse frameworks, in addition to offering scalability and fault-tolerance benefits.

Impact: Practical concerns often necessitate the utilization of multiple disparate systems. Being able to run these various systems on a single set of compute resources may provide significant cost savings.

Evidence: The authors ported MPI and Hadoop to Mesos, and they also built a new framework on top of Mesos called Spark. Mesos was able to keep cluster utilization at 100% without affecting the standalone running time of each framework. They also meausured the locality of these frameworks while running on top of Mesos. They additionally scaled Mesos up to 50,000 EC2 nodes to prove that it is scalable.

Prior work: Mesos obviously builds on the successes of distributed compute frameworks such as Hadoop and MPI. As far as their approach is concerned, the authors cite inspiration from prior work on microkernels, exokernels, and hypervisors.

Competitive work: Mesos is similar to HPC and grid schedulers; however, it allows frameworks flexibility when it comes to storing and retrieving data on the cluster. Mesos also shares common goals with virtual machine clouds such as EC2; however, the abstraction in virtual machine clouds (the VM) is not granular enough to provide for the needs of distributed computing frameworks in comparison to the resource offers employed in Mesos.

Reproducibility: Mesos is open source software, and is available via github.com/mesos. Given adequate compute resources it should be possibly to reproduce the authors' results.

Praise: (Yet again.) The authors exploit operating system container technologies, specifically Linux containers and Solaris projects, to achieve isolation. It's great to see the authors of distributed systems working with existing OS mechanisms rather than reinventing the wheel.

Ideas for further work: The authors mention that although Mesos currently isolates CPU cores and memory, they plan to extend their implementation to isolate network and IO using new features in Linux kernel 2.6.33. The same should be possible using Solaris' Project Crossbow.

Matt Mallozzi

unread,

Oct 11, 2010, 11:35:42 PM10/11/10

to brown-csci...@googlegroups.com

Matt Mallozzi

10/12/10

Title:

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Authors:

Hindman, Konwinski, Zaharia, Ghodsi, Joseph, Katz, Shenker, Stoica

Date:

2010

Novel Idea:

Sharing commodity resources among multiple cluster computing frameworks,

each with their own specialized scheduler, without sacrificing too much in

the way of performance due to data locality. It does this by offering a set

of resources to a particular framework and allowing the framework to decide

which resources to use as well as how to use them.

Main Results:

A working system that achieves near-optimal locality even when scaled to

50,000 nodes and in the presence of node failures.

Impact:

This could make using multiple cluster computing frameworks, whether for

real use or just for experimental use, much cheaper and easier to manage.

Using multiple such frameworks will no longer require multiple clusters

or statically splitting time for jobs. Also, this will increase the

utilization of the systems, as the operations of multiple frameworks can

be multiplexed instead of having significant idle periods which tend to show

up in systems of these frameworks.

Evidence:

Lots of theoretical results on why Mesos should perform as they intend, as

well as experimental evidence supporting each of their individual claims

regarding the functionality and performance of Mesos.

Prior Work:

Mesos is a fairly original work, and does not build as strongly on a

previous system as other systems we have seen do. However, if one had to

pick a system that Mesos was most influenced by, it would probably be

Quincy.

Competitive Work:

Mesos is very different from many other scheduling systems, including high

performance and grid computing and virtual machine clouds. However, Mesos

has several advantages over these systems, such as fine-grained allocation

and dynamic growing/shrinking of jobs.

Reproducibility:

Open source! On that note, the system is about as reproducible as can be.

Also, the experiments themselves were pretty well described, so the

experimental results could probably be reproduced without too much hassle.

Question:

If the source code for Dryad were available, how difficult would it be to

port it to Mesos compared to the effort required to port Hadoop, Torque,

and MPI?

Criticism:

I feel like Spark didn't have a good home in this paper - certainly it

merits its own paper (one was written, and referenced in this paper), but

the things mentioned about Spark seemed like they would fit better in a

possible future paper about programming paradigms on top of Mesos.

Ideas For Further Work:

Abstractions on top of Mesos to help easily build new distributed systems.

On Mon, Oct 11, 2010 at 7:30 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Dimitar

unread,

Oct 11, 2010, 9:34:07 PM10/11/10

to CSCI2950-u Fall 10 - Brown

Messos: A Platform for Fine Grained Resources Sharing in the Data
Center

Authors : Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi,
Anthony D. Joseph

Randy Katz, Scott Shenker, Ion Stoica

Date : May /26/2010

Novel Idea: Mesos is a platform for fine-grained resource sharing. The
platform introduces
two level scheduling mechanism. Mesos decides which resources should
offer to
each framework running on the cluster, and the framework decides which
resources to
accept. The main goals of Messos are to achieve high cluster
utilization and localization
of the date used by the frameworks.

Main Results: A working platform for fine-grained resources sharing
with fault tolerance. Mesos is
able to achieve high scalability up to 50,000 nodes. Meso is capable
to differentiate between
long and short tasks which allows it to support high variety of

frameworks such as Hadoop and

MPI. Meso uses master and slaves model, where each slave handles a
framework
running on the cluster.
Impact: Meso could have significant impact on organizations who need
to run
different types of frameworks on the same cluster. It will allow them
to efficiently
share resources between them.
Evidence: The test cases show that Mesos has small overhead.. For
example,
Hadoop WordCount Problem took 159.9 secs to complete on a cluster
running alone,
and 166.2s on a cluster with Mesos.Firgure 6 shows that Meso achieves
95% locality when
using 5 seconds delay scheduling. The experiments also show that Mesos
master
recover from master failure within 7-8 seconds.
Prior Work:
Mesos Builds on HPC and Grid scheduler. The main difference between
them is that Mesos uses
decentralized scheduler , while HPC and Grid scheduler use centralized
scheduler which
allows the use of checkpoints.
Quincy is fair scheduler which also uses centralized scheduler.
Another difference between Quincy and
Mesos is that Quincy assumes that tasks require the same amount of
resources.

Competitive Work:
Torque

Reproducibility: It wouldn't be difficult to reproduce the result if
we have
we access to Amazon's EC2 environment.

Criticism: I believe that paper was well written.
The authors explained their architecture throughly, and the evaluation
supported
their claims.

On Oct 11, 7:30 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Siddhartha Jain

unread,

Dec 13, 2010, 3:33:18 AM12/13/10

to brown-csci...@googlegroups.com

Novel Idea:
The idea is to act as a resource allocator for various cluster computing frameworks operating on the same cluster.
It works by offering resources to the cluster computing frameworks which are free to accept or reject the offers.
Mesos can kill a job if it exceeds its allocated resources. Mesos can also kill short jobs if it thinks it do a better
allocation of resources that way.

Main Results:
The framework is presented and evaluated by measuring overhead of running the system, time it takes for a job to be
run on the cluster, how much of the total cluster is utilized, etc.

Evidence:
The numbers for the overhead and cluster utilization are given. The cluster utilization was near a 100%. They also
tested how Mesos scaled.

Prior Work:
Quincy influenced Mesos a lot.

Reproducibility:
While the freamework is described and the effect of the design decisions may be reproducible, it's still a highly
non-trivial task.

Criticism:
They should have a good comparison with a cluster running only Hadoop or only MPI and measuring how long a big job will
take on those vs. when the framework is running under Mesos.

On Mon, Oct 11, 2010 at 7:30 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Reply all

Reply to author

Forward