Fwd: Talk on Tuesday at noon: A Model of Computation for MapReduce

1 view

Skip to first unread message

Rodrigo Fonseca

unread,

Nov 15, 2009, 11:30:28 AM11/15/09

to brown-cs...@googlegroups.com

Hi,

This talk should be very relevant to some of you, it's right before
class on Tuesday.
Sergei is looking, from a theory point of view, at what can be
efficiently computed by the MapReduce model.

Cheers,
Rodrigo

---------- Forwarded message ----------
From: Eli Upfal <e...@cs.brown.edu>
Date: Sun, Nov 15, 2009 at 10:54 AM
Subject: Talk on Tuesday at noon: A Model of Computation for MapReduce
To: theory-...@list.cs.brown.edu, fa...@cs.brown.edu,
p...@cs.brown.edu, mas...@cs.brown.edu, the...@cs.brown.edu

Sergei Vassilvitskii, (Yahoo! Research New York) will visit the
department and give a talk at the theory lunch this Tuesday - though
the subject of his talk (MapReduce) should be of interest to a broader
audience. Sergei also works on biding and auction problems,
recommender systems, web search, and much more. Let me know if you
want to meet with Sergei on Tuesday afternoon.

- Eli

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

Title: A Model of Computation for MapReduce

Sergei Vassilvitskii

Yahoo! Research New York http://research.yahoo.com/Sergei_Vassilvitskii

Abstract:
In recent years the MapReduce framework has emerged as one of the most
widely used parallel computing platforms for processing data on the
terabyte and petabyte scales. Used daily at companies such Yahoo!, Google,
Amazon, and Facebook and adopted more recently by several universities, it
allows for easy parallelization of data intensive computations over many
machines. One key feature of MapReduce that differentiates it from
previous models of parallel computation is that it interleaves sequential
and parallel computation. We propose a model of efficient computation using
the MapReduce paradigm. Our model allows each machine to perform
sequential computations in time polynomial in the size of the input that
machine receives. Moreover, since MapReduce is designed for computations
over massive data sets, our model limits the number of machines and the
memory per machine to be (substantially) sublinear in the size of the input.

We compare MapReduce to the PRAM model of computation. We prove a
simulation lemma showing that a large class of PRAM algorithms can be
efficiently simulated via MapReduce. The strength of MapReduce, however,
lies in the fact that it uses both sequential and parallel computation. We
show how algorithms can take advantage of this fact to compute an MST of a
dense graph in only two rounds, as opposed to $O(\log(n))$ rounds needed by a
standard PRAM. We also show to evaluate a wide class of functions using
the MapReduce framework. We conclude by applying this result to show how
to compute many basic algorithmic problems such as undirected connectivity
in the MapReduce framework.

Rodrigo Fonseca

unread,

Nov 15, 2009, 11:42:51 AM11/15/09

to brown-cs...@googlegroups.com

Here's a link to the paper he'll be talking about, to be presented
this January at the 2010 ACM-SIAM Symposium on Discrete Algorithms
(SODA10):

http://www.sidsuri.com/About_Me_files/mrc.pdf

Reply all

Reply to author

Forward

0 new messages