How MapReduce fits in the game

2 views
Skip to first unread message

Per Eckerdal

unread,
Apr 16, 2008, 11:17:59 AM4/16/08
to Hypertable User
Hi,

I'm currently searching for technology I will be using in an
application that will require storing, processing and searcing rather
large amounts of data. I hope (and think) Hypertable will fit quite
well. However, I've got a question regarding MapReduce. I know that
Hypertable can be run on HDFS, and that Hadoop has a MapReduce
implementation, but I can't find information about how one would use
MapReduce on data that is in a Hypertable database.

Is it possible to run the Hadoop MapReduce implementation out-of-the-
box, or would some configuration/tweaking of the code be required? Is
creating an Hypertable-specific implementation the best thing to do?
Or does Hypertable already provide an implementation?

If an implementation has to be done from scratch, how difficult/time
consuming is it? (Approximately, of course)

I don't think I will be using MapReduce in the next couple of months,
but it (or something equivalent) will be needed as soon as the most
basic things are done.

Any help or pointers are appreciated.

/Per

Doug Judd

unread,
Apr 16, 2008, 12:10:50 PM4/16/08
to hyperta...@googlegroups.com
Hi Per,

A Map-reduce connector to Hypertable is definitely on the roadmap.  The way that we're thinking about doing the implementation is like this:

- Provide a table "snapshot" mechanism that does a major compaction of all the Ranges in a table
- Run Hadoop Map-reduce directly on the table's CellStore files in HDFS

The CellStore file format has been designed to allow a Hadoop/map-reduce InputFormat class to be easily written for it.  Our plates are full right now with higher priority stuff, so it will be a few months before we tackle this one.

- Doug

Per Eckerdal

unread,
Apr 16, 2008, 2:21:22 PM4/16/08
to hyperta...@googlegroups.com
> A Map-reduce connector to Hypertable is definitely on the roadmap.

Thanks for the quick reply.
Hopefully it will be there or be close to be done when we need it.
Otherwise we will implement it, I guess.

/Per

Doug Judd

unread,
Apr 16, 2008, 2:25:26 PM4/16/08
to hyperta...@googlegroups.com
If it's not ready when you need it and you'd like to collaborate on it, that would be great.  We can help get you pointed in the right direction.

- Doug

Gordon

unread,
Apr 16, 2008, 3:31:08 PM4/16/08
to hyperta...@googlegroups.com
Per,

You're absolutely on target with this -- Doug brought map/reduce into our company (via Hadoop) and the genesis of this project was in bringing real-time web scale analytics capability into open source. So, map/reduce integration is key.

Thanks,
Gordon

Phillip B Oldham

unread,
Jun 5, 2008, 5:24:52 PM6/5/08
to Hypertable User
Further to this, will there be any other (possibly native)
implementations of map/reduce than that provided by Hadoop? I'm
thinking in instances where someone needs map/reduce, but is using a
DFS other than HDFS.

On Apr 16, 8:31 pm, Gordon <gpar...@gmail.com> wrote:
> Per,
>
> You're absolutely on target with this -- Doug brought map/reduce into our
> company (via Hadoop) and the genesis of this project was in bringing
> real-time web scale analytics capability into open source. So, map/reduce
> integration is key.
>
> Thanks,
> Gordon
>
> On Wed, Apr 16, 2008 at 11:25 AM, Doug Judd <d...@zvents.com> wrote:
> > If it's not ready when you need it and you'd like to collaborate on it,
> > that would be great.  We can help get you pointed in the right direction.
>
> > - Doug
>
> > On Wed, Apr 16, 2008 at 11:21 AM, Per Eckerdal <per.ecker...@gmail.com>

Doug Judd

unread,
Jun 5, 2008, 5:55:10 PM6/5/08
to hyperta...@googlegroups.com
Well, that would be a whole other project.  We're currently focused on making Hypertable a highly useful standalone database.

- Doug
Reply all
Reply to author
Forward
0 new messages