Using elephant bird to read/write protobuf data

135 views
Skip to first unread message

Paul A. Steckler

unread,
Feb 13, 2014, 1:37:48 PM2/13/14
to elephant...@googlegroups.com
I'm just learned of elephant bird, and it may do what I want.

I have some data that I'm writing in one Hadoop map-reduce job, and it's read in by a subsequent job.
Currently, I'm using JSON, which I treat as Text in Hadoop. Parsing the JSON seems to be a
bottleneck, so I'm looking for alternatives.

Can I define my data structures with protocol buffers, and then use elephant bird to write and read
the data? If so, I believe that would be much faster than writing/reading/parsing JSON.

Also, is the elephant bird library pretty stable at this point? I want to use it for production code.

Thanks for any help!

-- Paul

Dmitriy Ryaboy

unread,
Feb 13, 2014, 1:45:11 PM2/13/14
to elephant...@googlegroups.com
Yes, EB is what we use for protocol buffers and thrift in production at twitter for 4 years now, so I'd say it's pretty good for most of our use cases :).

One caveat is that the input/output formats EB provides are unfortunately tied to lzo compression. It's probably a good idea to use lzo anyway, so it may not be a big issue. 

D


--
You received this message because you are subscribed to the Google Groups "elephantbird-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elephantbird-d...@googlegroups.com.
To post to this group, send email to elephant...@googlegroups.com.
Visit this group at http://groups.google.com/group/elephantbird-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog

Raghu Angadi

unread,
Feb 13, 2014, 1:48:43 PM2/13/14
to elephant...@googlegroups.com
Yeah, EB has been in production at twitter for quite sometime.

lzo is not required. In fact, EB supports sequencefiles (which lets you choose your compression) and couple of other options. How are applications built? MR, Pig, or Hive etc?

Paul A. Steckler

unread,
Feb 13, 2014, 2:27:39 PM2/13/14
to elephant...@googlegroups.com
On Thursday, February 13, 2014 10:48:43 AM UTC-8, Raghu Angadi wrote:
Yeah, EB has been in production at twitter for quite sometime.

lzo is not required. In fact, EB supports sequencefiles (which lets you choose your compression) and couple of other options. How are applications built? MR, Pig, or Hive etc?


I'm using standard Map-Reduce jobs.

Yes, avoiding LZO is desirable.

-- Paul
Reply all
Reply to author
Forward
0 new messages