Scalding with Protocol Buffer encoded messages ?

202 views
Skip to first unread message

Stone

unread,
Mar 29, 2012, 3:57:36 AM3/29/12
to cascadi...@googlegroups.com
Does Scalding (https://github.com/twitter/scalding)  support Protocol Buffer encoded messages ? Some examples will be nice.

Oscar Boykin

unread,
Mar 29, 2012, 4:18:53 PM3/29/12
to cascadi...@googlegroups.com
Yes.

Scalding uses Kyro to serialize objects in cascading tuples, and protobuf are serialized very efficiently with that.  It should be transparent to you.

If you want to optimize it further, you can hook protobuf serialization directly into scalding (which we do internally at twitter) using elephantbird:


You can tell scalding about it by adding:

    override def ioSerializations = List("com.twitter.elephantbird.cascading2.io.protobuf.ProtobufSerialization") ++
                                super.ioSerializations

to your base Job class (and linking in elephantbird).

You can do the same with thrift from this project:


That said, I'd just try with the default serialization first, and only go to something like this if you run into problems (protobuf should be fine already).

On Thu, Mar 29, 2012 at 12:57 AM, Stone <stone...@gmail.com> wrote:
Does Scalding (https://github.com/twitter/scalding)  support Protocol Buffer encoded messages ? Some examples will be nice.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/2cqkWyzUoIIJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.



--
Oscar Boykin :: @posco :: https://twitter.com/intent/user?screen_name=posco

Sam Ritchie

unread,
Mar 29, 2012, 4:25:06 PM3/29/12
to cascadi...@googlegroups.com
Just to clarify, it's important that you concatenate the ProtobufSerialization onto the beginning of the list, as Hadoop scans the sequence from left to right.
Sam Ritchie, Twitter Inc
@sritchie09

(Too brief? Here's why! http://emailcharter.org)

C.V.Krishnakumar Iyer

unread,
Jan 13, 2014, 11:21:51 PM1/13/14
to cascadi...@googlegroups.com
Can you please post an example? That would really help.
Reply all
Reply to author
Forward
0 new messages