Scalding with Protocol Buffer encoded messages ?

Stone

unread,

Mar 29, 2012, 3:57:36 AM3/29/12

to cascadi...@googlegroups.com

Does Scalding (https://github.com/twitter/scalding) support Protocol Buffer encoded messages ? Some examples will be nice.

Oscar Boykin

unread,

Mar 29, 2012, 4:18:53 PM3/29/12

to cascadi...@googlegroups.com

Yes.

Scalding uses Kyro to serialize objects in cascading tuples, and protobuf are serialized very efficiently with that. It should be transparent to you.

If you want to optimize it further, you can hook protobuf serialization directly into scalding (which we do internally at twitter) using elephantbird:

https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/cascading2/io/protobuf/ProtobufSerializer.java

You can tell scalding about it by adding:

override def ioSerializations = List("com.twitter.elephantbird.cascading2.io.protobuf.ProtobufSerialization") ++

super.ioSerializations

to your base Job class (and linking in elephantbird).

You can do the same with thrift from this project:

https://github.com/Cascading/cascading-thrift

That said, I'd just try with the default serialization first, and only go to something like this if you run into problems (protobuf should be fine already).

On Thu, Mar 29, 2012 at 12:57 AM, Stone <stone...@gmail.com> wrote:

Does Scalding (https://github.com/twitter/scalding) support Protocol Buffer encoded messages ? Some examples will be nice.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/2cqkWyzUoIIJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

--
Oscar Boykin :: @posco :: https://twitter.com/intent/user?screen_name=posco

Sam Ritchie

unread,

Mar 29, 2012, 4:25:06 PM3/29/12

to cascadi...@googlegroups.com

Just to clarify, it's important that you concatenate the ProtobufSerialization onto the beginning of the list, as Hadoop scans the sequence from left to right.

Sam Ritchie, Twitter Inc

703.662.1337

@sritchie09

(Too brief? Here's why! http://emailcharter.org)

C.V.Krishnakumar Iyer

unread,

Jan 13, 2014, 11:21:51 PM1/13/14

to cascadi...@googlegroups.com

Can you please post an example? That would really help.

Reply all

Reply to author

Forward