Thanks for responding Sam. I'm in the middle of Nathan's book, and deep in the bowels of the scalding/scalding-commons/bijection code :-) Serious lack of samples for the pail work in this path but I'm happy to contribute some. Liking the bijection protobuf, clean!
In the sample, the write is essentially:
def writejob = {
val pipe = IterableSource((1 to 100), "src").read
val sink = PailSource.sink[Int]( "pailtest", structure)
pipe.write(sink)
}
I am trying to figure out how to work with IterableSource, or if this is the correct Source implementation for me to even be using with protobuf+pail ? So any pointers on the Source usage
would be greatly appreciated:
def writejob: Pipe = {
val pipe = IterableSource(Seq(event), "src").read // Creates: MyEvent/2013/1/0/part-000000 so far which is wrong, the 'part-0000' part
val sink = PailSource.sink[MyBaseProtobuf]("myrootpath", new MyPailStructure)
pipe.write(sink)
}
I break during pail structure validation, which makes sense, given that I'm trying to figure out the proper configuration of Source or IterableSource question:
Caused by: java.lang.IllegalArgumentException: MyEvent/2013/1/0/part-000000 is not valid with the pail structure {structure=com.foo.MyPailStructure, args={}, format=SequenceFile} --> [MyEvent, 2013, 1, 0]
Also, WRT serialization, I did have to add the override in my base Job:
override def ioSerializations = List("com.twitter.elephantbird.cascading2.io.protobuf.ProtobufSerialization") ++ super.ioSerializations
but I'd written a custom
class MyEventSerialization extends com.esotericsoftware.kryo.Serializer[MyBaseEvent]
and expected to configure the override as List("com.foo.MyEventSerialization") ++ super.ioSerializations
however that blew up with: cannot be cast to org.apache.hadoop.io.serializer.Serialization, but the ProtobufSerialization works so far.
Thanks :)
Helena
@helenaedelson