Updating cascading.avro

124 views
Skip to first unread message

Christopher Severs

unread,
Apr 27, 2012, 4:09:18 PM4/27/12
to cascadi...@googlegroups.com
Is anyone interested in working on an updated cascading.avro? Ideally it would use something like kryo to be able to pass around proper Maps and Lists in the Tuple fields (instead of the current Tuple as a list or map workaround) and be able to handle nested Maps/Lists. In fact all it really needs to do is take standard java objects that avro knows about and do the correct thing with them, the user can decide how they want to handle the serialization in the tuples.

I've taken a quick first pass at it but I'm running into a lot of problems when it gets to the actual sink call. Passing maps or lists around with cascading.kryo works fine if I sink to TextLine or TextDelmited but it throws a bunch of cast errors when I try and use a modified version of the current cascading.avro sink. At this point I'm a bit stuck and if anyone is interested I would really appreciate some expert help (or if there is a much better way to do this I would also love to know it).

Regards,
Chris

Chris K Wensel

unread,
Apr 27, 2012, 5:00:43 PM4/27/12
to cascadi...@googlegroups.com
fwiw, here is an alternate avro implementation.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/7Uh9IJJQRDwJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


Christopher Severs

unread,
Apr 27, 2012, 6:14:10 PM4/27/12
to cascadi...@googlegroups.com
Hi Chris,

Thanks for the quick reply. It looks like that project is currently active so maybe I can talk with them about adding maps/lists.

Regards,
Chris



On Friday, April 27, 2012 2:00:43 PM UTC-7, Chris K Wensel wrote:
fwiw, here is an alternate avro implementation.

ckw

On Apr 27, 2012, at 1:09 PM, Christopher Severs wrote:

Is anyone interested in working on an updated cascading.avro? Ideally it would use something like kryo to be able to pass around proper Maps and Lists in the Tuple fields (instead of the current Tuple as a list or map workaround) and be able to handle nested Maps/Lists. In fact all it really needs to do is take standard java objects that avro knows about and do the correct thing with them, the user can decide how they want to handle the serialization in the tuples.

I've taken a quick first pass at it but I'm running into a lot of problems when it gets to the actual sink call. Passing maps or lists around with cascading.kryo works fine if I sink to TextLine or TextDelmited but it throws a bunch of cast errors when I try and use a modified version of the current cascading.avro sink. At this point I'm a bit stuck and if anyone is interested I would really appreciate some expert help (or if there is a much better way to do this I would also love to know it).

Regards,
Chris


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/7Uh9IJJQRDwJ.
To post to this group, send email to cascading-user@googlegroups.com.
To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Christopher Severs

unread,
May 10, 2012, 9:27:24 PM5/10/12
to cascadi...@googlegroups.com
Quick update:

I forked cascading-avro to play with and I have it using java lists and maps just fine now. I had to make a small change in the sink function since Tuple.get() returns a Comparable and most lists and maps don't implement that. Is there a particular reason Tuple.get() returns Comparable and not Object?

Next step is to test nested lists and maps, maybe even records too. I haven't updated the github fork yet but I'll do so soon.

Thanks,
Chris


Chris K Wensel

unread,
May 11, 2012, 11:31:53 AM5/11/12
to cascadi...@googlegroups.com
 Is there a particular reason Tuple.get() returns Comparable and not Object? 

This is legacy from when Tuples only could hold a Comparable. see getObject if you want an Object..

I should probably deprecate that method now..

ckw

Christopher Severs

unread,
May 11, 2012, 12:23:33 PM5/11/12
to cascadi...@googlegroups.com
I can't believe I missed getObject, thanks again Chris.

On a related note, is there a preferred way to access the fields in a tuple in a sink? I'm currently just copying the sink for TextDelimited, which does something like:        
Object[] buffer = Tuples.asArray( tuple, getBuffer( tuple ) );
and then walks down the array.

Is there any reason I would want to use getObject instead of doing the asArray call ?



Thanks,
Chris

Chris K Wensel

unread,
May 11, 2012, 12:36:50 PM5/11/12
to cascadi...@googlegroups.com
Tuple is Iterable.. so you an just use an Iterator.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/LaIKjhvXtaYJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Christopher Severs

unread,
May 11, 2012, 2:56:21 PM5/11/12
to cascadi...@googlegroups.com
Changes are up at:
https://github.com/ccsevers/cascading-avro
if anyone wants to give it a whirl and find problems.

Nested maps and arrays work fine. Tests included to verify.

Actual code change was very minimal, Avro has evolved to be like magic.
To post to this group, send email to cascading-user@googlegroups.com.
To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Chris K Wensel

unread,
May 11, 2012, 3:04:24 PM5/11/12
to cascadi...@googlegroups.com
don't forget you can push your jars to conjars.org (and note that they are there in your README)

ckw

To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/bHYMtWN-AYMJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Christopher Severs

unread,
May 12, 2012, 4:04:29 PM5/12/12
to cascadi...@googlegroups.com
I changed the avro dependency to 1.5.1 so I can run it on my cluster and it works nicely. Submitted a pull request to the main cascading-avro project as well, which is already on conjars.

Sven Duzont

unread,
May 21, 2012, 1:46:26 PM5/21/12
to cascadi...@googlegroups.com
Hello,

I'm glad to inform you that we integrated you changes into another fork with also includes the following :
- upgrade to cascading-2.0.0-wip
- implementation of LocalAvroScheme
- automatic schema discovery with retrieveSourceFields method

Everything is located here : https://github.com/svenduzont/cascading-avro

Cheers

-- Sven


On Saturday, May 12, 2012 at 1:04 PM, Christopher Severs wrote:

> I changed the avro dependency to 1.5.1 so I can run it on my cluster and it works nicely. Submitted a pull request to the main cascading-avro project as well, which is already on conjars.
>
> On Friday, May 11, 2012 12:04:24 PM UTC-7, Chris K Wensel wrote:
> > don't forget you can push your jars to conjars.org (http://conjars.org) (and note that they are there in your README)
> > > > > > ch...@concurrentinc.com (mailto:ch...@concurrentinc.com)
> > > > > > http://concurrentinc.com (http://concurrentinc.com/)
> > > > >
> > > > >
> > > > > On Friday, May 11, 2012 8:31:53 AM UTC-7, Chris K Wensel wrote:
> > > > > > > Is there a particular reason Tuple.get() returns Comparable and not Object?
> > > > > >
> > > > > >
> > > > > >
> > > > > > This is legacy from when Tuples only could hold a Comparable. see getObject if you want an Object..
> > > > > >
> > > > > > I should probably deprecate that method now..
> > > > > >
> > > > > > ckw
> > > > > > --
> > > > > > Chris K Wensel
> > > > > > ch...@concurrentinc.com (mailto:ch...@concurrentinc.com)
> > > > > > http://concurrentinc.com (http://concurrentinc.com/)
> > > > >
> > > > >
> > > > > On Friday, May 11, 2012 8:31:53 AM UTC-7, Chris K Wensel wrote:
> > > > > > > Is there a particular reason Tuple.get() returns Comparable and not Object?
> > > > > >
> > > > > >
> > > > > >
> > > > > > This is legacy from when Tuples only could hold a Comparable. see getObject if you want an Object..
> > > > > >
> > > > > > I should probably deprecate that method now..
> > > > > >
> > > > > > ckw
> > > > > > --
> > > > > > Chris K Wensel
> > > > > > ch...@concurrentinc.com (mailto:ch...@concurrentinc.com)
> > > > > > http://concurrentinc.com (http://concurrentinc.com/)
> > > > >
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the Google Groups "cascading-user" group.
> > > > > To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/LaIKjhvXtaYJ.
> > > > > To post to this group, send email to cascadi...@googlegroups.com (mailto:cascadi...@googlegroups.com).
> > > > > To unsubscribe from this group, send email to cascading-use...@googlegroups.com (mailto:cascading-use...@googlegroups.com).
> > > > > For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
> > > >
> > > >
> > > > --
> > > > Chris K Wensel
> > > > ch...@concurrentinc.com (mailto:ch...@concurrentinc.com)
> > > > http://concurrentinc.com (http://concurrentinc.com/)
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "cascading-user" group.
> > > To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/bHYMtWN-AYMJ.
> > > To post to this group, send email to cascadi...@googlegroups.com (mailto:cascadi...@googlegroups.com).
> > > To unsubscribe from this group, send email to cascading-use...@googlegroups.com (mailto:cascading-user%2Bunsu...@googlegroups.com).
> > > For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
> >
> >
> > --
> > Chris K Wensel
> > ch...@concurrentinc.com (mailto:ch...@concurrentinc.com)
> > http://concurrentinc.com
>
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/viAvXwd72_EJ.
> To post to this group, send email to cascadi...@googlegroups.com (mailto:cascadi...@googlegroups.com).
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com (mailto:cascading-use...@googlegroups.com).
Reply all
Reply to author
Forward
0 new messages