Unable to load serializer for enum?

73 views
Skip to first unread message

PaulON

unread,
May 10, 2016, 5:46:42 AM5/10/16
to cascading-user
Hey,

we are trying to use an enum during our flow and are hitting a Serialization error
since enums are Serializable by default, I would have expected this to just work, is there something that we need to do?

Cheers!
Paul

 Caused by: cascading.CascadingException: unable to load serializer for: com.myEnum.Enumype from: org.apache.hadoop.io.serializer.SerializationFactory

Andre Kelpe

unread,
May 10, 2016, 6:47:00 AM5/10/16
to cascading-user
We currently do not support enums. We looked into ways of supporting
them efficiently, but could not find a way, that made sense. For more
info, see here:
http://docs.cascading.org/cascading/3.0/userguide/ch10-hadoop-common.html#custom-types

- André
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/1e79f070-fe43-487e-87d4-d3eb695b5ed1%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

PaulON

unread,
May 17, 2016, 7:01:11 AM5/17/16
to cascading-user
Thanks André,
so does this mean that even if I make my enum implement Writable its not going to work via Cascading?

Whats the preferred way of achieving the same thing without an enum? Mostly we were using it to ensure the correct sort order for some fields?

PaulON

unread,
May 18, 2016, 2:20:25 PM5/18/16
to cascading-user
I tried implementing my own WritableComparable but was running into lots of issues until I also implemented Serializable and overrode writeObject() & readObject()
Is this expected?

From the link above I understood that I just needed a Writable and the rest would take care of itself?
At this point I'm not if the WritableComparable interface is buying me anything, and if so, why does it also need to be Serializable?

I'm also confused as it whether or not I need to be creating my own Comparator instance and setting it on the sort Fields in my GroupBy?

Are there any examples of how to use a custom type in a tuple and how to sort by that type?

Cheers
P.

Chris K Wensel

unread,
May 18, 2016, 3:24:48 PM5/18/16
to cascadi...@googlegroups.com
Cascading does not by default support Java serialization, you need to add a java serialization hadoop serializer to your config. We do support Hadoop Writable interface by default.

I don’t remember the issues (there have been previous threads on this), but we have been unable to cleanly support enums (in general) without effecting the performance of every type. 

The recommendation is to not use them.

That said, going down the path of making your enum Writable should work (haven’t tried) but you have now coupled your enum to hadoop libraries.

One other option might be to create a CoercibleType for your enum that keeps it as an integer (via the ordinal) in the tuple, but TupleEntry#getObject returns it as a enum. a proper CoercibleType will solve any comparator issues.

fwiw, using types fully in Cascading 3.1 will unlock some nice performance gains.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




PaulON

unread,
May 19, 2016, 6:01:40 AM5/19/16
to cascading-user
Thanks Chris.

I'm still little confused though, I had understood that I just needed my CustomType to support WritableComparable, whats forcing me to make it also Serializable? 

I'll also look into CoercibleType, thanks!

Chris K Wensel

unread,
May 19, 2016, 12:44:15 PM5/19/16
to cascadi...@googlegroups.com
If you wanted to make a type Serializable, you must tell Hadoop to support it via the java serializer Serialization implementation. Cascading does not register it by default since its generally a bad idea.

Writable is supported by default as most complex types used on Hadoop are also Writable (and the Hadoop library ships with a few implementations). You probably should use Writable for any custom types.

CoercibleType allows you to convert one type to another. In this case, convert enums into integers and back. no need for a serialization implementation.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




PaulON

unread,
May 19, 2016, 6:08:59 PM5/19/16
to cascading-user
Thats whats confusing me, I dont want it to be Serializable, but I was getting non Serializable errors when the class was only implementing Writable.

It seems that unless I implement both I get errors

Im on .2.7.8 btw, in case something has changed in 3.0 around this?
Reply all
Reply to author
Forward
0 new messages