Potential issue in serializing collections

48 views
Skip to first unread message

Regunath Balasubramanian

unread,
Mar 3, 2014, 6:22:42 AM3/3/14
to netfli...@googlegroups.com
Hi,

I use the latest code pulled off github [version netflix-zeno-2.3-SNAPSHOT]. I have a user defined composite POJO that has a few referenced types held in Java Collection classes - List, Map and Set [POJO here : https://github.com/regunathb/aesop/blob/master/sample-snapshot-serializer/src/main/java/org/aesop/serializer/model/UserInfo.java]

I have created a SerializerFactory that returns the root NFTypeSerializer and individual instances for each referenced type. These are returned in NFTypeSerializer#requiredSubSerializers() [ SerializerFactory here : https://github.com/regunathb/aesop/blob/master/sample-snapshot-serializer/src/main/java/org/aesop/serializer/serializers/RootSerializerFactory.java] and the NFTypeSerializer instances are here : https://github.com/regunathb/aesop/tree/master/sample-snapshot-serializer/src/main/java/org/aesop/serializer/serializers

The serializeObject(NFSerializationRecord rec, String fieldName, Object obj) method fails for all Collection fields. I therefore had to change them to use Collection specific variants like serializeMap(...), serializeSet(...) and serializeList(....) from 
FlatBlobFrameworkSerializer
These methods work as expected, except for serializeList(...). I get a ClassCastException between the element and collection type when the method add(T data, boolean imageMembershipsFlags[]) on FastBlobTypeSerializationState is invoked. Issue is at line 120 : serializer.serialize(data, rec).

In order to work around this problem, I add the List field into another Collection before making the call to serializeList(FastBlobSerializationRecord rec, String fieldName, String typeName, Collection<T> collection) on FastBlobFrameworkSerializer. Is this a bug or an incorrect usage? - see lines 145 to 147 in this NFTypeSerializer : https://github.com/regunathb/aesop/blob/master/sample-snapshot-serializer/src/main/java/org/aesop/serializer/serializers/UserInfoSerializer.java

Thanks

Drew Koszewnik

unread,
Mar 3, 2014, 3:10:04 PM3/3/14
to netfli...@googlegroups.com
Hi Regunath,

I hope you are finding Zeno useful.  Let me try to address where the confusion might be coming from, and answer your questions.

First, the confusion.  We are currently refactoring the format for declaring NFTypeSerializers.  The latest release, 1.5, required that serializers be defined slightly differently than the latest snapshot, 2.3.  The reasoning behind this refactor is to provide greater insight into what the blob data contains via inspection of the FastBlobSchema, without having to deserialize the data into Objects first.  It looks like you picked up the latest snapshot, and it also looks like you discovered on your own most of the proper way to define a serializer before the documentation is even ready :).

Here's what's happening in your serializer:  In your schema, these fields have been defined as "mapField, setField, listField".  Based on this schema, Zeno expects that the binary representation for these lists, sets, and maps will be "inlined" into the record.  The way you would like to serialize them instead, I gather, is to serialize a reference to the Map, Set, or List objects as their own types, and refer to those representations separately.

In order to do this, you've already declared the required sub serializers.  Now, you'll need to define the FastBlobSchema so that your corresponding fields are simply objects of those types.  Instead of using mapField, setField, and listField, try using just field(String fieldName, String objectTypeName).  The objectTypeName field should be the same as the name you declare in your MapSerializerListSerializer, or SetSerializer definition in the requiredSubSerializers method.  I have attached a modified UserInfoSerializer from your github repository to this message with this modification.

It's unlikely that you'll need to use the "mapField, setField, listField" field declaration methods.  I recommend sticking to the pattern in the attached file.  Thanks for asking about this, we may want to make this difference more obvious, or hide these field declaration methods somehow in a future release to avoid any confusion.

Thanks again,
Drew.
UserInfoSerializer.java

Regunath Balasubramanian

unread,
Mar 4, 2014, 12:53:36 AM3/4/14
to Drew Koszewnik, netfli...@googlegroups.com
Hello again Drew,

Thanks for taking the effort to explain the changes in the serializers and also sending across the edited serializer.
I will follow the pattern as recommended by you.

I agree that the method signatures are a bit confusing presently. The changes I did were from attaching a debugger into the serializer and then looking for alternate methods in the respective classes.

"The objectTypeName field should be the same as the name you declare in your MapSerializerListSerializer, or SetSerializer definition in the requiredSubSerializers method." - IMHO this is a bit confusing where the framework establishes type information based on String literals. Would you consider using the type i.e. the Class object instead? For e.g. I quite intuitively used java.lang.String instead of StringSerializer.NAME in mapField(String name, String keyType, String valueType) which did not work while for my custom types, it did.

I working on a change propagation system that can support "push" and "pull" change event producers. I intend to use Zeno as the "pull" producer. The pipeline would look something like:

Pull Producer                                    Streaming Client 1            Slow/Catchup client1
(Zeno based)    \                              /                                       /
                        \_____ Relay _____/___ Bootstrap __________/
                        /         (Databus)     \      (Databus)                   \
                       /                              \                                       \
Push Producer                                   Streaming Client 2           Slow/Catchup client 2  
(e.g. HBase WAL edits listener)

The flow is from left to right. The change event serving part is built on Databus. I will share my experiences once I have a working prototype of the "pull" producer integrated into this pipeline.

Thanks
Regu


--
You received this message because you are subscribed to a topic in the Google Groups "Netflix Zeno Discussion Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/netflix-zeno/8LSHbCmlnS4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to netflix-zeno...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Drew Koszewnik

unread,
Mar 4, 2014, 2:15:06 PM3/4/14
to netfli...@googlegroups.com, Drew Koszewnik
We experimented with using the classes instead of user-defined type names in the past.  The problem we ran into with the class name was that they can be ambiguous.  You may have multiple zeno types corresponding to the same java class.  This occurs most often for Maps or Collections e.g. a List<String> is the same class as a List<Integer> (List.class).

I look forward to hearing about your experiences.

Thanks,
Drew.
Reply all
Reply to author
Forward
0 new messages