Support for forward and/or backward compatibility

545 views
Skip to first unread message

martin.grotzke

unread,
Oct 24, 2009, 8:21:09 AM10/24/09
to kryo-users
Hi,

I'm currently evaluating alternative serialization mechanisms for the
memcached-session-manager (http://code.google.com/p/memcached-session-
manager/). This serializes http sessions (tomcat/catalina sessions,
with its own private fields and the session attributes, which are
application specific) and stores them for backup in memcached nodes.

As I want to support changing application software versions and code
upgrades (of pojos stored in session attributes), the serialization
strategy should support forward compatibility, so that I can
deserialize an older version of a class to a newer version, even if
the newer version does not contain a field that the former version
contained. E.g. if the person class in a newer version no longer
contains the salutation field (as it's no longer needed), this shall
just be ignored during deserialization.

As the application code is not under my control, I cannot tell the
field serializer to ignore this field, as I have no knowledge which
application classes are serialized/deserialized.

The documentation right now says that "the class definition during
deserialization must be identical to when the class was serialized. In
the future Kryo may support optional forward and/or backward
compatibility.". Does this refer to what I want to have? Is there a
timeframe/roadmap for this?

Btw: this lib looks really nice and I would be happy if I could use it
for the memcached session manager!

Thanx && cheers,
Martin

Nate

unread,
Oct 30, 2009, 7:14:59 AM10/30/09
to kryo-users
Hi Martin,

Sorry I didn't see your message sooner!

Yes, that bit in the documentation is relevant to your scenario. The
only way FieldSerializer knows what fields to read from the serialized
data is by the Java class definition of the object it is
deserializing. This means if you have added a method since you
serialized, it will expect to read that method.

This is fine in many scenarios, eg networked games that serialize to
send objects, and has the advantage that no metadata needs to be
stored. However, any time you are serializing to store an object for
later, possibly much later, then forward compatibility becomes very
important.

I do feel that this is a needed feature for Kryo, there just hasn't
been a pressing need for it. Until now. :) I will put it on my list of
things to get done. I will definitely get to it in the next week or
two, if not sooner.

Thinking about it briefly, there seems to be a couple ways to go about
it.

We could store the field names so we know what field each serialized
value belongs to. The downside is that the serialized data becomes
larger. The upside is that the programmer doesn't have to do anything
extra.

We could require a schema to be provided (like Protobuf does). The
schema would define a unique integer "id" for each field. This
minimizes the metadata that needs to be written for each field, but
requires the programmer to maintain a single schema.

We could require a schema for each possible deserialization (like Avro
does). This means metadata doesn't have to be written per field, but
requires the programmer to maintain multiple schemas.

Note to implement this doesn't require any core changes to Kryo, only
the addition of a new Serializer (or maybe it could be an optional
feature of FieldSerializer). You could even copy and paste
FieldSerializer and just add writing/reading the field name instead of
relying on the Java class definition. Hardest part is naming the new
serializer class! ;)

Unrelated to forward compatibility, you may find it annoying to have
to register each class to serialize. Because classes must be
registered in the same, specific order for deserialization, you will
probably have to expose the Kryo API to your users and require they
register their classes. A way around this might be to register classes
in the order they are encountered, and to persist this order so that
subsequent runs of your application can register the classes in the
same order. This falls apart if you remove one of the classes though.

Kryo could be enhanced to allow for the fully qualified class name to
be serialized instead of the registered class ID. This would remove
the need to register classes.

These issues are coming up because thus far Kryo has only been used
for network communication, where instead of these being issues, they
are features that keep the serialized size to a minimum. I am all for
making Kryo more general purpose to better suit your needs. I will
give it some more thought.

-Nate


On Oct 24, 5:21 am, "martin.grotzke" <martin.grot...@googlemail.com>
wrote:

Martin Grotzke

unread,
Oct 30, 2009, 8:42:35 AM10/30/09
to kryo-...@googlegroups.com
Hi Nathan,

On Fri, Oct 30, 2009 at 12:14 PM, Nate <nathan...@gmail.com> wrote:

Hi Martin,

Sorry I didn't see your message sooner!
No problem :)
 
Thinking about it briefly, there seems to be a couple ways to go about
it.

We could store the field names so we know what field each serialized
value belongs to. The downside is that the serialized data becomes
larger. The upside is that the programmer doesn't have to do anything
extra.

We could require a schema to be provided (like Protobuf does). The
schema would define a unique integer "id" for each field. This
minimizes the metadata that needs to be written for each field, but
requires the programmer to maintain a single schema.

We could require a schema for each possible deserialization (like Avro
does). This means metadata doesn't have to be written per field, but
requires the programmer to maintain multiple schemas.

For me mixed approach would be suitable:
- The tomcat StandardSession is fully "under my control", so I know what to serialize, therefore I could provide a schema or just use a SimpleSerializer (as I want to explicitely control which field gets serialized, and I want to write deserialized data into an instance of StandardSession, that I created by myself)
- The user classes are not under my control, and the user shouldn't be forced to provide a schema for his classes. For these classes the field names should be stored.


Note to implement this doesn't require any core changes to Kryo, only
the addition of a new Serializer (or maybe it could be an optional
feature of FieldSerializer). You could even copy and paste
FieldSerializer and just add writing/reading the field name instead of
relying on the Java class definition. Hardest part is naming the new
serializer class! ;)
Hehe, indeed kryo seems really nice! The first page shows all the things that I want to have - with other serialization libs it's hard to find out if/how s.th. is supported. Nice work! :)
 

Unrelated to forward compatibility, you may find it annoying to have
to register each class to serialize. Because classes must be
registered in the same, specific order for deserialization, you will
probably have to expose the Kryo API to your users and require they
register their classes. A way around this might be to register classes
in the order they are encountered, and to persist this order so that
subsequent runs of your application can register the classes in the
same order. This falls apart if you remove one of the classes though.

Kryo could be enhanced to allow for the fully qualified class name to
be serialized instead of the registered class ID. This would remove
the need to register classes.
Ok, it seems I didn't read the docs good enough regarding the required order of registration. S.th. like storing the classname would be good for me.


These issues are coming up because thus far Kryo has only been used
for network communication, where instead of these being issues, they
are features that keep the serialized size to a minimum. I am all for
making Kryo more general purpose to better suit your needs. I will
give it some more thought.
Great! :)

Cheers,
Martin



--
Martin Grotzke
http://www.javakaffee.de/blog/

Nate

unread,
Nov 30, 2009, 7:20:54 PM11/30/09
to kryo-users
I didn't forget about this. I've given it some thought and it is a bit
tricky. I'll get to it eventually!

-Nate


On Oct 30, 4:42 am, Martin Grotzke <martin.grot...@googlemail.com>
wrote:
> Hi Nathan,

Martin Grotzke

unread,
Dec 1, 2009, 7:07:28 PM12/1/09
to kryo-...@googlegroups.com
Great. In the meantime I managed to implement some xml based serialization strategies and to setup a first performance comparison:

Appreciating your solution and looking forward to integrate this,
cheers,
Martin



--
You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users

Nate

unread,
Mar 29, 2010, 3:59:50 AM3/29/10
to kryo-...@googlegroups.com
Hi Martin,

I have filed an issue to track the need for forward/backward compatibility:
http://code.google.com/p/kryo/issues/detail?id=12

A few days ago I implemented a serializer that has limited forward and backward compatibility. The class is called CompatibleFieldSerializer and is currently only available in SVN. It serializes similar to FieldSerializer, except it supports adding and/or removing fields on classes without invalidating any previously serialized bytes. Note that it does not at all support changing the type of a field, not even something like long to int.

Size-wise, it of course outputs more serialized bytes than FieldSerializer, since it has to write a header with field name strings and the length of each value. This is described in more detail in the issue.

Speed-wise, the current implementation uses only reflection. It doesn't attempt to use bytecode generation on public fields like FieldSerializer. I intend to refactor FieldSerializer so this functionality can be shared. This provides a decent speed up, but only on public fields.

I wonder if I should add another dependency JAR so I can make use of my ReflectASM project?
http://code.google.com/p/reflectasm/
I know some people are annoyed by projects needing a bunch of JARs. I'll probably do it anyway though, maybe it will help spread the word about how cool ReflectASM is! ;)

Anyway, if you get a chance to implement it, I would be interested in how Kryo compares to the other serialization solutions on your performance wiki page.

-Nate

Martin Grotzke

unread,
Mar 29, 2010, 4:26:37 AM3/29/10
to kryo-...@googlegroups.com
Hi Nate,

really cool!

Regarding ReflectASM: would it be possible to allow the user to use
it, e.g. if it's in the class path, then use reflect-asm, otherwise
java reflection? Although, you're right with the
force-spreading-the-word approach, I can absolutely follow you here
:-)

I want to add the option to use reflect-asm to existing (javolution)
serializers and I'm really looking forward seeing what speed
improvements it brings! Unfortunately I don't know yet how much time
is spent in reflection compared to e.g. writing serialized data.

As soon as I find some time I'll play with kryo again and try to
integrate it as a serializer. Right now I'm working on a release of
the memcached-session-manager (besides daily paid work :)), which I
hope to finish this or the next week...

Thanx && cheers,
Martin

> To unsubscribe from this group, send email to
> kryo-users+unsubscribegooglegroups.com or reply to this email with the words
> "REMOVE ME" as the subject.

Reply all
Reply to author
Forward
0 new messages