Nathan,
I've been investigating serialization frameworks for java for a
project I'm working on and came across your Kyro project. It looks
decently close to what I'm looking for (object graph serialization)
however there are two things that I'd like to ask you about.
First, there are a couple of pretty obvious bugs listed in the
project's issue tracker that I don't see fixes for in svn. Would it be
possible to get those fixes into svn? Test cases too?
Second, I found a discussion
(http://tech.groups.yahoo.com/group/seajug/message/15555) where you
have done some preliminary work with asm as a replacement for
reflection. I can't seem to find that code in svn - would it be
possible to check that cide into svn (say a branch for now) so that I
can test it out? I'm quite interested in the performance increase that
asm allows - even given the private field issues that asm seems to
have. I'd likely take the route of introspecting the class to
determine if the field is public/has a setter and use ASM, otherwise
fall back to reflection. Compile file addition of setters might be an
option as well though I think it might be a bit of a bad idea.
All the best,
Bruce Ritchie
> Gah! Stupid Google Code wasn't sending me notifications that the project had
> issues. They have all been resolved.
Much appreciated.
> It can actually be used easily without changes to Kryo. Attached is a
> Serializer implementation that uses ReflectASM. I copied and pasted
> FieldSerializer and replaced reflection with ReflectASM. It hasn't been
> tested much, but has a main method that shows it working. You will need
> Kryo, ReflectASM, and the ASM JAR. Here are the last two:
> http://reflectasm.googlecode.com/files/reflectasm-0.8.jar
> http://reflectasm.googlecode.com/svn/trunk/lib/asm-3.2.jar
Awesome, pretty much exactly what I was looking for.
> Mind if I ask what your project is?
Of course. I work for Jive Software and I'm in the midst of replacing
a cache / clustering subsystem based on an expensive commercial
solution with a homegrown solution. I've handled the distributed
caching portion using a customized version of Voldemort
(project-voldemort.com) and I now need to tackle serialization.
Currently we use an Externalizable-like system which can't be reused
(it depends too much on the commercial library) and the prototype is
using Java's plain serialization. I'm in the process of finding (or
creating) a good serialization scheme for our cacheable objects -
preferably one that doesn't require fragile Externalizable-type code
in every class. Fragile in the sense that developers often forget to
update the serialization methods when they modify a class which leads
to all sorts of interesting issues.
> Please note that Kryo doesn't support forward or backward compatibility. I
> hope to add this eventually but just haven't found the time. More info about
> the issue here:
> http://groups.google.com/group/kryo-users/browse_thread/thread/a6832dd24213dc06
> In a nutshell, Kryo is especially great for network communication (see the
> KryoNet project), but less so for long term storage. The reason is that the
> Java class definitions are used as a schema, and if the schema changes the
> serialized objects are invalidated.
Understood. That's not really an issue for us as we're planning on
requiring cache servers to be flushed anytime the code base changes.
Regards,
Bruce Ritchie
> It can actually be used easily without changes to Kryo. Attached is a
> Serializer implementation that uses ReflectASM. I copied and pasted
> FieldSerializer and replaced reflection with ReflectASM. It hasn't been
> tested much, but has a main method that shows it working. You will need
> Kryo, ReflectASM, and the ASM JAR. Here are the last two:
> http://reflectasm.googlecode.com/files/reflectasm-0.8.jar
> http://reflectasm.googlecode.com/svn/trunk/lib/asm-3.2.jar
Awesome, pretty much exactly what I was looking for.
Of course. I work for Jive Software and I'm in the midst of replacing
a cache / clustering subsystem based on an expensive commercial
solution with a homegrown solution. I've handled the distributed
caching portion using a customized version of Voldemort
(project-voldemort.com) and I now need to tackle serialization.
I'm in the process of finding (or
creating) a good serialization scheme for our cacheable objects -
preferably one that doesn't require fragile Externalizable-type code
in every class. Fragile in the sense that developers often forget to
update the serialization methods when they modify a class which leads
to all sorts of interesting issues.
> Please note that Kryo doesn't support forward or backward compatibility. I
> hope to add this eventually but just haven't found the time. More info about
> the issue here:
> http://groups.google.com/group/kryo-users/browse_thread/thread/a6832dd24213dc06
> In a nutshell, Kryo is especially great for network communication (see the
> KryoNet project), but less so for long term storage. The reason is that the
> Java class definitions are used as a schema, and if the schema changes the
> serialized objects are invalidated.
Understood. That's not really an issue for us as we're planning on
requiring cache servers to be flushed anytime the code base changes.
> Neat. It would be cool if Voldemort mentioned Kryo on the front page along
> with the other serialization frameworks! :)
I can't speak for Voldemort as my code changes are outside of their
repository - though I may add support for Kryo as one of their
serialization schemes if I find I have time.
> It sounds like the ObjectBuffer class will be your friend.
Very much so :)
> > I'm in the process of finding (or
> > creating) a good serialization scheme for our cacheable objects -
> > preferably one that doesn't require fragile Externalizable-type code
> > in every class. Fragile in the sense that developers often forget to
> > update the serialization methods when they modify a class which leads
> > to all sorts of interesting issues.
>
> Sounds good. I wonder if registering classes will be an issue?
To a degree, yes. I'm thinking of adding support for an optional
annotation and/or interface that can specify the serializer to be used
for a class. If it's a simple serializer that doesn't require
configuration then it could be something as simple as adding
@Serializer("com.abc.serializer.ExampleSerializer")
If it requires a custom serializer (say an optimized one ala
http://code.google.com/p/thrift-protobuf-compare/source/browse/trunk/tpc/src/serializers/kryo/KryoOptimizedSerializer.java)
then an interface would be implemented:
public Interface Cacheable {
public Serializer getSerializer();
}
When the cache system sees a new object it'll check the registry for
the object's class and if not found it'll first check for the
annotation & interface before just registering the class with the
FieldSerializer. I'll have of course to register a whole whack of
serializers by default for all of our most common field values
(arrays, dates, etc) but that's simple to do.
Regards,
Bruce Ritchie
> Sounds good. I wonder if registering classes will be an issue?To a degree, yes. I'm thinking of adding support for an optional
annotation and/or interface that can specify the serializer to be used
for a class.
> There is more to registration that just which serializer to use. When a
> class is registered, it is assigned an ordinal number. This integer is used
> in the serialized bytes to identify what class to instantiate when the
> object is deserialized. It is very important that the exact same classes are
> registered in the exact same order when the object is deserialized.
>
> Eg, when using the KryoNet library that uses Kryo, typically there is one
> class available on both sides of the communication that registers
> everything:http://code.google.com/p/kryonet/source/browse/trunk/kryonet/examples...
>
> Let me know how much of an issue this is and maybe it can be worked around.
Ah, yes, that will be an issue without using cluster-wide lock
alongside a distributed task (yuck). I'll dive into that code to see
if I can think of a good solution. I assume you did this to reduce
serialized object size?
Regards,
Bruce Ritchie
> Ah, yes, that will be an issue without using cluster-wide lock
> alongside a distributed task (yuck). I'll dive into that code to see
> if I can think of a good solution.
Since knowing all the classes up front is basically impossible in our
system (given that we have plugins into our app) I think the quickest
solution for this is type.toString().hashCode(). Not foolproof as it's
prone to hash collisions and thus not really workable. Perhaps an
alternate method that accepts an ordinal calculated or retrieved
outside of Kryo ala
public void register (Class type, int ordinal, Serializer serializer)
It could throw a runtime exception if the ordinal is already
registered to another class. Mixing the various register methods could
produce OrdinalAlreadyExistsException's if the spread between
automatically generated ids and manually provided ones isn't large
enough, but I think that's something that is easy enough to deal with
(e.g. ordinal = ordinal < BILLION ? ordinal + BILLION : ordinal;)
Of course, this brings up another related issue. If node A puts
ObjectA into the cache and node B retrieves it (where node B has not
yet registered ObjectA) then it won't be able to deserialize the
object. While rare I think I can handle this fairly easily by
automatically sending new registration events to all cluster nodes.
Regards,
Bruce Ritchie
Ah, yes, that will be an issue without using cluster-wide lock
alongside a distributed task (yuck). I'll dive into that code to see
if I can think of a good solution. I assume you did this to reduce
serialized object size?
> Yes. Kryo was originally for a client/server network library, specifically
> for games. Sending an int instead of a String classname is ideal, and in
> this application registering all classes up front is reasonable.
>
> I think Kryo would be useful to wider audience if it could also work without
> registration. Storing the classname as a String is the easiest solution. I
> will look at adding this option to Kryo.
I've started some work on this and I'm not sure the no-registration
route is a no-brainer. Two reasons:
1. class name to Class might require classloader knowledge -
Class.forName() won't always cut it. We'd have to either inject a
resolver or a list of classloaders to try.
2. If node A happens to have an object registered in a non-standard
way e.g.
FieldSerializer serializer = new FieldSerializer(kryo);
serializer.removeField(TestClass.class, "optional");
kryo.register(TestClass.class, serializer);
then node B won't be able to deserialize that object without knowledge
of the exact serializer used. The best way I can think of to make this
work is to optionally allow objects that are being used in this manner
to implement an interface/annotation that defines the serializer to be
used. If the class doesn't have the annotation nor implements the
interface then we try with the default serializer for the class. By
doing this (and *not* using the register(Class, Serializer) method I
think it should work well enough.
Does that sound right or am I missing something obvious (quick
possible, it's been a very long week) ?
Regards,
Bruce Ritchie
1. class name to Class might require classloader knowledge -
Class.forName() won't always cut it. We'd have to either inject a
resolver or a list of classloaders to try.
2. If node A happens to have an object registered in a non-standard
way e.g.
> We could use the thread context classloader. Or we could allow you to set a
> classloader on the Kryo instance. I think it would be uncommon to need a
> list of classloaders, and in that case you could write a classloader to
> delegate to the list if needed.
True. I took the route in my instance to inject a class resolver to
delegate the class resolving too.
Class resolveClass(String classname) throws ClassNotFoundException;
>> 2. If node A happens to have an object registered in a non-standard
>> way e.g.
>
> The CustomSerialization interface that allows a class to do its own
> serialization:
> http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/CustomSerialization.java
> You don't have to only use the get/put methods on ByetBuffer, you can make
> use of other serializers inside of the read/writeObjectData methods.
>
> I'll try to take a look at implement String classnames and a setClassloader
> method today.
Ok. I can send you what I've got locally if you're interested. It's
not really what I'd like to see in the finished version (i.e. I'd like
to disallow register(class, serializer) when using classnames instead
of ordinals) but it's a start. The Kryo class might need a nice
refactor to make supporting this cleaner.
Regards,
Bruce Ritchie
Ok. I can send you what I've got locally if you're interested. It's
not really what I'd like to see in the finished version (i.e. I'd like
to disallow register(class, serializer) when using classnames instead
of ordinals) but it's a start. The Kryo class might need a nice
refactor to make supporting this cleaner.
On Fri, Jan 8, 2010 at 8:00 PM, Nate <nathan...@gmail.com> wrote:
> Thanks for sending the code Bruce. I've checked in a similar implementation
> and included some of your changes. Good to see others digging in to the code! :)
I'm glad the code is available for me to dig into! I really appreciate
the effort you've taken
in making this library available.
> * Kryo#setAllowUnregisteredClasses(boolean) allows classes to be serialized,
> even if they are not registered. You can still register classes you know
> will always be around and those will use an int class ID. Unregistered
> classes will store a String class name. I think being able to mix the two
> usages is nice. If you don't care too much about serialized size, you can
> ignore registering any classes.
Sounds good.
> * I included DateSerializer but didn't register it by default. When class
> IDs are ints, the first 127 classes registered can be stored using just 1
> byte. I don't want to use up one of these slots when some apps many not
> serialize dates.
Right. I added DateSerializer because by default I don't think Dates
would be serialized properly by Kryo as there are no non-transient
fields in Date (it's all handled in writeObject/readObject). I didn't
add a test case to prove that though - I will try to do that this
weekend.
> * The way I did the unregistered classes is, we store a class ID anyway. 0
> is special and means null, otherwise it is a class ID. I added another
> special value that means a class name follows. I chose 16383 so it doesn't
> use up one of the first 127 one byte slots. So 16383 will take 2 bytes and a
> String class name follows. I figured 2 bytes is fine since if you are
> storing class name Strings you are already being inefficient.
That makes more sense then my 1 byte boolean hack.
> * I optimized LongSerializer similar to IntSerializer.
Ah, nice. I tried to do that initially but I couldn't for the life of
me find the bug that was causing it to fail to properly serialize
things like -1. Since most longs in my system would fit just fine as
an int I kinda cheated :)
> * I added a setClassLoader method. Kryo will use this to resolve class
> names. I like this over a special interface just for simplicity. They do the
> same thing.
>
> * I updated some tests.
>
> Let me know how that works for you!
Sounds good - I'll check it out this weekend. I added one other thing
locally today that I'm finding useful and allows cleaner code (for me
anyways). Basically I added the ability to specify the serializer as
an annotation on a class. That way, if you use an (unregistered) class
the serializer to be used will be picked up by Kryo automatically.
It's similar in idea to the CustomSerialization however it has some
nice benefits:
1. It makes it really easy to use a (or a few) serializers for many
classes without having to delegate calls through read/writeObjectData
methods (and not have to register those classes + serializers
manually)
2. You can more easily specify custom serialization behavior for
unregistered classes then otherwise may be possible. For example, the
main custom serializer I'm using allows for classes have methods for
pre serialization and pre/post deserialization calls (For transient
data initializations, reregistering classes, etc).
3. It allows me to have my main codebase not have any direct
dependencies on Kryo code - rather all the direct kryo dependencies
are held in a pluggable serialization implementation in my caching
library.
If you're interested in this I can pass it along after I merge your
changes into my checkout.
Regards,
Bruce Ritchie
> Sounds good - I'll check it out this weekend. I added one other thing
> locally today that I'm finding useful and allows cleaner code (for me
> anyways). Basically I added the ability to specify the serializer as
> an annotation on a class. That way, if you use an (unregistered) class
> the serializer to be used will be picked up by Kryo automatically.
> It's similar in idea to the CustomSerialization however it has some
> nice benefits:
Or not ... I just realized that you had written things so that Kryo
could be easily extended. That will likely be more then enough for my
needs.
Regards,
Bruce Ritchie
Right. I added DateSerializer because by default I don't think Dates
would be serialized properly by Kryo as there are no non-transient
fields in Date (it's all handled in writeObject/readObject). I didn't
add a test case to prove that though - I will try to do that this
weekend.
> * I optimized LongSerializer similar to IntSerializer.Ah, nice. I tried to do that initially but I couldn't for the life of
me find the bug that was causing it to fail to properly serialize
things like -1. Since most longs in my system would fit just fine as
an int I kinda cheated :)
Sounds good - I'll check it out this weekend. I added one other thing
locally today that I'm finding useful and allows cleaner code (for me
anyways). Basically I added the ability to specify the serializer as
an annotation on a class.
If you're interested in this I can pass it along after I merge your
changes into my checkout.
Or not ... I just realized that you had written things so that Kryo
could be easily extended. That will likely be more then enough for my
needs.
-Nate
On Jan 6, 12:02 pm, Nate <nathan.sw...@gmail.com> wrote:
> Hi Bruce,
>
> (Note: I added the Kryo discussion group to my reply).
>
> Gah! Stupid Google Code wasn't sending me notifications that the project had
> issues. They have all been resolved.
>
> I haven't gotten around to implementing the bytecode manipulation in Kryo.
> My initial efforts, as you saw on the SeaJUG mailing list, resulted in a new
> project I called ReflectASM:http://code.google.com/p/reflectasm/
> It is an easy to use, general purpose reflection replacement that does
> bytecode generation. I plan to use ReflectASM directly in Kryo. If it needs
> some customization I may copy ReflectASM's classes into Kryo (ReflectASM is
> very small).
>
> It can actually be used easily without changes to Kryo. Attached is a
> Serializer implementation that uses ReflectASM. I copied and pasted
> FieldSerializer and replaced reflection with ReflectASM. It hasn't been
> tested much, but has a main method that shows it working. You will need
> Kryo, ReflectASM, and the ASM JAR. Here are the last two:http://reflectasm.googlecode.com/files/reflectasm-0.8.jarhttp://reflectasm.googlecode.com/svn/trunk/lib/asm-3.2.jar
>
> Mind if I ask what your project is?
>
> Please note that Kryo doesn't support forward or backward compatibility. I
> hope to add this eventually but just haven't found the time. More info about
> the issue here:http://groups.google.com/group/kryo-users/browse_thread/thread/a6832d...
> In a nutshell, Kryo is especially great for network communication (see the
> KryoNet project), but less so for long term storage. The reason is that the
> Java class definitions are used as a schema, and if the schema changes the
> serialized objects are invalidated.
>
> -Nate
>
> On Wed, Jan 6, 2010 at 8:40 AM, Bruce Ritchie <bruce.ritc...@gmail.com>wrote:
>
> > Nathan,
>
> > I've been investigating serialization frameworks for java for a
> > project I'm working on and came across your Kyro project. It looks
> > decently close to what I'm looking for (object graph serialization)
> > however there are two things that I'd like to ask you about.
>
> > First, there are a couple of pretty obvious bugs listed in the
> > project's issue tracker that I don't see fixes for in svn. Would it be
> > possible to get those fixes into svn? Test cases too?
>
> > Second, I found a discussion
> > (http://tech.groups.yahoo.com/group/seajug/message/15555) where you
> > have done some preliminary work with asm as a replacement for
> > reflection. I can't seem to find that code in svn - would it be
> > possible to check that cide into svn (say a branch for now) so that I
> > can test it out? I'm quite interested in the performance increase that
> > asm allows - even given the private field issues that asm seems to
> > have. I'd likely take the route of introspecting the class to
> > determine if the field is public/has a setter and use ASM, otherwise
> > fall back to reflection. Compile file addition of setters might be an
> > option as well though I think it might be a bit of a bad idea.
>
> > All the best,
>
> > Bruce Ritchie
>
>
>
> ReflectASMSerializer.java
> 11KViewDownload