[kryo] Deserialization incredibly slow

1,947 views
Skip to first unread message

Martin Grotzke

unread,
May 4, 2010, 7:54:42 PM5/4/10
to kryo-users
Hi,

these days I finally integrated the kryo based serialization strategy
for memcached-session-manager in our project, and deserialization is
incredibly slow (10 - 30 times slower than java deserialization!).
Still, serialization time is good compared to java serialization.

I just assembled a benchmark that compares java serialization,
javolution (xml) based serialization and kryo serialization.

The benchmark (the code at github: http://is.gd/bUib7) creates a
simple object graph with some sample data like e.g. a list of persons
that have name, birthday, connected persons (friends) etc,.
This data is serialized 1000 times, this is done in 10 rounds, to get
min/avg/max time needed for 1000 serializations. The same is done for
deserialization.
To see how numbers depend on object (graph) size, this was done for 3
different samples (20 person objects, 100 persons, 500 persons).

You can see the results here (size in bytes, time in millis):
http://spreadsheets.google.com/pub?key=0AspDnQzTrkhddDRWeEl3LWs5R0UxbFB6MURmdHY1cGc&gid=0

I also created charts for the sheats, so that you can have a looks at
the chart besides a sheat to get a quick overview.

Does anyone experience the same? Has anybody a clue what might be the
reason for this (ok, I know I could dig deeper for myself)?

Cheers,
Martin

--
You received this message because you are subscribed to the "kryo-users" group.
http://groups.google.com/group/kryo-users

martin.grotzke

unread,
May 4, 2010, 8:47:21 PM5/4/10
to kryo-users
Hi,

I created a very simple test that just compares java and kryo
deserialization, with both just a string and a list of Person classes
(FieldSerializer), this is the result:

kryo time for String: 140 ms (serialized size: 10243)
kryo time for ArrayList: 5836 ms (serialized size: 37503)
java time for String: 184 ms (serialized size: 10247)
java time for ArrayList: 589 ms (serialized size: 8182)

(the code is pasted at the end of the email)

It shows, that for String deserialization kryo is faster than java,
but the FieldSerializer is 10x slower (serialized data also more than
java).

Any clue what's causing this?

Cheers,
Martin


The code:

public static void main( final String[] args ) throws
InterruptedException, IOException, ClassNotFoundException {
benchDeserializationKryo( newString( 10 ) );
recover();
benchDeserializationKryo( createPersons( 500 ) );
recover();
benchDeserializationJava( newString( 10 ) );
recover();
benchDeserializationJava( createPersons( 500 ) );
}

private static void recover() throws InterruptedException {
System.gc();
Thread.sleep( 500 );
}

private static void benchDeserializationKryo( final Object obj ) {
final Kryo kryo = new Kryo();
kryo.setRegistrationOptional( true );

final byte[] data = new ObjectBuffer(kryo, 10 * 1024, 100 *
1024 ).writeObject( obj );

final long start = System.currentTimeMillis();
for( int i = 0; i < 1000; i++ ) {
new ObjectBuffer(kryo).readObject( data, obj.getClass() );
}
System.out.println( "kryo time for "+
obj.getClass().getSimpleName() +": " + ( System.currentTimeMillis() -
start) + " ms (serialized size: " + data.length + ")");
}

private static void benchDeserializationJava( final Object obj )
throws IOException, ClassNotFoundException {
final ByteArrayOutputStream bos = new ByteArrayOutputStream();
final ObjectOutputStream oos = new ObjectOutputStream( bos );
oos.writeObject( obj );
final byte[] data = bos.toByteArray();
oos.close();

final long start = System.currentTimeMillis();
for( int i = 0; i < 1000; i++ ) {
final ObjectInputStream ois = new ObjectInputStream( new
ByteArrayInputStream( data ) );
ois.readObject();
ois.close();
}
System.out.println( "java time for "+
obj.getClass().getSimpleName() +": " + ( System.currentTimeMillis() -
start) + " ms (serialized size: " + data.length + ")");
}

private static List<Person> createPersons( final int numPersons )
{
final List<Person> items = new ArrayList<Person>(numPersons);
for ( int i = 0; i < numPersons; i++ ) {
items.add( new Person( "foo", "bar" ) );
}
return items;
}

private static String newString( final int lengthInKb ) {
final StringBuilder sb = new StringBuilder( lengthInKb );
final Random random = new Random();
for( int i = 0; i < lengthInKb * 1024; i++ ) {
sb.append( random.nextInt( 9 ) );
}
return sb.toString();
}

private static class Person implements Serializable {
private String firstName;
private String lastName;
public Person() {
}
public Person( final String firstName, final String lastName )
{
this.firstName = firstName;
this.lastName = lastName;
}
}



On May 5, 1:54 am, Martin Grotzke <martin.grot...@googlemail.com>
wrote:
> Hi,
>
> these days I finally integrated the kryo based serialization strategy
> for memcached-session-manager in our project, and deserialization is
> incredibly slow (10 - 30 times slower than java deserialization!).
> Still, serialization time is good compared to java serialization.
>
> I just assembled a benchmark that compares java serialization,
> javolution (xml) based serialization and kryo serialization.
>
> The benchmark (the code at github:http://is.gd/bUib7) creates a
> simple object graph with some sample data like e.g. a list of persons
> that have name, birthday, connected persons (friends) etc,.
> This data is serialized 1000 times, this is done in 10 rounds, to get
> min/avg/max time needed for 1000 serializations. The same is done for
> deserialization.
> To see how numbers depend on object (graph) size, this was done for 3
> different samples (20 person objects, 100 persons, 500 persons).
>
> You can see the results here (size in bytes, time in millis):http://spreadsheets.google.com/pub?key=0AspDnQzTrkhddDRWeEl3LWs5R0Uxb...

martin.grotzke

unread,
May 4, 2010, 8:55:04 PM5/4/10
to kryo-users
Hi,

a first finding: changing Kryo.newInstance from
return type.newInstance();
to
Constructor<T> constructor = type.getConstructor();
constructor.setAccessible( true );
return constructor.newInstance();
changes improves it like this:
kryo time for ArrayList: 2337 ms (serialized size: 37503)

This is a reduction to ~50% (!), but it's still 5x slower than java...

Cheers,
Martin




On May 5, 2:47 am, "martin.grotzke" <martin.grot...@googlemail.com>
wrote:

Nate

unread,
May 5, 2010, 12:50:08 AM5/5/10
to kryo-...@googlegroups.com
By enabling unregistered classes, the Person class name will be written for each Person instance. Now that Kryo has the "reset context" notion, this could be improved to only write the class name for the first instance. I will get to this soon. For now, just register the classes for the benchmark. This explains the larger size.

No need to create a new ObjectBuffer for each deserialization.

Optimizations can be had if the class and fields are public.

Below is an updated version of your test. I see Kryo is always smaller. I see Kryo is faster for the Person list up to ~100 instances. More instances than this and the test shows Java as faster. Eg, here is 300:
kryo time for ArrayList: 328 ms (serialized size: 3302)
java time for ArrayList: 222 ms (serialized size: 4961)

However, comment out all the tests except for the Kryo Persons list test and the results for 300 instances becomes:
kryo time for ArrayList: 321 ms (serialized size: 3302)
kryo time for ArrayList: 184 ms (serialized size: 3302)
kryo time for ArrayList: 321 ms (serialized size: 3302)
kryo time for ArrayList: 185 ms (serialized size: 3302)
kryo time for ArrayList: 321 ms (serialized size: 3302)
kryo time for ArrayList: 184 ms (serialized size: 3302)

This shows the test is flawed, for whatever reason. GC is happening during the test?

You might check out these two JVM parameters:
-agentlib:hprof=cpu=samples
-verbose:gc

-Nate


public static void main (final String[] args) throws Exception {
    String stringValue = newString(10);
    List<Person> personList = createPersons(500);

    for (int i = 0; i < 10; i++) {
        benchDeserializationKryo(stringValue);
        recover();
        benchDeserializationKryo(personList);
        recover();
        benchDeserializationJava(stringValue);
        recover();
        benchDeserializationJava(personList);
        recover();
        System.out.println();
    }
}

private static void recover () throws InterruptedException {
    System.gc();
    System.gc();
    Thread.sleep(100);

}

private static void benchDeserializationKryo (final Object obj) {
    final Kryo kryo = new Kryo();
    kryo.register(ArrayList.class);
    kryo.register(Person.class);
    // kryo.setRegistrationOptional(true);

    ObjectBuffer buffer = new ObjectBuffer(kryo, 10 * 1024, 100 * 1024);
    final byte[] data = buffer.writeObjectData(obj);
    Class c = obj.getClass();

    Object[] results = new Object[1000];


    final long start = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
        results[i] = buffer.readObjectData(data, c);
    }
    final long end = System.currentTimeMillis();
    System.out.println("kryo time for " + obj.getClass().getSimpleName() + ": " + (end - start) + " ms (serialized size: "

        + data.length + ")");
}

private static void benchDeserializationJava (final Object obj) throws IOException, ClassNotFoundException {

    final ByteArrayOutputStream bos = new ByteArrayOutputStream();
    final ObjectOutputStream oos = new ObjectOutputStream(bos);
    oos.writeObject(obj);
    final byte[] data = bos.toByteArray();
    oos.close();

    Object[] results = new Object[1000];


    final long start = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
        final ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(data));
        results[i] = ois.readObject();
        ois.close();
    }
    final long end = System.currentTimeMillis();
    System.out.println("java time for " + obj.getClass().getSimpleName() + ": " + (System.currentTimeMillis() - start)

        + " ms (serialized size: " + data.length + ")");
}

private static List<Person> createPersons (final int numPersons) {
    final List<Person> items = new ArrayList<Person>(numPersons);
    for (int i = 0; i < numPersons; i++) {

        items.add(new Person("foo", "bar"));
    }
    return items;
}

private static String newString (final int lengthInKb) {
    final StringBuilder sb = new StringBuilder(lengthInKb);
    final Random random = new Random();
    for (int i = 0; i < lengthInKb * 1024; i++) {
        sb.append(random.nextInt(9));
    }
    return sb.toString();
}

public static class Person implements Serializable {
    public String firstName;
    public String lastName;

    public Person () {

    }

    public Person (final String firstName, final String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }
}


Martin Grotzke

unread,
May 5, 2010, 3:12:53 AM5/5/10
to kryo-...@googlegroups.com
Hi Nate,

thanx a lot for your fast response!

The change of Person fields to public fields does not reflect my use
case, as in the memcached-session-manager I serialize objects that are
not under my control.
Reusing the ObjectBuffer is valid in this simple test case, but does
not seem to be a valid optimization as it's not thread safe. If
creating object buffers would be that expensive I could introduce
pooling, but a quick comparison showed that it doesn't make a
significant difference (see numbers at the end of the email).

On Wed, May 5, 2010 at 6:50 AM, Nate <nathan...@gmail.com> wrote:
> By enabling unregistered classes, the Person class name will be written for
> each Person instance. Now that Kryo has the "reset context" notion, this
> could be improved to only write the class name for the first instance. I
> will get to this soon.
Sounds great!

Do you have an idea btw, why kryo performance is dropping for larger
object graphs? To me fast serialization of large object graphs is
especially important, as session tend to grow large as users often
don't even know what stuff makes it into their session :-)

Cheers,
Martin


= Creating new ObjectBuffers vs. Reusing ObjectBuffers =
== New ObjectBuffer ==
kryo time for ArrayList: 1088 ms (serialized size: 5503)
kryo time for ArrayList: 514 ms (serialized size: 5503)
kryo time for ArrayList: 508 ms (serialized size: 5503)
kryo time for ArrayList: 382 ms (serialized size: 5503)
kryo time for ArrayList: 460 ms (serialized size: 5503)
java time for ArrayList: 633 ms (serialized size: 8182)
java time for ArrayList: 363 ms (serialized size: 8182)
java time for ArrayList: 310 ms (serialized size: 8182)
java time for ArrayList: 294 ms (serialized size: 8182)
java time for ArrayList: 299 ms (serialized size: 8182)

== Reusing ObjectBuffer ==
kryo time for ArrayList: 815 ms (serialized size: 5503)
kryo time for ArrayList: 499 ms (serialized size: 5503)
kryo time for ArrayList: 528 ms (serialized size: 5503)
kryo time for ArrayList: 374 ms (serialized size: 5503)
kryo time for ArrayList: 444 ms (serialized size: 5503)
java time for ArrayList: 593 ms (serialized size: 8182)
java time for ArrayList: 364 ms (serialized size: 8182)
java time for ArrayList: 291 ms (serialized size: 8182)
java time for ArrayList: 284 ms (serialized size: 8182)
java time for ArrayList: 283 ms (serialized size: 8182)
--
Martin Grotzke
http://www.javakaffee.de/blog/

Nate

unread,
May 5, 2010, 3:34:01 AM5/5/10
to kryo-...@googlegroups.com
On Wed, May 5, 2010 at 12:12 AM, Martin Grotzke <martin....@googlemail.com> wrote:
Reusing the ObjectBuffer is valid in this simple test case, but does
not seem to be a valid optimization as it's not thread safe. If
creating object buffers would be that expensive I could introduce
pooling, but a quick comparison showed that it doesn't make a
significant difference (see numbers at the end of the email).

You need one ObjectBuffer per thread, but this doesn't mean you need to allocate a new one each time, though the overhead for doing so isn't huge. You only need an ObjectBuffer because you have a byte array and not a ByteBuffer. Including ObjectBuffer creation in the benchmark adds the memory allocation at best. At worse it adds allocation thrashing from ObjectBuffer's grow-and-retry approach.

Do you have an idea btw, why kryo performance is dropping for larger
object graphs? To me fast serialization of large object graphs is
especially important, as session tend to grow large as users often
don't even know what stuff makes it into their session  :-)

Well, running the code from my last email with only the Kryo Person list test of 300 instances and using -agentlib:hprof=cpu=samples,depth=8 gives:

rank   self  accum   count trace method
   1 73.54% 73.54%     139 300089 sun.reflect.Reflection.getClassAccessFlags
   2  8.99% 82.54%      17 300092 java.lang.Class.getModifiers
   3  5.29% 87.83%      10 300095 java.lang.Runtime.gc
   4  2.65% 90.48%       5 300091 java.lang.ref.Reference$ReferenceHandler.run
   5  1.06% 91.53%       2 300096 java.io.Writer.write
   6  0.53% 92.06%       1 300098 sun.nio.cs.StreamEncoder.write
   7  0.53% 92.59%       1 300101 java.nio.Buffer.hasRemaining
   8  0.53% 93.12%       1 300099 sun.nio.cs.StreamEncoder.write
   9  0.53% 93.65%       1 300100 java.lang.ref.Reference$ReferenceHandler.run
  10  0.53% 94.18%       1 300102 java.nio.charset.CharsetEncoder.encode
  11  0.53% 94.71%       1 300094 com.esotericsoftware.kryo.serialize.StringSerializer.get
  12  0.53% 95.24%       1 300093 java.nio.Buffer.<init>
  13  0.53% 95.77%       1 300103 sun.nio.cs.StreamEncoder.write
  14  0.53% 96.30%       1 300097 java.nio.Buffer.hasRemaining
  15  0.53% 96.83%       1 300105 java.io.BufferedWriter.write
  16  0.53% 97.35%       1 300081 sun.reflect.MethodAccessorGenerator.generate
  17  0.53% 97.88%       1 300070 sun.misc.Resource.getBytes
  18  0.53% 98.41%       1 300065 sun.util.calendar.CalendarSystem.<clinit>
  19  0.53% 98.94%       1 300061 java.lang.Throwable.fillInStackTrace
  20  0.53% 99.47%       1 300104 java.io.BufferedWriter.write
  21  0.53% 100.00%       1 300047 sun.nio.cs.UTF_8.newDecoder

The first two entries take up most of the time. The four after that are related to the test (writing to stdout). Here are the first two:

TRACE 300089:
    sun.reflect.Reflection.getClassAccessFlags(Reflection.java:Unknown line)
    sun.reflect.Reflection.quickCheckMemberAccess(Reflection.java:51)
    java.lang.Class.newInstance0(Class.java:346)
    java.lang.Class.newInstance(Class.java:308)
    com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:574)
    com.esotericsoftware.kryo.Serializer.newInstance(Serializer.java:72)
    com.esotericsoftware.kryo.serialize.FieldSerializer.readObjectData(FieldSerializer.java:178)
    com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:474)
TRACE 300092:
    java.lang.Class.getModifiers(Class.java:Unknown line)
    java.lang.reflect.Constructor.newInstance(Constructor.java:510)
    java.lang.Class.newInstance0(Class.java:355)
    java.lang.Class.newInstance(Class.java:308)
    com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:574)
    com.esotericsoftware.kryo.Serializer.newInstance(Serializer.java:72)
    com.esotericsoftware.kryo.serialize.FieldSerializer.readObjectData(FieldSerializer.java:178)
    com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:474)

So, all the time is being taken by Class#newInstance. Maybe Java's built-in serialization cheats at object creation? How do times change using ReflectionFactory/Objenesis? What does the HPROF output look like then?

I wonder if bytecode generation to avoid Class#newInstance would be worthwhile.

-Nate

Martin Grotzke

unread,
May 5, 2010, 9:15:34 AM5/5/10
to kryo-...@googlegroups.com
Hi Nate,

I just noticed that the bad numbers with Class.newInstance actually
were caused by the Person class being private. This caused exceptions
(with expensive fillInStackTrace) that were caught and handled via a
getDeclaredConstructor().newInstance()...

Changing the Person class to public (still keeping fields private)
changes it (with 1000 persons) for me to:

kryo time for ArrayList: 2247 ms (serialized size: 11003)
kryo time for ArrayList: 958 ms (serialized size: 11003)
kryo time for ArrayList: 518 ms (serialized size: 11003)
kryo time for ArrayList: 581 ms (serialized size: 11003)
kryo time for ArrayList: 595 ms (serialized size: 11003)
java time for ArrayList: 932 ms (serialized size: 16182)
java time for ArrayList: 748 ms (serialized size: 16182)
java time for ArrayList: 650 ms (serialized size: 16182)
java time for ArrayList: 559 ms (serialized size: 16182)
java time for ArrayList: 532 ms (serialized size: 16182)

The profiling output for public person class and 1000 persons looks like this:
1 31.19% 31.19% 68 300102
com.esotericsoftware.kryo.serialize.StringSerializer.get
2 23.39% 54.59% 51 300097 java.lang.String.<init>
3 12.39% 66.97% 27 300104 com.esotericsoftware.kryo.Kryo.newInstance
4 5.50% 72.48% 12 300105 java.lang.reflect.Field.set
5 4.59% 77.06% 10 300107
com.esotericsoftware.kryo.serialize.FieldSerializer.readObjectData
6 2.29% 79.36% 5 300086 java.util.Arrays.copyOfRange

According to this it seems there's no issue with public classes. The
question is if creating private classes could be supported better.
And I still wonder, why my memcached-session-manager serializer
benchmark shows such bad results for kryo, as there are no private
classes used (at least not by me directly AFAICS). Need to dig deeper
into this...

Cheers,
Martin

Nate

unread,
May 5, 2010, 5:23:13 PM5/5/10
to kryo-...@googlegroups.com
On Wed, May 5, 2010 at 6:15 AM, Martin Grotzke <martin....@googlemail.com> wrote:
I just noticed that the bad numbers with Class.newInstance actually
were caused by the Person class being private. This caused exceptions
(with expensive fillInStackTrace) that were caught and handled via a
getDeclaredConstructor().newInstance()...

The HPROF I posted was done with a public Person class with public fields. I see very similar HPROF results to what I posted when I use 1000 Persons.
 
Wait, duh... the default heap size is too small for your larger tests! :) Pass this to the VM:
 -Xms1024M -Xmx1024M
Now I see for 300:
kryo time for ArrayList: 118 ms (serialized size: 3302)
java time for ArrayList: 202 ms (serialized size: 4961)
For 1000:
kryo time for ArrayList: 516 ms (serialized size: 11002)
java time for ArrayList: 612 ms (serialized size: 16161)
For 10,000:
kryo time for ArrayList: 3018 ms (serialized size: 55002)
java time for ArrayList: 3142 ms (serialized size: 80161)
The bigger the object graph, a larger percentage of time is spent just creating the stuff and less time is spent on the serialization. This is why the differences are smaller with the monster graphs.

I still see Reflection.getClassAccessFlags as the bottleneck in Kryo serialization of this particular object graph.

-Nate



Nate

unread,
May 5, 2010, 6:21:46 PM5/5/10
to kryo-...@googlegroups.com
I just checked in better support for writing class name Strings when optional registration is used. It only writes the String on the first occurrence of a type in a graph. For each additional occurrence only an integer ordinal is written. Here is what I see for a 300 Person list with optional registration:

kryo time for ArrayList: 140 ms (serialized size: 3944)
java time for ArrayList: 203 ms (serialized size: 4961)

Here it is when the classes are registered, so you can see the both the size and speed overhead of not registering:

kryo time for ArrayList: 123 ms (serialized size: 3302)
java time for ArrayList: 204 ms (serialized size: 4961)

Of course, the size overhead will be a larger percentage of the total bytes with smaller object graphs.

-Nate

Martin Grotzke

unread,
May 6, 2010, 7:09:54 PM5/6/10
to kryo-...@googlegroups.com
I found out what the bottleneck was in the memcached-serializer
benchmark: ReflectionFactory.newConstructorForSerialization took all
the time (used in kryo-serializers KryoReflectionFactorySupport).

Caching constructors (created with newConstructorForSerialization)
changed deserialization times from
Min: 2848
Avg: 4539
Max: 6042
to
Min: 246
Avg: 315.6
Max: 515

-> Reduction to less than 10 percent!

Regarding your optimization for unregistered classes: this is really cool!

My numbers for 1000 persons:

registered before
kryo time for ArrayList: 595 ms (serialized size: 11003)
kryo time for ArrayList: 338 ms (serialized size: 11003)
kryo time for ArrayList: 327 ms (serialized size: 11003)

unregistered before
kryo time for ArrayList: 4498 ms (serialized size: 75003)
kryo time for ArrayList: 4079 ms (serialized size: 75003)
kryo time for ArrayList: 4096 ms (serialized size: 75003)

unregistered after
kryo time for ArrayList: 703 ms (serialized size: 13066)
kryo time for ArrayList: 388 ms (serialized size: 13066)
kryo time for ArrayList: 381 ms (serialized size: 13066)

-- this is a huge improvement!! Thanx!

I just ran the memcached-session-manager serializer benchmark again,
and the number are telling a great story now - guess who the winner
is! :-)
http://code.google.com/p/memcached-session-manager/wiki/SerializationStrategyBenchmark

Nate, thanx a lot for your great work!

Cheers,
Martin

Nate

unread,
May 6, 2010, 9:09:48 PM5/6/10
to kryo-...@googlegroups.com
Woohoo! Thanks for using the library! Nice charts! :D

-Nate

Vivek Kothari

unread,
Mar 18, 2014, 11:54:52 PM3/18/14
to kryo-...@googlegroups.com
Or simple you can use ThreadLocal, like ThreadLocal<Kryo> kryo = new ThreadLocal(){
protected T initialValue() {
        return new Kryo();
    }
}

and use the instance like kryo.get();
Reply all
Reply to author
Forward
0 new messages