Shortcuts removed, wiki updated

5,197 views
Skip to first unread message

David Yu

unread,
Apr 19, 2012, 6:49:11 AM4/19/12
to java-serializat...@googlegroups.com
Hi all,

Recents updates were pushed to remove shortcuts from kryo.
1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.
2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.
   Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).
   Although another lib, Wobly, also based on ByteBuffer (looks like kryo v1 from the outside, but it is mostly similar to protobuf with the size computation and schema evolution suppport) is capable of handling any dataset no matter how large.
   
Now that every serializer can handle overflows, its now possible to use a common buffer size.
I've updated some libs to use it as a starting point.
512 is the default (was previously 500 on ByteArrayOutputStream), which can be changed via 
system properties.  
Binary (except java-built-in/scala built-in) and json data are less than 512 bytes on media.1.cks, so there's no flushing/etc in the benchmark. Eventually we'll be able to measure the behavior of serializers when data does not fit in the buffer.

The results are updated to the wiki with the latest changes.  
I've tried to include a single utf8 character in the dataset but couldn't figure out how to make cks accept it.  
Maybe on the next run we can include it in the default dataset (with the help of kannan).

The results have not changed much, kryo-manual is still fastest.  I'm not even sure why those shortcuts were necessary. 
One thing I noticed is that wobly actually performs better when run on windows 7 (based from previous results).

Interestingly, with the shortcuts removed, smile/jackson/manual now seems to be on par with kryo.

--
When the cat is away, the mouse is alone.
- David Yu

Tatu Saloranta

unread,
Apr 19, 2012, 11:37:25 AM4/19/12
to java-serializat...@googlegroups.com
On Thu, Apr 19, 2012 at 3:49 AM, David Yu <david....@gmail.com> wrote:
> Hi all,
>
> Recents updates were pushed to remove shortcuts from kryo.
> 1. explicitly disables utf8 based from advanced knowledge of the content of
> the dataset.
> 2. uses the maximum possible buffer size (based from the biggest dataset) to
> avoid using an outputstream for flushing (shortcut is directly using its
> internal buffer since it knows everything fits). Other streaming libs
> (smile/json/etc) are not doing this.
>    Previously with kryo v1, this could not be alleviated as it was based on
> ByteBuffer (crashes on buffer overflow).
>    Although another lib, Wobly, also based on ByteBuffer (looks like kryo v1
> from the outside, but it is mostly similar to protobuf with the size
> computation and schema evolution suppport) is capable of handling any
> dataset no matter how large.

Ok. I did notice that Wobly is very fast as well, when added.

> Now that every serializer can handle overflows, its now possible to use a
> common buffer size.
> I've updated some libs to use it as a starting point.
> 512 is the default (was previously 500 on ByteArrayOutputStream), which can
> be changed via
> system properties.

Good idea.

> Binary (except java-built-in/scala built-in) and json data are less than 512
> bytes on media.1.cks, so there's no flushing/etc in the benchmark.
> Eventually we'll be able to measure the behavior of serializers when data
> does not fit in the buffer.
>
> The results are updated to the wiki with the latest changes.
> I've tried to include a single utf8 character in the dataset but couldn't
> figure out how to make cks accept it.
> Maybe on the next run we can include it in the default dataset (with the
> help of kannan).

Yes, this seems like a good idea as well.

> The results have not changed much, kryo-manual is still fastest.  I'm not
> even sure why those shortcuts were necessary.
> One thing I noticed is that wobly actually performs better when run on
> windows 7 (based from previous results).
>
> Interestingly, with the shortcuts removed, smile/jackson/manual now seems to
> be on par with kryo.

Another related thing is that a while ago I added alternate sequence
testing. However, it is only supported by a subset of serializers,
based on codecs that were easiest to change; some might need external
framing.
But it should be relatively easy to expand coverage.
I was hoping to find that Avro was more efficient with longer
sequences, although that did not seem to be the case for some reason.
But I think this might also show some differences between other
codecs; binary formats should benefit more due to size differences,
for example.

One thing that would benefit sequence tests most however would be some
way to generate variations of items; if this was possible, it would be
possible to run tests with multiple sequence lengths without having to
hand create different input sets.

-+ Tatu +-

Nate

unread,
Apr 19, 2012, 2:36:20 PM4/19/12
to java-serializat...@googlegroups.com
On Thu, Apr 19, 2012 at 3:49 AM, David Yu <david....@gmail.com> wrote:
Hi all,

Recents updates were pushed to remove shortcuts from kryo.
1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.
2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.
   Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).

The benchmark serialize method returns a byte[]. The most efficient way to do this with Kryo is to use Output by itself. Your changes cause the bytes to be written to a byte[] in Output, then unnecessarily copied to a ByteArrayOutputStream. The benchmark should not force a ByteArrayOutputStream to be used.
https://github.com/eishay/jvm-serializers/commit/fb52f09d24503808024b2a47d149ea6f0ec17769#L0R58
I request you revert the serialize() method to how it was before:
     public byte[] serialize (T content) {
      output.clear();
       kryo.writeObject(output, content);
      return output.toBytes();
     }
This works in exactly the same was as the ByteArrayOutputStream used by Serializer#outputStream(), the only difference is it avoids copying around the bytes. Output even extends OutputStream.

 
Now that every serializer can handle overflows, its now possible to use a common buffer size.
I've updated some libs to use it as a starting point.
512 is the default (was previously 500 on ByteArrayOutputStream), which can be changed via 
system properties.   
Binary (except java-built-in/scala built-in) and json data are less than 512 bytes on media.1.cks, so there's no flushing/etc in the benchmark. Eventually we'll be able to measure the behavior of serializers when data does not fit in the buffer.

Your changes to set a buffer size would have no effect with a size smaller than the data, since Serializer reuses the ByteArrayOutputStream by calling reset(). IMO, this is how it should be, as we are measuring the serializers, not how long it takes to allocate a buffer to hold the serialized bytes.


The results are updated to the wiki with the latest changes.  

I think we should use the latest Java to run the benchmark for the wiki.

-Nate

David Yu

unread,
Apr 19, 2012, 9:35:25 PM4/19/12
to java-serializat...@googlegroups.com
On Fri, Apr 20, 2012 at 2:36 AM, Nate <nathan...@gmail.com> wrote:
On Thu, Apr 19, 2012 at 3:49 AM, David Yu <david....@gmail.com> wrote:
Hi all,

Recents updates were pushed to remove shortcuts from kryo.
1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.
2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.
   Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).

The benchmark serialize method returns a byte[]. The most efficient way to do this with Kryo is to use Output by itself. Your changes cause the bytes to be written to a byte[] in Output, then unnecessarily copied to a ByteArrayOutputStream.
The benchmark should not force a ByteArrayOutputStream to be used.
https://github.com/eishay/jvm-serializers/commit/fb52f09d24503808024b2a47d149ea6f0ec17769#L0R58
I request you revert the serialize() method to how it was before:
Nope.  Why don't you read #2 again.  
Notice that before the change, all the libs in the benchmark are able to handle even if the data is 100x the buffer, except kryo, which crashes (apparently kryo v2 has same problems with v1).

The point is that there should be no bias/shortcuts based from direct knowledge of the content and size dataset.
That is why the buffer size will be provided by the benchmark, not the author.
Future runs will have the option to use a dataset whose size exceeds the buffer size provided.


     public byte[] serialize (T content) {
      output.clear();
       kryo.writeObject(output, content);
      return output.toBytes();
     }
This works in exactly the same was as the ByteArrayOutputStream used by Serializer#outputStream(), the only difference is it avoids copying around the bytes. Output even extends OutputStream.

 
Now that every serializer can handle overflows, its now possible to use a common buffer size.
I've updated some libs to use it as a starting point.
512 is the default (was previously 500 on ByteArrayOutputStream), which can be changed via 
system properties.   
Binary (except java-built-in/scala built-in) and json data are less than 512 bytes on media.1.cks, so there's no flushing/etc in the benchmark. Eventually we'll be able to measure the behavior of serializers when data does not fit in the buffer.

Your changes to set a buffer size would have no effect with a size smaller than the data, since Serializer reuses the ByteArrayOutputStream by calling reset(). IMO, this is how it should be, as we are measuring the serializers, 
not how long it takes to allocate a buffer to hold the serialized bytes. 


The results are updated to the wiki with the latest changes.  

I think we should use the latest Java to run the benchmark for the wiki.

-Nate


--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 19, 2012, 11:21:23 PM4/19/12
to java-serializat...@googlegroups.com
On Thu, Apr 19, 2012 at 6:35 PM, David Yu <david....@gmail.com> wrote:
On Fri, Apr 20, 2012 at 2:36 AM, Nate <nathan...@gmail.com> wrote:
On Thu, Apr 19, 2012 at 3:49 AM, David Yu <david....@gmail.com> wrote:
Hi all,

Recents updates were pushed to remove shortcuts from kryo.
1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.
2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.
   Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).

The benchmark serialize method returns a byte[]. The most efficient way to do this with Kryo is to use Output by itself. Your changes cause the bytes to be written to a byte[] in Output, then unnecessarily copied to a ByteArrayOutputStream.
The benchmark should not force a ByteArrayOutputStream to be used.
https://github.com/eishay/jvm-serializers/commit/fb52f09d24503808024b2a47d149ea6f0ec17769#L0R58
I request you revert the serialize() method to how it was before:
Nope.  Why don't you read #2 again.  
Notice that before the change, all the libs in the benchmark are able to handle even if the data is 100x the buffer, except kryo, which crashes (apparently kryo v2 has same problems with v1).

The point is that there should be no bias/shortcuts based from direct knowledge of the content and size dataset.
That is why the buffer size will be provided by the benchmark, not the author.
Future runs will have the option to use a dataset whose size exceeds the buffer size provided.

Use this...
this.output = new Output(BUFFER_SIZE, -1);
...so Output grows its buffer unbounded. Next time if you are going to change my code, read the javadocs. Now, revert the changes you made to the serialize() method in the Kryo benchmark. Thanks.

-Nate

David Yu

unread,
Apr 20, 2012, 1:41:37 AM4/20/12
to java-serializat...@googlegroups.com
Now there is another issue with re-use.
There is a difference between stateless and stateful re-use.
For example MsgPack re-uses its Packer/Unpacker, but on every iteration it completely resets all of the changes made such that the second run will have exactly the same state like it was the first run.

The benchmark results are basically based from a single run.  The only reason why we run iterations is to compute the average and use that as the basis for the one-single-run.

The problem with kryo is that the second run is completely different from the first run (stateful).  The buffers have been resized (you could call this sampling.  After the first run, it already knows the message can fit).  The second run and onwards no longer deals with overflows.

Compare this with the rest of the libs, on every iteration, they all need to flush/resize/handle overflows, because either the state is new ... or its completely reset (MsgPack)

Therefore, kryo's re-use is not valid at all. 
So thank you for reminding me to refactor the code even further to remove your re-use.
Next time, try not to blatantly take shortcuts.
 

-Nate

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 20, 2012, 11:36:56 AM4/20/12
to java-serializat...@googlegroups.com
We've already been over this. So far, no one else has agreed with you. Tatu accepted the patch for reuse of the ByteArrayOutputStream. We are measuring the serializers, not the growing of the ByteArrayOutputStream.

-Nate

David Yu

unread,
Apr 20, 2012, 1:05:52 PM4/20/12
to java-serializat...@googlegroups.com
This is unreleated to what we discussed before (plus your shortcuts went unnoticed at that time). 
This is about stateless/stateful re-use.
What's been recently stated in this thread are facts.
You don't even have a logical explanation to counter the previous statement.

If you're talking about buffer re-use, it is allowed (be it directly used or from external buffer cache).
Just look at your code, nobody changed that.

We are measuring the serializers, not the growing of the ByteArrayOutputStream.
That's not even the issue.  The issue is that kryo is having a different state from the first run.  If you can completely reset the state of your Input/Output (like MsgPack's Packer/Unpacker), there isn't a problem.

@Tatu, if you can chime in and share your thoughts on the previous statement, that be great.
 
 


-Nate

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 20, 2012, 1:07:40 PM4/20/12
to java-serializat...@googlegroups.com

I have no idea what you are talking about. Tatu has asked you many times to show the code you are talking about. Quit wasting our time.

 

David Yu

unread,
Apr 20, 2012, 1:32:38 PM4/20/12
to java-serializat...@googlegroups.com
Put it this way.  As I've said on the first post, we'll be able to measure the behavior/performance of serializers when data does not fit in the buffer..  With kryo's re-used output that internally resizes on the first run (test for correctness), it is basically exempted.
The two solutions are:
1.  Continue with re-use but use an outputstream, so it doesn't resize but flush (the current solution). 
2.  Skip the outputstream and allow it to resize, but use a new instance on every iteration.
Without that, kryo is exempted.
 
 


-Nate

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.



--
When the cat is away, the mouse is alone.
- David Yu

David Yu

unread,
Apr 20, 2012, 1:39:46 PM4/20/12
to java-serializat...@googlegroups.com
So you're avoiding the argument now?
Tatu has asked you many times to show the code you are talking about.
Again, what is mentioned in this thread is unrelated to the previous discussion. 
Quit wasting our time. 

 

@Tatu, if you can chime in and share your thoughts on the previous statement, that be great.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 20, 2012, 2:06:43 PM4/20/12
to java-serializat...@googlegroups.com

Kryo's Output class does EXACTLY the same thing that ByteArrayOutputStream does:
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L22
This is CORRECT behavior in both cases.

For the last time, when data doesn't fit in the buffer, the buffer grows. Measuring this is WORTHLESS.
 
The two solutions are:
1.  Continue with re-use but use an outputstream, so it doesn't resize but flush (the current solution). 

Serializer#serialize() returns a byte[]. It doesn't force usage of a ByteArrayOutputStream. Kryo can avoid using ByteArrayOutputStream by using its own Output class, which works in the same way.
 
2.  Skip the outputstream and allow it to resize, but use a new instance on every iteration.
Without that, kryo is exempted.

This is a community effort, so thankfully you alone do not get to decide what makes a library exempt.

Nate

unread,
Apr 20, 2012, 2:23:32 PM4/20/12
to java-serializat...@googlegroups.com

And so you refuse to make it clear what you are talking about? When asked for clarification, you just continue your same drivel. I am all for discussing issues in an adult manner, but you continuously cast blame and make claims that I am trying to maliciously deceive. When you don't like the results, you make changes to Kryo's benchmarks that are suboptimal and you update the wiki before the group can discuss your changes. Your continued argumentativeness is petty and childish.

It seems this project can no longer be sanely managed as a group. I vote that we have a single developer that manages the main source repository. Contributions would occur as git forks. The developer managing the project would discuss changes with the group as needed. I nominate Kannan to manage the project.

-Nate

David Yu

unread,
Apr 20, 2012, 10:36:53 PM4/20/12
to java-serializat...@googlegroups.com
Except that kryo doesn't because the growth is permanent once the first run is done.
So ultimately, the buffer size being assigned by the benchmark is useless for kryo.
Measuring this is WORTHLESS.
Really?  Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size). 

====== when kryo internally resizes buffers after the first run, which also effectively ignores the buffer size provided by benchmark

./run -trials=500 -include=java-manual,kryo,wobly data/media.3.cks 
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-manual                     136    7843    7684    3975    4039    4170   12013   1596   255
kryo                                134    3855    3708    4468    4480    4541    8396   1573   254
wobly                                86   11227   11012    3560    3563    3617   14844   1604   275

// Here's the code:
protected final Input input = new Input(BUFFER_SIZE);
// as author suggested
protected final Output output = new Output(BUFFER_SIZE, Integer.MAX_VALUE); // same as: new Output(BUFFER_SIZE, -1)

public T deserialize (byte[] array) {
input.setBuffer(array);
return kryo.readObject(input, type);
}

public byte[] serialize (T content) {
output.clear();
kryo.writeObject(output, content);
return output.toBytes();
}

====== when kryo actually uses the buffer assigned by the benchmark, and with the same state on every run

./run -trials=500 -include=java-manual,kryo,wobly data/media.3.cks 
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-manual                    137    7765    7619    3881    4101    4091   11856   1596   255
kryo                                134    5953    5934    4704    4775    4847   10800   1573   254
wobly                                89   11116   10902    3530    3572    3585   14701   1604   275

// Here's the code:
protected final byte[] buffer = new byte[BUFFER_SIZE]; // buffer assigned by the benchmark, re-used

public T deserialize (byte[] array) {
return kryo.readObject(new Input(array), type);
}

public byte[] serialize (T content) {
   Output output = new Output(buffer, Integer.MAX_VALUE);
   kryo.writeObject(output, content);
   return output.toBytes();
}


Finally, for a non-stream based serializer like wobly, one could improvise (retain state like kryo) and save the computed size of the message on the first run.  The second run onwards, it already knows the size, so it doesn't have to compute anymore and directly serialize to a byte array (which could be like 50-100% speed increase for wobly).  

Now don't tell me its worthless because clearly, the results prove it.
 
 
The two solutions are:
1.  Continue with re-use but use an outputstream, so it doesn't resize but flush (the current solution). 

Serializer#serialize() returns a byte[]. It doesn't force usage of a ByteArrayOutputStream. Kryo can avoid using ByteArrayOutputStream by using its own Output class, which works in the same way.
 
2.  Skip the outputstream and allow it to resize, but use a new instance on every iteration.
Without that, kryo is exempted.

This is a community effort, so thankfully you alone do not get to decide what makes a library exempt.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Kannan Goundan

unread,
Apr 21, 2012, 3:22:41 AM4/21/12
to java-serializat...@googlegroups.com
Unfortunately, I don't have the time to be a responsive project maintainer.  I think, though, that we can get things back on track with our usual mailing-list-consensus-based ways once we get a little more process set up.

I think one issue is that the results page is very high stakes.  For most people, the bar charts we publish are taken to be the final word on JVM serializer performance.  Internet People cite them all the time.

<irrelevant-aside>I think this is unfortunate.  As someone who spent weeks reworking the codebase, I got to see in great detail just how flawed these benchmarks are.  Just the fact that we test a single, limited data value, yet provide results with four significant figures is absurd!  This isn't anyone's fault; I'm just trying to point out the discrepancy with the accuracy of our results and (my perception of) how much The Internet bases decisions off our graphs.  I think we're better than the other comparisons out there (like the lolgraph on MsgPack's website), but benchmarking is so hard to get right.</irrelevant-aside>

Anyway, the stakes are high enough that maybe we should push new results to a "staging" URL and give everyone a week to scrutinize them before publishing to the main URL.

Second, I agree that we should probably write up some "rules".  When I first made a pass through the project I fixed the discrepancies that were obvious, but we're now dealing with more subtle stuff.  Things that may seem right to you might not be to someone else, to the point that it seems like the other person is being malicious.

For example, this particular thread included (among other things) whether we should count buffer resizing in the serialization time.  I can see Nate's high-level point about not counting things that may not happen in real usage.  For a long-running process, maybe the buffer gets resized on the first few sends and then doesn't hit the limit ever again (though I haven't quite digested what's going on in David's last message...).  But even if counting resize time is a bad idea, it's how we measure the other tools, so it's still not fair to publish results without fixing up the others.

So lets try writing everything down.  When some code looks shady, figure out what written rule it breaks and point it out.  If it's doing something questionable but there's no written rule to prevent it, lets then discuss and make a new rule.  A side benefit is that we'll have a detailed documentation of our testing methodology that we can put on the wiki (so people don't have to rummage through the source code for this information).

To start, does anyone want to take a stab at formalizing the buffering rules?

David Yu

unread,
Apr 21, 2012, 3:50:44 AM4/21/12
to java-serializat...@googlegroups.com
On Sat, Apr 21, 2012 at 3:22 PM, Kannan Goundan <kan...@cakoose.com> wrote:
Unfortunately, I don't have the time to be a responsive project maintainer.  I think, though, that we can get things back on track with our usual mailing-list-consensus-based ways once we get a little more process set up.

I think one issue is that the results page is very high stakes.  For most people, the bar charts we publish are taken to be the final word on JVM serializer performance.  Internet People cite them all the time.

<irrelevant-aside>I think this is unfortunate.  As someone who spent weeks reworking the codebase, I got to see in great detail just how flawed these benchmarks are.  Just the fact that we test a single, limited data value, yet provide results with four significant figures is absurd!  This isn't anyone's fault; I'm just trying to point out the discrepancy with the accuracy of our results and (my perception of) how much The Internet bases decisions off our graphs.  I think we're better than the other comparisons out there (like the lolgraph on MsgPack's website),
Yep.  Their dataset was crafted to make msgpack look good.
but benchmarking is so hard to get right.</irrelevant-aside>

Anyway, the stakes are high enough that maybe we should push new results to a "staging" URL and give everyone a week to scrutinize them before publishing to the main URL.

Second, I agree that we should probably write up some "rules".  When I first made a pass through the project I fixed the discrepancies that were obvious, but we're now dealing with more subtle stuff.  Things that may seem right to you might not be to someone else, to the point that it seems like the other person is being malicious.

For example, this particular thread included (among other things) whether we should count buffer resizing in the serialization time.  I can see Nate's high-level point about not counting things that may not happen in real usage.  For a long-running process, maybe the buffer gets resized on the first few sends and then doesn't hit the limit ever again (though I haven't quite digested what's going on in David's last message...).  But even if counting resize time is a bad idea, it's how we measure the other tools, so it's still not fair to publish results without fixing up the others.
100% agree.  I've been trying to point that out.

So lets try writing everything down.  When some code looks shady, figure out what written rule it breaks and point it out.  If it's doing something questionable but there's no written rule to prevent it, lets then discuss and make a new rule.  A side benefit is that we'll have a detailed documentation of our testing methodology that we can put on the wiki (so people don't have to rummage through the source code for this information).
Great idea.

Tatu Saloranta

unread,
Apr 21, 2012, 1:31:43 PM4/21/12
to java-serializat...@googlegroups.com
I don't feel I have the full picture, but for what it is worth here
are related thoughts.

- Sounds like handling of ByteArrayOutputStream (external to codecs)
was not problematic, and we can ignore it (originally I assumed you
meant that was problematic)
- I agree in that further runs should not be based on assuming/knowing
that further iterations provide same data.

I think our best chance is to focus on two things:

- creating "fully automatic" subset, which should avoid many of
disputed techniques
- trying to find a way to create permutations for different runs, so
that tests would exhibit some level of variation.

One more comment on "manual" tests (where I think all of us
occasionally get overzealous with optimizations, and/or disagree
most): these were, I think, originally created mostly because there
were no good fully-automated providers.
With XML, for example, such solutions tended to have
disproportionately high overhead -- but even there, adding JAXB at
this point would solve the issue, as it can use fastest parsers, and
does not have more than maybe 50% overhead (compared to XStream and
others that have steeper).

Put another way, I think manual variants are less necessary due to
coverage. In fact, moving tree-based and manual variants to completely
separate suite (but with comparable throughputs, so one can compare if
need be).

I know this does not help resolve the specific issue, but I feel that
we would do better if we stepped back and considered bigger picture.
And once we are done with "bigger" changes, we can solve detail
issues.

I don't mean to belittle the question of fairness -- which is
fundamental with comparison -- but sometimes best way to solve a
specific problem is not full frontal assault, but by outmaneuvering
the thing.

-+ Tatu +-

David Yu

unread,
Apr 21, 2012, 2:05:12 PM4/21/12
to java-serializat...@googlegroups.com
Yea, we basically had a misunderstanding.
- I agree in that further runs should not be based on assuming/knowing
that further iterations provide same data.
Agreed. 

I think our best chance is to focus on two things:

- creating "fully automatic" subset
I'm ok with that.
, which should avoid many of
disputed techniques
- trying to find a way to create permutations for different runs, so
that tests would exhibit some level of variation.
I'm ok with that as well.
The real problem is someone actually implementing it hehe. 

One more comment on "manual" tests (where I think all of us
occasionally get overzealous with optimizations, and/or disagree
most): these were, I think, originally created mostly because there
were no good fully-automated providers.
I actually don't have problems with the optimizations when you told me fastjson was doing the same internally.
So, its all good.
With XML, for example, such solutions tended to have
disproportionately high overhead -- but even there, adding JAXB at
this point would solve the issue, as it can use fastest parsers, and
does not have more than maybe 50% overhead (compared to XStream and
others that have steeper).

Put another way, I think manual variants are less necessary due to
coverage. In fact, moving tree-based and manual variants to completely
separate suite (but with comparable throughputs, so one can compare if
need be).

I know this does not help resolve the specific issue, but I feel that
we would do better if we stepped back and considered bigger picture.
And once we are done with "bigger" changes, we can solve detail
issues.

I don't mean to belittle the question of fairness -- which is
fundamental with comparison -- but sometimes best way to solve a
specific problem is not full frontal assault, but by outmaneuvering
the thing.

-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Tatu Saloranta

unread,
Apr 21, 2012, 2:08:37 PM4/21/12
to java-serializat...@googlegroups.com
On Fri, Apr 20, 2012 at 7:36 PM, David Yu <david....@gmail.com> wrote:
>
>
> On Sat, Apr 21, 2012 at 2:06 AM, Nate <nathan...@gmail.com> wrote:
...
This does seem odd. For deserialization, difference is significant but
not earth shattering. But serialization difference is huge.
What would explain this level of difference? Is there any possibility
it could be related to JVM warmup oddities -- I have observed that the
first entries tend to get preferentially treated (I guess it is due to
class unloading when speculative inlining was done due to assumption
of no sub-classing etc).

-+ Tatu +-

David Yu

unread,
Apr 21, 2012, 3:03:57 PM4/21/12
to java-serializat...@googlegroups.com
In the first code, the buffer doesn't resize (well, it actually resized only on the first non-measured run).  
That is the performance you expect from kryo when everything fits in the buffer.

The second code, that is the performance you get from kryo when message size exceeds buffer size.

Why the difference is that huge, I'm not really sure.  First thing that comes to mind is how efficient the library handles buffer overflow.

Is there any possibility
it could be related to JVM warmup oddities --
I've just pushed the code that above (clean state).  Try it yourself.
The wiki has been updated.
I now reflects the change of kryo not using an outputstream to produce a byte array (as per nate).
I have observed that the
first entries tend to get preferentially treated (I guess it is due to
class unloading when speculative inlining was done due to assumption
of no sub-classing etc).

-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 26, 2012, 5:30:22 PM4/26/12
to java-serializat...@googlegroups.com
The discussion about buffer sizes has been going in circles. Let's step back a bit and clearly define the issues and potential fixes. We should be able to discuss without being impolite. No one is trying to be malicious or misrepresent results. We seem to have a misunderstanding and we need to focus on discussing it productively.


On Fri, Apr 20, 2012 at 7:36 PM, David Yu <david....@gmail.com> wrote:
Kryo's Output class does EXACTLY the same thing that ByteArrayOutputStream does:
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L22
This is CORRECT behavior in both cases. 

For the last time, when data doesn't fit in the buffer, the buffer grows.
Except that kryo doesn't because the growth is permanent once the first run is done.
So ultimately, the buffer size being assigned by the benchmark is useless for kryo.

My point is that the ByteArrayOutputStream provided by Serializer...
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L19
...is reused in Serializer#outputStream and Serializer#outputStreamForList. These methods are used by many serializers that need an OutputStream, eg java-manual. Because the same ByteArrayOutputStream is reset() and reused, the backing buffer will only grow on the first serialization, and will never grow afterward. This is the same behavior as Kryo reusing an Output instance. Note I am not yet judging whether this is right or wrong, just pointing it out.

Hopefully now you understand why I disagreed with your changes, as you have made Kryo allocate a new buffer each time, while java-manual and others reuse the same buffer. You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.

Here are the numbers for java-manual with and without buffer reuse, running on Sun's Java 1.7.0_03:
run -trials=500 -include=java-manual data/media.3.cks

                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-manual WITHOUT buffer reuse:       80    3711    3725    1728    1797    1859    5570   1596   255
java-manual WITH buffer reuse:    82    3487    3409    1755    1854    1878    5364   1596   255

And here is the same for Kryo:
run -trials=500 -include=kryo data/media.3.cks

                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
Kryo WITHOUT buffer reuse:              80    3212    3120    2620    2681    2718    5930   1573   254
Kryo WITH buffer reuse:           81    2074    1985    2756    2725    3493    5566   1573   254

Interesting that the difference I see for Kryo is less pronounced than on your machine, but still a pretty big difference.

 
Measuring this is WORTHLESS.
Really?  Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size). 

I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer. In my mind, we should be measuring the serializers' code and we should exclude as much noise as possible. If we include growing of the buffer in our timings, it makes the relative difference between timings smaller. For an extreme example, if we added 10 seconds to all results, it would appear as if there was very little difference between all the serializers.

Does everyone understand the buffering reuse issue?

Here are some options I think are worth discussing, numbered for convenience:

1) Require serializers to start each serialization with a buffer of size Serializer.BUFFER_SIZE. Each serialization will include any growing of the buffer.

1a) What would be a reasonable size? FWIW, BufferedInputStream uses 8192.

2) Allow serializers to reuse the same buffer. This means that the first serialization may grow the buffer, and subsequent serializations can reuse this buffer which is known to be big enough.

2a) After serialization, most buffers allocate a new byte[] and copy out the bytes, like ByteArrayOutputStream#toByteArray() and Kryo's Output#toBytes(). A serializer could get a speed boost by avoiding this allocation and copy by returning the backing byte[], since it knows EXACTLY how big it should be beforehand. This seems somewhat sleazy, as it doesn't parallel real world usage, where objects are extremely unlikely to all be the same size.

3) Force serializers to serialize into a byte[] provided by the framework. The buffer size would have to be large enough, but this isn't much of an issue as it can be specified. The method would write bytes to the array starting at zero and would return the number of bytes written, so it would change from...
public abstract byte[] serialize(S content) throws Exception;
...to...
public abstract int serialize(S content, byte[] buffer) throws Exception;

3a) #3 is basically the same as #1, but with a buffer size known to be large enough. I think I prefer #3, as #1 could silently grow the buffer and negatively affect the results without anyone noticing.

Nate

unread,
Apr 26, 2012, 5:38:58 PM4/26/12
to java-serializat...@googlegroups.com
On Sat, Apr 21, 2012 at 12:22 AM, Kannan Goundan <kan...@cakoose.com> wrote:
I think one issue is that the results page is very high stakes.  For most people, the bar charts we publish are taken to be the final word on JVM serializer performance.  Internet People cite them all the time.
[snip]

Anyway, the stakes are high enough that maybe we should push new results to a "staging" URL and give everyone a week to scrutinize them before publishing to the main URL.

I agree. We can use a new wiki page "Tentative Results"  (or whatever) for discussion and review before updating the main page.
 
Second, I agree that we should probably write up some "rules".  When I first made a pass through the project I fixed the discrepancies that were obvious, but we're now dealing with more subtle stuff.  Things that may seem right to you might not be to someone else, to the point that it seems like the other person is being malicious.

For example, this particular thread included (among other things) whether we should count buffer resizing in the serialization time.  I can see Nate's high-level point about not counting things that may not happen in real usage. 

Close, but not my exact point. I'd rather exclude everything we can that is outside of the serializer's code, since the serializer's code is really what we are trying to measure.
 
For a long-running process, maybe the buffer gets resized on the first few sends and then doesn't hit the limit ever again (though I haven't quite digested what's going on in David's last message...).  But even if counting resize time is a bad idea, it's how we measure the other tools, so it's still not fair to publish results without fixing up the others.

So lets try writing everything down.  When some code looks shady, figure out what written rule it breaks and point it out.  If it's doing something questionable but there's no written rule to prevent it, lets then discuss and make a new rule.  A side benefit is that we'll have a detailed documentation of our testing methodology that we can put on the wiki (so people don't have to rummage through the source code for this information).

To start, does anyone want to take a stab at formalizing the buffering rules?

I just posted some options for discussion on the buffer reuse issue.

-Nate

Nate

unread,
Apr 26, 2012, 11:00:14 PM4/26/12
to java-serializat...@googlegroups.com
On Thu, Apr 26, 2012 at 2:30 PM, Nate <nathan...@gmail.com> wrote:
Measuring this is WORTHLESS.
Really?  Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size). 

I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer. In my mind, we should be measuring the serializers' code and we should exclude as much noise as possible. If we include growing of the buffer in our timings, it makes the relative difference between timings smaller. For an extreme example, if we added 10 seconds to all results, it would appear as if there was very little difference between all the serializers.

Here it is differently. Let's say that we know A is exactly 10x as fast as B (A is 100ms, B is 1000ms).

Now we're going to add an overhead of 250ms:
benchmark of A = 100ms + 250ms = 350ms
benchmark of B = 1000ms + 250ms = 1250ms
-> conclusion: "A is 3.57x as fast as B"

Let's instead add an overhead of 100ms:
benchmark of A = 100ms + 100ms = 200ms
benchmark of B = 1000ms + 100ms = 1100ms
-> conclusion: "A is 5.50x as fast as B"

Finally, add an overhead of 10000ms:
benchmark of A = 100ms + 10000ms = 10100ms
benchmark of B = 1000ms + 10000ms = 11000ms
-> conclusion: "A and B are about as fast"

As you can see, minimizing the overhead is important. We want to compare the serializers' code. Any overhead that we can avoid should be avoided, else it can dramatically affect the results.

-Nate

David Yu

unread,
Apr 26, 2012, 11:45:22 PM4/26/12
to java-serializat...@googlegroups.com
On Fri, Apr 27, 2012 at 5:30 AM, Nate <nathan...@gmail.com> wrote:
The discussion about buffer sizes has been going in circles. Let's step back a bit and clearly define the issues and potential fixes. We should be able to discuss without being impolite.
You started it.
No one is trying to be malicious or misrepresent results.
Hopefully that is the truth.  Whether it was malicious/intentional or not, it was still a shortcut.
We seem to have a misunderstanding and we need to focus on discussing it productively.


On Fri, Apr 20, 2012 at 7:36 PM, David Yu <david....@gmail.com> wrote:
Kryo's Output class does EXACTLY the same thing that ByteArrayOutputStream does:
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L22
This is CORRECT behavior in both cases. 

For the last time, when data doesn't fit in the buffer, the buffer grows.
Except that kryo doesn't because the growth is permanent once the first run is done.
So ultimately, the buffer size being assigned by the benchmark is useless for kryo.

My point is that the ByteArrayOutputStream provided by Serializer...
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L19
...is reused in Serializer#outputStream and Serializer#outputStreamForList. These methods are used by many serializers that need an OutputStream, eg java-manual. Because the same ByteArrayOutputStream is reset() and reused, the backing buffer will only grow on the first serialization, and will never grow afterward. This is the same behavior as Kryo reusing an Output instance.
Note I am not yet judging whether this is right or wrong, just pointing it out.

Hopefully now you understand why I disagreed with your changes, as you have made Kryo allocate a new buffer each time
Wrong.
What part of this don't you understand:
 protected final byte[] buffer = new byte[BUFFER_SIZE];

public byte[] serialize (T content) {
   Output output = new Output(buffer, Integer.MAX_VALUE);
   kryo.writeObject(output, content);
   return output.toBytes();
}
Is that "new buffer each time"?  Please check the code first next time.
, while java-manual and others reuse the same buffer. 
There's a difference between buffer from OutputStream and the internal buffer used by the library. (which tatu acknowledged)
Some of the libraries here don't even have the luxury of re-using the internal buffer for each run.
Ultimately, they still use (or re-use) the internal buffer to flush to the outputstream.
If the data is 5x bigger, they flush 5x.  
If not using a stream, you resize/expand it x times (depends on the algorithm) or simply pre-compute the data size.
Now kryo wants to take a shortcut and avoid all that by persisting the resized buffers from the first run.
The stream-based and non-stream based serializers aren't gonna be happy with that.

You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.
The wiki results are based from media.1.cks.  No one is at a disadvantage because everything fits it the buffer (well, except for java-built-in/scala-built-in).  Stop BSing.

Here are the numbers for java-manual with and without buffer reuse, running on Sun's Java 1.7.0_03:
run -trials=500 -include=java-manual data/media.3.cks

                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-manual WITHOUT buffer reuse:       80    3711    3725    1728    1797    1859    5570   1596   255
java-manual WITH buffer reuse:    82    3487    3409    1755    1854    1878    5364   1596   255

And here is the same for Kryo:
run -trials=500 -include=kryo data/media.3.cks

                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
Kryo WITHOUT buffer reuse:              80    3212    3120    2620    2681    2718    5930   1573   254
Kryo WITH buffer reuse:           81    2074    1985    2756    2725    3493    5566   1573   254

Interesting that the difference I see for Kryo is less pronounced than on your machine, but still a pretty big difference.
I ran it on my 2.66ghz Core2Quad ubuntu dev machine.  I imagine it will only be much worse when run on slower machines (especially virtualized ones)

 
Measuring this is WORTHLESS.
Really?  Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size). 

I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer.
Are we not publishing results from media.1.cks? (which fits inside 512)
Just like media.2.cks, media.3.cks is also there to keep the libraries honest.
It also answers the question:  
What if the data cannot fit in the buffer, how will the library behave?

We're not publishing results other than media.1.cks.
I don't see a problem here.
In my mind, we should be measuring the serializers' code and we should exclude as much noise as possible. If we include growing of the buffer in our timings, it makes the relative difference between timings smaller. For an extreme example, if we added 10 seconds to all results, it would appear as if there was very little difference between all the serializers.

Does everyone understand the buffering reuse issue?

Here are some options I think are worth discussing, numbered for convenience:

1) Require serializers to start each serialization with a buffer of size Serializer.BUFFER_SIZE. Each serialization will include any growing of the buffer.

1a) What would be a reasonable size? FWIW, BufferedInputStream uses 8192.

2) Allow serializers to reuse the same buffer. This means that the first serialization may grow the buffer, and subsequent serializations can reuse this buffer which is known to be big enough.

2a) After serialization, most buffers allocate a new byte[] and copy out the bytes, like ByteArrayOutputStream#toByteArray() and Kryo's Output#toBytes(). A serializer could get a speed boost by avoiding this allocation and copy by returning the backing byte[], since it knows EXACTLY how big it should be beforehand. This seems somewhat sleazy, as it doesn't parallel real world usage, where objects are extremely unlikely to all be the same size.

3) Force serializers to serialize into a byte[] provided by the framework. The buffer size would have to be large enough, but this isn't much of an issue as it can be specified. The method would write bytes to the array starting at zero and would return the number of bytes written, so it would change from...
public abstract byte[] serialize(S content) throws Exception;
...to...
public abstract int serialize(S content, byte[] buffer) throws Exception;

3a) #3 is basically the same as #1, but with a buffer size known to be large enough. I think I prefer #3, as #1 could silently grow the buffer and negatively affect the results without anyone noticing.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 27, 2012, 12:15:46 AM4/27/12
to java-serializat...@googlegroups.com
On Thu, Apr 26, 2012 at 8:45 PM, David Yu <david....@gmail.com> wrote:
On Fri, Apr 27, 2012 at 5:30 AM, Nate <nathan...@gmail.com> wrote:
The discussion about buffer sizes has been going in circles. Let's step back a bit and clearly define the issues and potential fixes. We should be able to discuss without being impolite.
You started it.

Wow. We are trying to have an adult discussion here.
 

Hopefully now you understand why I disagreed with your changes, as you have made Kryo allocate a new buffer each time
Wrong.
What part of this don't you understand:
 protected final byte[] buffer = new byte[BUFFER_SIZE];

public byte[] serialize (T content) {
   Output output = new Output(buffer, Integer.MAX_VALUE);
   kryo.writeObject(output, content);
   return output.toBytes();
}
Is that "new buffer each time"?  Please check the code first next time.

Yes I did see you had changed the code again after your initial change.
 
, while java-manual and others reuse the same buffer. 
There's a difference between buffer from OutputStream and the internal buffer used by the library. (which tatu acknowledged)

No, there is no difference. See the Serializer#serialize method declaration:

public abstract byte[] serialize(S content) throws Exception;
Internally, the code can do whatever it likes. If it wants to use OutputStream from Serializer#outputStream(), it can do that. If it wants to use its own buffer equivalent to a ByteArrayOutputStream, it can do that. If Serializer#outputStream() is going to reuse the ByteArrayOutputStream, then whatever internal buffer a serializer uses should also be allowed to reuse its buffer.

Some of the libraries here don't even have the luxury of re-using the internal buffer for each run.
Ultimately, they still use (or re-use) the internal buffer to flush to the outputstream.
If the data is 5x bigger, they flush 5x.  

This gives me the impression you misunderstand how the OutputStream from Serializer#outputStream is used. When writing to the stream, the serializer doesn't care if the data is 5x bigger than the buffer size. It does not flush 5x, it just writes the data. Internally, the ByteArrayOutputStream grows its byte[] during the first run and then it never grows again.
 
If not using a stream, you resize/expand it x times (depends on the algorithm) or simply pre-compute the data size.
Now kryo wants to take a shortcut and avoid all that by persisting the resized buffers from the first run.

There is no shortcut, Kryo's Output class works exactly the same as the OutputStream from Serializer#outputStream.
 

You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.
The wiki results are based from media.1.cks.  No one is at a disadvantage because everything fits it the buffer (well, except for java-built-in/scala-built-in).  Stop BSing.

Regarding your "Stop BSing" comment, if you continue to be impolite, I will cease discussion with you and I expect others will do the same.

In your initial update to the wiki, you changed Kryo to allocate a new buffer each time and you updated the wiki with the results. This is a non-issue now anyway, as we will being using a "staging" wiki page so that updates can be reviewed and discussed.


I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer.
Are we not publishing results from media.1.cks? (which fits inside 512)
Just like media.2.cks, media.3.cks is also there to keep the libraries honest.
It also answers the question:  
What if the data cannot fit in the buffer, how will the library behave?
 
We're not publishing results other than media.1.cks.
I don't see a problem here.

The current code does not answer that question, because most of the serializers use Serializer#outputStream, which reuses the buffer. Your changes have made Kryo not reuse the buffer, so it will grow each time, and therefore it is not fair for media.3.cks or any other test where the output is > 512. We don't publish those results, but it is still unfair.

Besides that, I don't believe this is an important question to ask. It only adds overhead that skews the results.

-Nate

David Yu

unread,
Apr 27, 2012, 6:34:20 AM4/27/12
to java-serializat...@googlegroups.com
Here are the libraries that I'm talking about.
They do not write to stream directly, instead they write everything to an internal buffer and flushes on overflow:
 - hessian, jackson (json/smile/etc), woodstox, aalto, kryo, etc
Last time I checked, kryo is a stream-based serializer (unless you're protobuf-based like Wobly with the size computation).  Am I wrong?
The only difference is that these libraries I mentioned actually do flush when size is bigger than buffer.
Since you want to avoid using an OutputStream, then you resize/expand like the rest of the other libs.

Even the writer-based serializers that use an OutputStreamWriter, effectively use an internal buffer via sun's StreamEncoder and a char buffer for strings.
Internally, the ByteArrayOutputStream grows its byte[] during the first run and then it never grows again.
 
If not using a stream, you resize/expand it x times (depends on the algorithm) or simply pre-compute the data size.
Now kryo wants to take a shortcut and avoid all that by persisting the resized buffers from the first run.

There is no shortcut, Kryo's Output class works exactly the same as the OutputStream from Serializer#outputStream.
It was you who suggested and did the change to re-use the OutputStream.(I did not agree with that since not everyone is using OutputStream, and they aren't ever re-used in the real usecase).  Other libs not using a stream still go through the resize/expand/re-compute hoopla, so no one is exempted.

Its better to revert that change you made to re-use OutputStream, so you'll have no arguments.  
You created that simply to support your own agenda of taking shortcuts.
I've also noticed that java-manual looks faster again when we allocate new ByteArrayOutputStream(500) like the last time (not sure the reason why).

------ new ByteArrayOutputStream(500)
./run -trials=500 -include=java-manual data/media.1.cks 
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-manual                         136    1735    1616    1331    1375    1401    3136    255   147

------ re-used/reset
./run -trials=500 -include=java-manual data/media.1.cks 
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-manual                         134    3783    3669    1301    1384    1442    5225    255   147

With java-built-in:
------ new ByteArrayOutputStream(500)
./run -trials=500 -include=java-built-in,java-manual data/media.1.cks 
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-built-in                       135   12668   11448   66965   67592   67731   80399    889   514
java-manual                         135    2059    1915    1304    1333    1419    3478    255   147

------ re-used/reset
./run -trials=500 -include=java-built-in,java-manual data/media.1.cks 
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal   +deep   total   size  +dfl
java-built-in                       133   13303   12046   67256   67751   68112   81415    889   514
java-manual                         135    3764    3591    1311    1361    1404    5168    255   147

 

You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.
The wiki results are based from media.1.cks.  No one is at a disadvantage because everything fits it the buffer (well, except for java-built-in/scala-built-in).  Stop BSing.

Regarding your "Stop BSing" comment, if you continue to be impolite, I will cease discussion with you and I expect others will do the same.

In your initial update to the wiki, you changed Kryo to allocate a new buffer each time and you updated the wiki with the results.
This is a non-issue now anyway, as we will being using a "staging" wiki page so that updates can be reviewed and discussed.


I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer.
Are we not publishing results from media.1.cks? (which fits inside 512)
Just like media.2.cks, media.3.cks is also there to keep the libraries honest.
It also answers the question:  
What if the data cannot fit in the buffer, how will the library behave?
 
We're not publishing results other than media.1.cks.
I don't see a problem here.

The current code does not answer that question, because most of the serializers use Serializer#outputStream, which reuses the buffer. Your changes have made Kryo not reuse the buffer, so it will grow each time, and therefore it is not fair for media.3.cks or any other test where the output is > 512. We don't publish those results, but it is still unfair.

Besides that, I don't believe this is an important question to ask. 
It only adds overhead that skews the results.

-Nate

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Tatu Saloranta

unread,
Apr 27, 2012, 1:34:20 PM4/27/12
to java-serializat...@googlegroups.com
Quick comment -- wish I had time for full comments but....

On Thu, Apr 26, 2012 at 9:15 PM, Nate <nathan...@gmail.com> wrote:
> On Thu, Apr 26, 2012 at 8:45 PM, David Yu <david....@gmail.com> wrote:
>>
>> On Fri, Apr 27, 2012 at 5:30 AM, Nate <nathan...@gmail.com> wrote:
>>>
>>> The discussion about buffer sizes has been going in circles. Let's step
>>> back a bit and clearly define the issues and potential fixes. We should be
>>> able to discuss without being impolite.
>>
>> You started it.
>
>
> Wow. We are trying to have an adult discussion here.
>
>>
>>
>>> Hopefully now you understand why I disagreed with your changes, as you
>>> have made Kryo allocate a new buffer each time
>>
>> Wrong.
>> What part of this don't you understand:
>>  protected final byte[] buffer = new byte[BUFFER_SIZE];
>>
>> public byte[] serialize (T content) {
>>    Output output = new Output(buffer, Integer.MAX_VALUE);
>>    kryo.writeObject(output, content);
>>    return output.toBytes();
>> }

I think I actually understand more now; and I can see two points here.

First: use of ByteArrayOutputStream. There are (IMO) two ways it is
(or has been?) used:

1. By shared test code passing it out directly, something like:
(I forget exact names, but this should suffice)

ByteArrayOutputStream reused = ...
serializer.writeValue(value, reused);
2. By serializer sub-classes that explicitly ask for instance, to
produce byte[]:
class MySerializer ... {
public byte[] write(T value) {
ByteArrayOutputStream out = super.getOutputStream();
....
}

Of these, I think (1) is correct, fair, and non-problematic. Case (2).
however, is somewhat problematic, because it can potentially differ
between cases, and requires special handling by implementation.

I think Nate is pointing to 2, and saying that this is very similar to
what Kryo serializer will do, just using different mechanism. I agree
with this to some point; although I think the way it is done is bit
problematic just because of the way byte[] and ByteArrayOutputStream
differ (one can not change state of byte[]).
Ideally I think Kryo's Output class would handle reuse automatically,
and if so, I would have absolutely no problem with it.
I am not 100% sure what to think of holding a reference to passed byte
array the way it is done, but I do think it is not all THAT different
from case (2)

I also think that this goes back to one of my points, that we have
division between two styles:

(a) Streaming, where we only use InputStream, OutputStream for
operation -- this is (IMO) easier to keep fair. However, some libs are
fundamentally non-streaming and for them there is then additional cost
of reading from InputStream into byte[] (and reverse for writing).
(b) Blocks; where input is given as byte[], output expected as byte[]

the disputed case is for (b) is it not?

One thing I do not recall is whether and when did we go back & forth
between requiring (or not) result being returned as byte[]? I thought
originally streams were used.

And from this, should we not just force use of InputStream,
OutputStream as the test. Doing this, we should be able to eliminate
reuse by per-implementation serializer glue code -- and leave
non-problematic cases of either test framework managing reuse, or
underlying serializer automatically handling reuse of its own internal
buffers.

A side issue of whether serializer should work with arbitrary length
input: I assume we all agree in that this should be the case (within
reasonable sizes of course)

-+ Tatu +-

David Yu

unread,
Apr 27, 2012, 2:01:24 PM4/27/12
to java-serializat...@googlegroups.com
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.
Now java-manual is back to being as fast as it was (1700ms).  The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)

WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.
It is still a little bit faster than kryo even with a new ByteArrayOutputStream.

Also, fastjson seems to be improving.

The results are basically fair.  

Nate

unread,
Apr 27, 2012, 2:08:01 PM4/27/12
to java-serializat...@googlegroups.com
On Fri, Apr 27, 2012 at 11:01 AM, David Yu <david....@gmail.com> wrote:
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.
Now java-manual is back to being as fast as it was (1700ms).  The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)
 
WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.
It is still a little bit faster than kryo even with a new ByteArrayOutputStream.

Also, fastjson seems to be improving.

The results are basically fair.  

This is a community project and the results are highly visible. We need to start using a staging wiki page so that the results can be reviewed and not just updated at will. Please refrain from updating the wiki until the staging results have been discussed. I have updated the wiki, removing all results until the issues we have been discussing are resolved.
https://github.com/eishay/jvm-serializers/wiki

-Nate

Nate

unread,
Apr 27, 2012, 2:19:06 PM4/27/12
to java-serializat...@googlegroups.com
On Fri, Apr 27, 2012 at 10:34 AM, Tatu Saloranta <tsalo...@gmail.com> wrote:
I think I actually understand more now; and I can see two points here.

First: use of ByteArrayOutputStream. There are (IMO) two ways it is
(or has been?) used:

1. By shared test code passing it out directly, something like:
  (I forget exact names, but this should suffice)

  ByteArrayOutputStream reused = ...
  serializer.writeValue(value, reused);
2. By serializer sub-classes that explicitly ask for instance, to
produce byte[]:
   class MySerializer ... {
      public byte[] write(T value) {
         ByteArrayOutputStream out = super.getOutputStream();
        ....
   }

Of these, I think (1) is correct, fair, and non-problematic. Case (2).
however, is somewhat problematic, because it can potentially differ
between cases, and requires special handling by implementation.

Can you expand on why #2 is problematic? The method signature for Serializer#serialize is:

public abstract byte[] serialize(S content) throws Exception;
IMO, it is up to the serializer to do whatever it needs and then return a byte[]. I assume Serializer#outputStream is only for convenience. If we want to force serialization to a stream, we should pass it in, however I don't think that is fair to non-streaming libraries.

 
I think Nate is pointing to 2, and saying that this is very similar to
what Kryo serializer will do, just using different mechanism. I agree
with this to some point; although I think the way it is done is bit
problematic just because of the way byte[] and ByteArrayOutputStream
differ (one can not change state of byte[]).

Can you expand on how the way Kryo does it is problematic? AFAIK, it is exactly identical to ByteArrayOutputStream.

 
Ideally I think Kryo's Output class would handle reuse automatically,
and if so, I would have absolutely no problem with it.
I am not 100% sure what to think of holding a reference to passed byte
array the way it is done, but I do think it is not all THAT different
from case (2)

The Kryo serializer that has a byte[] field is a result of the changes that David Yu has made. I have reverted it to how I believe Kryo should be used. Can you please review?
https://github.com/eishay/jvm-serializers/blob/f370d51f415fc29c872bc2c870feb52a16b705f2/tpc/src/serializers/Kryo.java#L39
The source for the Input and Output classes is here:
https://code.google.com/p/kryo/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fesotericsoftware%2Fkryo%2Fio

 

I also think that this goes back to one of my points, that we have
division between two styles:

(a) Streaming, where we only use InputStream, OutputStream for
operation -- this is (IMO) easier to keep fair. However, some libs are
fundamentally non-streaming and for them there is then additional cost
of reading from InputStream into byte[] (and reverse for writing).
(b) Blocks; where input is given as byte[], output expected as byte[]

the disputed case is for (b) is it not?

One thing I do not recall is whether and when did we go back & forth
between requiring (or not) result being returned as byte[]? I thought
originally streams were used.

And from this, should we not just force use of InputStream,
OutputStream as the test. Doing this, we should be able to eliminate
reuse by per-implementation serializer glue code -- and leave
non-problematic cases of either test framework managing reuse, or
underlying serializer automatically handling reuse of its own internal
buffers.

We could force use of a provided stream, but why? It isn't fair for non-stream based serializers. It also isn't fair for stream based serializers like Kryo that can write to a byte[] without an extra copy.

Serializers should be free to use whatever "glue code" they want, as this has the least overhead. If some libraries are unable to do this efficiently, that fact will be represented in the results, but I don't see this as being unfair.

-Nate

David Yu

unread,
Apr 27, 2012, 2:22:34 PM4/27/12
to java-serializat...@googlegroups.com
On Sat, Apr 28, 2012 at 2:08 AM, Nate <nathan...@gmail.com> wrote:
On Fri, Apr 27, 2012 at 11:01 AM, David Yu <david....@gmail.com> wrote:
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.
Now java-manual is back to being as fast as it was (1700ms).  The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)
 
WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.
It is still a little bit faster than kryo even with a new ByteArrayOutputStream.

Also, fastjson seems to be improving.

The results are basically fair.  

This is a community project and the results are highly visible.
Yet you freely updated the wiki with your shortcuts.
My previous changes did *remove* your shortcuts and corrected the results.
We need to start using a staging wiki page so that the results can be reviewed and not just updated at will. Please refrain from updating the wiki until the staging results have been discussed. I have updated the wiki, removing all results until the issues we have been discussing are resolved.
https://github.com/eishay/jvm-serializers/wiki

-Nate

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Nate

unread,
Apr 27, 2012, 2:27:16 PM4/27/12
to java-serializat...@googlegroups.com


On Fri, Apr 27, 2012 at 11:22 AM, David Yu <david....@gmail.com> wrote:


On Sat, Apr 28, 2012 at 2:08 AM, Nate <nathan...@gmail.com> wrote:
On Fri, Apr 27, 2012 at 11:01 AM, David Yu <david....@gmail.com> wrote:
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.
Now java-manual is back to being as fast as it was (1700ms).  The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)
 
WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.
It is still a little bit faster than kryo even with a new ByteArrayOutputStream.

Also, fastjson seems to be improving.

The results are basically fair.  

This is a community project and the results are highly visible.
Yet you freely updated the wiki with your shortcuts.
My previous changes did *remove* your shortcuts and corrected the results.

I made changes, requested someone to update the wiki, waited 2 days, then updated the wiki myself. Anyway, we now have a better process in place.

-Nate

David Yu

unread,
Apr 27, 2012, 3:00:05 PM4/27/12
to java-serializat...@googlegroups.com
It basically allows kryo (as an optimization) to re-use the buffer instead of allocation a new one every time.

E.g
new Output(BUFFER_SIZE, -1) // a new buffer is allocated everytime.  
new Output(buffer, -1) // the buffer is re-used.


I also think that this goes back to one of my points, that we have
division between two styles:

(a) Streaming, where we only use InputStream, OutputStream for
operation -- this is (IMO) easier to keep fair. However, some libs are
fundamentally non-streaming and for them there is then additional cost
of reading from InputStream into byte[] (and reverse for writing).
(b) Blocks; where input is given as byte[], output expected as byte[]

the disputed case is for (b) is it not?
Yes.  For buffer-based serializers, buffer is resized/expanded.
For stream-based serializers, buffer is flushed.
What kryo is doing is neither of the above.  It was intentionally coded to take data from the first run and persist the size, so it doesn't have any overhead on the runs that are actually measured.

One thing I do not recall is whether and when did we go back & forth
between requiring (or not) result being returned as byte[]? I thought
originally streams were used.

And from this, should we not just force use of InputStream,
OutputStream as the test. Doing this, we should be able to eliminate
reuse by per-implementation serializer glue code -- and leave
non-problematic cases of either test framework managing reuse, or
underlying serializer automatically handling reuse of its own internal
buffers.

A side issue of whether serializer should work with arbitrary length
input: I assume we all agree in that this should be the case (within
reasonable sizes of course)

-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Tatu Saloranta

unread,
Apr 27, 2012, 4:05:30 PM4/27/12
to java-serializat...@googlegroups.com
I merged latest fastjson contribution someone submitted, which
probably improved performance -- explicitly disables cycle detection
it looks, newer version. It's a crazy fast codec from what I remember
(I am not related to project in any way) -- first library that I think
of where word "fast" is included in name that IS ACTUALLY fast.
Imagine that! 8-)

-+ Tatu +-

Tatu Saloranta

unread,
Apr 27, 2012, 4:10:03 PM4/27/12
to java-serializat...@googlegroups.com
Because I do not think test serializer (that is -- not thing library
provides, but something benchmark includes) should have any
optimizations. Rather, ideally it should be shared benchmark code that
does this for all; or, codec itself would automatically do this.
And from practical perspective I think this is the place where there
is most disagreement.

Then again, if rules are clearly defined, I would not mind: we just
would need more work to ensure all implementations followed same
rules, criteria, and optimizations.

So this is just pragmatic thinking: if we could eliminate such code,
it'd simplify everyone's life.
This is also why I would rather see streams being used so that
aggregation as a byte array would not be needed.

Also, while I say problematic, I don't necessarily mean a huge
problem. Just a concern, possibly problem.

-+ Tatu +-

Tatu Saloranta

unread,
Apr 27, 2012, 4:17:43 PM4/27/12
to java-serializat...@googlegroups.com
Whops. Forgot to address all points.

On Fri, Apr 27, 2012 at 11:19 AM, Nate <nathan...@gmail.com> wrote:
>
>
> On Fri, Apr 27, 2012 at 10:34 AM, Tatu Saloranta <tsalo...@gmail.com>
> wrote:
...
>> I think Nate is pointing to 2, and saying that this is very similar to
>> what Kryo serializer will do, just using different mechanism. I agree
>> with this to some point; although I think the way it is done is bit
>> problematic just because of the way byte[] and ByteArrayOutputStream
>> differ (one can not change state of byte[]).
>
>
> Can you expand on how the way Kryo does it is problematic? AFAIK, it is
> exactly identical to ByteArrayOutputStream.

Not exactly, in that it's all in serializer implementations. But one
could obviously move this functionality in base class as well, to make
tham equal.

>>
>> Ideally I think Kryo's Output class would handle reuse automatically,
>> and if so, I would have absolutely no problem with it.
>> I am not 100% sure what to think of holding a reference to passed byte
>> array the way it is done, but I do think it is not all THAT different
>> from case (2)
>
> The Kryo serializer that has a byte[] field is a result of the changes that
> David Yu has made. I have reverted it to how I believe Kryo should be used.
> Can you please review?
> https://github.com/eishay/jvm-serializers/blob/f370d51f415fc29c872bc2c870feb52a16b705f2/tpc/src/serializers/Kryo.java#L39
> The source for the Input and Output classes is here:
> https://code.google.com/p/kryo/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fesotericsoftware%2Fkryo%2Fio

I'll try to have a look.

>> I also think that this goes back to one of my points, that we have
>> division between two styles:
>>
>> (a) Streaming, where we only use InputStream, OutputStream for
>> operation -- this is (IMO) easier to keep fair. However, some libs are
>> fundamentally non-streaming and for them there is then additional cost
>> of reading from InputStream into byte[] (and reverse for writing).
>> (b) Blocks; where input is given as byte[], output expected as byte[]
>>
>> the disputed case is for (b) is it not?
>>
>> One thing I do not recall is whether and when did we go back & forth
>> between requiring (or not) result being returned as byte[]? I thought
>> originally streams were used.
>>
>> And from this, should we not just force use of InputStream,
>> OutputStream as the test. Doing this, we should be able to eliminate
>> reuse by per-implementation serializer glue code -- and leave
>> non-problematic cases of either test framework managing reuse, or
>> underlying serializer automatically handling reuse of its own internal
>> buffers.
>
>
> We could force use of a provided stream, but why? It isn't fair for
> non-stream based serializers. It also isn't fair for stream based
> serializers like Kryo that can write to a byte[] without an extra copy.

This is exactly what I said earlier -- and not so much on codecs, but
actual use case.

For most of my use, I use streams: this is how web services work; GIGO
over streams.
I rarely need to aggregate output, almost only case is for debugging or testing.
I don't need byte[] results.

But there are valid use cases for needing byte[]: when storing in a DB
that does not supporting streaming.

So: this goes back to what is the use case being modelled?

Depending on which way choose, we will like favor one group over another.

> Serializers should be free to use whatever "glue code" they want, as this
> has the least overhead. If some libraries are unable to do this efficiently,
> that fact will be represented in the results, but I don't see this as being unfair.

I hoped to use wording to indicate "problematic", but not necessarily
"unfair": meaning that there may be an issue, but one that is not due
to bad intentions.

Btw, I do NOT believe there is anything preventing Kryo codec (for
example) from efficiently aggregating results this way. It's just
question of details of how to do it (I don't speak for David who may
have other concerns). And I would be surprised if in the end there was
all that much performance difference between disputed case (i.e. if we
have different bona fide proposals, they'd measure quite similarly)

-+ Tatu +-

Tatu Saloranta

unread,
Apr 27, 2012, 4:36:58 PM4/27/12
to java-serializat...@googlegroups.com
On code:

On Fri, Apr 27, 2012 at 1:17 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
...
>> The Kryo serializer that has a byte[] field is a result of the changes that
>> David Yu has made. I have reverted it to how I believe Kryo should be used.
>> Can you please review?
>> https://github.com/eishay/jvm-serializers/blob/f370d51f415fc29c872bc2c870feb52a16b705f2/tpc/src/serializers/Kryo.java#L39
>> The source for the Input and Output classes is here:
>> https://code.google.com/p/kryo/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fesotericsoftware%2Fkryo%2Fio

I think that since maximum size is NOT limited (good), it does not
matter what the initial size is; and whatever defaults we use for
ByteArrayOutputStream should apply there too (which I think is true as
well).

I would be fine with allowing Output to be reused along with Kryo this way.
(btw, looks like reuse of Input is not all that necessary? no problem
with it, just seems almost irrelevant).
I would definitely rather let that stand than continue arguments.... :-)
Especially if we decided that byte[] in, byte[] out is the test case we want.

One suggestion that is not related to this issue -- would it make
sense to separate out "standard" case, and optimized ones? The reason
is that to understand relative code sizes it'd be much easier to see
relative sizes -- I read through it, and understand that majority of
code is for optimized case.
For casual users (and our own stats if we are to publish some) it
would be even simpler if one can directly see that "standard Kryo
serializer is 30 lines" (or whatever), and "optimized 200 lines".
Plus it could also serve as piece of sample code for newbie Kryo users
I think: I suspect benchmark code will (right or wrong) be used as
templates too.

-+ Tatu +-

Nate

unread,
Apr 27, 2012, 6:51:54 PM4/27/12
to java-serializat...@googlegroups.com
On Fri, Apr 27, 2012 at 1:17 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:

Currently for serialize we return byte[]. If we changed to passing in an OutputStream, what would that mean? It would be identical for stream-based serializers, who would write to a ByteArrayOutputStream as they do now. For byte[]-based serializers, it would mean an additional memory copy to get the data into the stream. I would vote to leave it as-is since it seems to have the least drawbacks for all serializers.

-Nate


Nate

unread,
Apr 27, 2012, 7:01:24 PM4/27/12
to java-serializat...@googlegroups.com
On Fri, Apr 27, 2012 at 1:36 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
On code:

On Fri, Apr 27, 2012 at 1:17 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
...
>> The Kryo serializer that has a byte[] field is a result of the changes that
>> David Yu has made. I have reverted it to how I believe Kryo should be used.
>> Can you please review?
>> https://github.com/eishay/jvm-serializers/blob/f370d51f415fc29c872bc2c870feb52a16b705f2/tpc/src/serializers/Kryo.java#L39
>> The source for the Input and Output classes is here:
>> https://code.google.com/p/kryo/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fesotericsoftware%2Fkryo%2Fio

I think that since maximum size is NOT limited (good), it does not
matter what the initial size is; and whatever defaults we use for
ByteArrayOutputStream should apply there too (which I think is true as
well).

The defaults we use for ByteArrayOutputStream are now to create a new one each time, as the "shortcut" of reusing it apparently is worse for java-manual, for whatever crazy reason.
 
I would be fine with allowing Output to be reused along with Kryo this way.
(btw, looks like reuse of Input is not all that necessary? no problem
with it, just seems almost irrelevant).
I would definitely rather let that stand than continue arguments.... :-)
Especially if we decided that byte[] in, byte[] out is the test case we want.

I will commit the Kryo serializer version that uses a byte[], which means object graphs that exceed the BUFFER_SIZE will skew the results by including growing the buffer in the timings. I think it a bad idea to run a benchmark with a BUFFER_SIZE that isn't large enough, but this change doesn't affect the published media.1.cks and will hopefully pacify David.
 

One suggestion that is not related to this issue -- would it make
sense to separate out "standard" case, and optimized ones? The reason
is that to understand relative code sizes it'd be much easier to see
relative sizes -- I read through it, and understand that majority of
code is for optimized case.

I think this would be good. We already break the serializers into two groups because the chart URLs are too long. Let's just break them into "standard" and "optimized". Who wants to make the first draft of these two lists? :)
 
For casual users (and our own stats if we are to publish some) it
would be even simpler if one can directly see that "standard Kryo
serializer is 30 lines" (or whatever), and "optimized 200 lines".
Plus it could also serve as piece of sample code for newbie Kryo users
I think: I suspect benchmark code will (right or wrong) be used as
templates too.

The Kryo serializer code is pretty hairy, too many layers to be used as example code. I'd prefer rewriting it to be much more straightforward, but don't feel like fixing that what ain't broke.

-Nate

Tatu Saloranta

unread,
Apr 30, 2012, 2:05:49 PM4/30/12
to java-serializat...@googlegroups.com
I don't think this is entirely true, at least for serialization, if we
would still make test framework reuse ByteArrayOutputStream. It would
add one single copy operation; which is extra overhead, but quite
modest.

At the same time, I don't care enough to drive such change: the
reverse where stream-based codecs must produce byte[] is also a tiny
addition of one more allocation.
And one that block-based codecs sort of also have to do anyway.

Ok, never mind. I think either way works out pretty well -- although I
think it means that I would insist on reusing ByteArrayOutputStream
(by framework) to keep compromise balanced, considering the basic
difference between styles.

Sorry for one more distraction,

-+ Tatu +-

Tatu Saloranta

unread,
Apr 30, 2012, 2:08:44 PM4/30/12
to java-serializat...@googlegroups.com
On Fri, Apr 27, 2012 at 4:01 PM, Nate <nathan...@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 1:36 PM, Tatu Saloranta <tsalo...@gmail.com>
> wrote:
>>
>> On code:
>>
>> On Fri, Apr 27, 2012 at 1:17 PM, Tatu Saloranta <tsalo...@gmail.com>
>> wrote:
>> ...
>> >> The Kryo serializer that has a byte[] field is a result of the changes
>> >> that
>> >> David Yu has made. I have reverted it to how I believe Kryo should be
>> >> used.
>> >> Can you please review?
>> >>
>> >> https://github.com/eishay/jvm-serializers/blob/f370d51f415fc29c872bc2c870feb52a16b705f2/tpc/src/serializers/Kryo.java#L39
>> >> The source for the Input and Output classes is here:
>> >>
>> >> https://code.google.com/p/kryo/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fesotericsoftware%2Fkryo%2Fio
>>
>> I think that since maximum size is NOT limited (good), it does not
>> matter what the initial size is; and whatever defaults we use for
>> ByteArrayOutputStream should apply there too (which I think is true as
>> well).
>
> The defaults we use for ByteArrayOutputStream are now to create a new one
> each time, as the "shortcut" of reusing it apparently is worse for
> java-manual, for whatever crazy reason.

Hmmmh. This must be some weird artifact -- possibly related to it
being run first?

Strange. This seems wrong, but I am getting tired of the whole thread.
We could make this command-line switchable, to at least prevent having
to go back and forth?
If I have time, how about I do this as something constructive...

-+ Tatu +-

Nate

unread,
Apr 30, 2012, 2:12:55 PM4/30/12
to java-serializat...@googlegroups.com
"pre-warmup" is now on by default, so there shouldn't be any advantage to going first. I have no idea why reusing the ByteArrayOutputStream would be slower than allocating a new one, but both David and I see that it is. We have very different hardware. Almost makes me wonder if our numbers are useful at all...

-Nate

Tatu Saloranta

unread,
Apr 30, 2012, 2:21:20 PM4/30/12
to java-serializat...@googlegroups.com
Ah. I see, so since all codecs are loaded, briefly run, JVM class
loader shouldn't be able to use some wrong speculative inlining
strategies.
I have not tried out allocation/reallocation, but while it seems
counter-intuitive I am not doubting this effect exists. Just puzzled.

I can check this again -- and if possible, it'd be great if others
could too -- but I was able to consistently see differences when
changing which codec was run first.

But maybe default 'run' command does not default to using pre-warmup?
This would explain it...

-+ Tatu +-

David Yu

unread,
Apr 30, 2012, 11:14:56 PM4/30/12
to java-serializat...@googlegroups.com
I've already tried running it behind java-built-in, java-manual (2nd codec) still takes a hit either way (on 2 different machines).

But maybe default 'run' command does not default to using pre-warmup?
This would explain it...

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Tatu Saloranta

unread,
May 1, 2012, 1:35:03 PM5/1/12
to java-serializat...@googlegroups.com
I also noticed that we indeed do the full pre-warmup, so difference
should not be there. Now tests take quite a bit longer, so it would be
great to do the 3-way split that was discussed.

-+ Tatu +-

David Yu

unread,
May 1, 2012, 9:37:56 PM5/1/12
to java-serializat...@googlegroups.com
You'll be surprised once you test it.

Also, I'm not sure I understand this full pre-warmup.
The trials=500 is basically the real warm up (try running without it and one can immediately see the difference). 
Imo, all it is doing is prolonging the benchmark run.  Warmups are supposed to be done immediately prior to the actual run (which the trials actually does)

Now tests take quite a bit longer, so it would be
great to do the 3-way split that was discussed.

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Kannan Goundan

unread,
May 1, 2012, 9:43:45 PM5/1/12
to java-serializat...@googlegroups.com
I think I added pre-warm because when I ran the tests, the first set of serializers were getting faster "create object" times than the later serializers.  The pre-warm thing fixed the symptoms for me.

My theory was that the JVM was more prone to specialize code early (when it seemed like there were only a few hot code paths) but then stopped inlining as much later on, since it saw that there were many more hot code paths.

I didn't verify this, though.

Tatu Saloranta

unread,
May 1, 2012, 11:20:17 PM5/1/12
to java-serializat...@googlegroups.com
I think the important thing -- if we run things on single JVM, single
class loader -- is to ensure all codecs get warmed up before measuring
any of them. I was assuming this was the needed change. If we isolated
things with either classloader per codec, or separate JVMs, this would
not be needed.

-+ Tatu +-

Tatu Saloranta

unread,
May 1, 2012, 11:24:01 PM5/1/12
to java-serializat...@googlegroups.com
On Tue, May 1, 2012 at 6:43 PM, Kannan Goundan <kan...@cakoose.com> wrote:
> I think I added pre-warm because when I ran the tests, the first set of
> serializers were getting faster "create object" times than the later
> serializers.  The pre-warm thing fixed the symptoms for me.
>
> My theory was that the JVM was more prone to specialize code early (when it
> seemed like there were only a few hot code paths) but then stopped inlining
> as much later on, since it saw that there were many more hot code paths.
>
> I didn't verify this, though.

I definitely observed first codec priority as well, and played a bit
with it to verify that it affected multiple codecs; although not
necessarily all (I saw it with Jackson ones when moving to initial
ones, improved things by 10% or so).

And my guess also was that Hot Spot is able to do some too-aggressive
inlining, because there is only a single implementation class (for,
say, Serializer). There is lots of speculative inlining going on with
current JVMs (Charlie Hunt / Binu John's book has good explanation if
anyone is interested); and there is also mandatory "unoptimizing" when
further class loading renders optimizations invalid.

-+ Tatu +-
Reply all
Reply to author
Forward
0 new messages