Ok. I did notice that Wobly is very fast as well, when added.
> Now that every serializer can handle overflows, its now possible to use a
> common buffer size.
> I've updated some libs to use it as a starting point.
> 512 is the default (was previously 500 on ByteArrayOutputStream), which can
> be changed via
> system properties.
Good idea.
> Binary (except java-built-in/scala built-in) and json data are less than 512
> bytes on media.1.cks, so there's no flushing/etc in the benchmark.
> Eventually we'll be able to measure the behavior of serializers when data
> does not fit in the buffer.
>
> The results are updated to the wiki with the latest changes.
> I've tried to include a single utf8 character in the dataset but couldn't
> figure out how to make cks accept it.
> Maybe on the next run we can include it in the default dataset (with the
> help of kannan).
Yes, this seems like a good idea as well.
> The results have not changed much, kryo-manual is still fastest. I'm not
> even sure why those shortcuts were necessary.
> One thing I noticed is that wobly actually performs better when run on
> windows 7 (based from previous results).
>
> Interestingly, with the shortcuts removed, smile/jackson/manual now seems to
> be on par with kryo.
Another related thing is that a while ago I added alternate sequence
testing. However, it is only supported by a subset of serializers,
based on codecs that were easiest to change; some might need external
framing.
But it should be relatively easy to expand coverage.
I was hoping to find that Avro was more efficient with longer
sequences, although that did not seem to be the case for some reason.
But I think this might also show some differences between other
codecs; binary formats should benefit more due to size differences,
for example.
One thing that would benefit sequence tests most however would be some
way to generate variations of items; if this was possible, it would be
possible to run tests with multiple sequence lengths without having to
hand create different input sets.
-+ Tatu +-
Hi all,Recents updates were pushed to remove shortcuts from kryo.1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).
Now that every serializer can handle overflows, its now possible to use a common buffer size.I've updated some libs to use it as a starting point.512 is the default (was previously 500 on ByteArrayOutputStream), which can be changed viasystem properties.
Binary (except java-built-in/scala built-in) and json data are less than 512 bytes on media.1.cks, so there's no flushing/etc in the benchmark. Eventually we'll be able to measure the behavior of serializers when data does not fit in the buffer.
The results are updated to the wiki with the latest changes.
On Thu, Apr 19, 2012 at 3:49 AM, David Yu <david....@gmail.com> wrote:Hi all,Recents updates were pushed to remove shortcuts from kryo.1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).
The benchmark serialize method returns a byte[]. The most efficient way to do this with Kryo is to use Output by itself. Your changes cause the bytes to be written to a byte[] in Output, then unnecessarily copied to a ByteArrayOutputStream.
The benchmark should not force a ByteArrayOutputStream to be used.
https://github.com/eishay/jvm-serializers/commit/fb52f09d24503808024b2a47d149ea6f0ec17769#L0R58
I request you revert the serialize() method to how it was before:
public byte[] serialize (T content) {
output.clear();
kryo.writeObject(output, content);
return output.toBytes();
}
This works in exactly the same was as the ByteArrayOutputStream used by Serializer#outputStream(), the only difference is it avoids copying around the bytes. Output even extends OutputStream.
Now that every serializer can handle overflows, its now possible to use a common buffer size.I've updated some libs to use it as a starting point.512 is the default (was previously 500 on ByteArrayOutputStream), which can be changed viasystem properties.Binary (except java-built-in/scala built-in) and json data are less than 512 bytes on media.1.cks, so there's no flushing/etc in the benchmark. Eventually we'll be able to measure the behavior of serializers when data does not fit in the buffer.
Your changes to set a buffer size would have no effect with a size smaller than the data, since Serializer reuses the ByteArrayOutputStream by calling reset(). IMO, this is how it should be, as we are measuring the serializers,
not how long it takes to allocate a buffer to hold the serialized bytes.
I think we should use the latest Java to run the benchmark for the wiki.The results are updated to the wiki with the latest changes.
-Nate
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
On Fri, Apr 20, 2012 at 2:36 AM, Nate <nathan...@gmail.com> wrote:On Thu, Apr 19, 2012 at 3:49 AM, David Yu <david....@gmail.com> wrote:Hi all,Recents updates were pushed to remove shortcuts from kryo.1. explicitly disables utf8 based from advanced knowledge of the content of the dataset.2. uses the maximum possible buffer size (based from the biggest dataset) to avoid using an outputstream for flushing (shortcut is directly using its internal buffer since it knows everything fits). Other streaming libs (smile/json/etc) are not doing this.Previously with kryo v1, this could not be alleviated as it was based on ByteBuffer (crashes on buffer overflow).
The benchmark serialize method returns a byte[]. The most efficient way to do this with Kryo is to use Output by itself. Your changes cause the bytes to be written to a byte[] in Output, then unnecessarily copied to a ByteArrayOutputStream.The benchmark should not force a ByteArrayOutputStream to be used.
https://github.com/eishay/jvm-serializers/commit/fb52f09d24503808024b2a47d149ea6f0ec17769#L0R58
I request you revert the serialize() method to how it was before:Nope. Why don't you read #2 again.Notice that before the change, all the libs in the benchmark are able to handle even if the data is 100x the buffer, except kryo, which crashes (apparently kryo v2 has same problems with v1).The point is that there should be no bias/shortcuts based from direct knowledge of the content and size dataset.That is why the buffer size will be provided by the benchmark, not the author.Future runs will have the option to use a dataset whose size exceeds the buffer size provided.
-Nate
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
We are measuring the serializers, not the growing of the ByteArrayOutputStream.
-Nate
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
-Nate
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
--
When the cat is away, the mouse is alone.
- David Yu
Tatu has asked you many times to show the code you are talking about.
Quit wasting our time.
@Tatu, if you can chime in and share your thoughts on the previous statement, that be great.
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
The two solutions are:1. Continue with re-use but use an outputstream, so it doesn't resize but flush (the current solution).
2. Skip the outputstream and allow it to resize, but use a new instance on every iteration.Without that, kryo is exempted.
Measuring this is WORTHLESS.
This is a community effort, so thankfully you alone do not get to decide what makes a library exempt.The two solutions are:1. Continue with re-use but use an outputstream, so it doesn't resize but flush (the current solution).
Serializer#serialize() returns a byte[]. It doesn't force usage of a ByteArrayOutputStream. Kryo can avoid using ByteArrayOutputStream by using its own Output class, which works in the same way.
2. Skip the outputstream and allow it to resize, but use a new instance on every iteration.Without that, kryo is exempted.
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
Unfortunately, I don't have the time to be a responsive project maintainer. I think, though, that we can get things back on track with our usual mailing-list-consensus-based ways once we get a little more process set up.
I think one issue is that the results page is very high stakes. For most people, the bar charts we publish are taken to be the final word on JVM serializer performance. Internet People cite them all the time.
<irrelevant-aside>I think this is unfortunate. As someone who spent weeks reworking the codebase, I got to see in great detail just how flawed these benchmarks are. Just the fact that we test a single, limited data value, yet provide results with four significant figures is absurd! This isn't anyone's fault; I'm just trying to point out the discrepancy with the accuracy of our results and (my perception of) how much The Internet bases decisions off our graphs. I think we're better than the other comparisons out there (like the lolgraph on MsgPack's website),
but benchmarking is so hard to get right.</irrelevant-aside>
Anyway, the stakes are high enough that maybe we should push new results to a "staging" URL and give everyone a week to scrutinize them before publishing to the main URL.
Second, I agree that we should probably write up some "rules". When I first made a pass through the project I fixed the discrepancies that were obvious, but we're now dealing with more subtle stuff. Things that may seem right to you might not be to someone else, to the point that it seems like the other person is being malicious.
For example, this particular thread included (among other things) whether we should count buffer resizing in the serialization time. I can see Nate's high-level point about not counting things that may not happen in real usage. For a long-running process, maybe the buffer gets resized on the first few sends and then doesn't hit the limit ever again (though I haven't quite digested what's going on in David's last message...). But even if counting resize time is a bad idea, it's how we measure the other tools, so it's still not fair to publish results without fixing up the others.
So lets try writing everything down. When some code looks shady, figure out what written rule it breaks and point it out. If it's doing something questionable but there's no written rule to prevent it, lets then discuss and make a new rule. A side benefit is that we'll have a detailed documentation of our testing methodology that we can put on the wiki (so people don't have to rummage through the source code for this information).
- I agree in that further runs should not be based on assuming/knowing
that further iterations provide same data.
I think our best chance is to focus on two things:
- creating "fully automatic" subset
, which should avoid many of
disputed techniques
- trying to find a way to create permutations for different runs, so
that tests would exhibit some level of variation.
One more comment on "manual" tests (where I think all of us
occasionally get overzealous with optimizations, and/or disagree
most): these were, I think, originally created mostly because there
were no good fully-automated providers.
With XML, for example, such solutions tended to have
disproportionately high overhead -- but even there, adding JAXB at
this point would solve the issue, as it can use fastest parsers, and
does not have more than maybe 50% overhead (compared to XStream and
others that have steeper).
Put another way, I think manual variants are less necessary due to
coverage. In fact, moving tree-based and manual variants to completely
separate suite (but with comparable throughputs, so one can compare if
need be).
I know this does not help resolve the specific issue, but I feel that
we would do better if we stepped back and considered bigger picture.
And once we are done with "bigger" changes, we can solve detail
issues.
I don't mean to belittle the question of fairness -- which is
fundamental with comparison -- but sometimes best way to solve a
specific problem is not full frontal assault, but by outmaneuvering
the thing.
-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
Is there any possibility
it could be related to JVM warmup oddities --
I have observed that the
first entries tend to get preferentially treated (I guess it is due to
class unloading when speculative inlining was done due to assumption
of no sub-classing etc).
-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
Kryo's Output class does EXACTLY the same thing that ByteArrayOutputStream does:
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L22
This is CORRECT behavior in both cases.
For the last time, when data doesn't fit in the buffer, the buffer grows.Except that kryo doesn't because the growth is permanent once the first run is done.So ultimately, the buffer size being assigned by the benchmark is useless for kryo.
Measuring this is WORTHLESS.Really? Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size).
I think one issue is that the results page is very high stakes. For most people, the bar charts we publish are taken to be the final word on JVM serializer performance. Internet People cite them all the time.
[snip]
Anyway, the stakes are high enough that maybe we should push new results to a "staging" URL and give everyone a week to scrutinize them before publishing to the main URL.
Second, I agree that we should probably write up some "rules". When I first made a pass through the project I fixed the discrepancies that were obvious, but we're now dealing with more subtle stuff. Things that may seem right to you might not be to someone else, to the point that it seems like the other person is being malicious.
For example, this particular thread included (among other things) whether we should count buffer resizing in the serialization time. I can see Nate's high-level point about not counting things that may not happen in real usage.
For a long-running process, maybe the buffer gets resized on the first few sends and then doesn't hit the limit ever again (though I haven't quite digested what's going on in David's last message...). But even if counting resize time is a bad idea, it's how we measure the other tools, so it's still not fair to publish results without fixing up the others.
So lets try writing everything down. When some code looks shady, figure out what written rule it breaks and point it out. If it's doing something questionable but there's no written rule to prevent it, lets then discuss and make a new rule. A side benefit is that we'll have a detailed documentation of our testing methodology that we can put on the wiki (so people don't have to rummage through the source code for this information).
To start, does anyone want to take a stab at formalizing the buffering rules?
Measuring this is WORTHLESS.Really? Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size).
I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer. In my mind, we should be measuring the serializers' code and we should exclude as much noise as possible. If we include growing of the buffer in our timings, it makes the relative difference between timings smaller. For an extreme example, if we added 10 seconds to all results, it would appear as if there was very little difference between all the serializers.
The discussion about buffer sizes has been going in circles. Let's step back a bit and clearly define the issues and potential fixes. We should be able to discuss without being impolite.
No one is trying to be malicious or misrepresent results.
We seem to have a misunderstanding and we need to focus on discussing it productively.
On Fri, Apr 20, 2012 at 7:36 PM, David Yu <david....@gmail.com> wrote:Kryo's Output class does EXACTLY the same thing that ByteArrayOutputStream does:
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L22
This is CORRECT behavior in both cases.
For the last time, when data doesn't fit in the buffer, the buffer grows.Except that kryo doesn't because the growth is permanent once the first run is done.So ultimately, the buffer size being assigned by the benchmark is useless for kryo.
My point is that the ByteArrayOutputStream provided by Serializer...
https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/Serializer.java#L19
...is reused in Serializer#outputStream and Serializer#outputStreamForList. These methods are used by many serializers that need an OutputStream, eg java-manual. Because the same ByteArrayOutputStream is reset() and reused, the backing buffer will only grow on the first serialization, and will never grow afterward. This is the same behavior as Kryo reusing an Output instance.
Note I am not yet judging whether this is right or wrong, just pointing it out.
Hopefully now you understand why I disagreed with your changes, as you have made Kryo allocate a new buffer each time
, while java-manual and others reuse the same buffer.
You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.
Here are the numbers for java-manual with and without buffer reuse, running on Sun's Java 1.7.0_03:
run -trials=500 -include=java-manual data/media.3.cksjava-manual WITHOUT buffer reuse: 80 3711 3725 1728 1797 1859 5570 1596 255
create ser +same deser +shal +deep total size +dfl
java-manual WITH buffer reuse: 82 3487 3409 1755 1854 1878 5364 1596 255
And here is the same for Kryo:
run -trials=500 -include=kryo data/media.3.cksKryo WITHOUT buffer reuse: 80 3212 3120 2620 2681 2718 5930 1573 254
create ser +same deser +shal +deep total size +dfl
Kryo WITH buffer reuse: 81 2074 1985 2756 2725 3493 5566 1573 254
Interesting that the difference I see for Kryo is less pronounced than on your machine, but still a pretty big difference.
Measuring this is WORTHLESS.Really? Here's some sample data with the message size larger than buffer size (media.3.cks with 512 as the provided buffer size).
I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer.
In my mind, we should be measuring the serializers' code and we should exclude as much noise as possible. If we include growing of the buffer in our timings, it makes the relative difference between timings smaller. For an extreme example, if we added 10 seconds to all results, it would appear as if there was very little difference between all the serializers.
Does everyone understand the buffering reuse issue?
Here are some options I think are worth discussing, numbered for convenience:
1) Require serializers to start each serialization with a buffer of size Serializer.BUFFER_SIZE. Each serialization will include any growing of the buffer.
1a) What would be a reasonable size? FWIW, BufferedInputStream uses 8192.
2) Allow serializers to reuse the same buffer. This means that the first serialization may grow the buffer, and subsequent serializations can reuse this buffer which is known to be big enough.
2a) After serialization, most buffers allocate a new byte[] and copy out the bytes, like ByteArrayOutputStream#toByteArray() and Kryo's Output#toBytes(). A serializer could get a speed boost by avoiding this allocation and copy by returning the backing byte[], since it knows EXACTLY how big it should be beforehand. This seems somewhat sleazy, as it doesn't parallel real world usage, where objects are extremely unlikely to all be the same size.
3) Force serializers to serialize into a byte[] provided by the framework. The buffer size would have to be large enough, but this isn't much of an issue as it can be specified. The method would write bytes to the array starting at zero and would return the number of bytes written, so it would change from...
public abstract byte[] serialize(S content) throws Exception;
...to...
public abstract int serialize(S content, byte[] buffer) throws Exception;
3a) #3 is basically the same as #1, but with a buffer size known to be large enough. I think I prefer #3, as #1 could silently grow the buffer and negatively affect the results without anyone noticing.
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
On Fri, Apr 27, 2012 at 5:30 AM, Nate <nathan...@gmail.com> wrote:
The discussion about buffer sizes has been going in circles. Let's step back a bit and clearly define the issues and potential fixes. We should be able to discuss without being impolite.You started it.
Hopefully now you understand why I disagreed with your changes, as you have made Kryo allocate a new buffer each timeWrong.What part of this don't you understand:protected final byte[] buffer = new byte[BUFFER_SIZE];public byte[] serialize (T content) {Output output = new Output(buffer, Integer.MAX_VALUE);kryo.writeObject(output, content);return output.toBytes();}Is that "new buffer each time"? Please check the code first next time.
, while java-manual and others reuse the same buffer.There's a difference between buffer from OutputStream and the internal buffer used by the library. (which tatu acknowledged)
Some of the libraries here don't even have the luxury of re-using the internal buffer for each run.Ultimately, they still use (or re-use) the internal buffer to flush to the outputstream.If the data is 5x bigger, they flush 5x.
If not using a stream, you resize/expand it x times (depends on the algorithm) or simply pre-compute the data size.Now kryo wants to take a shortcut and avoid all that by persisting the resized buffers from the first run.
You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.The wiki results are based from media.1.cks. No one is at a disadvantage because everything fits it the buffer (well, except for java-built-in/scala-built-in). Stop BSing.
I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer.Are we not publishing results from media.1.cks? (which fits inside 512)Just like media.2.cks, media.3.cks is also there to keep the libraries honest.It also answers the question:What if the data cannot fit in the buffer, how will the library behave?
We're not publishing results other than media.1.cks.I don't see a problem here.
Internally, the ByteArrayOutputStream grows its byte[] during the first run and then it never grows again.
If not using a stream, you resize/expand it x times (depends on the algorithm) or simply pre-compute the data size.Now kryo wants to take a shortcut and avoid all that by persisting the resized buffers from the first run.There is no shortcut, Kryo's Output class works exactly the same as the OutputStream from Serializer#outputStream.
You made these changes which put Kryo at a disadvantage and updated the wiki before anyone could review them.The wiki results are based from media.1.cks. No one is at a disadvantage because everything fits it the buffer (well, except for java-built-in/scala-built-in). Stop BSing.
Regarding your "Stop BSing" comment, if you continue to be impolite, I will cease discussion with you and I expect others will do the same.
In your initial update to the wiki, you changed Kryo to allocate a new buffer each time and you updated the wiki with the results.
This is a non-issue now anyway, as we will being using a "staging" wiki page so that updates can be reviewed and discussed.
The current code does not answer that question, because most of the serializers use Serializer#outputStream, which reuses the buffer. Your changes have made Kryo not reuse the buffer, so it will grow each time, and therefore it is not fair for media.3.cks or any other test where the output is > 512. We don't publish those results, but it is still unfair.I understand that reusing or allocating a new buffer for each serialization will have an affect on the results. What I am saying is that I would like results for ALL serializers to avoid growing the buffer.Are we not publishing results from media.1.cks? (which fits inside 512)Just like media.2.cks, media.3.cks is also there to keep the libraries honest.It also answers the question:What if the data cannot fit in the buffer, how will the library behave?We're not publishing results other than media.1.cks.I don't see a problem here.
Besides that, I don't believe this is an important question to ask.
It only adds overhead that skews the results.
-Nate
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.Now java-manual is back to being as fast as it was (1700ms). The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)
WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.It is still a little bit faster than kryo even with a new ByteArrayOutputStream.Also, fastjson seems to be improving.The results are basically fair.
I think I actually understand more now; and I can see two points here.
First: use of ByteArrayOutputStream. There are (IMO) two ways it is
(or has been?) used:
1. By shared test code passing it out directly, something like:
(I forget exact names, but this should suffice)
ByteArrayOutputStream reused = ...
serializer.writeValue(value, reused);
2. By serializer sub-classes that explicitly ask for instance, to
produce byte[]:
class MySerializer ... {
public byte[] write(T value) {
ByteArrayOutputStream out = super.getOutputStream();
....
}
Of these, I think (1) is correct, fair, and non-problematic. Case (2).
however, is somewhat problematic, because it can potentially differ
between cases, and requires special handling by implementation.
I think Nate is pointing to 2, and saying that this is very similar to
what Kryo serializer will do, just using different mechanism. I agree
with this to some point; although I think the way it is done is bit
problematic just because of the way byte[] and ByteArrayOutputStream
differ (one can not change state of byte[]).
Ideally I think Kryo's Output class would handle reuse automatically,
and if so, I would have absolutely no problem with it.
I am not 100% sure what to think of holding a reference to passed byte
array the way it is done, but I do think it is not all THAT different
from case (2)
I also think that this goes back to one of my points, that we have
division between two styles:
(a) Streaming, where we only use InputStream, OutputStream for
operation -- this is (IMO) easier to keep fair. However, some libs are
fundamentally non-streaming and for them there is then additional cost
of reading from InputStream into byte[] (and reverse for writing).
(b) Blocks; where input is given as byte[], output expected as byte[]
the disputed case is for (b) is it not?
One thing I do not recall is whether and when did we go back & forth
between requiring (or not) result being returned as byte[]? I thought
originally streams were used.
And from this, should we not just force use of InputStream,
OutputStream as the test. Doing this, we should be able to eliminate
reuse by per-implementation serializer glue code -- and leave
non-problematic cases of either test framework managing reuse, or
underlying serializer automatically handling reuse of its own internal
buffers.
This is a community project and the results are highly visible.On Fri, Apr 27, 2012 at 11:01 AM, David Yu <david....@gmail.com> wrote:
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.Now java-manual is back to being as fast as it was (1700ms). The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)
WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.It is still a little bit faster than kryo even with a new ByteArrayOutputStream.Also, fastjson seems to be improving.The results are basically fair.
We need to start using a staging wiki page so that the results can be reviewed and not just updated at will. Please refrain from updating the wiki until the staging results have been discussed. I have updated the wiki, removing all results until the issues we have been discussing are resolved.
https://github.com/eishay/jvm-serializers/wiki
-Nate
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
On Sat, Apr 28, 2012 at 2:08 AM, Nate <nathan...@gmail.com> wrote:
This is a community project and the results are highly visible.On Fri, Apr 27, 2012 at 11:01 AM, David Yu <david....@gmail.com> wrote:
Updated the wiki and reverted the change to allocating a ByteArrayOutputStream instead of re-using/reset.Now java-manual is back to being as fast as it was (1700ms). The reset/re-use of ByteArrayOutputStream made it look slower (2400ms)
WIth smile/jackson/manual being based on outputstream, there wasn't any change in the results.It is still a little bit faster than kryo even with a new ByteArrayOutputStream.Also, fastjson seems to be improving.The results are basically fair.Yet you freely updated the wiki with your shortcuts.My previous changes did *remove* your shortcuts and corrected the results.
I also think that this goes back to one of my points, that we have
division between two styles:
(a) Streaming, where we only use InputStream, OutputStream for
operation -- this is (IMO) easier to keep fair. However, some libs are
fundamentally non-streaming and for them there is then additional cost
of reading from InputStream into byte[] (and reverse for writing).
(b) Blocks; where input is given as byte[], output expected as byte[]
the disputed case is for (b) is it not?
One thing I do not recall is whether and when did we go back & forth
between requiring (or not) result being returned as byte[]? I thought
originally streams were used.
And from this, should we not just force use of InputStream,
OutputStream as the test. Doing this, we should be able to eliminate
reuse by per-implementation serializer glue code -- and leave
non-problematic cases of either test framework managing reuse, or
underlying serializer automatically handling reuse of its own internal
buffers.
A side issue of whether serializer should work with arbitrary length
input: I assume we all agree in that this should be the case (within
reasonable sizes of course)
-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
On code:
>> The Kryo serializer that has a byte[] field is a result of the changes thatI think that since maximum size is NOT limited (good), it does not
>> David Yu has made. I have reverted it to how I believe Kryo should be used.
>> Can you please review?
>> https://github.com/eishay/jvm-serializers/blob/f370d51f415fc29c872bc2c870feb52a16b705f2/tpc/src/serializers/Kryo.java#L39
>> The source for the Input and Output classes is here:
>> https://code.google.com/p/kryo/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fesotericsoftware%2Fkryo%2Fio
matter what the initial size is; and whatever defaults we use for
ByteArrayOutputStream should apply there too (which I think is true as
well).
I would be fine with allowing Output to be reused along with Kryo this way.
(btw, looks like reuse of Input is not all that necessary? no problem
with it, just seems almost irrelevant).
I would definitely rather let that stand than continue arguments.... :-)
Especially if we decided that byte[] in, byte[] out is the test case we want.
One suggestion that is not related to this issue -- would it make
sense to separate out "standard" case, and optimized ones? The reason
is that to understand relative code sizes it'd be much easier to see
relative sizes -- I read through it, and understand that majority of
code is for optimized case.
For casual users (and our own stats if we are to publish some) it
would be even simpler if one can directly see that "standard Kryo
serializer is 30 lines" (or whatever), and "optimized 200 lines".
Plus it could also serve as piece of sample code for newbie Kryo users
I think: I suspect benchmark code will (right or wrong) be used as
templates too.
But maybe default 'run' command does not default to using pre-warmup?
This would explain it...
-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
Now tests take quite a bit longer, so it would be
great to do the 3-way split that was discussed.
-+ Tatu +-
--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.