Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Detecting shortcuts (Explained in as simple terms as possible)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
David Yu  
View profile  
 More options Apr 27 2012, 2:09 pm
From: David Yu <david.yu....@gmail.com>
Date: Sat, 28 Apr 2012 02:09:35 +0800
Local: Fri, Apr 27 2012 2:09 pm
Subject: Detecting shortcuts (Explained in as simple terms as possible)

Here are a number of ways where a library can take a shortcut:
1.  If you're a compute based serializer (any prototobuf-based serializer
like wobly), you intentionally persist the computed size from the first
run, so you don't need to compute for the succeeding runs (like the others
do).

2.  If you're a stream/buffer based serializer, you intentionally persist
the resized buffer from the first run, so you get exempted from
flushing/resizing/expanding for the succeeding runs.

It is as simple as that.
You can't compromise one for the other.
To be fair to all types of serializers mentioned, avoid any of the above.

That is easy to digest.  Hopefully that clears things up from here on out.

--
When the cat is away, the mouse is alone.
- David Yu


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nate  
View profile  
 More options Apr 27 2012, 2:21 pm
From: Nate <nathan.sw...@gmail.com>
Date: Fri, 27 Apr 2012 11:21:05 -0700
Local: Fri, Apr 27 2012 2:21 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

You have missed the point about reducing overhead -- we should not be
timing the growing of the buffer, ever.

-Nate


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Yu  
View profile  
 More options Apr 27 2012, 2:24 pm
From: David Yu <david.yu....@gmail.com>
Date: Sat, 28 Apr 2012 02:24:28 +0800
Local: Fri, Apr 27 2012 2:24 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

In addition to that:

All current code are prepared to do proper
flushing/recomputing/resizing/expanding if ever the current tested dataset
is
sufficiently larger (this happens on every iteration, which basically
allows us to compute the real average time)

*In the wiki, it is actually encouraged (by Kannan, rightfully so) that the
interested parties actually run the dataset that closely resembles what
they have (thanks to cks-text, which makes this easy).  So it is important
that the code is honest for any possible dataset that a serializer might be
tested against.*

For the published results, it is intenional that media.1.cks is chosen so
that the actual results do not contain any overhead from the extra
flushing/resizing/etc.

> --
> When the cat is away, the mouse is alone.
> - David Yu

--
When the cat is away, the mouse is alone.
- David Yu

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Yu  
View profile  
 More options Apr 27 2012, 2:36 pm
From: David Yu <david.yu....@gmail.com>
Date: Sat, 28 Apr 2012 02:36:05 +0800
Local: Fri, Apr 27 2012 2:36 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

And I agree with you, that is why media.1.cks is used on the public
results?  No growing of buffer happens there.

That fact still remains that if ever the users themselves use there
dataset, kryo will have false results because it takes the shortcut.
*As kannan said:*
*it's how we measure the other tools, so it's still not fair to publish
results without fixing up the others.*

You cannot ever fix the others because it was designed that way.
Stream-based serializers will always need to flush.

Being a stream based serializer, kryo is trying to be smart by avoiding
that overhead only for your benefit.
Other buffer based serializer will always need to expand/resize and reset
on every iteration, but kryo is still try to be smart and avoid that
overhead.

> -Nate

>  --
> You received this message because you are subscribed to the Google Groups
> "java-serialization-benchmarking" group.
> To post to this group, send email to
> java-serialization-benchmarking@googlegroups.com.
> To unsubscribe from this group, send email to
> java-serialization-benchmarking+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/java-serialization-benchmarking?hl=en.

--
When the cat is away, the mouse is alone.
- David Yu

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nate  
View profile  
 More options Apr 27 2012, 2:37 pm
From: Nate <nathan.sw...@gmail.com>
Date: Fri, 27 Apr 2012 11:37:37 -0700
Local: Fri, Apr 27 2012 2:37 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

Kryo's design allows it to be more efficient. This should be reflected in
the results, not nerfed just because other libraries are less efficient.

-Nate


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Yu  
View profile  
 More options Apr 27 2012, 2:49 pm
From: David Yu <david.yu....@gmail.com>
Date: Sat, 28 Apr 2012 02:49:56 +0800
Local: Fri, Apr 27 2012 2:49 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

This should be reflected in the results, not nerfed just because other
> libraries are less efficient.

Its not related to how other libraries are less efficient.
*The issue is how you take data from the first run, and use it to make kryo
look good.  *
What's the point of computing the average for the first single run, when
the second run has completely different behavior/results from the first run?

> -Nate

> --
> You received this message because you are subscribed to the Google Groups
> "java-serialization-benchmarking" group.
> To post to this group, send email to
> java-serialization-benchmarking@googlegroups.com.
> To unsubscribe from this group, send email to
> java-serialization-benchmarking+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/java-serialization-benchmarking?hl=en.

--
When the cat is away, the mouse is alone.
- David Yu

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nate  
View profile  
 More options Apr 27 2012, 2:57 pm
From: Nate <nathan.sw...@gmail.com>
Date: Fri, 27 Apr 2012 11:57:48 -0700
Local: Fri, Apr 27 2012 2:57 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

Using a large enough buffer is done solely to reduce overhead. In a micro
benchmark we must isolate the code we are testing as much as possible. We
are testing serializer code, not buffer growing code.

> What's the point of computing the average for the first single run, when
> the second run has completely different behavior/results from the first run?

The first run is not included in any average, the first run is used to
check correctness. In fact, there is no averaging occurring for any runs,
see TestCaseRunner#runTakeMin.

-Nate


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Yu  
View profile  
 More options Apr 27 2012, 3:19 pm
From: David Yu <david.yu....@gmail.com>
Date: Sat, 28 Apr 2012 03:19:39 +0800
Local: Fri, Apr 27 2012 3:19 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

This is actually where your code cheats.
You intentionally re-use your components to collect data (persist the size)
from the first run, so that you'll have an advantage over all the other
serializers.

In fact, there is no averaging occurring for any runs, see

> TestCaseRunner#runTakeMin.

> -Nate

> --
> You received this message because you are subscribed to the Google Groups
> "java-serialization-benchmarking" group.
> To post to this group, send email to
> java-serialization-benchmarking@googlegroups.com.
> To unsubscribe from this group, send email to
> java-serialization-benchmarking+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/java-serialization-benchmarking?hl=en.

--
When the cat is away, the mouse is alone.
- David Yu

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nate  
View profile  
 More options Apr 27 2012, 3:21 pm
From: Nate <nathan.sw...@gmail.com>
Date: Fri, 27 Apr 2012 12:21:54 -0700
Local: Fri, Apr 27 2012 3:21 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

On Fri, Apr 27, 2012 at 12:19 PM, David Yu <david.yu....@gmail.com> wrote:
> The first run is not included in any average, the first run is used to
>> check correctness.

> This is actually where your code cheats.
> You intentionally re-use your components to collect data (persist the
> size) from the first run, so that you'll have an advantage over all the
> other serializers.

Again, it is done to remove overhead. The same overhead can be removed for
other serializers by reusing the ByteArrayOutputStream.

-Nate


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Yu  
View profile  
 More options Apr 27 2012, 3:45 pm
From: David Yu <david.yu....@gmail.com>
Date: Sat, 28 Apr 2012 03:45:39 +0800
Local: Fri, Apr 27 2012 3:45 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

On Sat, Apr 28, 2012 at 3:21 AM, Nate <nathan.sw...@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 12:19 PM, David Yu <david.yu....@gmail.com> wrote:

>> The first run is not included in any average, the first run is used to
>>> check correctness.

>> This is actually where your code cheats.
>> You intentionally re-use your components to collect data (persist the
>> size) from the first run, so that you'll have an advantage over all the
>> other serializers.

> Again, it is done to remove overhead. The same overhead can be removed for
> other serializers by reusing the ByteArrayOutputStream.

Nope.  What about the others that don't use outputStream?  Have you thought
of that?
In fact when you change the code and re-used the OutputStream, java-manual
took a hit (from 1700ms to 2400ms).
In a real project, that is how java-manual is used.  OutputStreams are not
reused at all.

Here's the performance of java-manual and kryo both using an outputstream
(not re-used)

./run -trials=500 -include=java-manual,kryo,wobly data/media.3.cks
Checking correctness...
[done]
                                 create     ser   +same   deser   +shal
+deep   total   size  +dfl
java-manual                         135    7489    7294    3920    4045
 4119   11608   1596   255
kryo                                135    6937    6945    4703    4763
 4918   11855   1573   254
wobly                                86   11164   10979    3521    3562
 3631   14796   1604   275

In this case, kryo without the shortcuts, is actually slower than
java-manual.
When everything is equal (both libraries flushing to outputstream),
java-manual performs better.
So when you mention that you're faster than java-manual, tell them "I did
it through shortcuts".

> -Nate

>  --
> You received this message because you are subscribed to the Google Groups
> "java-serialization-benchmarking" group.
> To post to this group, send email to
> java-serialization-benchmarking@googlegroups.com.
> To unsubscribe from this group, send email to
> java-serialization-benchmarking+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/java-serialization-benchmarking?hl=en.

--
When the cat is away, the mouse is alone.
- David Yu

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nate  
View profile  
 More options Apr 27 2012, 6:46 pm
From: Nate <nathan.sw...@gmail.com>
Date: Fri, 27 Apr 2012 15:46:42 -0700
Local: Fri, Apr 27 2012 6:46 pm
Subject: Re: Detecting shortcuts (Explained in as simple terms as possible)

Explain how setting an int to zero is slower than creating a new
ByteArrayOutputStream which allocates a new byte[512]. This shows how
brittle the results numbers really are.

> In a real project, that is how java-manual is used.  OutputStreams are not
> reused at all.

In a real project that needs byte[], reusing a ByteArrayOutputStream makes
sense. As Tatu mentioned, serializers are either stream-based or
byte[]-based. Currently they are byte[] based, as that is what
Serializer#serialize returns.

So when you mention that you're faster than java-manual, tell them "I did

> it through shortcuts".

You are saying the "shortcut" that makes Kryo so fast makes java-manual
slower...

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »