Just merged Rüdiger's large framework change.

164 views
Skip to first unread message

Kannan Goundan

unread,
Mar 12, 2014, 11:50:38 PM3/12/14
to java-serializat...@googlegroups.com
Please take a look to see if you are ok with the new methodology.

As Rüdiger already mentioned, one thing that will need to be refined is the categorization of various serializers (serializers/SerXXXX.java).  For example, I think we should distinguish between schema-based and POJO serializers.  I'm sure there are other improvements we could make in that area as well.

I'm proposing that we leave a week for people to take a look at the changes.  If there aren't any serious objections after that, we can publish the new results.  Sound ok?

Tatu Saloranta

unread,
Mar 13, 2014, 12:15:18 AM3/13/14
to java-serializat...@googlegroups.com
Sounds good to me.

Also, based on changes so far, I think Rüdiger could be given committer access, to simplify workflow. We can always reviews changes, propose and make changes, and changes so far have been very good and useful.

-+ Tatu +-



--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.
Visit this group at http://groups.google.com/group/java-serialization-benchmarking.
For more options, visit https://groups.google.com/d/optout.

David Yu

unread,
Mar 13, 2014, 1:17:04 AM3/13/14
to java-serializat...@googlegroups.com
On Thu, Mar 13, 2014 at 12:15 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
Sounds good to me.

Also, based on changes so far, I think Rüdiger could be given committer access, to simplify workflow
+1 to that 



--
When the cat is away, the mouse is alone.
- David Yu

Rüdiger Möller

unread,
Mar 13, 2014, 4:25:49 PM3/13/14
to java-serializat...@googlegroups.com
I promise not to go all over it again any time soon :-). 


Am Donnerstag, 13. März 2014 05:15:18 UTC+1 schrieb cowtowncoder:
Sounds good to me.

Also, based on changes so far, I think Rüdiger could be given committer access, to simplify workflow. We can always reviews changes, propose and make changes, and changes so far have been very good and useful.

-+ Tatu +-

On Thu, Mar 13, 2014 at 3:50 AM, Kannan Goundan <kan...@cakoose.com> wrote:
Please take a look to see if you are ok with the new methodology.

As Rüdiger already mentioned, one thing that will need to be refined is the categorization of various serializers (serializers/SerXXXX.java).  For example, I think we should distinguish between schema-based and POJO serializers.  I'm sure there are other improvements we could make in that area as well.

I'm proposing that we leave a week for people to take a look at the changes.  If there aren't any serious objections after that, we can publish the new results.  Sound ok?

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Rüdiger Möller

unread,
Mar 14, 2014, 9:21:04 PM3/14/14
to java-serializat...@googlegroups.com
For example, I think we should distinguish between schema-based and POJO serializers.  I'm sure there are other improvements we could make in that area as well.


That's already done. ZERO_KNOWLEDGE = "Pojo" (no information in advance)
CLASSES_KNOWN = serializer knows which classes will be serialized and prearranges tables / generates stuff
MANUAL = manual optimiziations in advance
 

Tatu Saloranta

unread,
Mar 15, 2014, 12:37:19 PM3/15/14
to java-serializat...@googlegroups.com
I think this may still be missing one case: one where a Schema must be provided by caller.
Although our test can pre-generate these, typically they are hand-written and shared.
I assume CLASSES_KNOWN simply means need to list classes, which is little bit less work.

-+ Tatu +-
 
 

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

Rüdiger Möller

unread,
Mar 16, 2014, 4:16:45 AM3/16/14
to java-serializat...@googlegroups.com
As code generation can be included into the build, i did not make a difference inbetween simple preregistering a class (like fst, kryo, jboss-ct) or schema pregeneration. I agree it is a difference, but I did not want to add too fine grained classifications. If you go into the details there are many more differences: e.g. for fst you can specify incomplete class-lists, just to boost performance of frequent classes, kryo in this test requires all classes to be preregistered (setRegistrationRequired(true)).

As soon a serializer required more than provideing a list of classes (e.g. also define field specific stuff) i put it into the MANUAL category.

I doubt a viewer not involved heavily in-detail with serialization will understand the difference, however we can add another category np (have to update chart queries then also).

-ruediger


--
You received this message because you are subscribed to a topic in the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-serialization-benchmarking/54h2Jp1zpKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-serialization-be...@googlegroups.com.

Tatu Saloranta

unread,
Mar 16, 2014, 4:05:27 PM3/16/14
to java-serializat...@googlegroups.com
On Sun, Mar 16, 2014 at 8:16 AM, Rüdiger Möller <moru...@gmail.com> wrote:
As code generation can be included into the build, i did not make a difference inbetween simple preregistering a class (like fst, kryo, jboss-ct) or schema pregeneration. I agree it is a difference, but I did not want to add too fine grained classifications. If you go into the details there are many more differences: e.g. for fst you can specify incomplete class-lists, just to boost performance of frequent classes, kryo in this test requires all classes to be preregistered (setRegistrationRequired(true)).


I disagree here: to me having to write a schema (or obtaining it from somewhere else) is a different task than listing classes of objects being used. The fact that schema can be generated for tests, on the other hand, is irrelevant for end users: this is not done by users and is a testing artifact here.

As soon a serializer required more than provideing a list of classes (e.g. also define field specific stuff) i put it into the MANUAL category.

This makes sense to me.
 
I doubt a viewer not involved heavily in-detail with serialization will understand the difference, however we can add another category np (have to update chart queries then also).


To me difference between formal schema vs. simple listing of types is significant; although I guess I was unaware that there were significant number of serializers that required listing of classes to use.
But as with other things, this could be covered by documentation: that the semi-automatic case includes things like input schema to feed, or manually provided list of serialization types.

How do others feel?

-+ Tatu +-

Kannan Goundan

unread,
Mar 16, 2014, 7:12:36 PM3/16/14
to java-serializat...@googlegroups.com
Even though the scalar "effort" is similar, I think users care about the difference between POJOs and a schema.  The advantage of a schema is language-neutrality.  The advantage of POJOs is that your build setup and IDE handle it perfectly.  Handling code generation in a build is still not perfect in any language and especially bad in Java[1].

I also think it's useful to have multiple "manual" categories.

I think maybe the aversion to having too many categories is that it makes the results page more complicated.  I'm hoping that a more capable dynamic results page will make this less of a problem.  I'm currently working on a results page that lets you type in an arbitrary boolean expression and it filters the results based on that expression.  For example "(schema | pojo) & binary".  I'll hopefully have a demo ready by tomorrow and we can see whether or not this actually makes a difference.

[1] Part of the reason I used a Makefile for this project is that all of the Java-based build tools are a bunch of ad-hoc helpers without solid core dependency system.  This makes handling code generation very difficult.  However, it looks like Facebook's new Buck build tool, Java developers finally have something reasonable we can use.  It's main current deficiency is that it doesn't handle Maven dependencies, but we don't use those in this project.

Nate

unread,
Mar 16, 2014, 7:24:01 PM3/16/14
to java-serializat...@googlegroups.com
On Mon, Mar 17, 2014 at 12:12 AM, Kannan Goundan <kan...@cakoose.com> wrote:
Even though the scalar "effort" is similar, I think users care about the difference between POJOs and a schema.  The advantage of a schema is language-neutrality.  The advantage of POJOs is that your build setup and IDE handle it perfectly.  Handling code generation in a build is still not perfect in any language and especially bad in Java[1].

I also think it's useful to have multiple "manual" categories.

I think maybe the aversion to having too many categories is that it makes the results page more complicated.  I'm hoping that a more capable dynamic results page will make this less of a problem.  I'm currently working on a results page that lets you type in an arbitrary boolean expression and it filters the results based on that expression.  For example "(schema | pojo) & binary".  I'll hopefully have a demo ready by tomorrow and we can see whether or not this actually makes a difference.

FWIW, this lib has some impressive examples:
http://d3js.org/
I've heard it's a bit hard to work with though.

-Nate

Kannan Goundan

unread,
Mar 16, 2014, 7:51:22 PM3/16/14
to java-serializat...@googlegroups.com
Haha yeah.  I tried three different graphing libraries based on D3: rickshaw, nvd3, dimple.  The results were pretty but it was hard to make things look the way I wanted (bar labels were getting cut off, etc.).

My current code just inserts DIVs with different widths :-P  I think for our limited needs, this is a better approach.  Hope to have something to show soon.


Tatu Saloranta

unread,
Mar 17, 2014, 6:09:09 PM3/17/14
to java-serializat...@googlegroups.com
One semi-related note on refactored version: first, I like the fact that tests run faster, and overall change looks really nice. It improves usability a lot. Good job!

Second, it might make sense to use slightly higher default iteration counts: I noticed that numbers with defaults are very different from longer runs -- I realize that there's trade-off between convenient quick runs vs longer time, but I
think a slight increase (say, from testTime=100 to testTime=1000 at least) would help a lot.

Also: not sure why, but I noticed that certain increased sizes resulted in test run being skipped altogether, but without error message. Code also crashed if `results/tmp` directory was missing; it'd be nice to just dynamically create it, should it be missing (from example due to nuking `results` directory or such).

-+ Tatu +-

Kannan Goundan

unread,
Mar 17, 2014, 6:16:59 PM3/17/14
to java-serialization-benchmarking
Sorry, the issues with "results/tmp" might be my fault.  I tweaked some of the scripts a little.

Kannan Goundan

unread,
Mar 17, 2014, 6:18:18 PM3/17/14
to java-serialization-benchmarking
Also, the testTime=100 was DEFINITELY my fault.  I reduced the runtime while testing the patch, but forgot to revert it.

Tatu Saloranta

unread,
Mar 17, 2014, 6:18:25 PM3/17/14
to java-serializat...@googlegroups.com
Not a biggie at all, just noticed it briefly & created one manually.

-+ Tatu +-

Kannan Goundan

unread,
Mar 17, 2014, 6:25:49 PM3/17/14
to java-serialization-benchmarking
Fixed now.

Rüdiger Möller

unread,
Mar 17, 2014, 7:23:59 PM3/17/14
to java-serializat...@googlegroups.com
testtime should be 5000 warm up + 5000 overall.(milliseconds). Everything below will produce mostly garbage numbers. before publishing you should set it to like 10000 or 20000 (takes >1hour then). 
The tmp directy was created in my initial version ..
Pls do not do the thingy going for a mix of time and max iterations. It will produce bias (not enough runtime for the faster ones).


Fixed now.
You received this message because you are subscribed to a topic in the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-serialization-benchmarking/54h2Jp1zpKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-serialization-be...@googlegroups.com.

Tatu Saloranta

unread,
Mar 17, 2014, 8:14:05 PM3/17/14
to java-serializat...@googlegroups.com
Agreed. I just wanted to make sure that I wouldn't change settings if they had been changed for specific purpose. I think values you suggest make sense based on my experiences (in other cases I have used 5 sec warmup, 20-30 second test run; but 10 sec should be sufficient).

Would it be possible to maybe take optional time arguments, or switch to indicate "publish" run? The reason just so that local changes to config settings are not accidentally pushed to github; I don't think changing values itself is a problem but rather accidental overwrites (has happened to be occasionally).

And once again, I think these are really good changes: even with just the default result generation is an improvement. I also hope that by experimentation it would be possible to somehow reduce effect of slow outliers: in some groups we still have way slow cases that make it difficult to see relative differences for fastest implementations per group.

-+ Tatu +-

Rüdiger Möller

unread,
Mar 18, 2014, 4:52:35 AM3/18/14
to java-serializat...@googlegroups.com
One could generate the runtime settings into the output document. Regarding the outliers: just have a look at statscruncher, You could limit the charting on the number of serializers OR optionally sepcify a max variance factor. E.g. say only include serialiezrs in the chart which are not worse than 4 times the best. However some charts get pretty short then and I definitely wnated to have default JDK serialization in the charts.

-ruediger

Kannan Goundan

unread,
Mar 18, 2014, 4:57:47 AM3/18/14
to java-serialization-benchmarking
The current results page prototype allows you to select any number of boolean "properties" for a serializer.  These property lists are intended to be output by the benchmark.

If you wanted, you could output the "slow" property for some serializers.  Then it would be easy to filter using the expression "!slow".  If you want to include the "jvm-default" serializer, the filter expression would be "!slow | jvm-default".  We can provide a pre-defined list of filters for common queries.

Rüdiger Möller

unread,
Mar 18, 2014, 5:14:07 AM3/18/14
to java-serializat...@googlegroups.com
If you manage to output the charts dynamically, this obviously is the best solution. Do you plan to still offer "static charts" for mobiles and other javascript challenged devices ?

-rüdiger

Kannan Goundan

unread,
Mar 18, 2014, 5:41:41 AM3/18/14
to java-serialization-benchmarking
I didn't really think of the non-Javascript case.  I wonder how much it matters, since most recent phones will handle Javascript just fine.

If it becomes a problem, the page can be redesigned to initially be a static view.  The interactivity features would then only be present if Javascript was enabled.

Rüdiger Möller

unread,
Mar 18, 2014, 8:24:24 AM3/18/14
to java-serializat...@googlegroups.com
That looks promising :-). You can improve style with css later np. 

maybe still provide a table with ALL results of all serializers statically ..

-ruediger
Fixed now.


To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-serialization-benchmarking/54h2Jp1zpKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.

To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-serialization-benchmarking/54h2Jp1zpKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/java-serialization-benchmarking/54h2Jp1zpKU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Tatu Saloranta

unread,
Mar 18, 2014, 11:21:45 AM3/18/14
to java-serializat...@googlegroups.com
On Tue, Mar 18, 2014 at 8:52 AM, Rüdiger Möller <moru...@gmail.com> wrote:
One could generate the runtime settings into the output document. Regarding the outliers: just have a look at statscruncher, You could limit the charting on the number of serializers OR optionally sepcify a max variance factor. E.g. say only include serialiezrs in the chart which are not worse than 4 times the best. However some charts get pretty short then and I definitely wnated to have default JDK serialization in the charts.


I like the idea of variance factor (my specific case are various slow JSON codecs).
But instead of dropping results, perhaps just truncate outlier bars; actual numbers may still be visible by other means. Factor of, say, 5 or even 10 would help. It's really difference between 10k and 200k that is problematic.

-+ Tatu +-
Reply all
Reply to author
Forward
0 new messages