Serialized content

176 views
Skip to first unread message

diegocip .

unread,
Apr 26, 2013, 1:33:38 PM4/26/13
to java-serializat...@googlegroups.com
Hi,

would be of great interest to have the serialized content for each "technology" put on the results, considering binary to be put on a binary text representation like hexadecimal. Its unacceptable that text protocols, in particular XML based protocols, have less size of serialized content than XML ones, so its of importance to have the serialized data to do some analyses of the cons and pros. Hessian 4 will use 4 bytes for a 32 bit int, as its for Date support in minutes, and floating point numbers. On the other side, text based protocols, will have one byte (according to the charset, can be two), per character or digit of a number, and one more for the decimal point in decimal numbers. For date that would be (mm/yy/aaaa hh:mm) comparted to 4 bytes of Hessian or other binary protocol, I really like to see the serialized result without the need to run the test myself.

Thanks all,
Congrats,
Diego C Nascimento.

Tatu Saloranta

unread,
Apr 26, 2013, 1:46:56 PM4/26/13
to java-serializat...@googlegroups.com
Size of resulting messages is already included in results isn't it?
Or are you asking for something differnet?

-+ Tatu +-



--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.
Visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

diegocip .

unread,
Apr 26, 2013, 4:41:25 PM4/26/13
to java-serializat...@googlegroups.com
Thanks for repling. Yes it is, what I said is including the serialized content, to analise the pros and cons it is giving. A XML based protocol given less size than a binary one, and for Hessian 2, its really weird results, without compactation!
So giving the serialization result (text for text protocols and hexadecimal notation for example, for binary protocols) it
should be of great help to know why some protocols that may generate very small datasizes compared to any text based
one is generating bigger sizes. A test with a medium quantity list of the object in the test would be of great help, with random numbers, as with fixed numbers GZIP or other compactator would get great benefit on it.

Thanks,
Diego


On Friday, April 26, 2013 2:46:56 PM UTC-3, cowtowncoder wrote:
Size of resulting messages is already included in results isn't it?
Or are you asking for something differnet?

-+ Tatu +-


On Fri, Apr 26, 2013 at 10:33 AM, diegocip . <die...@brturbo.com.br> wrote:
Hi,

would be of great interest to have the serialized content for each "technology" put on the results, considering binary to be put on a binary text representation like hexadecimal. Its unacceptable that text protocols, in particular XML based protocols, have less size of serialized content than XML ones, so its of importance to have the serialized data to do some analyses of the cons and pros. Hessian 4 will use 4 bytes for a 32 bit int, as its for Date support in minutes, and floating point numbers. On the other side, text based protocols, will have one byte (according to the charset, can be two), per character or digit of a number, and one more for the decimal point in decimal numbers. For date that would be (mm/yy/aaaa hh:mm) comparted to 4 bytes of Hessian or other binary protocol, I really like to see the serialized result without the need to run the test myself.

Thanks all,
Congrats,
Diego C Nascimento.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Tatu Saloranta

unread,
Apr 26, 2013, 7:54:38 PM4/26/13
to java-serializat...@googlegroups.com
On Fri, Apr 26, 2013 at 1:41 PM, diegocip . <die...@brturbo.com.br> wrote:
Thanks for repling. Yes it is, what I said is including the serialized content, to analise the pros and cons it is giving. A XML based protocol given less size than a binary one, and for Hessian 2, its really weird results, without compactation!


Not really. Hessian's object serialization adds metadata, similar to JDK serialization, regarding types of objects being serialized. You can see same effect on "java serialization". Some serializers (textual or binary) do this; not all do. Doing this allows serializing more complex things, as identity is retained; but comes with price. And price is higher for single item test, such as default jvm-serializers one.
But that's the nature of benchmarks; choosing data sets, scenarios matters a lot, and some implementations fare better with specific set than others. This one was what Eishay was using and testing.

I am not sure that output hex dump would be generally useful. I can't see many readers actually being interested in seeing low-level format details -- that's what format documentation is for.

I could see it useful to add links for more information for specific formats.

 
So giving the serialization result (text for text protocols and hexadecimal notation for example, for binary protocols) it
should be of great help to know why some protocols that may generate very small datasizes compared to any text based
one is generating bigger sizes. A test with a medium quantity list of the object in the test would be of great help, with random numbers, as with fixed numbers GZIP or other compactator would get great benefit on it.


It is an open source project, so all contributions are welcome! Data sets are included in distribution.

Note that there are tests for item sequences, although not all codecs support it. Nor are results published.
But it should be easy enough to add support for other codecs, including Hessian one.

-+ Tatu +-

 
Thanks,
Diego


On Friday, April 26, 2013 2:46:56 PM UTC-3, cowtowncoder wrote:
Size of resulting messages is already included in results isn't it?
Or are you asking for something differnet?

-+ Tatu +-


On Fri, Apr 26, 2013 at 10:33 AM, diegocip . <die...@brturbo.com.br> wrote:
Hi,

would be of great interest to have the serialized content for each "technology" put on the results, considering binary to be put on a binary text representation like hexadecimal. Its unacceptable that text protocols, in particular XML based protocols, have less size of serialized content than XML ones, so its of importance to have the serialized data to do some analyses of the cons and pros. Hessian 4 will use 4 bytes for a 32 bit int, as its for Date support in minutes, and floating point numbers. On the other side, text based protocols, will have one byte (according to the charset, can be two), per character or digit of a number, and one more for the decimal point in decimal numbers. For date that would be (mm/yy/aaaa hh:mm) comparted to 4 bytes of Hessian or other binary protocol, I really like to see the serialized result without the need to run the test myself.

Thanks all,
Congrats,
Diego C Nascimento.

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.
Visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

diegocip .

unread,
Apr 26, 2013, 10:40:02 PM4/26/13
to java-serializat...@googlegroups.com
Hi. Yes, this is one of the intentions adding the serialized content helps see the diferences, just comparing protocols without accounting their cons and pros, is uneven. The Hessian serialization is really well documented, althour in my packet sniffing show some diferences probably by version changes. Sure its not a question of audience the readers :)

Thanks,
Diego

Kannan Goundan

unread,
Apr 27, 2013, 12:15:38 AM4/27/13
to java-serializat...@googlegroups.com

When I first encountered the project I also thought it was important to go into a little more detail and created the first version of the feature comparison table.

I don't have strong objections to your idea, but it does seem like a lot of work that nobody here feels motivated to do :-P

If you were to write something up and send it to the list, I and others would probably be happy to provide feedback.  If it looks like a fair comparison, we could put it on the site.

There's always the question of whether something like that will stay maintained, but I think if it ever gets too out of date we can just remove it.

Tatu Saloranta

unread,
Apr 27, 2013, 3:56:23 PM4/27/13
to java-serializat...@googlegroups.com
Let me put it this way -- you are the first person that I remember ever having asked for hex dumps of serialization results to be included. So I would conservatively estimate audience to have minimum size of 1. :-)

Like Kannan, I do not mind someone adding this information, if it was done in a way that does not distract from content that most readers find useful. But output is already information dense so this is bit of a concern. And the main goal if the project is not to explain details of underlying data format encodings, or compare them.
I think it would be a great article or something; and I don't doubt there are developers who would be interested in in-depth look, but it seems more like a separate project.

But, if you want to contribute something, I am sure we could discuss how it would fit,

-+ Tatu +-



diegocip .

unread,
Apr 27, 2013, 5:15:13 PM4/27/13
to java-serializat...@googlegroups.com
I dont agree with your first argument, experient developers with caution on the bandwidth probably will have interest in this, and high level languages developers will have advantages for comparasion. In fact some developers my be interested and dont have posted their interest.

As I said, the results are not the expected, the expected is like this: http://census2.jamesward.com/ It even dont have hessian 2 that can perform better than AMF. So provinding the serialized content would help developers examine why this is happing. I dont have time to examine all the serialization protocols so this is the help it would provide.

Just taking like a 0-31 lenght string in Hessian it would be one byte plus the string, textual representation would need
quotes, that would be two bytes, and for special characters need one scape char for each, this need dont exist for
binary. XML would need "<" and ">" just to open and close tags, and is common to need a closing tag. So its expected that the Hessian have lower size.

I thank you all for the replies,
Congrats,
Diego
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-benchmarking+unsubscribe@googlegroups.com.
To post to this group, send email to java-serialization-benchm...@googlegroups.com.

Tatu Saloranta

unread,
Apr 28, 2013, 2:00:26 AM4/28/13
to java-serializat...@googlegroups.com
On Sat, Apr 27, 2013 at 2:15 PM, diegocip . <die...@brturbo.com.br> wrote:
I dont agree with your first argument, experient developers with caution on the bandwidth probably will have interest in this, and high level languages developers will have advantages for comparasion. In fact some developers my be interested and dont have posted their interest.


Fine. Feel free to disagree. But you are just expressing your opinions without any actual evidence to contrary.

Instead of trying to convince others to do work you want, your best (and probably only) bet is to go ahead and do this work that you think is valuable. That is how open source projects work.

 
As I said, the results are not the expected, the expected is like this: http://census2.jamesward.com/ It even dont have hessian 2 that can perform better than AMF. So provinding the serialized content would help developers examine why this is happing. I dont have time to examine all the serialization protocols so this is the help it would provide.


Huh? That does not make sense -- you don't have time to examine... and yet you would go over hex dumps of output?

Or are you saying that others should do this investigation for you? Because you think it is valuable, and you would be interested. And you are sure that many others somewhere might as well be, even though it has not been brought up?

 
Just taking like a 0-31 lenght string in Hessian it would be one byte plus the string, textual representation would need
quotes, that would be two bytes, and for special characters need one scape char for each, this need dont exist for
binary. XML would need "<" and ">" just to open and close tags, and is common to need a closing tag. So its expected that the Hessian have lower size.


It seems like this issue interests you a great deal, so I would recommend you dig out the underlying reason.
Especially since you did not even consider explanation I already offered.

Good luck with investigation,

-+ Tatu +-



 
To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

diegocip .

unread,
Apr 28, 2013, 12:30:35 PM4/28/13
to java-serializat...@googlegroups.com
You are taking it the wrong way. I dont want to force anyone to accept my opniation, as to develop anything, sorry
if you get it that way.

Just to explain:


> Huh? That does not make sense -- you don't have time to examine... and yet you would go over hex dumps of output?
>Or are you saying that others should do this investigation for you? Because you think it is valuable, and you would be >interested. And you are sure that many others somewhere might as well be, even though it has not been brought up?

Totally wrong, I could examine the hex dumps in minutes, contrary to implementing this in the project (Yes, I have doing some search in the sources and find some byte array that can be the results of the serialized content, but thats a fast search). And the hex dumps of the protocol iam working I examined part of it. In no way I want others to examine it for me.


>But you are just expressing your opinions without any actual evidence to contrary.

No, I posted some references and facts. Feel free to take it like you want.

I apologize if I dont expressed it the best way, and wishes good work to all.

Thanks again,
Diego
To post to this group, send email to java-serialization-benchmarking...@googlegroups.com.

Kannan Goundan

unread,
Apr 28, 2013, 5:59:59 PM4/28/13
to java-serializat...@googlegroups.com
I think I now see what Diego is getting at.  The goal is not to explain the formats to someone who doesn't know.  It is to help legitimize the testing methodology to someone who is confused by the results.

Case in point: Diego saw the results and was confused about how Hessian could produce larger output than some of the XML formats.  This makes him doubt the legitimacy of the results.

If he was wondering why Hessian is bigger than "xml/fastinfo", then he could have looked at the output and realized that "xml/fastinfo" is actually a binary representation of XML.  (Tangent: maybe the word "binary" should be somewhere in the name?)

Or maybe he was wondering why Hessian is bigger than "xml/xstream+c".  And when I just went to look at it, I was also confused about why "xml/xstream+c" XML is smaller than the other XML formats.  The "tool behavior" page explains why, but that page is no longer linked from the front page.  (Tangent: turns out the "+c" means we used abbreviated tag/attribute names, which we stopped doing for JSON and should stop doing for XML.)

So having a single page with a hex+ascii dump of the serialized data makes sense to me.  If you'd like to add an option to the benchmark runner that produces this, I'd be in favor of linking to it.

To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

Tatu Saloranta

unread,
Apr 28, 2013, 7:04:57 PM4/28/13
to java-serializat...@googlegroups.com
Ok. Apologies for misunderstanding the intent of your comments and questions.
It is sometimes easy to read too much into discussions over email.

Good luck with your investigation,

-+ Tatu +-


To unsubscribe from this group and stop receiving emails from it, send an email to java-serialization-be...@googlegroups.com.
To post to this group, send email to java-serializat...@googlegroups.com.

Tatu Saloranta

unread,
Apr 28, 2013, 7:09:14 PM4/28/13
to java-serializat...@googlegroups.com
On Sun, Apr 28, 2013 at 2:59 PM, Kannan Goundan <kan...@cakoose.com> wrote:
I think I now see what Diego is getting at.  The goal is not to explain the formats to someone who doesn't know.  It is to help legitimize the testing methodology to someone who is confused by the results.

Case in point: Diego saw the results and was confused about how Hessian could produce larger output than some of the XML formats.  This makes him doubt the legitimacy of the results.

If he was wondering why Hessian is bigger than "xml/fastinfo", then he could have looked at the output and realized that "xml/fastinfo" is actually a binary representation of XML.  (Tangent: maybe the word "binary" should be somewhere in the name?)

Or maybe he was wondering why Hessian is bigger than "xml/xstream+c".  And when I just went to look at it, I was also confused about why "xml/xstream+c" XML is smaller than the other XML formats.  The "tool behavior" page explains why, but that page is no longer linked from the front page.  (Tangent: turns out the "+c" means we used abbreviated tag/attribute names, which we stopped doing for JSON and should stop doing for XML.)

So having a single page with a hex+ascii dump of the serialized data makes sense to me.  If you'd like to add an option to the benchmark runner that produces this, I'd be in favor of linking to it.


Would this be dump of canonical data, in sort of BNF-style form? That is, not really binary dump, but explanation of structures tests use.
I assumed suggestions was to display hex dumps of all serialized output, with explanation, and I was not sure that was worth the effort (compared to project pages of tools, or format definition pages).
But if this would be explanation of the logical data model, yes, I could see this being useful. Along with links to more information, if such exists.

I did realize (after responding) that likely starting point was indeed "but how could XML be more compact than Hessian" -- valid question of course -- and perhaps those kinds of FAQ would make sense to address?

-+ Tatu +-

Kannan Goundan

unread,
Apr 28, 2013, 7:43:47 PM4/28/13
to java-serializat...@googlegroups.com
On Sun, Apr 28, 2013 at 4:09 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
Would this be dump of canonical data, in sort of BNF-style form? That is, not really binary dump, but explanation of structures tests use.
I assumed suggestions was to display hex dumps of all serialized output, with explanation, and I was not sure that was worth the effort (compared to project pages of tools, or format definition pages).
But if this would be explanation of the logical data model, yes, I could see this being useful. Along with links to more information, if such exists.

I did realize (after responding) that likely starting point was indeed "but how could XML be more compact than Hessian" -- valid question of course -- and perhaps those kinds of FAQ would make sense to address?

-+ Tatu +

I think that even a raw hex+ascii dump is still useful.  It's for the people who see the size results, say "that doesn't make sense," and are want to look at the hex dump to do a quick sanity check to make sure our benchmark isn't flawed.

I agree that annotated hex dumps would be better, but that takes work :-)  The raw hex dump can be generated automatically and is easy to maintain.

A BNF-style description of each format is also useful, but I don't think it's strictly better.  For example, if you want to know why Avro is one byte larger than Kryo, it's probably quicker to line things up in the hex dump than to inspect the BNFs.  Plus, writing out the BNF for each serializer is a lot more work.

diegocip .

unread,
Apr 28, 2013, 9:09:51 PM4/28/13
to java-serializat...@googlegroups.com
Yes. Thanks, thats it, described very well. I was wondering it could be binary xml but when I someone said binary and textual, I made de erronuos assumption that all xml listed is pure textual XML. Reading more about xml/fastinfo now makes much more sense, as along being binary, it was defined datatypes.

And yes, adding the binary contents would be great for developers to get fast statistics about each protocol for use,
studing, and for contributing to these projects.

Thanks again,
Diego
Reply all
Reply to author
Forward
0 new messages