On "manual" codecs, ideal data format perf vs actual implementations

34 views

Skip to first unread message

Tatu Saloranta

unread,

Apr 21, 2012, 2:18:35 PM4/21/12

to java-serializat...@googlegroups.com

(note: not directly related to bigger on-going thread)

I started thinking of why I originally felt those "manual" codecs
would make sense. Beyond sub-optimal implementations (for xml; and
nowadays for YAML), the real reason I think was this: I felt (and
still feel) that most developers don't understand there are two very
different aspects:

(a) Potential performance a data format has: for example, how much
faster would optimal Protobuf implementation be than, say, optimal
XML-based implementation?
(b) Actual performance difference with real world toolkits.

Of these, (b) is obviously easier (possible) to measure -- yet many
simply use results of (b) to claim (a): as in "Protobuf is 30 faster
than XML (when I used XStream given a DOM)"
To help count this, I was hoping that by having hand-written codecs
that use fastest low-level parsers/generators, it would be possible to
see difference between (a) and (b).
Secondary benefit, more important lately (at least for my self) is
that it can help converge (b) towards (a); one can get better idea of
how much overhead there is to be eliminated, in perfect world.

But naming can also mislead: while "manual" vs "databind" (or perhaps
better, "automatic") gives some impression, it is still too subtle a
hint.
Now readers can still assume (a) == (b), just get differently skewed
results; as the fact is that most users are more affected by (b) than
(a).
So I don't know if this addition has helped general understanding or
not. Readers who don't have time to understand things can as easily
misunderstand bigger result set than smaller set. Or maybe more so.

From my perspective, I would like to see (a) and (b) fully separate --
or, if others so feel, to just remove set of "manual" ("ideal") codecs
altogether.
And then we could in general suggest that "TL;DNR;" folks start with
actual fully-automated results; and only proceed to "ideal codecs"
section if they feel they have time to spend on understanding the
bigger picture.

Anyway, this was my view of how and why manual codecs came to be. Any
other views, comments?

-+ Tatu +-

Kannan Goundan

unread,

Apr 24, 2012, 6:32:33 AM4/24/12

to java-serializat...@googlegroups.com

I agree that the "effort to write" axis needs to be more prominent. How about coloring the bars on the charts based on the effort (pojo, annotate/schema, manual, manual+)?

I've also been thinking about moving to a more interactive results page (a little like the Computer Language Shootout). Instead of putting things on the wiki we could generate a bunch of static html pages that let you slice through the results (i.e. show me only cross-platform formats, or human-readable formats, or non-manual setups, etc.)

[Incidentally, I'm using Jackson in fully-manual mode for this one web API library I'm writing. I didn't really measure anything before making this decision, but since parsing JSON is basically all the library does, I figured I might as well make it as fast as I can :-)]

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "java-serialization-benchmarking" group.
To post to this group, send email to java-serializat...@googlegroups.com.
To unsubscribe from this group, send email to java-serialization-be...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Reply all

Reply to author

Forward

0 new messages