On binary JSON alternatives: Smile, CBOR coming of age with 2.4(.1)

3,418 views

Skip to first unread message

Tatu Saloranta

unread,

Jun 7, 2014, 6:26:53 PM6/7/14

to jackso...@googlegroups.com

Now that support for CBOR (another "binary JSON" format, see http://tools.ietf.org/html/rfc7049) in Jackson is production ready (https://github.com/FasterXML/jackson-dataformat-cbor), I thought I should answer one obvious question: Which binary JSON format should I choose?

There are 3 such formats supported by Jackson:

* Smile (http://wiki.fasterxml.com/SmileFormatSpec)

* BSON (used by MongoDB, see https://en.wikipedia.org/wiki/BSON)

* CBOR

Of these, I think the only reason to use BSON is for interoperability with Mongo; otherwise it does not have many strong features.

But between Smile and CBOR, what are the trade-offs?

I'll start with executive summary / recommendation, for readers with short attention span:

- Feature-wise there is not much difference for most cases (with exception of external framing for longer data streams)

- CBOR is likely to have wider support for interoperability, esp. on short term

- Performance:
* (longer) data streams; like log processing, or Hadoop, Kafka, Smile is more efficient to read and write, and takes less space as well

* shorter content like single messages (request/response): size difference negligible; performance likely to be very similar

So: I would recommend Smile for processing pipelines, esp. where all components are in Java, or one of languages for which Smile Codec exists (there is a C codec and Ruby/PHP bindings). For simple request/response style interactions, both would work; and if there are many platforms to support, CBOR may make more sense.

But I think that for almost any use case, either one works well and is more efficient alternative to regular JSON (with the usual caveats of textual vs binary formats).

And then longer breakdown of differences....

Feature-wise both support all JSON data types, without extending those, with obvious exception of allowing efficient inclusion of binary data. This is good thing, as it means that feature sets via Jackson are pretty much identical between the two, as well as compared to JSON -- if something works with JSON, it should "just work" with Smile or CBOR as well.

There is one area where Smile has features not found in CBOR: in framing of data streams. Smile specification leaves out specific byte codes (0xFC - 0xFF) which are not used for encoding; and as such it is possible to use inline framing for separating out messages, records or groups of records; and to efficiently skip, scan and seek content.

This can be very useful for kinds of processing where large chunks of contents need to be split, such as with Hadoop processing. But it is rarely used for simple request/response style message encoding.

From inter-operability perspective CBOR is (and probably will be) more widely supported, even though it is a newer format. So for inter-operability CBOR may be preferred choice, for non-Java end points. We hope to improve situation with Smile, since it is a very stable and efficient data format, so it is possible that difference here will be reduced to trivial.

For efficiency, there are two related aspects: space-efficiency (how compact is encoded data), and time-efficient (how fast can it be read and written).

Space-efficiency is something where Smile works really well: it will produce more compact presentation in almost any case. It will be particularly compact for stream-oriented data where there is redundancy for property names (that is, same name used more than once) as well as for short String values (tokens like enum values). For shorter messages difference will be less significant.

Time-efficiency is bit more complicated one to analyse; but difference is based on fact that Smile does more work on writing, trying to determine if there is redundancy to eliminate; if there is, this will make reading faster; if not, there is no read benefit, but bit of additional write overhead.
So where Smile's size compaction works best, Smile will be faster to read; and where size difference is of less consequence, reading performance will be similar.
For writing, CBOR is faster where size difference is negligible; and performance willl probably be comparable for longer streams.

So one could summarize this to say that for longer data streams, Smile will be faster to process (considering reading and writing); and for smaller payload like web service requests/responses, performance will be comparable, possibly with small benefit for CBOR if there is no redundancy to use.

At the end of the day, we (FasterXML) will work hard to have the best support for both Smile and CBOR on Java platform. So feel free to choose whichever you want.

We also hope to get feedback on your practical experiences, suggestions, recommendations, to further improve support.

-+ Tatu +-

Tatu Saloranta

unread,

Jun 7, 2014, 6:32:00 PM6/7/14

to jackso...@googlegroups.com

Forgot to add one footnote: regarding CBOR performance, there will be one important fix in 2.4.1, over 2.4.0 of jackson-core. Without fix, CBOR write performance is slower than Smile performance even for small messages.
With fix, CBOR is typically a bit (20%) faster for small/medium sized messages. I am also working on possibly improving performance of CBOR+Afterburner combination on parsing side (by implementing JsonParser.nextFieldName(...) properly for CBOR, to work optimally).

So when evaluating performance differences, 2.4.1 will hopefully be good baseline where both Smile and CBOR codecs are fully optimized.