Simple Binary Encoding

1,149 views
Skip to first unread message

Naveen Michaud-Agrawal

unread,
Apr 23, 2014, 9:18:34 PM4/23/14
to capn...@googlegroups.com
I've been playing around with the Disruptor pattern (http://lmax-exchange.github.io/disruptor/) and capnproto looks like an ideal serialization format for the ring-buffer messages. The author of Disruptor has come out with a new in memory serialization format - I was wondering if anyone has heard of it:


Regards,
Naveen

Andrew Lutomirski

unread,
Apr 23, 2014, 9:23:00 PM4/23/14
to Naveen Michaud-Agrawal, capnproto
As far as I can tell, it's one of several new encodings for FIX
messages. FIX is an abomination. FIX SBE requires an alarmingly
complicated schema specification added on to the normal
barely-existent FIX schema specification, and it results in a
less-awful wire format.

In short: don't use it unless you're talking to someone who uses it.
If you need to speak some form of FIX, it may be the least bad option.

--Andy

Rajiv Kurian

unread,
Apr 24, 2014, 3:25:14 AM4/24/14
to capn...@googlegroups.com
Any specific criticisms against SBE (besides it being a FIX impl)? The examples don't seem that ugly. The performance numbers look stellar and there are native implementations in Java and C++. They seem to use techniques similar to Cap'n Proto.
Message has been deleted

Kenton Varda

unread,
May 6, 2014, 2:42:55 AM5/6/14
to Naveen Michaud-Agrawal, capnproto
SBE got a little press today and I finally looked at it myself.

In certain narrow use cases, it may work OK and be slightly faster than Cap'n Proto. However, that speed comes with some significant functionality costs:

1) There is no bounds checking at all, at least in the C++ bindings. Reading a message you don't trust can trivially crash you, or perhaps even perform a heartbleed-like attack by tricking you into reading data past the end of the message. Being able to accept untrusted inputs clearly wasn't a design goal for them, which is fair enough for many use cases.

Cap'n Proto is intended to be secure, although it has not yet received a formal security review.

2) The schemas are XML. Compare this:


To this:


(Note that these schemas don't represent the same thing. It seems we both coincidentally chose car descriptions as a benchmark subject.)

3) You can _only_ read and write data sequentially. If your message contains lists, there is potentially no way to know where a particular entry in the list is located until you've processed all the data before it, because list entries can have nested lists and thus are variable-width. The (C++) API doesn't even appear to give you a way to seek backwards to items you've seen previously. (Actually, it looks like the API makes it quite easy to shoot yourself in the foot by accessing things in the wrong order. If your message contains two lists "cats" and "dogs", there's nothing stopping you from accidentally calling the dogs() accessor first and getting garbage out.)

They seem to describe this as a feature, pointing out that sequential access is faster. Sure, but sometimes the thing you want to do just isn't sequential. Sometimes you want to pull one data point out of the middle of a large structure. Sometimes you want to reference entries in some list by index. Sometimes you want a hash table. It looks like all these things are pretty inconvenient in SBE.

In fact, requiring sequential access defeats one of the biggest advantages of using this sort of CPU-friendly encoding in the first place: the ability to mmap() a huge data structure wholesale and just use it without doing a pass over the content.

In contrast, Cap'n Proto uses pointers to enable random access, just like regular in-memory data structures normally do.

-Kenton


On Wed, Apr 23, 2014 at 6:18 PM, Naveen Michaud-Agrawal <naveen.mic...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.

Rajiv Kurian

unread,
May 8, 2014, 1:41:08 PM5/8/14
to capn...@googlegroups.com, Naveen Michaud-Agrawal
(2) is a minor inconvenience but (1) and (3) are very valid points. Their design seems really focused on trading use cases, where you would sequentially iterate through the entire data.

Tony Arcieri

unread,
May 8, 2014, 1:45:49 PM5/8/14
to Kenton Varda, Naveen Michaud-Agrawal, capnproto
On Mon, May 5, 2014 at 11:42 PM, Kenton Varda <temp...@gmail.com> wrote:
1) There is no bounds checking at all, at least in the C++ bindings. Reading a message you don't trust can trivially crash you, or perhaps even perform a heartbleed-like attack by tricking you into reading data past the end of the message. Being able to accept untrusted inputs clearly wasn't a design goal for them, which is fair enough for many use cases.

Wow.

--
Tony Arcieri

Kenton Varda

unread,
May 8, 2014, 10:01:36 PM5/8/14
to Tony Arcieri, Naveen Michaud-Agrawal, capnproto
FWIW, they say they're going to add checks. They also claim that the expectation all along was that the caller would do their own bounds checking as they proceed through the message. I guess they expect that most apps have enough additional context to get away with fewer checks than would be necessary if implemented by SBE itself, and they feel this is important for performance. I'm skeptical. There is a thread about it here:


In general, SBE seems to be designed with a "manual shift" philosophy -- they leave it up to the application to implement a lot of details on the assumption that the application can make better decisions based on context. If you use SBE, I'd strongly advice studying the generated code's workings so you know exactly what it does and doesn't do.

Cap'n Proto is designed to be more "automatic", but adds a bit of overhead for it.

-Kenton

narenb...@gmail.com

unread,
Apr 1, 2016, 1:25:11 AM4/1/16
to Cap'n Proto

Hi,

I have small doubt regarding simple binary encoding.How can i allocate size of  ByteBuffer dynamically?.

Thanks 
Naren

Kenton Varda

unread,
Apr 1, 2016, 4:59:56 PM4/1/16
to narenb...@gmail.com, Cap'n Proto
Hi Naren,

This mailing list is for Cap'n Proto, which is a competitor to SBE. We don't know much about how SBE works here -- you'd probably have more success finding an SBE-specific mailing list or forum for your question.

Sorry I can't be more helpful.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages