CapnProto vs gRPC/Protobuf differences - semantics and use case

1,117 views
Skip to first unread message

Jonathan Shapiro

unread,
Mar 17, 2023, 2:43:49 PM3/17/23
to Cap'n Proto
So I've been much-belatedly looking at capn-proto lately, and I'd like to see if I understand the key differences between CapnProto and gRPC.

I'm not interested so much in the surface syntax differences. Right now I'm not paying attention to Level 3 either - I'm already familiar with MarkM's work on CapTP and E.

The essential semantic differences seem to be:
  • Interface definitions define a type, and interface references can be carried in messages.
  • More specifically, interfaces define an object type - there's an implicit object identity embedded in each interface instance, which is passed with each method invocation.
  • Method arguments are first class. The gRPC approach on this always seemed like a really bad decision.
  • There's a very reasonable take on a module system. Having fought with this on the GraphQL front for a while, it's really nice to see.
  • Method results are returned as promises. Creatively used, these subsume any need for streams.
  • capn-proto is less obsessively wire-centric; the impedance matching between what the consuming client or server wants to see and the protocol layer wants to see seems much better handled.
On the use case front, it seems to me that the two are optimized for different situations:
  • The gRPC+protobuf encoding scheme is optimized for use over lower bandwidth links, but embeds the assumption that decoding upon receipt will proceed linearly and to completion (because random access isn't straightforward).
  • The capn-proto encoding scheme is optimized for local area RPC and/or out-of-process plugins, where communication bandwidth isn't much of a limiting factor but efficient transmission (perhaps even by mmap) matters.


What have I missed here that is fundamental?  Having worked a fair bit with both gRPC and GraphQL, I have one or two really minor thoughts for enhancement; mainly things that already seem to be implicitly present should be made explicit.

Aside from some of the individual language mappings, capn-proto looks really good. And all of the language mapping complaints reflect constraints of the target language rather than capn-proto. Which, given the real care that Kenton put in to this, doesn't suggest favorable things about one or two of those languages. :-)


Jonathan Shapiro

Jonathan Shapiro

unread,
Mar 17, 2023, 3:10:14 PM3/17/23
to Cap'n Proto
I missed one under semantics:
  • capn-proto structs are defined as reference (pointer) types, while protobuf message types appear to be value types.
Does capn-proto support the case where a single struct is referenced from multiple places? That is: does it support graphs as messages?

Thanks!

Ian Denhardt

unread,
Mar 17, 2023, 3:38:16 PM3/17/23
to Cap'n Proto, Jonathan Shapiro
I think this is mostly a good summary. I'd maybe call out/emphasize that
I think the *most* fundamental difference at the serialization level (as
opposed to RPC) is that Cap'n Proto is designed for efficient in-memory
access, not requiring an up-front parsing step. Ironically, this is most
compelling for use cases that don't have much do do with RPC.

Quoting Jonathan Shapiro (2023-03-17 15:10:14)
> I missed one under semantics:
>
> * capn-proto structs are defined as reference (pointer) types, while
> protobuf message types appear to be value types.

I would amend this to clarify that protobufs doesn't really have pointer
types at all, since (1) the encoding is inherently a tree (as opposed to
only by fiat, see below), and (2) it is oriented towards up-front
parsing instead of in-place access.

> Does capn-proto support the case where a single struct is referenced
> from multiple places? That is: does it support graphs as messages?

It does not, at least officially, though it's obvious how these would be
encoded and (1) basic reading of such messages will generally work as
expected, (2) not all implementations will do anything to stop you from
building these. Note though that things like canonicalization will
remove the sharing.

>
> Thanks!
>
> On Friday, March 17, 2023 at 11:43:49�AM UTC-7 Jonathan Shapiro wrote:
>
> So I've been much-belatedly looking at capn-proto lately, and I'd
> like to see if I understand the key� differences between CapnProto
> and gRPC.
>
> I'm not interested so much in the surface syntax differences. Right now
> I'm not paying attention to Level 3 either - I'm already familiar with
> MarkM's work on CapTP and E.
> The essential semantic� differences seem to be:
> * Interface definitions define a type, and interface references� can
> be carried in messages.
> * More specifically, interfaces define an object� type - there's an
> implicit object identity embedded in each interface instance, which
> is passed with each method invocation.
> * Method arguments are first class. The gRPC approach on this always
> seemed like a really bad decision.
> * There's a very reasonable take on a module system. Having fought
> with this on the GraphQL front for a while, it's really nice to
> see.
> * Method results are returned as promises. Creatively used, these
> subsume any need for streams.
> * capn-proto is less obsessively wire-centric; the impedance matching
> between what the consuming client or server wants to see and the
> protocol layer wants to see seems much� better handled.
>
> On the use case front, it seems to me that the two are optimized for
> different situations:
> * The gRPC+protobuf encoding scheme is optimized for use over lower
> bandwidth links, but embeds the assumption that decoding upon
> receipt will proceed linearly and to completion (because random
> access isn't straightforward).
> * The capn-proto encoding scheme is optimized for local area RPC
> and/or out-of-process plugins, where communication bandwidth isn't
> much of a limiting factor but efficient transmission (perhaps even
> by mmap) matters.
>
> What have I missed here that is fundamental?� Having worked a fair bit
> with both gRPC and GraphQL, I have one or two really� minor thoughts
> for enhancement; mainly things that already seem to be implicitly
> present should be made explicit.
> Aside from some of the individual language mappings, capn-proto looks
> really� good. And all of the language mapping complaints reflect
> constraints of the target language rather than capn-proto. Which, given
> the real care that Kenton put in to this, doesn't suggest favorable
> things about one or two of those languages. :-)
> Jonathan Shapiro
>
> --
> You received this message because you are subscribed to the Google
> Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [1]capnproto+...@googlegroups.com.
> To view this discussion on the web visit
> [2]https://groups.google.com/d/msgid/capnproto/47298df1-0ffc-4e6d-8cd1-
> cc13e4ba2e3dn%40googlegroups.com.
>
> Verweise
>
> 1. mailto:capnproto+...@googlegroups.com
> 2. https://groups.google.com/d/msgid/capnproto/47298df1-0ffc-4e6d-8cd1-cc13e4ba2e3dn%40googlegroups.com?utm_medium=email&utm_source=footer

Kenton Varda

unread,
Mar 27, 2023, 8:18:23 PM3/27/23
to Jonathan Shapiro, Cap'n Proto
Hi Jonathan,

On Fri, Mar 17, 2023 at 2:10 PM Jonathan Shapiro <sh...@buttonsmith.com> wrote:
I missed one under semantics:
  • capn-proto structs are defined as reference (pointer) types, while protobuf message types appear to be value types.
Does capn-proto support the case where a single struct is referenced from multiple places? That is: does it support graphs as messages?

I think you might be confusing semantics vs. encoding details here. Structs are encoded using a pointer that points to the content located elsewhere in the message buffer. However, they nevertheless behave like value types. The semantics are just about exactly the same as in Protobuf.

Cap'n Proto does not support graphs. Given the use of pointers, it may be obvious how graphs would be encoded, if we supported them. The problem with graphs is that they make so much else in the implementation vastly more complicated. For example, say I do `message1.setFoo(message2.getFoo())`, where `foo` has a struct type. We have to copy `foo` from one message buffer into another. With trees, this is a trivial recursive operation. But if graphs are allowed, now we must maintain a lookup table to remember which pointers we've already followed. Moreover, if on the next line I do `message1.setBar(message2.getBar())`, and it turns out `foo` and `bar` both pointed to a common third object, how do we make sure we don't make a redundant copy of that? It seems we now have to maintain a mapping table long-term for any pair of messages for which copies have occurred.

On the use case front, it seems to me that the two are optimized for different situations:
  • The gRPC+protobuf encoding scheme is optimized for use over lower bandwidth links, but embeds the assumption that decoding upon receipt will proceed linearly and to completion (because random access isn't straightforward).
  • The capn-proto encoding scheme is optimized for local area RPC and/or out-of-process plugins, where communication bandwidth isn't much of a limiting factor but efficient transmission (perhaps even by mmap) matters.
Note that Cap'n Proto's serialization is not primarily designed for RPC at all, and indeed most users use the serialization but not the RPC. The serialization's biggest wins come when used as a format for large files that are read using mmap().

When it comes to RPC, the serialization might lend itself nicely to communications via shared memory, but I'm not sure if anyone has actually tried that (yet).

I would not necessarily say that protoobuf is "optimized" for low bandwidth. When using a low-bandwidth link, I would highly recommend applying compression to either format, which will have greater impact than Protobuf's encoding techniques (and will narrow the gap created by those techniques).

-Kenton
Reply all
Reply to author
Forward
0 new messages