Fast cross-language binary serialisation format?

1,006 views
Skip to first unread message

jonathan....@gmail.com

unread,
Jun 26, 2014, 3:59:51 AM6/26/14
to golan...@googlegroups.com
I was wondering what the best option would be for sending binary data (structs containing ints and slices of struct{int, int} pairs, in this case) over tcp to clients written in other languages? It would be great for instance if there was a C library for decoding structs encoded in Go's gob encoding format.

On a related note, does encoding/gob use reflection? As that would probably be too slow for my use-case, and unnecessary considering the structure of the data is constant.

egon

unread,
Jun 26, 2014, 4:08:08 AM6/26/14
to golan...@googlegroups.com, jonathan....@gmail.com
On Thursday, 26 June 2014 10:59:51 UTC+3, jonathan....@gmail.com wrote:
I was wondering what the best option would be for sending binary data (structs containing ints and slices of struct{int, int} pairs, in this case) over tcp to clients written in other languages? It would be great for instance if there was a C library for decoding structs encoded in Go's gob encoding format.

On a related note, does encoding/gob use reflection? As that would probably be too slow for my use-case, and unnecessary considering the structure of the data is constant.

For what purpose? Why are you sending those pairs?

Anyways, recently found this http://kentonv.github.io/capnproto/

+ egon

jonathan....@gmail.com

unread,
Jun 26, 2014, 5:10:36 AM6/26/14
to golan...@googlegroups.com, jonathan....@gmail.com
I'm sending those pairs as part of a data interchange format between a server written in Go and clients written in whatever language the users want to write in. The ints represent resource and property identifiers and values.

Cap'n Proto looks really impressive, thanks. It might be overkill for something like this however, but then again it seems like the best option I've come across so far.

Donovan Hide

unread,
Jun 26, 2014, 5:25:41 AM6/26/14
to Jonathan Barnard, golang-nuts
If your nested slices are fixed length, you can use an array instead and binary.Write/Read will work:


If they are variable length you could come up with your own simple encoding scheme. Just replace bytes.Buffer with net.Conn and you have a DIY protocol :-)




--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

egon

unread,
Jun 26, 2014, 5:28:05 AM6/26/14
to golan...@googlegroups.com, jonathan....@gmail.com
On Thursday, 26 June 2014 12:10:36 UTC+3, jonathan....@gmail.com wrote:
I'm sending those pairs as part of a data interchange format between a server written in Go and clients written in whatever language the users want to write in. The ints represent resource and property identifiers and values.

What I meant is what does the data look like.. is it highly repetitive? Is it in short bursts (i.e. polling connections)? Is it performance sensitive (e.g. game)? Is a browser one of the intended clients? How many expected clients? etc.
Depending on the characteristics, some formats work better than other.

The easiest setup is json, and most languages have libraries for it.
Protobuf probably could work also.
etc.

+ egon

Donovan Hide

unread,
Jun 26, 2014, 6:13:28 AM6/26/14
to egon, golang-nuts, Jonathan Barnard
Got bored:


This is why people use serialization frameworks :-)

Donovan Hide

unread,
Jun 26, 2014, 6:29:52 AM6/26/14
to egon, golang-nuts, Jonathan Barnard
Simplified: http://play.golang.org/p/Yn8Xe4-XIf

Going to stop now :-)

jonathan....@gmail.com

unread,
Jun 26, 2014, 6:47:28 AM6/26/14
to golan...@googlegroups.com, egon...@gmail.com, jonathan....@gmail.com
On Thursday, 26 June 2014 20:29:52 UTC+10, Donovan wrote:

Nice, seems impressively concise as such things go.


On Thursday, 26 June 2014 19:28:05 UTC+10, egon wrote:
>What I meant is what does the data look like.. is it highly repetitive? Is it in short bursts (i.e. polling connections)? Is it performance sensitive (e.g. game)? Is a browser one of the intended >clients? How many expected clients? etc.
>Depending on the characteristics, some formats work better than other.

It's for servers as part of a distributed simulation. Basically think of something like a MMO server, with all actions by the clients passing through the server for validation and streaming to other clients, and the server also generating actions for passing to the clients. Clients are polled for data at regular intervals (1-10 seconds), and must receive input from the server as soon as it has finished processing data from all clients polled.
 

Naoki INADA

unread,
Jun 26, 2014, 7:18:40 AM6/26/14
to golan...@googlegroups.com, jonathan....@gmail.com
msgpack is json like schemaless binary format.

egon

unread,
Jun 26, 2014, 8:02:27 AM6/26/14
to golan...@googlegroups.com, egon...@gmail.com, jonathan....@gmail.com
If the protocol is simple and you are not going to extend and there are many languages then Donovan's approach will suffice.

Of course if you need to write the same code for multiple languages, it will be error prone.

Cap'n Proto is probably the safest bet, performance and extensibility wise. (supports "the important" languages)
JSON/ProtoBuf are safer, compatibility wise. (most languages)
FlatBuffers could also be a good option, but I haven't seen go libraries for it yet.

+ egon

Jonathan Barnard

unread,
Jun 26, 2014, 8:13:43 AM6/26/14
to egon, golang-nuts
Thanks everyone! I think I'll probably go with either Cap'n Proto or msgpack (which seems to support quite a variety of languages); when I get closer to implementing it I'll maybe benchmark them to see if either's significantly faster.

andrewc...@gmail.com

unread,
Jun 26, 2014, 8:27:45 AM6/26/14
to golan...@googlegroups.com, egon...@gmail.com, jonathan....@gmail.com
Google protocol buffers is fast and high quality but you need to learn how to use it. 

Ken Allen

unread,
Jun 26, 2014, 10:44:26 AM6/26/14
to golan...@googlegroups.com, jonathan....@gmail.com
http://cbor.io/ is basically a better designed msgpack. I don't think there's a general purpose c library for decoding it but the format is so trivial you can do it yourself easily.

Damian Gryski

unread,
Jun 27, 2014, 7:57:59 AM6/27/14
to golan...@googlegroups.com, jonathan....@gmail.com


On Thursday, June 26, 2014 9:59:51 AM UTC+2, jonathan....@gmail.com wrote:
I was wondering what the best option would be for sending binary data (structs containing ints and slices of struct{int, int} pairs, in this case) over tcp to clients written in other languages? It would be great for instance if there was a C library for decoding structs encoded in Go's gob encoding format



Damian

jonathan....@gmail.com

unread,
Jun 27, 2014, 10:16:55 AM6/27/14
to golan...@googlegroups.com, jonathan....@gmail.com
Thanks, that's fantastically useful! I'm quite surprised at the differences between different go Msgpack implementations in the serialization benchmarks, seems to suggest that marshalling/unmarshalling speed is as much determined by the implementation as by the encoding type.

The difference between goprotobuff and gogoprotobuff is also surprising.

Jonathan Barnard

unread,
Jun 27, 2014, 10:32:21 AM6/27/14
to golang-nuts, Jonathan Barnard
Also, wow, the amount of runtime.mallocgc in the pprof results from the goser benchmarks suggests that the process could be sped up by use of a different allocation strategy, something like arena allocation or an object pool/sync.pool. Assuming I'm not misreading the results.

Jason E. Aten

unread,
Jun 29, 2014, 6:57:27 PM6/29/14
to golan...@googlegroups.com, jonathan....@gmail.com
On Thursday, June 26, 2014 5:13:43 AM UTC-7, Jonathan Barnard wrote:
Thanks everyone! I think I'll probably go with either Cap'n Proto or msgpack (which seems to support quite a variety of languages); when I get closer to implementing it I'll maybe benchmark them to see if either's significantly faster.

Hi Jonathan,

I like capnproto alot, and even maintain the go bindings for capnproto; let me know if you have any questions.

Msgpack doesn't provide a schema. If you appreciate (or insist upon) compile-time error detection, or want version-evolution support (i.e. forward and backward compatibility as you add or rename fields in your structs over time), then the schema-with-compilation style serializations, like capnproto, protobuf, and thrift, are better choices.  For the rest of the discussion, I'll assume that version evolution and strong type discipline are required, and neglect the more dynamic and loose formats.

For me, the decisive feature in using capnproto were (1) performance; (2) graceful version evolution; and (3), that I can easily convert a text/human readable lisp-like description of a struct into binary-encoded data. This third feature is novel and under-appreciated, but it lets me construct type-checked DSLs for configuration and computation specification with ease. See the support for constant expressions in particular (http://kentonv.github.io/capnproto/language.html#constants).

Comparing capnproto, protobuf, and thrift (all three are fairly similar): capnproto's Java bindings were initiated last month, and are unfinished.  If you need Java support today, then you are looking at protobuf or thrift. Also to note: capnproto on Windows works with mingw, but there isn't Visual Studio support at the moment, given microsoft's lagging c++11 support. So if you need Visual studio on windows support today, capnproto is not for you.

Re Thrift: I don't know how good the go-bindings are for Thrift.  My sense (e.g. https://groups.google.com/forum/#!topic/golang-nuts/MgELd_iOaI8 ) is that Thrift sees little usage from Go, although the integrated RPC and great multi-platform support for Thrift (e.g. very good on Windows) may be compelling reasons to evaluate the various Go bindings out there.

In conclusion, protobuf is probably the safest choice for cross-language (and cross-platform) serialization and schema-evolution; but capnproto has given me better performance, smaller binary sizes, and declarative, type-safe, DSL support. This last was the decisive feature for me.

Best,
Jason

Justin Israel

unread,
Jun 29, 2014, 8:36:18 PM6/29/14
to Jason E. Aten, golang-nuts, jonathan....@gmail.com
On Mon, Jun 30, 2014 at 10:57 AM, Jason E. Aten <j.e....@gmail.com> wrote:
On Thursday, June 26, 2014 5:13:43 AM UTC-7, Jonathan Barnard wrote:
Thanks everyone! I think I'll probably go with either Cap'n Proto or msgpack (which seems to support quite a variety of languages); when I get closer to implementing it I'll maybe benchmark them to see if either's significantly faster.

Hi Jonathan,

I like capnproto alot, and even maintain the go bindings for capnproto; let me know if you have any questions.

Msgpack doesn't provide a schema. If you appreciate (or insist upon) compile-time error detection, or want version-evolution support (i.e. forward and backward compatibility as you add or rename fields in your structs over time), then the schema-with-compilation style serializations, like capnproto, protobuf, and thrift, are better choices.  For the rest of the discussion, I'll assume that version evolution and strong type discipline are required, and neglect the more dynamic and loose formats.

For me, the decisive feature in using capnproto were (1) performance; (2) graceful version evolution; and (3), that I can easily convert a text/human readable lisp-like description of a struct into binary-encoded data. This third feature is novel and under-appreciated, but it lets me construct type-checked DSLs for configuration and computation specification with ease. See the support for constant expressions in particular (http://kentonv.github.io/capnproto/language.html#constants).

Comparing capnproto, protobuf, and thrift (all three are fairly similar): capnproto's Java bindings were initiated last month, and are unfinished.  If you need Java support today, then you are looking at protobuf or thrift. Also to note: capnproto on Windows works with mingw, but there isn't Visual Studio support at the moment, given microsoft's lagging c++11 support. So if you need Visual studio on windows support today, capnproto is not for you.

Re Thrift: I don't know how good the go-bindings are for Thrift.  My sense (e.g. https://groups.google.com/forum/#!topic/golang-nuts/MgELd_iOaI8 ) is that Thrift sees little usage from Go, although the integrated RPC and great multi-platform support for Thrift (e.g. very good on Windows) may be compelling reasons to evaluate the various Go bindings out there.


Just to throw my 2cents into the mix: I have been using the official go thrift bindings for a little while now and they have been working just fine. I have had to make sure to generate my thrift bindings using a 1.x generator. But other than that, its been good:

 
In conclusion, protobuf is probably the safest choice for cross-language (and cross-platform) serialization and schema-evolution; but capnproto has given me better performance, smaller binary sizes, and declarative, type-safe, DSL support. This last was the decisive feature for me.

Best,
Jason

Jason E. Aten

unread,
Jun 30, 2014, 12:16:46 AM6/30/14
to Justin Israel, golang-nuts, jonathan....@gmail.com
On Sun, Jun 29, 2014 at 5:35 PM, Justin Israel <justin...@gmail.com> wrote:

Just to throw my 2cents into the mix: I have been using the official go thrift bindings for a little while now and they have been working just fine. I have had to make sure to generate my thrift bindings using a 1.x generator. But other than that, its been good:


Awesome! That's great to hear. Thanks Justin.

jonathan....@gmail.com

unread,
Jun 30, 2014, 1:32:29 AM6/30/14
to golan...@googlegroups.com, jonathan....@gmail.com

On Monday, 30 June 2014 08:57:27 UTC+10, Jason E. Aten wrote:
On Thursday, June 26, 2014 5:13:43 AM UTC-7, Jonathan Barnard wrote:
Thanks everyone! I think I'll probably go with either Cap'n Proto or msgpack (which seems to support quite a variety of languages); when I get closer to implementing it I'll maybe benchmark them to see if either's significantly faster.

Hi Jonathan,

I like capnproto alot, and even maintain the go bindings for capnproto; let me know if you have any questions.

Hi Jason,

Thanks for the detailed response. Would you consider the advantages of using capnproto for relatively simple data (around 4-5 struct types total, each with fewer than six fields, and containing only bytes, ints and arrays thereof) to outweigh the disadvantages of requiring people compiling the clients (in this case, the users) to have functioning C++11 compilers? I suppose I'm concerned that it could increase the difficulty users face in getting up and running. Say for instance the user wants to get started hacking on a Clojure or Haskell client; would the inconvenience of needing a C++11 compiler outweigh the conveniences provided by the use of capnproto over a schemaless format such as msgpack? Note that the user has no control over the data interchange format, so wouldn't be able to benefit from capnproto's ease of use regarding DSLs.

Cheers,
Jonathan

Jason E. Aten

unread,
Jun 30, 2014, 2:57:29 AM6/30/14
to jonathan.t.barnard, golang-nuts
To be clear, microsoft's c++11 slowness only impacts your clients if they want to access your data in the following 3-pronged scenario: from (a) C++ code that is (b) on Windows and (c) insists on building with Visual Studio.  Any other combination is unaffected. If you can use mingw on windows for your c++ clients, you are fine.

There aren't any Clojure or Haskell clients (or Java, as mentioned) bindings for capnproto at the moment. Were you wondering about the scenario if one of your users decided to first make capnproto bindings for these languages? That would, after all, be a pre-requisite to using your data if your data was encoded in capnproto and you wanted (say) Haskell code to read it.

There is very good C++, Python, Lua (courtesy of CloudFlare), and Go support for serialization. There are also capnproto bindings for Ruby, Ocaml, Erlang, and straight C listed (http://kentonv.github.io/capnproto/otherlang.html) that I have no experience with, and so cannot comment on.

Since you are asking about msgpack, it may be worth reviewing the tradeoff: the tradeoff lies in versioned-structs versus using a dynamic-map. That is the core difference.  The reason, we are told, that google uses a serialization format with forward and backward compatible schema evoluation, is that they need to upgrade large clusters in a piece-meal fashion. It is impractical to upgrade an entire world-wide cluster all-at-once without incredibly long downtimes.  You may have a similar situation with your users: do you plan to add features to a central server, and have that new server code still work with old clients that may be online and talking to it?  If so, then msgpack would put you in the situation (as with json) where you have to use maps for everything. This imposes a performance penalty on retrieving elements of the map (when compared to looking up a field in a struct), and requires that you and your client write lots of field checking code to deal with absent and/or unexpected values. Again, the question comes back to: are you read-latency sensitive, and wishing to skip the whole parsing-and-deserialization step?  Since capnproto lays out data on disk exactly as it will be laid out in memory, there is no parsing step, and your reads can go screaming fast.

Best,
Jason

jonathan....@gmail.com

unread,
Jun 30, 2014, 4:20:23 AM6/30/14
to golan...@googlegroups.com, jonathan....@gmail.com

On Monday, 30 June 2014 16:57:29 UTC+10, Jason E. Aten wrote:
To be clear, microsoft's c++11 slowness only impacts your clients if they want to access your data in the following 3-pronged scenario: from (a) C++ code that is (b) on Windows and (c) insists on building with Visual Studio.  Any other combination is unaffected. If you can use mingw on windows for your c++ clients, you are fine.

I suppose what I should be asking is whether there is (or is likely to be in the near future) a binary distribution of capnproto, as that would greatly simplify installation on Windows clients. The installation page doesn't seem to provide one.

There aren't any Clojure or Haskell clients (or Java, as mentioned) bindings for capnproto at the moment. Were you wondering about the scenario if one of your users decided to first make capnproto bindings for these languages? That would, after all, be a pre-requisite to using your data if your data was encoded in capnproto and you wanted (say) Haskell code to read it.

As capnproto seems to be growing in popularity, I was assuming that within a year or so there'll probably be bindings for languages like Clojure and Haskell.
 
Since you are asking about msgpack, it may be worth reviewing the tradeoff: the tradeoff lies in versioned-structs versus using a dynamic-map. That is the core difference.  The reason, we are told, that google uses a serialization format with forward and backward compatible schema evoluation, is that they need to upgrade large clusters in a piece-meal fashion. It is impractical to upgrade an entire world-wide cluster all-at-once without incredibly long downtimes.  You may have a similar situation with your users: do you plan to add features to a central server, and have that new server code still work with old clients that may be online and talking to it?  If so, then msgpack would put you in the situation (as with json) where you have to use maps for everything. This imposes a performance penalty on retrieving elements of the map (when compared to looking up a field in a struct), and requires that you and your client write lots of field checking code to deal with absent and/or unexpected values. Again, the question comes back to: are you read-latency sensitive, and wishing to skip the whole parsing-and-deserialization step?  Since capnproto lays out data on disk exactly as it will be laid out in memory, there is no parsing step, and your reads can go screaming fast.

I think I wouldn't be able to benefit much from versioned structs in this case. As I described, it's a distributed simulation somewhat similar in structure to an MMO server, with the clients controlling agents in the simulation. The world structure is entirely unknown at compile time (its loaded from JSON resources at startup), and can be arbitrarily modified at runtime, so it's already closer to "using maps for everything", or at least a form of component-entity system. It's possible to reduce the use of maps by essentially sticking all the loaded entities into a big array, and then for each loaded action that references a specific entity name, replacing that name with the index into the array of that entity (kind of like compiling the loaded resources). This can also be used across a network: if the client and the server both have the same set of resources and load them in the same order, it's possible to refer to a resource purely by its index in array of loaded resources. This is why most of the communication between clients and the server is just passing ints around (agent number a does action number b to object number c).

From what you describe, I suspect the dynamic nature of the stimulation would prevent it from reaping many of the benefits of a schema-based protocol like capnproto or protobuf.

Cheers,
Jonathan

Jason E. Aten

unread,
Jun 30, 2014, 5:30:16 AM6/30/14
to golan...@googlegroups.com, jonathan....@gmail.com
On Monday, June 30, 2014 1:20:23 AM UTC-7, jonathan....@gmail.com wrote:

On Monday, 30 June 2014 16:57:29 UTC+10, Jason E. Aten wrote:
To be clear, microsoft's c++11 slowness only impacts your clients if they want to access your data in the following 3-pronged scenario: from (a) C++ code that is (b) on Windows and (c) insists on building with Visual Studio.  Any other combination is unaffected. If you can use mingw on windows for your c++ clients, you are fine.

I suppose what I should be asking is whether there is (or is likely to be in the near future) a binary distribution of capnproto, as that would greatly simplify installation on Windows clients. The installation page doesn't seem to provide one.

It's impossible at the moment for visual studio to compile capnpc and the runtime libraries due to the use of c++11 in the capnpc source code, the runtime libs, and the generated c++ source itself. People were hopeful that vs2013update2 would be sufficient, but apparently it will not be (relevant discussion: https://groups.google.com/forum/#!searchin/capnproto/c$2B$2B11/capnproto/IK4zj_aVvOM/gaIFEp5kdRQJ ). The missing features are constexpr member functions and unrestricted unions. I'm sure someone will post a binary as soon as microsoft makes it possible.


jonathan....@gmail.com

unread,
Jun 30, 2014, 8:06:15 AM6/30/14
to golan...@googlegroups.com, jonathan....@gmail.com

On Monday, 30 June 2014 19:30:16 UTC+10, Jason E. Aten wrote:
It's impossible at the moment for visual studio to compile capnpc and the runtime libraries due to the use of c++11 in the capnpc source code, the runtime libs, and the generated c++ source itself. People were hopeful that vs2013update2 would be sufficient, but apparently it will not be (relevant discussion: https://groups.google.com/forum/#!searchin/capnproto/c$2B$2B11/capnproto/IK4zj_aVvOM/gaIFEp5kdRQJ ). The missing features are constexpr member functions and unrestricted unions. I'm sure someone will post a binary as soon as microsoft makes it possible.

Would it be possible to compile a binary with mingw and distribute that to Windows users?

rw

unread,
Sep 17, 2014, 6:14:26 AM9/17/14
to golan...@googlegroups.com, jonathan....@gmail.com
FWIW my Go port of FlatBuffers was merged a few months back: https://github.com/google/flatbuffers/pull/36
Reply all
Reply to author
Forward
0 new messages