Stable Gob encoding

1,026 views
Skip to first unread message

ju...@benet.ai

unread,
Sep 26, 2016, 10:09:32 AM9/26/16
to golang-dev
Hey Everyone,

encoding/gob is awesome. I find the self-describing stream of Go types so much nicer to use than protobuf.

I noticed though that the gob encoder uses reflect.MapKeys() to encode, which as we all know is not stable (simple proof: https://play.golang.org/p/V3sRGLo2VI ). We need the encoded values to be stable for cryptographic applications (i.e. we need hash(gobEncode(v)) to be stable). We can do this in a separate package (gob-stable), but we figure many people share our concern so we may want this in the standard lib.

Questions:

- Are maps the only non-deterministic values? Or are there others that come to mind? Any other considerations to keep in mind?
- Would you be willing to merge a changeset that makes gob encoding streams stable (sorted)? If so, would you prefer this be a mode, or the default?
- Or, does this already exists and I just didn't see it?

Why I care:

I'm considering using Go and gob for a major part of the [IPFS project](https://ipfs.io). The part in question is a distributed authenticated data structure type system -- think data structures + hash links + protobuf. We are eager to use Go and gob because Go types are simple and powerful, and gob gives us a nice self-describing way to encode and distribute both types and values. If we do this, we'll push Go types and Gob into a bunch of other languages. (And yes, other languages -- like Haskell or Ocaml may fit better, but we think Go is tremendously approachable in a way other languages are not.

Separately, it may push us to use a restricted subset of Go (pure and functional-ish) to write data structure operations and distribute those too. If we do _that_, we may do Go -> WebAssembly (either through llvm IR or gopherjs...) with some embedded js vm...

Jan Mercl

unread,
Sep 26, 2016, 10:15:51 AM9/26/16
to ju...@benet.ai, golang-dev
On Mon, Sep 26, 2016 at 4:09 PM <ju...@benet.ai> wrote:

> - Would you be willing to merge a changeset that makes gob encoding streams stable (sorted)? 

Map keys are comparable but not necessarily ordered. I'm afraid sorting is not [directly] feasible.

--

-j

Brad Fitzpatrick

unread,
Sep 26, 2016, 10:31:08 AM9/26/16
to Jan Mercl, Juan Benet, golang-dev
Bytes can be ordered, though, and gob is all about turning things into bytes.


Jan Mercl

unread,
Sep 26, 2016, 10:40:25 AM9/26/16
to Brad Fitzpatrick, Juan Benet, golang-dev
On Mon, Sep 26, 2016 at 4:31 PM Brad Fitzpatrick <brad...@golang.org> wrote:

> Bytes can be ordered, though, and gob is all about turning things into bytes.

Then the problem turns into having to possibly, in the worst case, buffer everything except the last byte of the produced encoding.

--

-j

Bruno Albuquerque

unread,
Sep 26, 2016, 10:42:52 AM9/26/16
to Jan Mercl, Brad Fitzpatrick, Juan Benet, golang-dev
It might not be very efficient but, for example, if a specific map type implements the GobEncoder interface, it could simply get all keys, sort them and, then, make map lookups with the ordered keys to encode the data in a stable way.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brad Fitzpatrick

unread,
Sep 26, 2016, 10:47:26 AM9/26/16
to Jan Mercl, Juan Benet, golang-dev
Indeed. Buffering in general isn't particularly pleasant.

But on the other hand, in this specific case: if the thing being encoded is already in memory, the buffer can't be worse than a 100% increase in memory, and we're already a GC'd language and GOGC=100 is the default anyway, so we're basically already cool with heap usage being double what's strictly required at any point.

To be clear, I have no opinion on changing gob. Rob can decide.

Russ Cox

unread,
Sep 26, 2016, 11:33:22 AM9/26/16
to ju...@benet.ai, golang-dev
Hi,

The way forward here would be for you to fork gob and make your changes and learn from the experience what is important, what works and what doesn't, and then come back with a proposal for how to change gob. Making changes without that experience is a much tougher sell. You may find that there are other more problematic points about gob that you haven't considered and that mean you need an entirely different encoding anyway. For example, I think you may be underestimating the effort involved in making other languages deal with gobs in a reasonable way, as well as the possible need for generic handling of gob-encoded streams when you don't have corresponding structs.

By all means, use gob or a forked gob to get off the ground quickly and learn more. Gob is a great choice for doing that. But I suspect you will learn both that gob is not the right long-term answer for this kind of use and what the important criteria are for what you replace it with.

Russ

Juan Benet

unread,
Sep 28, 2016, 2:26:07 PM9/28/16
to Russ Cox, golang-dev
Thanks everyone. some responses below.

On Mon, Sep 26, 2016 at 10:31 AM, Brad Fitzpatrick <brad...@golang.org> wrote:
Bytes can be ordered, though, and gob is all about turning things into bytes.

Yep, what i was thinking.


On Mon, Sep 26, 2016 at 10:42 AM, Bruno Albuquerque <b...@bug-br.org.br> wrote:
It might not be very efficient but, for example, if a specific map type implements the GobEncoder interface, it could simply get all keys, sort them and, then, make map lookups with the ordered keys to encode the data in a stable way.

Yeah, an interface like this would be useful for large objects users want to stream-encode without duplicating memory.


On Mon, Sep 26, 2016 at 10:47 AM, Brad Fitzpatrick <brad...@golang.org> wrote:
Indeed. Buffering in general isn't particularly pleasant.

But on the other hand, in this specific case: if the thing being encoded is already in memory, the buffer can't be worse than a 100% increase in memory, and we're already a GC'd language and GOGC=100 is the default anyway, so we're basically already cool with heap usage being double what's strictly required at any point.

Yeah, i think this as well. The buffering option is likely ok for most normal use cases-- 

Stable encoding can always be an optional thing for those that need it, through a different encoder types. Will start that way at least and see how it goes.


On Mon, Sep 26, 2016 at 11:33 AM, Russ Cox <r...@golang.org> wrote:
Hi,

The way forward here would be for you to fork gob and make your changes and learn from the experience what is important, what works and what doesn't, and then come back with a proposal for how to change gob. Making changes without that experience is a much tougher sell.

Yes, sounds good.

Does anything else come to mind -- beyond maps -- that is indeterministic? (I can read the code of course, and make some diverse test objects, but checking in case something non obvious comes to mind.)
 
You may find that there are other more problematic points about gob that you haven't considered and that mean you need an entirely different encoding anyway. For example, I think you may be underestimating the effort involved in making other languages deal with gobs in a reasonable way, as well as the possible need for generic handling of gob-encoded streams when you don't have corresponding structs.

Yep-- thanks for the fair warning.
 
By all means, use gob or a forked gob to get off the ground quickly and learn more. Gob is a great choice for doing that. But I suspect you will learn both that gob is not the right long-term answer for this kind of use and what the important criteria are for what you replace it with.

Yep-- plan to test this thoroughly, but already it is promising: much simpler and user friendly for data struct definition and reading than protobuf, even without all the existing protobuf libs. (across go and js -- though gopherjs is cheating)

Curious what other things came to mind? I suspect something very good for our precise use case exists, and is buried in people's memory. (if the problem interests you, feel free to contact me to discuss more off list.)

Thanks,
Juan

Russ Cox

unread,
Sep 28, 2016, 4:31:45 PM9/28/16
to Juan Benet, golang-dev
I believe map values are the only ones that will encode differently each time you run a gob encoding. But there is also a lot of other flexibility in gob encodings that is not taken advantage of but still there. (For example, the fields in the type definition could be listed in any order, or extra fields might be listed that are never used, and so on.) If you care about a single canonical encoding for a given value, then that flexibility is a problem too.

Russ

Jan Mercl

unread,
Sep 28, 2016, 7:04:20 PM9/28/16
to Russ Cox, Juan Benet, golang-dev
On Wed, Sep 28, 2016 at 10:31 PM Russ Cox <r...@golang.org> wrote:

> I believe map values are the only ones that will encode differently each time you run a gob encoding.

Not only in the order of them. IINM, the gob encoding stream is stateful, ie. it may depend on encoded-before things. IOW, not even the encoding of a single thing/value is stable w/o knowing the current context.

PS: Custom marshallers can ruin any stable encoding*, so once again, the task is not feasible in the general case. Admitted, I assume the OP is not looking to handle the general case.

(*): Imagine for example serialized-time stamps etc.

--

-j

Ugorji

unread,
Sep 29, 2016, 7:08:41 AM9/29/16
to golang-dev, ju...@benet.ai
My recommendation is that you consider other encoding formats which are already supported on multiple languages, and use libraries that support a "canonical" or "stable" encoding. For example, messagepack and cbor are language-agnostic and have encoders for multiple languages. The primary go codec for it is https://godoc.org/github.com/ugorji/go/codec which I wrote, I implemented a "Canonical" flag following multiple user requests.

Hope this helps.

David Brophy

unread,
Jun 7, 2018, 9:48:25 AM6/7/18
to golang-dev
If you'd like a stable encoding/gob package, I've forked it and added some code to make it stable: https://github.com/dave/stablegob

My particular application doesn't need high performance, so I haven't worried about that, and as Jan pointed out above, this can be broken by custom marshalers, so only use if you understand the pitfalls.
Reply all
Reply to author
Forward
0 new messages