GobEncode/GobDecode efficiency issues

477 views
Skip to first unread message

cev...@gmail.com

unread,
Dec 27, 2013, 10:00:14 AM12/27/13
to golan...@googlegroups.com
Hi All,

I've been trying to use gobs for a project I am writing and have run into a space efficiency issue. The problem is that gobs are stateful self-descriptive encodings that put types into the stream, but the GobEncode and GobDecode interfaces do not allow you to reuse the existing state of the stream, and so the type descriptions have to be re-encoded inside those interfaces. This is a big problem for small structs. If I have:

type Collection struct {
  data []*SmallItem
}

struct SmallItem{
  value int
  value2 int
}

struct smallItemSerializer{
  Value int
  Value2 int
}

And I create a GobEncode on the small item which encodes SmallItem by creating a smallItemSerializer and gob encoding that, then the type info for smallItemSerializer will be included with each smallItem. Wouldn't a better GobEncode interface look like:

GobEncode(e *Encoder)  error  instead of  GobEncode() ([]byte, error) 
and then 
GobDecode(d *Decoder)  error  instead of  GobDecode([]byte)  error

In this way the SmallItem could add itself to the existing stream and avoid specifying the type info for  smallItemSerializer for each item?

Is there a reason it's not done this way?

Thanks


Rob Pike

unread,
Dec 27, 2013, 8:39:34 PM12/27/13
to cev...@gmail.com, golan...@googlegroups.com
The idea is to create a single Encoder for a stream of values. If you
create a new Encoder, you will need to transport the type information
again because it is a new stream, but if you send multiple values with
a single Encoder instance you should see the type information only
once.

If this is not the behavior you are seeing, please file an issue that
reproduces the problem.

-rob

Matvey Arye

unread,
Jan 2, 2014, 3:24:52 PM1/2/14
to Rob Pike, golan...@googlegroups.com
Yes I understand the creating a new Encoder requires passing the type values again. This is precisely the problem I am pointing to when using the GobEncoder interface (which lets you write custom logic for encoding an object's values.) The GobEncoder interface does not receive an encoder instance (the interface signature is GobEncode() ([]byte, error)), requiring you to create a new Encoder and thus loose the type info. Am I missing a way to get the instance of the encoder while using custom serialization logic (i.e. serializing private fields). This is especially important for structs of structs where the child structs might want to use custom encoding logic.

Thanks,
Mat

Rob Pike

unread,
Jan 2, 2014, 3:35:31 PM1/2/14
to Matvey Arye, golan...@googlegroups.com
If you need a gob.Encoder to implement the GobEncoder interface,
you're not using the interface as it is intended to be used.

-rob

Kyle Lemons

unread,
Jan 2, 2014, 3:45:16 PM1/2/14
to Rob Pike, Matvey Arye, golan...@googlegroups.com
On Thu, Jan 2, 2014 at 12:35 PM, Rob Pike <r...@golang.org> wrote:
If you need a gob.Encoder to implement the GobEncoder interface,
you're not using the interface as it is intended to be used.

I have used it on occasion when I had fields that could not be marshaled by gob in my type, so I needed to make a shadow type without them (this has since been fixed for func and chan, but I still need it in one place when I have a doubly-linked tree that I marshal as a singly-linked tree).  I've also used it to marshal a shadow type with exported fields when the regular type has unexported fields.

-rob
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Matvey Arye

unread,
Jan 2, 2014, 3:53:18 PM1/2/14
to Rob Pike, golan...@googlegroups.com
I think I am not being clear, sorry. I do not want gob.Encoder to implement GobEncoder. In the example of my original post I want to add the GobEncoder interface to SmallItem. This function will simply create a SmallItemSerializer and gob encode it (so that the private fields SmallItem.value and SmallItem.value2 are saved). But because I cannot reuse the Collection's encoder in SmallItem.GobEncode(), I have to create a new encoder, serialize SmallItemSerializer and return the []byte. However this means I am re-encoding the type info for SmallItemSerializer for each SmallItem. Wouldn't a GobEncode(e *Encoder)  error  be a better interface so I can re-use an encoder instance passed from the Collection to each SmallItem?

Dan Kortschak

unread,
Jan 2, 2014, 4:30:17 PM1/2/14
to Rob Pike, Matvey Arye, golan...@googlegroups.com
I think Matvey is asking for an approach like the one used by Russ' xml.Marshaler interface where the call the to interface method passes the encoder value as well.

Rob Pike

unread,
Jan 2, 2014, 6:33:53 PM1/2/14
to Dan Kortschak, Matvey Arye, golan...@googlegroups.com
To answer your original question, yes, there's a reason it's done this
way. The output is a stateful stream that only the Encoder can write
to directly. It wouldn't work at all if clients could write to it. You
can't just wake up and write more stuff down a nested stream whenever
the mood strikes. (Plus the type info is per-stream.)

Don't work against the package, work with it. If you want to encode a
struct with gobs, recursively, let the system do it for you. Make the
struct fields exportable, or copy the data before encoding. Or just
encode your data! If you've really got something analogous to a couple
of ints, do something lightweight in your implementation of the
GobEncoder interface.

-rob
Reply all
Reply to author
Forward
0 new messages