gob non-optimal serialization in case of interface

96 views
Skip to first unread message

dev.nul...@gmail.com

unread,
Apr 4, 2021, 10:30:28 AM4/4/21
to golang-nuts
Hi,
Gob in general is pretty efficient however I am seeing, what seems to be an anomaly in its behavior when it comes to serilazing interfaces.
To elaborate, I have a program defining a simple interface called Dimension and a struct Point. 
type Dimension interface{}
type Point struct {
X, Y, Z int64
}
In first case I encode a slice of 10K Points but stored in an []Dimension and in the second case i do the same with 10K Points stored in an []Point. The former case serialized to disk is 526KB whereas later is only 146KB. The type Point is registered using gob.Register

My naive interpretation is that the type is getting encoded for every entry in the slice which seems non-optimal. I would hope that once the type description is encoded it would be referenced and it shouldn't result in this big difference.
Infact serializing to JSON also yields only 300KB.
 
The full sample code can be found here

Am I missing something here or is this the expected behavior?
Regards
Monmohan

Axel Wagner

unread,
Apr 4, 2021, 11:19:37 AM4/4/21
to dev.nul...@gmail.com, golang-nuts
I believe it is expected behavior (well… given that the gob format is pretty much defined as "whatever the gob package does", it surely is, at this point).
gob is able to decode types completely, without having a "framework" to work on - you need to pass a pointer to `json.Unmarshal`, so that the decoder knows what you expect, but that isn't necessary with gob. It does that by encoding interfaces using their type-name, followed by their value and then looking up the type-name when decoding (which is why you have to `gob.Register`). Note that a `[]Dimension` *can* be heterogenous, that is, it doesn't have to contain only one type of value. If you look at a hexdump of your data, you'll see that's what's happening - every point gets a `main.Point` in the output.

You are correct, that gob is not super efficient in this case. There probably would be a denser packing (where, instead of a type-name, you use an index into an implicit table, for example). But at this point, it's just how gob works. I said it before and I say it again: I don't believe gob is a very good encoding format. It is convenient to use, in Go, but IMO it also comes with significant drawbacks. You might want to look into, say, protobuf.

Axel Wagner

unread,
Apr 4, 2021, 11:22:12 AM4/4/21
to dev.nul...@gmail.com, golang-nuts
Oh, I should also not, to be fair: You *can* decode the gob into an `[]Dimension`, but you *can't* do the same with json. So, the comparison with json is unfair insofar, as the encoded json is actually incomplete - it doesn't hold the type-information of *which* `Dimension` implementation to use.

Robert Engels

unread,
Apr 4, 2021, 6:43:31 PM4/4/21
to Axel Wagner, dev.nul...@gmail.com, golang-nuts
Since Dimension can hold any instance type it needs to encode each type for each element of the slice. I would file an enhancement to use RLE when encoding interface slices. It won’t be backwards compatible though. 

On Apr 4, 2021, at 10:22 AM, 'Axel Wagner' via golang-nuts <golan...@googlegroups.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAEkBMfF1cA9_QRB8BTKiO8zqfV%2Bjz-ZRQ67ZJrujOArBH5W9yw%40mail.gmail.com.

Amnon

unread,
Apr 5, 2021, 1:47:20 AM4/5/21
to golang-nuts
I would agree with Axel, that while gob is convenient, it never claims to be optimal.
And if you care about size and speed of encoding then it may be worth looking at 
other serialisation formats.

See https://ugorji.net/blog/benchmarking-serialization-in-go for some alternatives....

dev.nul...@gmail.com

unread,
Apr 5, 2021, 9:27:20 AM4/5/21
to golang-nuts
Thanks everyone for the insights and suggestions.
Reply all
Reply to author
Forward
0 new messages