Possible gob encoder problem

554 views
Skip to first unread message

rif

unread,
Feb 29, 2012, 4:30:32 AM2/29/12
to golan...@googlegroups.com
I am encoding a slice of custom structs using gob.

I wrote a function (see below) to test if the encoding is correct and it fails on every case. However if I change just the encoder from gob to json it passes every time.

Are there any use cases where json can encode while go is failing by design. 

func testGob(key string, aps []*timespans.ActivationPeriod) {
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
dec := gob.NewDecoder(&buf)

enc.Encode(aps)
result := buf.String()

aps1 := make([]*timespans.ActivationPeriod, 0)
buf.Reset()
buf.WriteString(result)
err := dec.Decode(&aps1)
log.Print("Err: ", err)

buf.Reset()
enc.Encode(aps1)
result1 := buf.String()

log.Print("Equal? ", result == result1, len(result), len(result1))
}

Kyle Lemons

unread,
Feb 29, 2012, 12:56:41 PM2/29/12
to rif, golan...@googlegroups.com
You're comparing the encoded values, nothing more.  Gob transmits type information as well as the data.  Create a new encoder for the second encode (so it no longer assumes that the "receiver" has the type info and will re-send it) and I think you'll find that it works.

If you compare the decoded value and the encoded value, you should find them the same regardless of the wire encoding, which shouldn't be of much concern as long as it works.

rif

unread,
Feb 29, 2012, 1:31:29 PM2/29/12
to golan...@googlegroups.com
Unfortunately the decoded value and the encoded value are not the same and that's why I created this test. So the gob encoding is not working for some of the real use cases from my application.

I presumed that the best way to verify the encoding process is to encode it once decode it and re-encode the decoded value so the first encoding and the second encoding should be equal.

Even if I create another encoder I still don't get equal encoded strings. So I presume that as nobody else is having problems with the gob encoder it must be something on my side.

If I get some time I will create a single file application to reproduce the problem.

Thank you.


miercuri, 29 februarie 2012, 19:56:41 UTC+2, Kyle Lemons a scris:
You're comparing the encoded values, nothing more.  Gob transmits type information as well as the data.  Create a new encoder for the second encode (so it no longer assumes that the "receiver" has the type info and will re-send it) and I think you'll find that it works.

If you compare the decoded value and the encoded value, you should find them the same regardless of the wire encoding, which shouldn't be of much concern as long as it works.

Jan Mercl

unread,
Feb 29, 2012, 1:58:48 PM2/29/12
to golan...@googlegroups.com
On Wednesday, February 29, 2012 7:31:29 PM UTC+1, rif wrote:
Even if I create another encoder I still don't get equal encoded strings.

I think there is a false assumption involved here. AFAIK there's is no guarantee anywhere that the encoding of the same data must be the always the same binary string. Moreover, as soon as any map appears in the data, such guarantee cannot be given easily, if at all. Map iteration is per specs (pseudo)random even for an unaltered map between its different iterations, IIRC.

Damian Gryski

unread,
Feb 29, 2012, 2:02:31 PM2/29/12
to golan...@googlegroups.com


Le mercredi 29 février 2012 19:31:29 UTC+1, rif a écrit :
I presumed that the best way to verify the encoding process is to encode it once decode it and re-encode the decoded value so the first encoding and the second encoding should be equal.

Even if I create another encoder I still don't get equal encoded strings. So I presume that as nobody else is having problems with the gob encoder it must be something on my side.


This has come up before.  Multiple gob encodings for the same struct are not guaranteed to be equal.

      Gobs contain type information and unique ids. A stream is guaranteed to decode to the values put into it, 
      but otherwise no guarantee is made about the contents of the stream itself.

There should probably be a note on the gob package documentation that mentions this.

Damian

rif

unread,
Feb 29, 2012, 2:03:00 PM2/29/12
to golan...@googlegroups.com
That's very good to know. So for validation I will only compare the decoded value.

I was avoided this because values are slices of stcures that contain slices of elements that have slices inside. So comparison will involve some for cycles :)

Damian Gryski

unread,
Feb 29, 2012, 2:10:42 PM2/29/12
to golan...@googlegroups.com


Le mercredi 29 février 2012 20:03:00 UTC+1, rif a écrit :
That's very good to know. So for validation I will only compare the decoded value.

I was avoided this because values are slices of stcures that contain slices of elements that have slices inside. So comparison will involve some for cycles :)


   Damian 

rif

unread,
Feb 29, 2012, 2:13:14 PM2/29/12
to golan...@googlegroups.com
Love and happiness :)

I didn't know about DeepEqual.

rif

unread,
Mar 1, 2012, 5:01:15 AM3/1/12
to golan...@googlegroups.com
I am using gob as a serialization mechanism (not transport). It seems that gob was created for data transport purposes so for serialization I had to make some initial preparation.

As a conclusion to this post here is what I learned about gob encoding:

- the first time the encoder is used with a certain type it will store the information about that type. After that it will store only the values.
- a single encoder instance should not be used to encode multiple types
- if it is used for storage the first encoded and decoded value has extra data so it will not be in the same format as the rest of the stored objects. A work around this was to train the encoder and decoder before starting to use it for serialization of real objects (just encode and decode an instance of the data to be stored using that encoder).
- even if they have somehow similar interfaces, the gob and json encoder have very different way of working and of course very different formats of data.
- gob encoding is much faster than json encoding
- gob creation of encoder/decoder + encoding/decoding process is slower then the same process using json. So if you create an encoder/ decoder every time you encode the gob is slower.
- there is some extra information about gob encoding here: http://blog.golang.org/2011/03/gobs-of-data.html

Hope my conclusions are correct.

All the best,
rif

Rob 'Commander' Pike

unread,
Mar 1, 2012, 6:54:39 AM3/1/12
to rif, golan...@googlegroups.com
On Thu, Mar 1, 2012 at 9:01 PM, rif <feri...@gmail.com> wrote:
> I am using gob as a serialization mechanism (not transport). It seems that
> gob was created for data transport purposes so for serialization I had to
> make some initial preparation.
>
> As a conclusion to this post here is what I learned about gob encoding:
>
> - the first time the encoder is used with a certain type it will store the
> information about that type. After that it will store only the values.

s/store/encode/

> - a single encoder instance should not be used to encode multiple types

That's incorrect. Multiple types work just fine.

> - if it is used for storage the first encoded and decoded value has extra
> data so it will not be in the same format as the rest of the stored objects.

Yes, the the type numbers used in the encoding are essentially
arbitrary, so you shouldn't depend on the precise contents of the
representation anyway.

> A work around this was to train the encoder and decoder before starting to
> use it for serialization of real objects (just encode and decode an instance
> of the data to be stored using that encoder).

That's misusing the package. If you need this property, and you almost
certainly don't, gobs are not for you.

> - even if they have somehow similar interfaces, the gob and json encoder
> have very different way of working and of course very different formats of
> data.

True.

> - gob encoding is much faster than json encoding

True.

> - gob creation of encoder/decoder + encoding/decoding process is slower then
> the same process using json. So if you create an encoder/ decoder every time
> you encode the gob is slower.

Perhaps, but the cost of creation amortizes to near zero if you encode
a reasonable amount of data.

> - there is some extra information about gob encoding
> here: http://blog.golang.org/2011/03/gobs-of-data.html
>
> Hope my conclusions are correct.

They are not. Did you read this?
http://blog.golang.org/2011/03/gobs-of-data.html

-rob

rif

unread,
Mar 1, 2012, 7:21:32 AM3/1/12
to golan...@googlegroups.com, rif
Thank you for taking the time to correct my conclusions.

I am using gob to store objects in a key value store (redis or kyoto cabinet). The writing (encoding) speed is not very important but the reading must be very fast. Because json is slow and gob did not work (my mistake) I used my own encoding decoding of objects.

I studied the gob encoding some more to get rid of my serialization/de-serialization code and before doing that preparation (training) of the gob encoder/decoder i kept getting "extra data in buffer" from the gob decoder. I am now using a separate (single instance) encoder/decoder for every type that I serialize and it seem to dump consistent content so the decodomg is always correct.

Now my code finally works using the gob encoder but reading your answers I understand that I am wrong again. I will read the http://blog.golang.org/2011/03/gobs-of-data.html document again to see what I am missing.

rif

Rob 'Commander' Pike

unread,
Mar 1, 2012, 3:56:17 PM3/1/12
to rif, golan...@googlegroups.com

On Mar 1, 2012, at 11:21 PM, rif wrote:

Thank you for taking the time to correct my conclusions.

I am using gob to store objects in a key value store (redis or kyoto cabinet). The writing (encoding) speed is not very important but the reading must be very fast. Because json is slow and gob did not work (my mistake) I used my own encoding decoding of objects.

I studied the gob encoding some more to get rid of my serialization/de-serialization code and before doing that preparation (training) of the gob encoder/decoder i kept getting "extra data in buffer" from the gob decoder. I am now using a separate (single instance) encoder/decoder for every type that I serialize and it seem to dump consistent content so the decodomg is always correct.

Now my code finally works using the gob encoder but reading your answers I understand that I am wrong again. I will read the http://blog.golang.org/2011/03/gobs-of-data.html document again to see what I am missing.

Two key points.

1) The gob stream is a stream. You can't pick and choose bytes from it; you need to parse it from the beginning.

2) The type numbers in the stream may vary from run to run. If you delete the type descriptor, the stream can become nonsense even to a "trained" receiver.

-rob


Kyle Lemons

unread,
Mar 1, 2012, 9:17:30 PM3/1/12
to rif, golan...@googlegroups.com
I'll also mention that in AppEngine it is perfectly possible to store objects using a gob encoding in the datastore (a key-value store).  I believe it uses gob.Marshal so that all of the necessary information is stored along with the value.

rif

unread,
Mar 2, 2012, 3:45:31 AM3/2/12
to golan...@googlegroups.com, rif
These two key points explain the process very well. It is clear now (I also read the document three times :) that my tests were passing based on luck.

rif

unread,
Mar 2, 2012, 3:51:56 AM3/2/12
to golan...@googlegroups.com, rif
This is very interesting. I would like to see their implementation.

Does gob have a Marshal method? Is the AppEngine go code available?

rif

Kyle Lemons

unread,
Mar 2, 2012, 4:25:55 AM3/2/12
to rif, golan...@googlegroups.com
Apparently I'm crazy on the first point... but yes on the second: http://code.google.com/p/appengine-go/

I guess it's just repeated invocations of gob.NewEncoder(w).Encode(v)

rif

unread,
Mar 2, 2012, 4:34:36 AM3/2/12
to golan...@googlegroups.com, rif
For all the benchmarks I have done the gob.NewEncoder(w).Encode(v) is slower than json.NewEncoder(w).Encode(v).

Json is also human readable which gives it a plus, it only looses on size. But sadly both forms are much slower than a custom serialization  which leaves me with only one option.

Thank you for the link.
rif


vineri, 2 martie 2012, 11:25:55 UTC+2, Kyle Lemons a scris:
Apparently I'm crazy on the first point... but yes on the second: http://code.google.com/p/appengine-go/

I guess it's just repeated invocations of gob.NewEncoder(w).Encode(v)

rif

unread,
Mar 2, 2012, 6:06:10 AM3/2/12
to golan...@googlegroups.com, rif
These are my benchmarks https://gist.github.com/1957757.

The benchmark methods ending in New are using new instances of encoder every time. I am running on golang tip.

From my point of view AppEngine is not optimal by using gob instead of json from CPU perspective.

rif
Reply all
Reply to author
Forward
0 new messages