gob.Decoder: "invalid message length"

186 views
Skip to first unread message

Frank Schröder

unread,
Apr 30, 2014, 2:44:53 PM4/30/14
to golan...@googlegroups.com
I have a single struct which I'm writing to disk via encode/gob.The resulting file is about 1.5GB uncompressed.

When I read the file back in on OSX it seems to work fine. When I cross-compile the same code for Linux and run it there I get "invalid message length" during load. This is with Go 1.2.1 on linux/amd64.

This error only happens when the struct has a certain amount of data in it. I've found the code which throws the error (http://golang.org/src/pkg/encoding/gob/decoder.go?s=2697:2889#L66) but I'm not sure I get the condition. Does that mean that a single struct/message must not be bigger than 1GB?

Any insight is greatly appreciated.

Frank

Rob Pike

unread,
Apr 30, 2014, 2:53:04 PM4/30/14
to Frank Schröder, golan...@googlegroups.com
Code from encoding/gob/decoder.go:

// Upper limit of 1GB, allowing room to grow a little without overflow.
// TODO: We might want more control over this limit.
if nbytes >= 1<<30 {
dec.err = errBadCount
return false
}

Notice the TODO.....

-rob

Frank Schröder

unread,
Apr 30, 2014, 3:52:13 PM4/30/14
to golan...@googlegroups.com, Frank Schröder
So I guess that means yes. :)

Thanks

fr...@poptip.com

unread,
Aug 20, 2014, 12:52:41 PM8/20/14
to golan...@googlegroups.com, frank.s...@gmail.com
So is there any way to load gobs over 1GB back into memory? Or do we have to wait until this TODO is completed?

Frank Schröder

unread,
Aug 20, 2014, 2:02:48 PM8/20/14
to fr...@poptip.com, golan...@googlegroups.com
Or split up the data. My guess is that if you want to load a blob that big you might want to rethink your approach. That's at least what we did. 

Frank Schröder

fr...@poptip.com

unread,
Aug 20, 2014, 2:27:58 PM8/20/14
to golan...@googlegroups.com, fr...@poptip.com
The annoying thing is that you can write gob files which are over a GB. I have been scraping data for several days to amass this file (which is a single, huge map), and I just assumed that it would work fine and got rid of the intermediate, smaller saved files. In fact the only reason I stopped this early was because my scraping program crashed and I was trying to get it to restart and continue the job. Otherwise I might have kept it running for weeks and wasted all that time. I guess I can split each bucket of the map into its own file or something, but it just adds complexity when trying to load it into a single map in memory.

Frank Schröder

unread,
Aug 20, 2014, 11:10:51 PM8/20/14
to fr...@poptip.com, golan...@googlegroups.com
If you just have a single map why not use redis or memcache to store the data? 

Frank Schröder
--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/H5zeue4kjMY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Taru Karttunen

unread,
Aug 21, 2014, 3:30:20 AM8/21/14
to Frank Schröder, fr...@poptip.com, golan...@googlegroups.com
On 21.08 05:10, Frank Schröder wrote:
> If you just have a single map why not use redis or memcache to store the data?

Convenience.

e.g. we had one internal app that analyzed large amounts of data
(typically 5-500gb) and produced an output file of 500-2000mb
containing essentially a huge map.

Managing databases for this with many datasets and ad hoc use would be
tedious, as typically we just want "dump this for later loading".

- Taru Karttunen

Peter Vessenes

unread,
Feb 4, 2015, 5:34:01 PM2/4/15
to golan...@googlegroups.com, frank.s...@gmail.com, fr...@poptip.com
Just to chime in late here, this is a highly annoying behaviour. Either the gob Encoder should pass an error when encoding something larger than 1GB, or the Decoder should proceed. I would prefer to raise the limit; is there a practical reason for the 1GB size?

I also am wrestling with pre-processing large swaths of data, and want multiple cores to talk to that data, making redis a bad plan. Also, redis copies data in ram and saves to disk as part of its persistence strategy. 

It can take a long time for a single thread to write large amounts of data while it takes up double the RAM it should. Redis is great, but not that suitable for storing a very large map. Add in how slow string processing can be in golang, and you don't have a great solution.

Peter Vessenes

unread,
Feb 4, 2015, 5:35:34 PM2/4/15
to golan...@googlegroups.com, frank.s...@gmail.com, fr...@poptip.com
By slow string processing, I mean to refer to pulling data in and out of Redis, which stores stuff mostly as strings.

Peter Vessenes

unread,
Feb 4, 2015, 6:26:26 PM2/4/15
to golan...@googlegroups.com, frank.s...@gmail.com, fr...@poptip.com
I just rebuilt with 

const tooBig = 1 << 33 

And it seems to work pretty well. 

For future readers, originally I was using 1<<34. One of my routines was loading up 30k or so Gobs in 30k goroutines, all the gobs were small enough to be under the 1GB limit, most much smaller. This load pattern caused some egregious memory usage, ballooning up over 128GB pretty quickly and causing a crash.

I reworked the gobs to be larger, and load maybe 300 simultaneously now (same total data), this doesn't cause so much mallocing, and it loads fine.

Peter
Reply all
Reply to author
Forward
0 new messages