trying to understand gob error: "extra data in buffer"

1,764 views
Skip to first unread message

kortschak

unread,
Mar 1, 2012, 10:05:31 PM3/1/12
to golan...@googlegroups.com
I'm (ab)using gob as a way to sort very large collections[1] via a collection of files. This approach has worked well for me with our data. I've just started cleaning up an API that makes use of this and thought that reusing the morass would be sensible - the application takes two passes over a genome (one for each strand).

When I do this for at least one dataset, the second use (after a call to Clear()) gives an "extra data in buffer" error when I try to gobDecode at line 263 of morass.go. This doesn't happen if a new morass is used.

I don't understand why gob returns an error when the Decoder is new for this use of the morass:

Pull removes consumed elements of the files field that contain the encode/decoder pairs, so these are recreated in the second use and anything left over from the previous use should not persist.

Can someone point to where I'm obviously being dim?

thanks

[1] https://github.com/kortschak/BioGo/blob/development/morass/morass.go

Kyle Lemons

unread,
Mar 1, 2012, 10:39:11 PM3/1/12
to kortschak, golan...@googlegroups.com
That's a lot of code to decipher, but it sounds like you're trying to decode a single gob output stream with two different input streams.  If I've interpreted you correctly, this is not intended to work.  You'd need to either create two output streams (so that type information is encoded into both) or continue using the same input stream.

kortschak

unread,
Mar 1, 2012, 10:54:27 PM3/1/12
to golan...@googlegroups.com, kortschak
Yeah, sorry - there is no way I can make the situation simpler since I'd have to essentially know what the problem is to do that. I've been poring over it for the past couple of hours to try to get a grip on it. No such luck yet.

I can't have explained properly though. A Morass takes a series of values and part sorts them then gobencodes to files (a collection of files). When the series is complete, the Morass is Finalised which writes out the last unwritten chunk and grabs the first element of each of the files, returning the least value, consuming the contents of the file collection and so returning a sorted stream of values. (The baroque approach is necessary due to data collection size).

There is a distinct pair of gob Encoder/Decoders for each file in the collection.

In the case I'm having problems with I'm using a Morass to sort values obtained from one job, clearing that Morass (deletes the files, lens the chunks to 0) and then reusing it. If I use two different Morass objects, there is no problem.

thanks.


On Friday, March 2, 2012 2:09:11 PM UTC+10:30, Kyle Lemons wrote:
That's a lot of code to decipher, but it sounds like you're trying to decode a single gob output stream with two different input streams.  If I've interpreted you correctly, this is not intended to work.  You'd need to either create two output streams (so that type information is encoded into both) or continue using the same input stream.

Andrew Gerrand

unread,
Mar 1, 2012, 11:22:44 PM3/1/12
to kortschak, golan...@googlegroups.com
Are you re-using file objects? The Clear method sets f to nil, but it
doesn't set self.files[i] to nil, so the effect of line 287 is
nothing.

Clear should really reset the files field entirely, no?

Andrew

kortschak

unread,
Mar 1, 2012, 11:48:27 PM3/1/12
to golan...@googlegroups.com, kortschak
I had tried looking at that, and it make no difference. I can resize to 0, or make a new files, either still results in the same error.

kortschak

unread,
Mar 2, 2012, 1:58:13 AM3/2/12
to golan...@googlegroups.com, kortschak
The morass.file elements are not reused and elements of self.files are not pushed back onto self.files when they have been completely consumed. So any morass.files that are still in self.files are due to incomplete retrieval - I know this doesn't happen in this situation.

os.File objects are Close()'d at this time and the reference should be lost (this happens in Morass.Pull() lines 311-341).

The next time the morass is used a new morass.file struct is created in Morass.writing(), so there should be no leak over from the previous use. This is at lines 206-208.

This is why I'm struggling to figure out what is going on.

I can confirm that self.files is empty at the end of a loop that pulls all the sorted elements out of the morass:

First use:
self.files at beginning of Finalise: morass.files{(*morass.file)(0xf840093150), (*morass.file)(0xf840093210)}
low at EOF: &morass.file{head:filter.FilterHit{QFrom:69103020, QTo:69103070, DiagIndex:-26299694}, file:(*os.File)(0xf840061250), encoder:(*gob.Encoder)(0xf8470c6000), decoder:(*gob.Decoder)(0xf8470c7000)}
low at EOF: &morass.file{head:filter.FilterHit{QFrom:71204786, QTo:71204815, DiagIndex:-9040590}, file:(*os.File)(0xf8400613c8), encoder:(*gob.Encoder)(0xf8470c6140), decoder:(*gob.Decoder)(0xf8470c7340)}
self.files at EOM: morass.files{} // <- empty

Second use:
self.files at beginning of Finalise: morass.files{(*morass.file)(0xf9712e8660), (*morass.file)(0xf9712e8c60)} // before line 258

...then error at line 263.

This has me stumped.


On Friday, March 2, 2012 2:52:44 PM UTC+10:30, Andrew Gerrand wrote:
On Friday, March 2, 2012 2:52:44 PM UTC+10:30, Andrew Gerrand wrote:

Rob 'Commander' Pike

unread,
Mar 2, 2012, 2:10:05 AM3/2/12
to kortschak, golan...@googlegroups.com
Can you turn this into something I can run that demonstrates the
problem? This is all too abstract.

-rob

Dan Kortschak

unread,
Mar 2, 2012, 2:31:53 AM3/2/12
to Rob 'Commander' Pike, golan...@googlegroups.com
I'll try replicate the error in the test. Thanks.

Dan Kortschak

unread,
Mar 2, 2012, 7:23:39 AM3/2/12
to Rob 'Commander' Pike, golan...@googlegroups.com
I have been unable to replicate the problem I see with real data in the
test, though I do get other unexpected behaviour which may be related.
I'll keep looking to see if I can get a perfect repeat.

Here is a version of morass and an expanded test suite that has no
dependencies outside the core library except for gocheck.

https://gist.github.com/1957898

It usually passes (this is dependent on which machine it runs on -
failure is more common on my netbook GOARCH=386 than on my workstation
GOARCH=amd64). Failure is usually due to a nil value which somehow
manages to get past the error check after gob decoding, though sometimes
it's an unexpected EOF and sometimes an encoded unsigned integer out of
range.

It feels racy, but it fails independently of whether I'm doing
concurrent writing or not, so I can't explain the sporadic failure. The
test uses rand.Int(), so maybe this is the cause of sporadic failure,
but I would expect gob to be robust to that.

I'm abusing the registration as well, but I can't see how that would
give this outcome.

I've never seen this failure before.

thanks
Dan

On Fri, 2012-03-02 at 18:10 +1100, Rob 'Commander' Pike wrote:

rif

unread,
Mar 2, 2012, 7:47:10 AM3/2/12
to golan...@googlegroups.com, Rob 'Commander' Pike
I have encountered the  "extra data in buffer" error when I was trying to decode a stream using an decoder instance that already has seen the type information.

Bellow I paste a small example that can be run in the sandbox that illustrates the problem.

Maybe it helps,
-rif

package main

import ("encoding/gob"; "bytes"; "log")

type Test struct {
A string
B int
}

func main() {
var stream bytes.Buffer
gob.NewEncoder(&stream).Encode(Test{"one", 1})
var result Test
dec := gob.NewDecoder(&stream)
err := dec.Decode(&result)
log.Print(result, err)
stream.Reset()
var result2 Test
gob.NewEncoder(&stream).Encode(Test{"two", 2})
err = dec.Decode(&result2)
log.Print(result2, err)

Dan Kortschak

unread,
Mar 2, 2012, 4:27:51 PM3/2/12
to golan...@googlegroups.com
I have tried removing the concurrent sections of code and performing
them sequentially (even with the nominally non-concurrent approach
previous there was some concurrent execution which I thought I had made
safe) and the sporadic failures go away. Unfortunate for my ego, but
probably the best outcome.

When I'm back at work, I'll check whether this fixes the other problem.

thanks for people's suggestions.
Dan


On Fri, 2012-03-02 at 18:10 +1100, Rob 'Commander' Pike wrote:

Dan Kortschak

unread,
Mar 4, 2012, 5:51:49 PM3/4/12
to golan...@googlegroups.com
After revising the chunk queuing, clearly the error was in my code (I
still don't understand what that error is and particularly how it
affected gob), since a more careful management of the chunk pool and
task switching has resolved the error.

thanks
Dan

On Fri, 2012-03-02 at 18:10 +1100, Rob 'Commander' Pike wrote:

Reply all
Reply to author
Forward
0 new messages