Parsing gob is significantly slower than parsing JSON

3,891 views
Skip to first unread message

Dustin

unread,
Aug 28, 2012, 8:56:53 PM8/28/12
to golan...@googlegroups.com
My project:

My Sunday night TV lineup left me with a new document-oriented time-series database written in go.  I intend to use this to capture large amount of unstructured data that rolls off of performance tests and other such things and then build tools to draw pictures.

I know there's 100 of these things out there, so this is time-series DB 101:  https://github.com/dustin/seriesly/wiki


My observation:

As my queries need to rip through tons of JSON, I figured I'd consider whether storing in gob and decoding that would give me better performance on the way out since gob is more "native."  It'd obviously cost me more on the way in because I do zero parsing on the data at all at this point, but the queries are the more expensive part of the system.

I threw together a quick test where I took a moderately complicated JSON object, parsed it and encoded it in GOB (as my server would have to do) and then raced decoding of the two in a benchmark.  The bottom line on my mac looks like this:

BenchmarkJSONParser    1000   2454682 ns/op   9.24 MB/s
BenchmarkGOBParser     500   5335379 ns/op   4.58 MB/s

So, in my case, gob is about 2x slower than json.  I found this surprising.  Also slightly surprising was that I had to twist gob's arm slightly to get it even willing to encode a map[string]interface{} and []interface{} as found in the json input.

If anyone else finds this confusing, doesn't believe me, or wants to get gob winning the race, my test is available here:  https://gist.github.com/3505647

Rob Pike

unread,
Aug 28, 2012, 10:26:34 PM8/28/12
to Dustin, golan...@googlegroups.com
Interfaces are quite slow in gob because they must carry extra
information and break up the stream. If you can avoid interfaces, it
should do better. Also, did you have buffering in the encoding stream?
That can make a big difference.

-rob

Dustin

unread,
Aug 28, 2012, 10:33:17 PM8/28/12
to golan...@googlegroups.com, Dustin
  The thing I'm receiving is json blobs.  I don't have any control over the input.  Whatever they turn into when I parse them into map[string]interface{} is what I added to the gob.  If there's a more efficient way to get them in there, I can try that, but fundamentally, I'm just building something that deals with other people's data and am looking to store it in a way that's most efficient to retrieve and work with later.

Rob Pike

unread,
Aug 28, 2012, 10:38:41 PM8/28/12
to Dustin, golan...@googlegroups.com
I see. I think the issue is the interface one then, without digging in
more. One detail though: you're allocating much more in the gob
benchmark by re-allocating the glob every iteration. This costs a lot.
Reuse the buffer, just as you do in the JSON case.

-rob

Dave Cheney

unread,
Aug 29, 2012, 12:03:54 AM8/29/12
to Rob Pike, Dustin, golan...@googlegroups.com
The improvement is significant.

benchmark old ns/op new ns/op delta
BenchmarkJSONParser 1472817 1492443 +1.33%
BenchmarkGOBParser 4879574 3155806 -35.33%

benchmark old MB/s new MB/s speedup
BenchmarkJSONParser 15.39 15.19 0.99x
BenchmarkGOBParser 5.00 7.74 1.55x

http://play.golang.org/p/INdZnh3amU

Dustin

unread,
Aug 29, 2012, 2:19:41 AM8/29/12
to golan...@googlegroups.com, Rob Pike, Dustin

On Tuesday, August 28, 2012 9:04:08 PM UTC-7, Dave Cheney wrote:
The improvement is significant.

benchmark              old ns/op    new ns/op    delta
BenchmarkJSONParser      1472817      1492443   +1.33%
BenchmarkGOBParser       4879574      3155806  -35.33%

benchmark               old MB/s     new MB/s  speedup
BenchmarkJSONParser        15.39        15.19    0.99x
BenchmarkGOBParser          5.00         7.74    1.55x

http://play.golang.org/p/INdZnh3amU

  Oh great, so this closes in quite a bit.  Oddly, I tried initializing in the bench (before resetting the timer) and it had less of an effect than doing it in init().  I get this:

BenchmarkJSONParser    1000   2419239 ns/op   9.37 MB/s
BenchmarkGOBOrig     500   5265846 ns/op   4.64 MB/s
BenchmarkGOBDave     500   3067748 ns/op   7.96 MB/s
BenchmarkGOBDaveRdr     500   3073553 ns/op   7.94 MB/s

  Updated the gist:  https://gist.github.com/3505647

At this point, I think we can downgrade from "significantly", but unless it's faster, it's not a helpful path.  Thanks for the pointers.

Rémy Oudompheng

unread,
Aug 29, 2012, 2:44:20 AM8/29/12
to Dustin, golan...@googlegroups.com, Rob Pike
On 2012/8/29 Dustin <dsal...@gmail.com> wrote:
> Oh great, so this closes in quite a bit. Oddly, I tried initializing in
> the bench (before resetting the timer) and it had less of an effect than
> doing it in init(). I get this:
>
> BenchmarkJSONParser 1000 2419239 ns/op 9.37 MB/s
> BenchmarkGOBOrig 500 5265846 ns/op 4.64 MB/s
> BenchmarkGOBDave 500 3067748 ns/op 7.96 MB/s
> BenchmarkGOBDaveRdr 500 3073553 ns/op 7.94 MB/s
>
> Updated the gist: https://gist.github.com/3505647
>
> At this point, I think we can downgrade from "significantly", but unless
> it's faster, it's not a helpful path. Thanks for the pointers.

Calling gob.NewDecoder is a very costly operation, Gob is best used
for streaming a lot of data. Maybe it could be made faster. Last time
I profiled it, it spent a lot of time initializing type information
for basic types that seemed unrelated to the actual data I was
decoding, but that time was more about 100-200µs, not milliseconds.

It also works much faster on structs rather than maps. Also, using
interface{} as value type is unfortunate since neither strings nor
slices fit in interfaces, they need an extra memory allocation. I
think you are mostly benchmarking the hashmap implementation and the
conversion to interface{}.

Rémy.

Dustin

unread,
Aug 29, 2012, 3:07:11 AM8/29/12
to golan...@googlegroups.com, Dustin, Rob Pike

On Tuesday, August 28, 2012 11:44:27 PM UTC-7, Rémy Oudompheng wrote:
 
Calling gob.NewDecoder is a very costly operation, Gob is best used
for streaming a lot of data. Maybe it could be made faster. Last time
I profiled it, it spent a lot of time initializing type information
for basic types that seemed unrelated to the actual data I was
decoding, but that time was more about 100-200µs, not milliseconds.

It also works much faster on structs rather than maps. Also, using
interface{} as value type is unfortunate since neither strings nor
slices fit in interfaces, they need an extra memory allocation. I
think you are mostly benchmarking the hashmap implementation and the
conversion to interface{}.

  Well, it all makes sense.  For *my* problem, I just need the fastest way back to a map[string]interface{} -- which is a bit surprisingly, JSON.
 

Rob Pike

unread,
Aug 29, 2012, 9:06:05 AM8/29/12
to Dustin, golan...@googlegroups.com
Rémy is correct. I missed that there was also a new decoder every
round. Indeed, it works best when used to stream many items of data to
the same decoder.

-rob
Reply all
Reply to author
Forward
0 new messages