Fast JSON parsing

768 views
Skip to first unread message

Michael Hoisie

unread,
Apr 14, 2011, 11:48:54 AM4/14/11
to golang-nuts
I'd like to write an interface to YAJL: http://lloyd.github.com/yajl/.
It's event-driven, so it requires passing a set of static callbacks to
the library. Has anyone written an interface to some kind of event-
driven library (like SAX) in Go? I'd definitely be interested in
seeing the project code.

- Mike

Brad Fitzpatrick

unread,
Apr 14, 2011, 3:27:42 PM4/14/11
to Michael Hoisie, golang-nuts
Is your desire faster JSON parsing (do you find Go's too slow?), or is it more wanting parse callbacks instead of Decode/Unmarshal.

Go's json package internally could do something like this,


... but it's not exported, probably because it doesn't seem that useful.

The beauty of writing in Go, imo, is you don't have to deal with event-based programming anymore.  Want to parse JSON from a slow network client? Just do it-- the goroutine is cheap.  Do a blocking read from the network right into the json parser.

Or are you just interested in the details of how to wrap a library with callbacks for technical curiosity?

Gustavo Niemeyer

unread,
Apr 14, 2011, 3:36:31 PM4/14/11
to Michael Hoisie, golang-nuts

I've done something very similar before, but I also don't understand
the reasoning in this case, as Brad mentioned. We have a good json
package in the standard library.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter

Greg Lowe

unread,
Apr 14, 2011, 5:10:06 PM4/14/11
to golang-nuts
I've wanted to do this occasionally too, is there a way to do it using
the existing json package?

Example:

{"request": "blah", "data": [
{"data": "fdsgfdgdfgsdfgdfgdfg......"}
{"data": "fdsgfdgdfgsdfgdfgdfg......"}
{"data": "fdsgfdgdfgsdfgdfgdfg......"}
.........
]}

type req struct {
request string
data []data
}

type data struct {
data string
}

// Basic approach
var r req
json.Unmarshall(input, &r)

for _, d := range r.data {
process(d.data)
}

Now let's say each data sub-object is 100k, and that there are 100 of
them. That means 10m of memory is required for the basic approach.

I'd rather just process each data struct one at a time, as this
requires only 100k of memory to be allocated.

Can you do this with the existing json package? What's the best way to
achieve this in go?

Michael Hoisie

unread,
Apr 14, 2011, 7:46:09 PM4/14/11
to golang-nuts
Yeah, exactly. I'd like to reduce the memory churn in my Facebook app.
Looking at the profiling information, most of the memory allocation is
in the reflect package due to json marshal/unmarshal. It would be nice
to have a parser that's optimized for memory usage, and I figure it
would be easier to adapt something written in C than to rewrite one in
Go.

- Mike

Rob 'Commander' Pike

unread,
Apr 14, 2011, 7:50:20 PM4/14/11
to Michael Hoisie, golang-nuts
There will soon be much less allocation involved in reflection.

-rob

Russ Cox

unread,
Apr 14, 2011, 11:03:27 PM4/14/11
to Michael Hoisie, golang-nuts
The core part of the Go json scanner is very simple:
you just feed bytes into it one at a time and let it
tell you about significant events. It would be easy
to build a thing that lets you pass in an interface to
get called for each event. I suggest reading the json
code. Grep for step.

Russ

Reply all
Reply to author
Forward
0 new messages