[ANN] i-json a fast (C++) incremental JSON parser

293 views
Skip to first unread message

Bruno Jouhier

unread,
May 30, 2014, 7:12:13 PM5/30/14
to nod...@googlegroups.com
I just released a first version of i-json, a fast incremental JSON parser: https://github.com/bjouhier/i-json

Main features:
  • It is FAST :-), but still slower than JSON.parse :-(. On my bench (parsing a realistic 8 MB JSON file), it comes out as only 60% slower than JSON.parse. In comparison, the pure JS parsers that I have tried are more than 6 times slower than JSON.parse.
  • It is incremental and it works directly on buffers (on strings too :-). So you can call it directly with buffers that you receive through 'data' events.
  • It can work either as a DOM parser or an evented one (less events than SAX but probably enough).
  • API is small. The intent is not to provide a sophisticated API but a fast engine around which you can build fancier APIs.
  • It it implemented in C++ but there is a JS fallback implementation in case it does not build. the JS implementation is 2.5 to 3 times slower. I wrote it in JS initially but was not happy with the speed. Both implementations pass the unit test suite.

Bruno

Alex Yaroshevich

unread,
May 31, 2014, 11:36:01 PM5/31/14
to nod...@googlegroups.com
Hi Bruno!

Did you tried to write it on C or Go?

Bruno Jouhier

unread,
Jun 1, 2014, 7:27:50 AM6/1/14
to nod...@googlegroups.com
Hi Alex,

No, I haven't tried C nor Go. The node and v8 API are exposed in C++. Developing with other languages is trickier (http://stackoverflow.com/questions/12088937/is-it-possible-to-write-node-js-addons-in-go). Why C or Go?

If your concern is speed, I think that the next step is not to change language but to hook directly into v8's internals instead of going through v8's public APIs. This is how JSON.parse is implemented. I measured the total time spent in the public v8 materialization calls (creating strings, objects and arrays) and this time alone is equivalent to the total time used by JSON.parse.

Bruno

Nuno Job

unread,
Jun 1, 2014, 10:05:58 PM6/1/14
to nod...@googlegroups.com
why is it different from clarinet or jsonparse[1]?

for what class of application is the difference meaningful?

would be happy to refer to this in the clarinet readme,

nuno


--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/434777fb-9bcf-4cc8-bd37-caffb76b74b0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Bruno Jouhier

unread,
Jun 2, 2014, 2:35:31 AM6/2/14
to nod...@googlegroups.com
Hi Nuno,

The difference is speed. Just run the test/bench program:

JSON.parse: 584 ms
I-JSON single chunk: 956 ms
I-JSON multiple chunks: 1235 ms
jsonparse single chunk: 8238 ms
DIFFERENT RESULTS!
62: a1=        "latitude": -77.373901,
62: a2=        "latitude": -77.37390099999999,
clarinet single chunk: 5244 ms
clarinet does not materialize result, time is for parsing only

Our application manipulates a lot of JSON data. I want to eliminate big JSON.parse calls that block the event loop and go with a streaming API but and I'm not ready to pay a high price for it.

Bruno

dhruvbird

unread,
Jun 2, 2014, 12:39:03 PM6/2/14
to nod...@googlegroups.com


On Sunday, June 1, 2014 7:05:58 PM UTC-7, Nuno Job wrote:
why is it different from clarinet or jsonparse[1]?

for what class of application is the difference meaningful?

would be happy to refer to this in the clarinet readme,

It would seem that the C++ implementation would be faster than the above 2 (javascript based) implementations.

Apart from performance, the API of the 3 libraries is very different.
* jsonparse requires an unbound callback (since it passes in data using 'this') - sounds like a bad idea
* clarinet requires every caller/client to keep track of the current value's object nesting by providing a callback for almost every event function. Reconstructing the original json can be problematic or time consuming for the client. Not to mention the perf. overhead for invoking each callback
* otoh, i-json passes in the complete path of the value (assuming it passes in integer indexes to denote array indexes in the 2nd parameter of the value callback)

Aside: There's a typo in the clarinet readme: "string on number" should be "string or number".

Will Hoover

unread,
Jun 2, 2014, 10:17:09 PM6/2/14
to nod...@googlegroups.com

How does it stack up against http://oboejs.com/why ?

Bruno Jouhier

unread,
Jun 3, 2014, 5:06:39 AM6/3/14
to nod...@googlegroups.com
Looks like oboe is built around clarinet. So performance must be similar.

Oboe can parse JSON on the server and in the browser. C++ gives i-json an edge on the server side. On the browser side, you can use i-json's JS fallback implementation. Not as fast but still faster than clarinet in my tests.

But oboe is a higher level solution. It provides a complete end-to-end solution to handle JSON between browser and server. i-json is just a parser.

Bruno

Floby

unread,
Jun 3, 2014, 7:48:25 AM6/3/14
to nod...@googlegroups.com
Interesting implementation.

I'm note able to tell if it would be complicated to reproduce the JSONPath-like behaviour of JSONStream.

I use it to parse couchDB result sets and it works. But it is SLLOOOWWW
Reply all
Reply to author
Forward
0 new messages