Does any BSON library support selective parsing?

63 views
Skip to first unread message

David Glavas

unread,
Sep 11, 2019, 5:10:54 PM9/11/19
to BSON
A common strategy to accelerate JSON parsing is to parse selectively. For example, to jump directly to a queried field without parsing intermediate content (eg. Mison http://www.vldb.org/pvldb/vol10/p1118-li.pdf). 

Is there any BSON library that supports quick querying of only one field such that it's significantly faster than deserializing the whole thing and parsing the resulting JSON file? 

So internally MongoDB uses BSON to store its documents. Say that the user stores a document A, and MongoDB stores it internally as BSON. If now the user wants to access only one field in document A, what does MongoDB do? How does MongoDB get this one field out of the BSON document?

Best
David

Mathias Stearn

unread,
Sep 11, 2019, 7:06:58 PM9/11/19
to BSON
The BSON parser that MongoDB uses internally *only* does selective parsing. In fact it would be more correct to say it is an iterator over BSON fields than a parser of BSON objects. One of the important parts of BSON's design is that because every field is either fixed-size or prefixed with a size, it is easy to skip large swaths of BSON (eg, strings and whole subobjects) when looking for a field. This is unlike json, where you always need to examine every byte[1] until you find your desired field. If you are curious, you can see the logic for computing field size at https://github.com/mongodb/mongo/blob/f27f82560f129f6ccd9b16fba887949ab197e678/src/mongo/bson/bsonelement.cpp#L709-L776, which is the meat of iterating over fields.

I think many drivers offer some form of lazy bson parsing, but I'm not an expert in them, so I can't answer that definitively.

[1] Although it is possible to use SIMD to examine 16 or 32 bytes at a time, as is done in the paper you linked to. I'm surprised that it didn't compare itself to https://github.com/lemire/simdjson, which appears to be approximately as fast even though it *doesn't* do selective parsing.

David Glavas

unread,
Sep 12, 2019, 4:27:52 PM9/12/19
to BSON
Do you know how I can set up a local BSON implementation (such that I can step through the code line by line)? I want to benchmark 3 operations: serialize, deserialize, query the BSON data for individual fields. How can I set up either MongoDB's BSON implementation or some third party implementation that works similarly?

I believe that the Mison paper doesn’t mention simdjson because Mison was published first.
Reply all
Reply to author
Forward
0 new messages