Hello,
I am trying to understand the structure of a BSON document generated by MongoDB by comparing the hexdump with the formal specification.
I inserted the following into an empty mongo collection (without compression):
{"name": John, age: NumberInt(10)}
{"name": Paul, age: NumberInt(25)}
And here is the hexdump
$ hexdump -C collection-0-1541750112074217368.wt
00000000 41 d8 01 00 01 00 00 00 d8 08 23 b7 00 00 00 00 |A.........#.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 00 00 00 00 00 00 00 00 26 04 00 00 00 00 00 00 |........&.......|
00001010 8a 00 00 00 04 00 00 00 07 04 00 00 00 10 00 00 |................|
00001020 f0 22 09 6b 01 00 00 00 05 81 bb 2e 00 00 00 07 |.".k............|
00001030 5f 69 64 00 5d fd 43 d4 53 6c 1e 93 e1 fb fb 1d |_id.].C.Sl......|
00001040 02 6e 61 6d 65 00 05 00 00 00 6a 6f 68 6e 00 10 |.name.....john..|
00001050 61 67 65 00 0a 00 00 00 00 05 82 bb 2e 00 00 00 |age.............|
00001060 07 5f 69 64 00 5d fd 43 dc 53 6c 1e 93 e1 fb fb |._id.].C.Sl.....|
00001070 1e 02 6e 61 6d 65 00 05 00 00 00 70 61 75 6c 00 |..name.....paul.|
00001080 10 61 67 65 00 19 00 00 00 00 00 00 00 00 00 00 |.age............|
[...]
I understand that the bytes underlined are the types and bytes in blue are the objectids, yellow are the names and green are the ages.
However, I do not understand what the bytes between them represent (in red).
So I tried to parse that same file using libbson (using example-client.c). It connects to Mongo and retrives that collection. Using gdb, I stepped through the parsing and I realized that those bytes were actually different:
00 03 31 00 2e 00 00 00
Now, that makes a lot more sense: the first 00 is the end of the previous document, "03 31 00" is the beginning of the new document and "2e 00 00 00" is the document's size.
What is going on here? Why is the hexdump different from what I see with libbson?
Thanks,
Martin