Hello everyone,
I'm curious about the storage of document nodes. Every document (within a collection) can theoretically have a totally different schema. The straight-forward way to store a document node is to convert it into BSON, treat the byte array as an atomic value and insert it into the B-Tree, with the document ID as key. However, this seems rather wasteful, because in realistic use cases we end up repeating the same property names over and over for every document. In SQL storage engines, this problem does not exist because columns have fixed sizes and cells can be addressed via simple index offsets.
How (if at all) does WiredTiger address this issue? Does it rely on block-level compression to get rid of the property name duplicates? Or is the overhead so small that it doesn't affect storage footprint and performance in practice?
Thank you!
Martin