Hi,
I’m doing some benchmarking of MongoDB, and try to understand its internals. I can not find much documentation of its internals, and therefore I have some questions. Please notify me of relevant documentation, if it exists.
I use default configured MongoDB 3.4.10 on a single machine.
1. The storage folder of MongoDB contains multiple files like «collection-*.wt» and «index-*.wt». Are all these files separate instances of WiredTiger?
2. When I create a new collection and insert a document to it, two new files get created («collection-*.wt» and «index-*.wt»). What are these files? I guess that my action causes the creation of the primary index, but why is it two files?
3. The documentation (docs.mongodb.com) states that the data structure of the secondary index is a B-tree, are secondary indexes also maintained by the selected storage engine?
4. What kind of pointer is used in the secondary index to refer the original document, physical pointer or a logical pointer (like document id in the case of MongoDB)?
5. When WiredTiger is chosen as storage engine, which index structure does MongoDB use (LSM-tree or B-tree)? Why?
I have seen some earlier comments that state the B-tree is used, but they are unofficial and old.
You can find a lot of internals info from here http://source.wiredtiger.com/
Hi
I would also like to add some details in addition to Rhys’ answer:
The storage folder of MongoDB contains multiple files like «collection-.wt» and «index-.wt». Are all these files separate instances of WiredTiger?
No they are separate WiredTiger tables but they’re all part of a single WiredTiger instance. Also see Rhys’ answer.
When I create a new collection and insert a document to it, two new files get created («collection-.wt» and «index-.wt»). What are these files? I guess that my action causes the creation of the primary index, but why is it two files?
Every time you create a new collection in MongoDB, WiredTiger creates two files: the data file, and the index file. Each additional index you create on the collection will create a new index file in WiredTiger.
However, note that the answers to questions 1 and 2 are implementation details, which is subject to change without notice. Having said that, those facts are true as of MongoDB 3.6.0.
The documentation (docs.mongodb.com) states that the data structure of the secondary index is a B-tree, are secondary indexes also maintained by the selected storage engine?
Yes.
What kind of pointer is used in the secondary index to refer the original document, physical pointer or a logical pointer (like document id in the case of MongoDB)?
A logical pointer.
When WiredTiger is chosen as storage engine, which index structure does MongoDB use (LSM-tree or B-tree)? Why?
See Rhys’ answer, in particular this comment on the ticket.
Best regards
Kevin