Index Internals

47 views
Skip to first unread message

sims...@gmail.com

unread,
Dec 11, 2017, 3:30:28 PM12/11/17
to mongodb-user

Hi,


I’m doing some benchmarking of MongoDB, and try to understand its internals. I can not find much documentation of its internals, and therefore I have some questions. Please notify me of relevant documentation, if it exists.


I use default configured MongoDB 3.4.10 on a single machine.




1. The storage folder of MongoDB contains multiple files like «collection-*.wt» and «index-*.wt». Are all these files separate instances of WiredTiger?


2. When I create a new collection and insert a document to it, two new files get created («collection-*.wt» and «index-*.wt»). What are these files? I guess that my action causes the creation of the primary index, but why is it two files?


3. The documentation (docs.mongodb.com) states that the data structure of the secondary index is a B-tree, are secondary indexes also maintained by the selected storage engine?


4. What kind of pointer is used in the secondary index to refer the original document, physical pointer or a logical pointer (like document id in the case of MongoDB)?


5. When WiredTiger is chosen as storage engine, which index structure does MongoDB use (LSM-tree or B-tree)? Why? 

I have seen some earlier comments that state the B-tree is used, but they are unofficial and old.

Rhys Campbell

unread,
Dec 12, 2017, 6:30:53 AM12/12/17
to mongodb-user
You can find a lot of internals info from here http://source.wiredtiger.com/

sims...@gmail.com

unread,
Dec 13, 2017, 9:49:38 AM12/13/17
to mongodb-user


On Tuesday, December 12, 2017 at 12:30:53 PM UTC+1, Rhys Campbell wrote:
You can find a lot of internals info from here http://source.wiredtiger.com/

Hi, Rhys

From what I can see, does the WiredTiger documentation not say much about its role in MongoDB. While my questions are more about how MongoDB uses WiredTiger, and not so much focused on the internals of WiredTiger. 

Rhys Campbell

unread,
Dec 13, 2017, 5:16:20 PM12/13/17
to mongodb-user
1. This is your data and indexes for each collection. See db.collection.stats({ "indexDetails": true }); for more info.
2. Partially answered in #1. I guess this is an architectural choice for performance reasons in a big data system. Smaller files are easier to deal with than massive monolithic ones.
3. Yes.
4. I would think a pointer to the PK but perhaps here is a better place to ask https://groups.google.com/forum/#!forum/wiredtiger-users. Delving into the WT Code might be an option.
5. Still BTree. LSM Support still pending AFAIK - https://jira.mongodb.org/browse/SERVER-18396 

Kevin Adistambha

unread,
Dec 13, 2017, 10:49:10 PM12/13/17
to mongodb-user

Hi

I would also like to add some details in addition to Rhys’ answer:

The storage folder of MongoDB contains multiple files like «collection-.wt» and «index-.wt». Are all these files separate instances of WiredTiger?

No they are separate WiredTiger tables but they’re all part of a single WiredTiger instance. Also see Rhys’ answer.

When I create a new collection and insert a document to it, two new files get created («collection-.wt» and «index-.wt»). What are these files? I guess that my action causes the creation of the primary index, but why is it two files?

Every time you create a new collection in MongoDB, WiredTiger creates two files: the data file, and the index file. Each additional index you create on the collection will create a new index file in WiredTiger.

However, note that the answers to questions 1 and 2 are implementation details, which is subject to change without notice. Having said that, those facts are true as of MongoDB 3.6.0.

The documentation (docs.mongodb.com) states that the data structure of the secondary index is a B-tree, are secondary indexes also maintained by the selected storage engine?

Yes.

What kind of pointer is used in the secondary index to refer the original document, physical pointer or a logical pointer (like document id in the case of MongoDB)?

A logical pointer.

When WiredTiger is chosen as storage engine, which index structure does MongoDB use (LSM-tree or B-tree)? Why?

See Rhys’ answer, in particular this comment on the ticket.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages