Cache policy of MongoDB

32 views

Skip to first unread message

Moditha Hewasinghage

unread,

Jun 4, 2018, 6:15:49 PM6/4/18

to mongodb-user

Hello everyone.

I am intereseted in getting to know about the cache policy of MongoDB. Unfortuantely I haven't found much information in my search and decided to run some experiments to get an idea. I have 4 different sized databases with the same schema and i run a simple retrieve by _id (100000 iterations)on each of them simultaneously and get the collection.stats() to see the cache usage of each of the collections. From what I see the larger collection (1.5M documents) gets the majority of the cahce and the rest of the collections get an equal share regardless of the size of the cache. I also ran other experiments reading each collection one after the other and seems like MongoDB discards the smaller collections first eventhough they are fresh in cache. (image 2 where i read the collections in the descending size and image 3 where i read them in the ascending order). I thought it could be LRU as the cache policy. Can someone tell me the reason behid this behaviour and preferebly the cache policy being used. (i am using MongoDB 3.6.1 with wired tiger and i have reduced the cache to 256MB for the experiments) . I have asked this question on the dev forum as well. Moreover, intuitively my idea was that more cache will be utilized to maintain the indexes rather than data. But, with my experiments (image 4) eventhough the index fits in memory the index-to-cache ratio seems to be 1:9 in almost all the use cases that I have tried.

Reading together

Larger first

Smaller First

Index and data in cache

Thank you,

Moditha

Auto Generated Inline Image 1

Kevin Adistambha

unread,

Jun 18, 2018, 2:55:01 AM6/18/18

to mongodb-user

Hi Moditha

The WiredTiger cache contains data that is loaded into memory and ready for immediate use by WiredTiger. If WiredTiger is configured to use compression (Snappy by default), the content of the WiredTiger cache is the uncompressed version of the data. In contrast, the content of the filesystem cache is still compressed.

Contents in the WiredTiger cache will be evicted to make room for new data as required by your workload. In general, eviction is following an approximately LRU process. However, this may not follow a strictly LRU timeline due to factors such as: unmodified or modified (dirty) data, whether some cursor is referencing pages in the cache, etc.

Having said that, WiredTiger cache eviction process is an implementation detail, and might change from version to version. Are you seeing any performance issues due to the current implementation?