The servers have 8GB of RAM, out of which the mongod process use around 5GB. So the majority of the data and probably more important the indexes is not kept in memory. Can this be the our root cause?
Hi Leonid,
Note that MongoDB MMAPv1 does not buffer any data within its own process, instead data is cached in the FileSystem cache. Due to the way MongoDB journalling works, often times data files are in memory but under filesystem cache and not resident memory. See SERVER-9415 for some further information on this.
The general recommendation is to ensure that indexes fit in RAM to reduce index page faults. Since you have 50GB of indexes, you probably should also look at optimising your indexes:
Check all of the 14 indexes. Drop them if they are not being used or overlapps with a prefix of a compound index. Also see How do you determine what fields to index.
Check your working set. Your working set is the portion of your data that is accessed frequently, and should fit in memory to achieve good performance. See How can I estimate working set size (v2.6).
One point to add is that the new WiredTiger storage engine in MongoDB v3.0 supports compression for all collections and indexes. The compression on the indexes may help reducing your RAM size requirement.
2015-10-19T09:44:09.220+0000 [conn210844] update chatlogging.SESSIONS query: { _id: “838_19101509420840010420fe43620c0a81eb9_IM” } update: { $set: { EndTime: new Date(1445247849092) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 numYields:0 locks(micros) w:214 126ms
I noticed that your _id is quite interesting : 838_19101509420840010420fe43620c0a81eb9_IM.
One tip in managing the size of your working set is index access patterns. If you are inserting into indexes at random locations (as would happen with ids that are randomly generated by hashes), you will continually be using the whole index. If instead you are able to create your ids in approximately ascending order (i.e. day concatenated with a random id), all the updates will occur at the right side of the B-tree and the working set size for the id index may be smaller if your reads favour the newest documents.
All inserts and updates are made with w:1, j:1.
By setting write concern value of j:1, mongod will confirm the write only after it has written the operation to the journal. Although this confirms that the write can survive a shutdown and increases the write durability, it comes with performance cost (especially on spinning disk). See j-option. In a replica set deployment w:majority is normally used to improve durability.
For general recommendations for your deployment please see the MongoDB v2.6 production notes. I would specifically point out the below:
Make sure that read-ahead settings for the block devices that store the database files are appropriate. For random access use patterns, set low readahead values. A readahead of 32 (16kb) often works well.
Disable Transparent Huge Pages, as MongoDB performs better with normal (4096 bytes) virtual memory pages.
Use SSD if available and economical.
Make sure that there are no other processes running on the instances that may create resource contention issues on the machine.
Regards,
Wan.
Can you estimate the affect on the workset size this will have as well as the affect on CPU usage. Any other pointers regarding using MMAPv1 vs WiredTiger?
Hi Leonid,
A working set estimation is quite complex, as there are many factors to consider. The most effective way to measure this would be testing in your environment under the representative data set and work load.
In regards to MMAPv1 vs WiredTiger, this also depends on your use case but in general, WiredTiger has better performance due to compression, document-level concurrency control, and the ability to effectively utilise multi-core processor resources better than MMAPv1.
For WiredTiger you may find these FAQs useful :
My understanding that if the values were monolitically increasing, then new documents were always inserted to a single shard and the shards would be rebalanced later on to keep them in balance. So the write scalability of the whole system would be capped by the write scalability of a single shard. The extra rebalancing would also be causing bigger parts of the indexes to become part of the working set when those parts would be moved between chunks. Is my understanding not entirely correct?
It’s correct that utilising Hash Based Partitioning ensures an even distribution of writes between the shards, however at the expense of efficient range queries . Also see Performance distinctions between Range and Hash Based Partitioning.
So in my case it looks like using random or monolitically increasing values each have its positive and negative affects. Is there a recommended way for investigating which of the approaches is a more suitable one besides trying them both under load to see how each behaves?
There are a number of approaches for this. For more detailed discussions on those approaches and their pros and cons, I would suggest to check out Socialite: Social Data Reference Architecture.
For even more on Socialite, there were three sessions recorded at MongoDB World 2014:
Are you suggesting w:majority since the penalty of waiting for the data to be journaled is usually bigger than replicating it to a secondary?
Journaling provides single-instance write durability. For a replica set deployment, we normally recommend w:majority so writes are acknowledged by the majority of the replica set members. If you are only acknowledging writes in the primary, those writes can be rolled back in the event of primary failover. See Rollbacks During Replica Set Failover
Regards,
Wan.
PS: You can also download the white paper for MongoDB Performance Best Practices by answering a few questions.
Hi Leonid,
The workingSet is an estimated value to give you an idea of what proportion of data you are using.
The pagesInMemory is the total number of pages accessed by mongod over the measurement period. It is not what is currently in physical memory, but rather pages ‘touched‘. MongoDB is estimating that your operations “touched” 89539 pages, ( or ~350MB of data). So this is a measure of how much data is “hot” for that measurement interval (~14minutes).
Right now it looks like you are accessing a relatively low amount of data within your mongod process. However, this can be misleading as your working set can change over the course of the day as your workload increases/decreases. Ideally you should measure when you are at the peak of your workload for the day and see what the values look like then. You can also try out MongoDB Cloud Manager Monitoring for metric trending.
To get a better idea of how MongoDB is utilising RAM, you should also consider the following values :
If the numbers above are all low, it is more likely that your working set is contained in memory.
Regards,
Wan.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/ZbisWFtNpoo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/febecdff-aefe-44c7-a175-2dc86c5209e0%40googlegroups.com.
As this is a test environment, the load doesn’t change much during the day. The numbers displayed in the workingSet are not changing by much over time.
Hi Leonid,
What kind of operations are you running on the test environment ? i.e. a loop of read queries? random inserts ?
Since none of these occur on my environment, I must be missing something regarding workingSet output of db.serverStatus. Could you provide more details on this?
To look into this further, can you post the readahead values please ?
You can run sudo blockdev --report to get the readahead settings.
Also, are your journal and data files on the same block device ? The default journal commit interval is 100ms if a single block device (e.g. physical volume, RAID device, or LVM volume) contains both the journal and the data files. (See commitIntervalMs)
You mentioned previously that you are using mtools, can you plot the log file using mplotqueries --group operation and post the resulting graph ?
If you are setting up new environment, I would suggest to use the latest stable release of MongoDB (currently v3.0.7).
Kind Regards,
Wan.