Mongodb 3.4, MacOS Sierra volatile performance on large databases

46 views
Skip to first unread message

Bill Chute

unread,
Feb 7, 2017, 3:41:52 PM2/7/17
to mongodb-user
Our standard development environment is MacOS Sierra (currently 10.12.3) on iMacs with 32GB RAM.
With a large-ish database -- 28 million records, with several indexes -- the query performance is quite volatile.
We have tested with both MMAPV1 and WiredTiger, with each giving very similar results.
On a typical query, we may have 700,000+ records returned. 
Therefore we fetch in tranches, with limit() and find() selecting based on a unique record ID $gt previous returned ID.
Most queries return in about 100msec. This would be quite satisfactory. The query plan correctly uses indexes.
But every five or ten tranches, we get a pause of around 20 sec.  20,000msec.
We suspect this might be MacOS memory pressure management interfering with the query.
We have not tested on a server with enough RAM to pull the entire DB into memory.

Eventually we get the correct response to the query in total. The wide variation in response time is troubling until we better understand it.

Thank you for any advice or information.

Bill Chute
Acadiant Limited

Kevin Adistambha

unread,
Feb 19, 2017, 7:37:16 PM2/19/17
to mongodb-user

Hi Bill

It’s been some time since you posted this question. Have you found out what’s causing the intermittent performance issue?

We have tested with both MMAPV1 and WiredTiger, with each giving very similar results.

But every five or ten tranches, we get a pause of around 20 sec. 20,000msec.

The similar results using both engines may point to memory pressure you mentioned. That is, it’s possible that the size of your result set (700,000+ documents) requires MongoDB to fetch those documents from disk. Unless your hardware is configured with SSD, fetching many documents from a spinning disk will have a major performance impact.

The regular cadence of 5-10 batches you mentioned seem to indicate the case that your working set is much larger than your RAM.

If you don’t need to regularly display 700,000+ documents every time, I would recommend you to construct a more selective query instead. For more information, please see:

We have not tested on a server with enough RAM to pull the entire DB into memory.

Please note that for best performance, you need to take into account that indexes should fit in RAM as well as your working set. Perhaps you could test with a subset of your data that you’re certain will fit in the current hardware’s RAM and check if the pauses disappear?

Also, if you find that your working set regularly exceeds a single machine’s capacity, you may want to explore sharding which was designed for this purpose.

Best regards,
Kevin

Bill Chute

unread,
Feb 24, 2017, 1:02:45 PM2/24/17
to mongodb-user
Thank you Kevin.

This query uses the aggregation pipeline, and it does need to span the entire db (bilateral payments summed by payer by payee, etc.) and it's not always 700,000 documents, but there are some responses out of a database of 28 million transactions that do exceed that number. 

Your comments are actually quite helpful as we think about the profile of this behavior. 

I do suspect the indexes fit into RAM -- the machines have 32GB each.  But -- and this is the interesting part -- we see ** in MacOS** (Sierra) mongod is using a bit over 15GB (as expected!) but of that, more than 13GB is "compressed" by the operating system. The ENTIRE database is 11GB on disk but of course WiredTiger compression is at work too. And when we check currentOp, we see the aggregation pipeline is using indexes properly.

So we suspect we're running into some "harmonic" behavior with (a) WiredTiger compression; (b) MacOS memory compression. Our suspicion is that the MacOS response may be our culprit.  If our theory holds, it's kind of analogous to a garbage-collection problem.

Before moving this into production, we'll try it on Amazon with WiredTiger under Linux and see if the profile is different.

Thanks for your attention!

Bill.

Kevin Adistambha

unread,
Feb 27, 2017, 1:32:23 AM2/27/17
to mongodb-user

Hi Bill

This query uses the aggregation pipeline, and it does need to span the entire db (bilateral payments summed by payer by payee, etc.) and it’s not always 700,000 documents, but there are some responses out of a database of 28 million transactions that do exceed that number.

If you find that you require a sizable number of documents to aggregate all the time, you may want to take a look at pre-aggregated reports, which could allow you to perform this operation more efficiently.

To summarize, instead of performing aggregation queries all the time, create a pre-aggregated document that stores a “running total” every time you insert a new data.

Please note that although the link provides implementation details that are MMAPv1-specific, the concept is universal and also applies to the WiredTiger storage engine.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages