Hi Bill
It’s been some time since you posted this question. Have you found out what’s causing the intermittent performance issue?
We have tested with both MMAPV1 and WiredTiger, with each giving very similar results.
But every five or ten tranches, we get a pause of around 20 sec. 20,000msec.
The similar results using both engines may point to memory pressure you mentioned. That is, it’s possible that the size of your result set (700,000+ documents) requires MongoDB to fetch those documents from disk. Unless your hardware is configured with SSD, fetching many documents from a spinning disk will have a major performance impact.
The regular cadence of 5-10 batches you mentioned seem to indicate the case that your working set is much larger than your RAM.
If you don’t need to regularly display 700,000+ documents every time, I would recommend you to construct a more selective query instead. For more information, please see:
We have not tested on a server with enough RAM to pull the entire DB into memory.
Please note that for best performance, you need to take into account that indexes should fit in RAM as well as your working set. Perhaps you could test with a subset of your data that you’re certain will fit in the current hardware’s RAM and check if the pauses disappear?
Also, if you find that your working set regularly exceeds a single machine’s capacity, you may want to explore sharding which was designed for this purpose.
Best regards,
Kevin
Hi Bill
This query uses the aggregation pipeline, and it does need to span the entire db (bilateral payments summed by payer by payee, etc.) and it’s not always 700,000 documents, but there are some responses out of a database of 28 million transactions that do exceed that number.
If you find that you require a sizable number of documents to aggregate all the time, you may want to take a look at pre-aggregated reports, which could allow you to perform this operation more efficiently.
To summarize, instead of performing aggregation queries all the time, create a pre-aggregated document that stores a “running total” every time you insert a new data.
Please note that although the link provides implementation details that are MMAPv1-specific, the concept is universal and also applies to the WiredTiger storage engine.
Best regards,
Kevin