Need help understanding something

20 views

Skip to first unread message

Robert Steckroth

unread,

Sep 16, 2015, 7:33:08 PM9/16/15

to mongodb-user

How does mongodb utilize memory and IO. When I call find() on a collection which contains, lets say 1,000,000 million records, does the cursor iterate over the results while reading from the hard drive?

If so, is the iteration synced to the IO bottleneck? Also, is the mongodb module for nodejs going to have the same behaviors as the shell regarding memory and IO?

Thanks, Rob.

Stephen Steneker

unread,

Sep 17, 2015, 12:59:04 AM9/17/15

to mongodb-user

On Thursday, 17 September 2015 09:33:08 UTC+10, Robert Steckroth wrote:

How does mongodb utilize memory and IO. When I call find() on a collection which contains, lets say 1,000,000 million records, does the cursor iterate over the results while reading from the hard drive?

Hi Rob,

There are several different aspects to your question including:

memory/resource utilitization
the behaviour of how drivers interact with the server to fetch results

Memory and resource utilization will depend on the storage engine being used, but the general outcome is that requested indexes & data will be fetched into memory and cached. The ideal performance scenario is usually when your commonly requested indexes & data (aka “working set“) is mostly in-memory so overall performance is not constrained by disk I/O which is many times slower than RAM.

MMAP (the default storage engine through MongoDB 3.0) automatically uses available free memory as cache and defers to the operating system for memory management:

MMAP uses memory-mapped file access and the documents on disk have the same representation in memory. When data is requested that isn’t in active memory, mongod notes a page fault for monitoring purposes.

WiredTiger (alternative storage engine in MongoDB 3.0+) has a configurable cache size which (in v3.0) defaults to maximum of either half of physical RAM or 1GB):

http://docs.mongodb.org/master/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB

WiredTiger supports index prefix compression and data compression (both enabled by default), so the size of data on disk is typically much less than the representation in memory. WiredTiger notes cache evictions and other metrics in serverStatus.wiredTiger.

If so, is the iteration synced to the IO bottleneck?

A find() query returns a cursor which drivers iterate to return results in cursor batches. If a million documents match a find() query, they will be loaded into memory (if not already present) as your driver requests additional batches.

Also, is the mongodb module for nodejs going to have the same behaviors as the shell regarding memory and IO?

The driver used does not affect the server-side memory or I/O.

If you’re interested in learning more about MongoDB internals or any other topics, there are a lot of archived presentations available: https://www.mongodb.com/presentations/all. There are also regular free online courses through MongoDB University: https://university.mongodb.com.