UNIX Resource Utilization When Using WiredTiger

Jared Cottrell

unread,

Jun 15, 2016, 9:55:32 PM6/15/16

to mongodb-user

Hello,

I'm trying to better understand the mongod resource utilization guidelines given here:

https://docs.mongodb.com/manual/reference/ulimit/#resource-utilization

1. How do we estimate for the purposes of setting resource limits the number of data files created when using WiredTiger?

1a. WiredTiger creates one data file per namespace (collection or index), correct? Does it ever split a namespace into multiple files e.g. if the data gets too large?

1b. Should we still budget one file handle per data file when using WiredTiger?

2. How do we estimate for the purposes of setting resource limits the number of internal threads created when using WiredTiger?

2a. Does the following statement (from the doc linked above) cover WiredTiger as well?

mongod uses background threads for a number of internal processes, including TTL collections, replication, and replica set health checks, which may require a small number of additional resources.

2b. Are there any thread pools in WiredTiger that grow (meaningfully) with load, data, cores, RAM, or some other variable resource?

Thanks in advance!

Jared

Kevin Adistambha

unread,

Jun 27, 2016, 9:58:18 PM6/27/16

to mongodb-user

Hi Jared,

How do we estimate for the purposes of setting resource limits the number of data files created when using WiredTiger?
1a. WiredTiger creates one data file per namespace (collection or index), correct? Does it ever split a namespace into multiple files e.g. if the data gets too large?

A WiredTiger file is not split for large collections, unlike MMAPv1. WiredTiger creates one file per collection or index. Since the _id field is always indexed, at the minimum each collection consists of two WiredTiger files.

1b. Should we still budget one file handle per data file when using WiredTiger?

Yes, WiredTiger uses one file handle per collection, and one file handle per index.

How do we estimate for the purposes of setting resource limits the number of internal threads created when using WiredTiger?
2a. Does the following statement (from the doc linked above) cover WiredTiger as well?
mongod uses background threads for a number of internal processes, including TTL collections, replication, and replica set health checks, which may require a small number of additional resources.

Yes. However, the threads mentioned above is not part of the storage engine, but are fundamental to the operation of the mongod server process.

2b. Are there any thread pools in WiredTiger that grow (meaningfully) with load, data, cores, RAM, or some other variable resource?

No. Under normal operation WiredTiger will spawn a fixed number of internal threads.

Please note that for production purposes, I would recommend setting the ulimit values to the values in the Recommended ulimit Settings page for best performance and to ensure that the mongod process is not artifically restricted by sub-optimal ulimit settings.

Best regards,
Kevin

Jared Cottrell

unread,

Jul 1, 2016, 4:53:08 PM7/1/16

to mongodb-user

Thanks Kevin,

Please note that for production purposes, I would recommend setting the ulimit values to the values in the Recommended ulimit Settings page for best performance and to ensure that the mongod process is not artifically restricted by sub-optimal ulimit settings.

Unfortunately, we have environments where the recommended settings don't cover the estimated usage resource usage based on the above. Specifically ulimit -u (processes) and ulimit -n (file handles), which are both recommended to be set to 64k.

The defaults are most commonly a problem in our development environments, where connection count requirements are quite modest but the number of file descriptors needed is huge. We heavily multi-tenant these environments with potentially over 100 developers on a single deployment. Between all their environments we need to plan to have hundreds of thousands of namespaces--even though the storage needed is small--but only a few thousand connections. So we end up with estimates of ~10k processes and ~200k files (obviously it varies from deployment to deployment based on exact usage pattern).

So we have been going with our estimated numbers (good to confirm calculating those estimates correctly) rather than the recommended settings and the deployments have been operating smoothly, but we've created confusion because we commonly run afoul of this warning, which scares those who encounter it:

WARNING: soft rlimits too low. rlimits set to 12500 processes, 200000 files. Number of processes should be at least 100000 : 0.5 times number of files.

https://github.com/mongodb/mongo/blob/6dcdd23dd37ef12c87e71cf59ef01cd82432efe0/src/mongo/db/startup_warnings_mongod.cpp#L334-L341

Is it fair to say the test is no longer relevant and the warning can be safely ignored? I see no correlation between processes and file handles in any of the guidelines on resource capacity planning.

Regards,

Jared

Kevin Adistambha

unread,

Jul 7, 2016, 2:28:32 AM7/7/16

to mongodb-user

Hi Jared,

Is it fair to say the test is no longer relevant and the warning can be safely ignored? I see no correlation between processes and file handles in any of the guidelines on resource capacity planning.

The rlimits warning you saw concerns the general case, where we recommend that the limit of number of processes should be at least 0.5 times the number of files.

However, it seems that you have a very specific requirements for your deployment, which is not covered in the “general” case. Therefore, if your calculations and testing determined that the rlimit warning does not apply to your use case, then you may be able to ignore the warning.

Best regards,
Kevin

Jared Cottrell

unread,

Jul 7, 2016, 12:38:11 PM7/7/16

to mongodb-user

Hi Kevin,

The rlimits warning you saw concerns the general case, where we recommend that the limit of number of processes should be at least 0.5 times the number of files.

Can you help me understand how this is generally applicable?

Here's how I've been thinking about it. If we simplify the formulas for estimating required resources to eliminate all but their dominant factors:

nProcesses ~ nConnections

nFiles ~ nConnections + nNamespaces

So another way of saying nProcesses > 0.5 nFiles is:

nNamespaces < nConnections

I don't see how this is a safe general assumption. nNamespaces is driven by your schema (obviously) whereas nConnections is driven by your system architecture (e.g. how many app servers you need and how big their connection pools need to be to handle the planned load). It doesn't seem safe to make general claims about how your schema should vary with your system architecture.

Kevin Adistambha

unread,

Jul 12, 2016, 2:04:12 AM7/12/16

to mongodb-user

Hi Jared,

I don’t see how this is a safe general assumption. nNamespaces is driven by your schema (obviously) whereas nConnections is driven by your system architecture (e.g. how many app servers you need and how big their connection pools need to be to handle the planned load). It doesn’t seem safe to make general claims about how your schema should vary with your system architecture.

The recommendation isn’t supposed to be a guideline about architecture, but rather a warning about the relationship between number of processes (threads) the mongod/mongos is expected to run vs. the number of sockets (file descriptors) the process is allowed to have open.

So the point of the warning is really to say “If you set your file descriptor limit to X in order to accommodate Y open connections, then also set your process limit to Y” with the general estimation that half of the file descriptors are expected to be used for sockets instead of disk files.

Best regards,
Kevin

Reply all

Reply to author

Forward