huge number of updates on system.sessions collection

Daniel Puiu

unread,

Apr 2, 2019, 5:05:09 PM4/2/19

to mongodb-user

Hi,

We just completed the upgrade to 3.6.10 for a cluster made out of 30 shards / 20 mongoSes and we noticed that right after we switched on FCV 3.6, on the 1st shard of the cluster we have high activity spikes of >5k in updates/second every 5 min, on this collection: config.system_session. The number of updates is directly proportional with the load on the cluster. (see attachment)

Also the avg exec time for client queries executed against shard01(where system.sessions collection resides) is greatly affected during the peak hours.

In order to prevent the flooding of shard01, we had to revert to FCV 3.4.

I've seen that it would be possible to change the "refresh" rate: https://docs.mongodb.com/v3.6/reference/parameters/#logical-session-parameters or to disable the sessions cache refresh:

setParameter:

disableLogicalSessionCacheRefresh : true

Both of these options will break all the new features of 3.6.

Since this is strictly related to session management in 3.6, is there a way to control/tune/throttle this process? I'm afraid this is a big problem for any vlarge mongodb deployment.

Thanks,

Dan

shard01_updates.png

Kevin Adistambha

unread,

Apr 3, 2019, 11:40:08 PM4/3/19

to mongodb-user

Hi,

To be able to support advanced features like causal consistency and retryable writes, the config.system.sessions collection must be used, as you have found. However, there is some issue regarding certain drivers “leaking” sessions as described in DRIVERS-453. If you’re using the affected driver, could you upgrade your driver version and see if the updates to config.system.sessions is still causing a measurable impact to your workload?

Best regards,
Kevin

Daniel Puiu

unread,

Apr 4, 2019, 12:06:29 PM4/4/19

to mongodb-user

Hi Kevin,

We're using mongo-java-driver version 3.7.1

Looking over the DRIVERS issue, this version should be safe. Anyhow, we didn't experience those kind of symptoms described in that ticket.

Thanks,

Dan

Daniel Puiu

unread,

Apr 5, 2019, 8:56:14 AM4/5/19

to mongodb-user

Hi Kevin,

Doing a deeper analysis for this behaviour, I think we can rule out the client driver (mongo-java-driver) vector.

Since this is happening on regular basis, e.g. every 5 minutes, this means that it's related to the server sessions and not the client sessions. There were no changes on the application side to support/enable 3.6 specific features.

That 5 minutes interval matches a server option that controls the refresh rate for sessions:
https://docs.mongodb.com/manual/reference/parameters/#param.logicalSessionRefreshMillis

https://docs.mongodb.com/v3.6/reference/config-database/#collections-to-support-sessions

The docs are also specifically stating that

When a user creates a session on a mongod or mongos instance, the record of the session initially exists only in-memory on the instance; i.e. the record is local to the instance. Periodically, the instance will sync its cached sessions to the system.sessions collection in the config database, at which time, they are visible to $listSessions and all members of the deployment. Until the session record exists in the system.sessions collection, you can only list the session via the $listLocalSessions operation.

As seen in the screenshot, it looks like all the mongod/mongos nodes are updating this collection at pretty much the same time. Is this timed by a global cluster time? I would expect to see these updates a bit staggered, and not having all the cluster members issuing the updates at the same time.

Another question: does the extensive use of cursors is negatively impacting this behaviour? My assumption is that the server sessions are directly related to the client sessions so if there are many such outstanding sessions, it can lead to what we're seeing.

If I'm reading the documentation correctly `logicalSessionRefreshMillis` is controlling the sessions refresh rate for each node (e.g. reading the sessions from system.sessions collections). Is there a server option that controls the rate at which the cached sessions on each node are synced to this collection?

Thanks,

Dan

Kevin Adistambha

unread,

Apr 10, 2019, 9:12:55 PM4/10/19

to mongodb-user

Hi Daniel,

As seen in the screenshot, it looks like all the mongod/mongos nodes are updating this collection at pretty much the same time. Is this timed by a global cluster time? I would expect to see these updates a bit staggered, and not having all the cluster members issuing the updates at the same time.

The updates into the config.system.sessions collection is timed by the mongod process; it is basically a scheduled task that runs at a predetermined time. Thus in a sharded cluster, the updates should be staggered as you expect since each individual mongod would have its own schedule.

If you’re finding that the updates are occurring at the same time all the time, you might want to check that your config.system.sessions collection is properly sharded. The output of sh.status() should show you.

Another question: does the extensive use of cursors is negatively impacting this behaviour?

Technically yes, but in reality we haven’t seen updates into this collection to be detrimental to the cluster’s performance in general. Multiple clients performing operations on the cluster are easily much more demanding, resource-wise.

Is there a server option that controls the rate at which the cached sessions on each node are synced to this collection?

I don’t believe the rate is controllable. Generally, this functionality is given by the logicalSessionRefreshMillis parameter.

Best regards,
Kevin

Daniel Puiu

unread,

Apr 22, 2019, 9:01:36 AM4/22/19

to mongodb-user

Hi Kevin,

Thanks for your reply.

If you’re finding that the updates are occurring at the same time all the time, you might want to check that your config.system.sessions collection is properly sharded. The output of sh.status() should show you.

The collection is properly sharded, but there's only 1 (one) chunk created. Here's what sh.status is showing:

{ "_id" : "config", "primary" : "config", "partitioned" : true } 
config.system.sessions 
shard key: { "_id" : 1 } 
unique: false 
balancing: true 
chunks: 
scstage-eastus2-ReplSet1	1 
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : sctage-eu-west-ReplSet1 Timestamp(1, 0)

The collection is quite small and normally it will never reach the split chunk threshold:

db.system.sessions.stats()
{
 "ns" : "config.system.sessions",
 "size" : 987075,

 "count" : 5777,

 "avgObjSize" : 170,
 "storageSize" : 1765376,

Technically yes, but in reality we haven’t seen updates into this collection to be detrimental to the cluster’s performance in general. Multiple clients performing operations on the cluster are easily much more demanding, resource-wise.

I agree with you here - that should be the case, but as you can see, this creates a hot spot, and the updates on this collection is indeed affecting the activity on that particular shard.

Do you think that manually splitting that chunk in such a way that the load is distributed across multiple shard would help? E.g creating multiple chunks that will get migrated as part of the balancing process.

The records have this form:

mongos> db.system.sessions.find()
{ "_id" : { "id" : UUID("8d4e1767-3389-4cf1-b965-acfb55123a7e"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:58:47.473Z") }
{ "_id" : { "id" : UUID("9f6de236-0dea-45cd-823a-e1dda73d22ad"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:53:49.530Z") }
{ "_id" : { "id" : UUID("f2da4b55-8f69-4f82-868e-063ae8efc78e"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:43:41.638Z") }
{ "_id" : { "id" : UUID("c9f578ab-3c14-487d-b144-7ed5d2ad3fab"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:58:36.380Z") }
{ "_id" : { "id" : UUID("dd77e664-c32f-4388-93d3-318e0b043d5c"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:43:41.638Z") }
{ "_id" : { "id" : UUID("b3e03d60-6e03-4557-b19b-8aa8b1936a88"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:48:47.468Z") }
{ "_id" : { "id" : UUID("df5e26c7-17fc-46f0-833d-34d47e054c6f"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2019-04-11T09:53:47.467Z") }
....

So creating 16 chunks (for each hex digit 0..f) by manually splitting after "_id.id" would give us ~18 chunks that will be distributed across the cluster's shards, removing the hot-spot scenario.

This is just an idea, so I would like to know your thoughts on this.

Thanks,

Dan

Kevin Adistambha

unread,

May 9, 2019, 2:45:05 AM5/9/19

to mongodb-user

Hi Daniel,

There is an internal method to maintain the config.system.sessions collection, and it supposed to maintain the collection and shard it as necessary. I don’t know why this process seems to fail in your case. In my opinion, it’s best not to try to perform manual maintenance on the collection. I think if you have a lot of sessions, eventually the collection will be automatically split and distributed among the shards. You can maybe try to induce the split by using e.g. only one mongos process during a load test to reduce variability.

Best regards,
Kevin

Reply all

Reply to author

Forward