MongoDB poor performance few hours after data deletion

221 views

Skip to first unread message

tik...@smartproject.ua

unread,

Mar 26, 2018, 6:01:57 AM3/26/18

to mongodb-user

Hello, sorry for maybe trivial question, but I'm really stuck.

I've got 3 Mongo DB (v 3.4.10) servers (256 Gb RAM, 1 Tb HDD, 12 CPUs each) as a replica setup. Servers are under decent load and HDD is eaten up quite rapidly. I'm considering sharding big collections, but not yet there.

In the meantime, the typical scenario I face:

Morning I see an alert that database HDD is 92% used
Midday I delete bunch of redundant data from big collections (1M - 4M entries) on master. I either update collection like this: "update({}, {'$unset' : {'key_1' : true, 'key_2' : true, 'key_3' : true}}, {"multi" : 1})" or create new collection, insert only needed data there and drop old one.
Evening (about 4-5 hours after deletion, usually peak of the load) Mongo response time increases dramatically from 3-4ms to 500ms. This period lasts for a while, during which my application is almost down. It only restores back to normal performance after I stop my application completely for 10-20 minutes and try to start it back again.

The days I do not delete data - database performs like normal.

I read a bit about oplog and nuances of deleting data on replicated servers. However, in my case the lag between deletion and performance drop is several hours.

Is there any internal Mongo process, which happens hours after massive update/insert? How should I bulk update/insert to avoid this?

Thanks!

Kevin Adistambha

unread,

Apr 4, 2018, 2:42:55 AM4/4/18

to mongodb-user

Evening (about 4-5 hours after deletion, usually peak of the load) Mongo response time increases dramatically from 3-4ms to 500ms. This period lasts for a while, during which my application is almost down. It only restores back to normal performance after I stop my application completely for 10-20 minutes and try to start it back again.

Is the increase in response time uniform, e.g. all queries suddenly start to return in 500 ms instead of 3-4 ms? Do you have any theories of why stopping the application for 10-20 minutes allow it to be performant again?

As I understand it, you have 3 machines, where each machine is running a single mongod process. Is this correct? Is it three actual hardware, or three virtualized servers running on a single machine? Do you have a dedicated host for running the application, or is the application running on one of the servers?

Do you have an external monitoring tools, or any process that interacts directly with the mongod process?

What do the mongod logs show during these elevated response time? Do you see any errors, warnings, or any queries containing COLLSCAN (which implies that the query is not using indexes, and would put more pressure on the machine during busy hours).

The days I do not delete data - database performs like normal.

I believe you’re deleting data due to a disk alert (>90% used). If you don’t delete the data, wouldn’t you have issues with disk space instead?

Is there any internal Mongo process, which happens hours after massive update/insert?

No there is no scheduled internal process, other than regular checkpoints in WiredTiger that occurs every 60 seconds. This checkpointing is the process used by WiredTiger to persist your data to disk.

I’ve got 3 Mongo DB (v 3.4.10) servers

MongoDB 3.4.10 was released in Oct 2017, and the current release in the 3.4 series is 3.4.14. Have you tried upgrading to 3.4.14 and see the issue persists?