Full resync fails due to out of memory

41 views
Skip to first unread message

Kenn Ejima

unread,
Mar 14, 2012, 5:38:51 PM3/14/12
to mongodb-user
I've reported this issue here: https://jira.mongodb.org/browse/SERVER-5312

But would like to hear from anyone who have thoughts on this.

In summary, I'm adding a new secondary node to a replica set, but it
fails in the middle of initial resync.

To be precise, (1) initial cloning finishes just fine, but (2)
somewhere in the secondary index building, it's stuck like this:

external sort used : 10 files in 54 secs
570700/9432323 6%
575500/9432323 6%
579800/9432323 6%

It gets really slow, and eventually mongod is killed due to out of
memory.

What I find remarkable is that the pattern in memory usage changes
dramatically in the step (2) above.

That is, when everything goes smoothly in the step (1), it's like
VIRT=95g, RES=4g and SHR=3.8g. Plenty of file cache (most of the
memory is used as file cache / shared mmap'd pages) available.

But in the step (2), it becomes like VIRT=95g, RES=4g and SHR=0.
Notice SHR=0 - there's only 1-2MB of memory (!) allocated to the file
cache, causing dirtied mmap pages gets in and out so frequently that
it's causing thrashing. In fact, iowait is predominant when it's so
slow.

With such a memory starving situation, I had to set -17 to oom_adj to
exempt mongod from oom-killer targets, and "swapoff -a" to make sure
thrashing isn't caused by swap, but always crashes catastrophically
nonetheless.

It looks to me that something is very inefficient in the secondary
index building process. Is there any hard limit as to how much memory
I should have to get the initial resync done?

To me, "RES=8g / SHR=0" looks really problematic, as it suggests that
there's a hard limit for the amount of memory in proportion to the
data (or index?) size. I'd like to learn the way to estimate the
minimum required memory for secondaries to be able to resync.

Experiences, thoughts, ideas?

Rakesh Sankar

unread,
Mar 29, 2012, 12:46:29 AM3/29/12
to mongod...@googlegroups.com
I am facing the same problem were you able to find any solution Kenn?

Thanks,
Rakesh Sankar.
Reply all
Reply to author
Forward
0 new messages