Replica Set Full Re-sync Out of Memory During Index Building

jnunemaker

unread,

Jun 20, 2012, 10:30:10 AM6/20/12

to mongod...@googlegroups.com

I've tried 3 times with one machine and 1 time with another to add another replica to a set. Each time it gets through 45-47 data files out of 52 and then starts rapidly using memory until it eventually gets sniped by the OOM killer.

For now I've added an arbiter so I have 2 full copies and an arbiter. I need to get the new machines synced though as one of the 2 full copies has half the hardware as we are in the middle of transitioning to new hardware.

According to our host, we can't snapshot only the data directy, it would have to be the whole server, which would be a mess for config. Pretty sure this means we have to do a full re-sync and they keep running out of memory.

It seems to always happen in the index building phase. Let me know if any more information would help (servers, logs, etc.).

Sid

unread,

Jun 20, 2012, 11:55:58 AM6/20/12

to mongodb-user

Do you have any swap space ? If yes, how much swap space are you
running i with ?

John Nunemaker

unread,

Jun 20, 2012, 12:10:05 PM6/20/12

to mongod...@googlegroups.com

Yes. 2gb.

> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Sid

unread,

Jun 20, 2012, 12:41:05 PM6/20/12

to mongodb-user

Ok. Can you post the following please :

i) details about the machine (RAM, platform, OS) etc.
ii) How big is the data size and what is the size of the indexes etc.
iii) Output from free -m while indexing is going on. Also is the disk
saturated when it happens ?

On Jun 20, 12:10 pm, John Nunemaker <nunema...@gmail.com> wrote:
> Yes. 2gb.
>

John Nunemaker

unread,

Jun 20, 2012, 3:54:26 PM6/20/12

to mongod...@googlegroups.com

1) 6GB of RAM. Just upped to 8GB of RAM and still failed. Latest stable ubuntu.

2) data size is like 54GB. Index size is ~34GB.

3) Disk is not saturated. All the swap gets used too. It just died again, so I couldn't free -m, but the kernel log shows free swap as nothing so I'm assuming it is burning that as well.

It seems to fail at a similar point, right around data file 47 and is always during index building.

The service is analytics, so our active set is relatively small compared to all the data/index size.

Most data is partitioned a collection per month as well, so only the latest collections actually receive writes. It does not appear to be getting to this point yet. Seems to be dying when it gets to a collection a few months back, or at least that is what is in the log.

Sid

unread,

Jun 21, 2012, 11:24:09 AM6/21/12

to mongod...@googlegroups.com

Can you please try with a larger swap file. Also, can you please try reproducing the issue with log level 2 on the node that you are trying to resync and post the logs. I am interested in seeing the logs specifically from the time when its building the index. To run a mongo instance with higher verbosity level just pass an extra argument -vv on the command line when you start mongo.

mongodb-user+unsubscribe@googlegroups.com

See also the IRC channel -- freenode.net#mongodb

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

mongodb-user+unsubscribe@googlegroups.com

John Nunemaker

unread,

Jun 21, 2012, 10:56:42 PM6/21/12

to mongod...@googlegroups.com

Bumped up swap to 6GB and changed log verbosity to 2. I'll check on it in the morning (EST) and post the results.

mongodb-user...@googlegroups.com

John Nunemaker

unread,

Jun 22, 2012, 10:26:19 AM6/22/12

to mongod...@googlegroups.com

Failed again last night with 8GB of RAM and 6GB of swap. Got to data file 49 of 52.

I'd prefer not to post the logs publicly. Should I send them directly to you or drop them in jira community private or something?

Sid

unread,

Jun 22, 2012, 1:36:48 PM6/22/12

to mongod...@googlegroups.com

So adding extra swap space did help it in making it move forward. As for the logs, yes can you please create a ticket in community private and attach the logs there.

Thanks.

John Nunemaker

unread,

Jun 22, 2012, 2:03:59 PM6/22/12

to mongod...@googlegroups.com

Nah, swap space didn't really help. It made it to the last data file once without swap.

mongodb-user...@googlegroups.com

Sid

unread,

Jun 25, 2012, 10:13:02 AM6/25/12

to mongod...@googlegroups.com

Thanks for filing the ticket along with the logs. Will look into it and update the relevant ticket accordingly. Much thanks for reporting this to us.

John Nunemaker

unread,

Jun 25, 2012, 7:36:07 PM6/25/12

to mongod...@googlegroups.com

The good news is I managed to get one of the two new machines that I need to sync up to date last night (after like try 5). Going to try syncing the other one tonight.

I've had this sync problem before as well. Would love to get it sorted out so I'm not so nervous about losing machines. I can't currently do file system snapshots so I kind of have to do full syncs. Let me know if you need anything else from me. Happy to help.

mongodb-user...@googlegroups.com

David K Storrs

unread,

Sep 28, 2012, 4:54:30 PM9/28/12

to mongod...@googlegroups.com

Did you ever find an answer? We are having the same issue?

Dave

Gianfranco

unread,

Oct 5, 2012, 9:19:32 AM10/5/12

to mongod...@googlegroups.com

What version are you using Dave?

There's been a fix since 2.0.7 https://jira.mongodb.org/browse/SERVER-6414

which addresses introduces 'much better for memory consumption and performance'.

Related to this previous issue.

David K Storrs

unread,

Oct 10, 2012, 4:26:44 PM10/10/12

to mongodb-user

On Oct 5, 6:19 am, Gianfranco <gianfra...@10gen.com> wrote:
> What version are you using Dave?

Gianfranco,

Pardon the long lag time. We have resolved this now after much
beating of heads. It finally turned out that we had several problems
going on:

- There may have been a piece of faulty hardware on the secondary we
were using; it would reboot randomly when Mongo had issues. After a
hardware swap, this issue stopped; all data synced but indexes were
not built.

- We took the machine out of the RS and built the indexes manually,
then added it back. Probably solved.

- Additionally, the week before my post we upgraded from 2.0 to 2.2,
and one of the config servers was missed; it was still running 2.0
which was keeping the cluster metadata read-only and preventing the
balancer from running. Not related to the replica set issue, but
annoying and it confused the issue with the RS for a time.

Thanks for the pointer about the 6414 bug.

Dave

David K Storrs

unread,

Oct 10, 2012, 4:28:49 PM10/10/12

to mongodb-user

Oh, I forgot to add -- we were also using a stock CentOS install,
which had a 1024 file handle limit and had ext3. After reading
http://www.mongodb.org/display/DOCS/Production+Notes we got it upped
to 8k, used ext4, and did some of the other tweaks as specified
therein. That was probably the biggest piece.

Dave

Gianfranco

unread,

Oct 30, 2012, 6:41:20 AM10/30/12

to mongod...@googlegroups.com

Glad your servers are running smoothly now

Reply all

Reply to author

Forward