Mongos high memory usage

Tecbot

unread,

Mar 23, 2013, 4:40:13 AM3/23/13

to mongod...@googlegroups.com

Hi guys,

yesterday we upgraded our replica set to sharded cluster.

We have no problems all works fine but we have found that our mongos has a memory usage about 20GB (virtual memory is the same).

Can anyone explain why its so much?

The ram usage still increase and we think that the times comes if the server will swap, so we want to prevent this.

We use the new 2.4 Version.

We have 2 shard each 3 servers

We have 8 mongos clients (all has 20GB ram usage)

We have 3 config servers

Thanks & Regards

Thomas

Tecbot

unread,

Mar 23, 2013, 5:28:07 AM3/23/13

to mongod...@googlegroups.com

Now, our first application server has swapped and we need to reboot.

tst...@sacbee.com

unread,

Mar 23, 2013, 11:32:55 AM3/23/13

to mongod...@googlegroups.com

I discovered the same thing this morning (see "Random transport endpoint failures since upgrade to 2.4 yesterday (4)")

You're not alone.

tst...@sacbee.com

unread,

Mar 23, 2013, 12:22:08 PM3/23/13

to mongod...@googlegroups.com

Util this issue gets identified and resolved, I've setup a cron task to issue a hard restart to mongos every half-hour. Not optimum but it'll keep it from consuming all available memory.

Tecbot

unread,

Mar 23, 2013, 4:44:30 PM3/23/13

to mongod...@googlegroups.com

Good to now I'm not alone ;-)...

We have do the same thing after our first server has swapped and restart it.

We use also php-fpm and the latest stablebl mongo driver in our application.

I think that maybe some cursors are not closed on the mongos because it grows on operations. But a mongos has no stats for cursors like a mongod instance.

tst...@sacbee.com

unread,

Mar 24, 2013, 3:39:02 AM3/24/13

to mongod...@googlegroups.com

Bump to top of list.

This is a serious issue and it's not going away with 2.4.1... can someone from 10gen please chime in? I've been banging my head against over 20 production mongodb nodes ever since the upgrade. It's feeling pretty dead in these forums for an issue as significant as this. Not how I wanted to spend my weekend.

On Saturday, March 23, 2013 1:40:13 AM UTC-7, Tecbot wrote:

Eliot Horowitz

unread,

Mar 24, 2013, 12:27:07 PM3/24/13

to mongod...@googlegroups.com

Can you send a mongostat output or mms link for the mongos in question?

> --
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
>
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Tecbot

unread,

Mar 24, 2013, 12:43:23 PM3/24/13

to mongod...@googlegroups.com

https://mms.10gen.com/host/detail/e01f5c47ba66ffc132a771e9d4052786

But this only one of 11 mongos which has the problem. If you can see all of our mongos instances then you will see that from web1 to web8 and the 3 workers has this problem and we restart it after some time.

tst...@sacbee.com

unread,

Mar 24, 2013, 12:58:56 PM3/24/13

to mongod...@googlegroups.com, el...@10gen.com

I've disabled the half-hour restart cron and in a couple hours I'll have mongostat output with extremely high memory consumption. In the meantime, here's a /proc/[pid]/status from yesterday before I started automatically issuing service restarts...

Name: mongos

State: S (sleeping)

Tgid: 17603

Pid: 17603

PPid: 1

TracerPid: 0

Uid: 997 997 997 997

Gid: 996 996 996 996

FDSize: 1024

Groups: 996

VmPeak: 724400 kB

VmSize: 724396 kB

VmLck: 0 kB

VmPin: 0 kB

VmHWM: 165184 kB

VmRSS: 157332 kB

VmData: 584144 kB

VmStk: 136 kB

VmExe: 13084 kB

VmLib: 4112 kB

VmPTE: 1056 kB

VmSwap: 213500 kB

RAM (512m total) + swap (256m total) was almost entirely consumed by mongos. Given more time Xen would have OOM killed the process (as it was doing with other mongos instances). I believe that was just before I upgraded it from 2.4.0 to 2.4.1

In the time that it's taken to write this (and wait for it to grow a bit)...

insert query update delete getmore command vsize res faults netIn netOut conn repl time

6 50 26 0 0 5 400m 13m 0 12k 8k 40 RTR 16:34:28

[snip]

0 55 553 0 0 9 422m 30m 0 175k 8k 45 RTR 16:55:07

... with res/vsize slowing increasing in between. Unless it's fixed itself since yesterday morning (fingers crossed), this trend will continue (looks like about a meg every one to three minutes). Like Tecbot mentioned, it probably has to do with use (leaking cursors or something) as weekends are slow on our servers and it seems to be consuming slower than it was Friday. All mongos instances interacting with the web-apps are doing this. I haven't noticed the "utility" ones doing it but those are hardly used.

Eliot Horowitz

unread,

Mar 24, 2013, 1:06:40 PM3/24/13

to mongod...@googlegroups.com

Can you also take a netstat snapshot now and later

tst...@sacbee.com

unread,

Mar 24, 2013, 1:12:35 PM3/24/13

to mongod...@googlegroups.com

With just the relevant mongo ports? Otherwise it'll be thousands of time_wait port 80 entries.

Eliot Horowitz

unread,

Mar 24, 2013, 1:54:00 PM3/24/13

to mongod...@googlegroups.com

Just the mongo ports

tst...@sacbee.com

unread,

Mar 24, 2013, 2:30:35 PM3/24/13

to mongod...@googlegroups.com

Latest mongostat...

insert query update delete getmore command vsize res faults netIn netOut conn repl time

0 58 423 0 0 71 497m 92m 0 135k 15k 62 RTR 18:25:19

0 32 10 0 0 11 497m 92m 0 5k 6k 63 RTR 18:25:20

1 28 13 0 0 6 497m 92m 0 5k 5k 63 RTR 18:25:21

0 15 496 0 0 4 497m 92m 0 150k 3k 63 RTR 18:25:22

0 33 10 0 0 7 497m 92m 0 5k 6k 63 RTR 18:25:23

res has not gone down at any point. It's simply increasing.

/proc/[pid]/status...

Name: mongos

State: S (sleeping)

Tgid: 26931

Pid: 26931

PPid: 1

TracerPid: 0

Uid: 997 997 997 997

Gid: 996 996 996 996

FDSize: 1024

Groups: 996

VmPeak: 631740 kB

VmSize: 497352 kB

VmLck: 0 kB

VmPin: 0 kB

VmHWM: 96596 kB

VmRSS: 96560 kB

VmData: 357100 kB

VmStk: 136 kB

VmExe: 13084 kB

VmLib: 4112 kB

VmPTE: 624 kB

VmSwap: 0 kB

Threads: 63

Netstat dumps....

At 17:25 UTC (an hour ago):

http://pastebin.com/raw.php?i=DHgDa8cC

At 18:17 UTC (a few minutes ago):

http://pastebin.com/raw.php?i=KbghFV7H

Nothing really appears different between those dumps and the connection counts make sense given our topology.

I'd like to restart mongos now as it's turning into a resource hog. Need anything else before I do?

Eliot Horowitz

unread,

Mar 24, 2013, 6:35:48 PM3/24/13

to mongod...@googlegroups.com

Can you open a ticket @ jira.mongodb.org with a full mongos log

tst...@sacbee.com

unread,

Mar 24, 2013, 7:50:31 PM3/24/13

to mongod...@googlegroups.com, el...@10gen.com

Created as SERVER-9108.

... forgot to attach mongos logs... just a minute.

tst...@sacbee.com

unread,

Mar 24, 2013, 8:05:10 PM3/24/13

to mongod...@googlegroups.com, el...@10gen.com

Logs uploaded. I wasn't 100% sure which one it was (I've been doing automatic restarts of mongos, so my log dir is filled with rotated files), so I uploaded the two most likely candidates.

tst...@sacbee.com

unread,

Mar 24, 2013, 9:04:44 PM3/24/13

to mongod...@googlegroups.com, el...@10gen.com

So while we're waiting for this issue to be identified and resolved, what's the recommendation here? Is there any downside to doing periodic mongos restarts (aside from the obvious collection of requests that get the 500 error during the restart)?

Also, I'm starting to think it's in my best interest to downgrade to 2.2. Are the data.n files compatible? If not, would I have to do a dump/restore?

Dan Pasette

unread,

Mar 25, 2013, 12:00:43 AM3/25/13

to mongod...@googlegroups.com

If you haven't created any of the new index types introduced in 2.4,
the data files are compatible. If you choose to downgrade, read the
documentation here:
http://docs.mongodb.org/manual/release-notes/2.4-upgrade/#downgrade-mongodb-from-2-4-to-previous-versions

Do you have MMS set up for this cluster?

Spencer T Brody

unread,

Mar 25, 2013, 1:54:48 PM3/25/13

to mongod...@googlegroups.com

Hi Thomas,

Could you also please file a "Community Private" ticket at jira.mongodb.org and attach logs from a mongos exhibiting this problem? I'd like to try and compare that to the logs tst...@sacbee.com uploaded and see if there is anything in common between the two. Thanks!

--

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

{ name : "Spencer T Brody",

title : "Software Engineer",

location : "New York, NY" }

Tecbot

unread,

Mar 25, 2013, 3:25:09 PM3/25/13

to mongod...@googlegroups.com

Hi,

Our mongos instancens logs only the distribution logs and the pings to the config servers, nothing more so I think it is not so helpful.

Also we have no problems with connections failues like tst...@sacbee.com has.

We have only the memory problem wich increase only if we make operations because we have 2 mongos which get no operations and the memory usage remains the same all time.

Tecbot

unread,

Mar 27, 2013, 5:10:05 PM3/27/13

to mongod...@googlegroups.com

any news on this?

gregor

unread,

Mar 28, 2013, 7:29:49 AM3/28/13

to mongod...@googlegroups.com

Our engineers are currently studying this issue - as soon as there is resolution I will update you on this thread.

On Wednesday, March 27, 2013 9:10:05 PM UTC, Tecbot wrote:

any news on this?

Barrie

unread,

Mar 29, 2013, 11:56:12 AM3/29/13

to mongod...@googlegroups.com

Hi everyone,

Thanks for your input and your patience while we worked on this. We suspect that what's going on here can be attributed to SERVER-8720, which will be fixed in the next minor release of 2.4.x, which is 2.4.2. You can test it out when we release 2.4.2-rc0, but note that as it's a release candidate it's not fully tested and shouldn't be used in a production environment. And of course, please let us know of any feedback you have when the new version comes out.

Thanks again,

Barrie

Taylor Fort

unread,

Mar 29, 2013, 3:48:53 PM3/29/13

to mongod...@googlegroups.com

What is the schedule for the 2.4.2 release? We are contemplating downgrading if this issue is not resolved soon.

Dan Pasette

unread,

Mar 29, 2013, 5:49:34 PM3/29/13

to mongod...@googlegroups.com

Hi Taylor,

We are planning on a release candidate for 2.4.2 by the middle or end of next week. Release is tracked here in JIRA: https://jira.mongodb.org/browse/SERVER/fixforversion/12405

Dan

Taylor Fort

unread,

Mar 29, 2013, 7:17:48 PM3/29/13

to mongod...@googlegroups.com

If we are not using any new features in 2.4, can a 2.2 mongos successfully run with a 2.4 mongod sharded instance or do we have to rollback everything?

gregor

unread,

Apr 3, 2013, 11:51:11 AM4/3/13

to mongod...@googlegroups.com

Hi

Mixing 2.2 and 2.4 in clusters is only recommended for short periods during upgrade/downgrade. There could be unknown issues that emerge from running mixed clusters for long periods so I would recommend a downgrade of the entire cluster unfortunately if that is what you decide to do.