Mongos high memory usage

840 views
Skip to first unread message

Tecbot

unread,
Mar 23, 2013, 4:40:13 AM3/23/13
to mongod...@googlegroups.com
Hi guys,

yesterday we upgraded our replica set to sharded cluster.
We have no problems all works fine but we have found that our mongos has a memory usage about 20GB (virtual memory is the same).

Can anyone explain why its so much? 

The ram usage still increase and we think that the times comes if the server will swap, so we want to prevent this.

We use the new 2.4 Version.
We have 2 shard each 3 servers
We have 8 mongos clients (all has 20GB ram usage)
We have 3 config servers

Thanks & Regards
Thomas

Tecbot

unread,
Mar 23, 2013, 5:28:07 AM3/23/13
to mongod...@googlegroups.com
Now, our first application server has swapped and we need to reboot.

tst...@sacbee.com

unread,
Mar 23, 2013, 11:32:55 AM3/23/13
to mongod...@googlegroups.com
I discovered the same thing this morning (see "Random transport endpoint failures since upgrade to 2.4 yesterday (4)")

You're not alone.

tst...@sacbee.com

unread,
Mar 23, 2013, 12:22:08 PM3/23/13
to mongod...@googlegroups.com
Util this issue gets identified and resolved, I've setup a cron task to issue a hard restart to mongos every half-hour. Not optimum but it'll keep it from consuming all available memory.

Tecbot

unread,
Mar 23, 2013, 4:44:30 PM3/23/13
to mongod...@googlegroups.com
Good to now I'm not alone ;-)...

We have do the same thing after our first server has swapped and restart it.

We use also php-fpm and the latest stablebl mongo driver in our application.

I think that maybe some cursors are not closed on the mongos because it grows on operations. But a mongos has no stats for cursors like a mongod instance.

tst...@sacbee.com

unread,
Mar 24, 2013, 3:39:02 AM3/24/13
to mongod...@googlegroups.com
Bump to top of list.

This is a serious issue and it's not going away with 2.4.1... can someone from 10gen please chime in? I've been banging my head against over 20 production mongodb nodes ever since the upgrade. It's feeling pretty dead in these forums for an issue as significant as this. Not how I wanted to spend my weekend.


On Saturday, March 23, 2013 1:40:13 AM UTC-7, Tecbot wrote:

Eliot Horowitz

unread,
Mar 24, 2013, 12:27:07 PM3/24/13
to mongod...@googlegroups.com
Can you send a mongostat output or mms link for the mongos in question?
> --
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
>
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Tecbot

unread,
Mar 24, 2013, 12:43:23 PM3/24/13
to mongod...@googlegroups.com
https://mms.10gen.com/host/detail/e01f5c47ba66ffc132a771e9d4052786

But this only one of 11 mongos which has the problem. If you can see all of our mongos instances then you will see that from web1 to web8 and the 3 workers has this problem and we restart it after some time.

tst...@sacbee.com

unread,
Mar 24, 2013, 12:58:56 PM3/24/13
to mongod...@googlegroups.com, el...@10gen.com
I've disabled the half-hour restart cron and in a couple hours I'll have mongostat output with extremely high memory consumption. In the meantime, here's a /proc/[pid]/status from yesterday before I started automatically issuing service restarts...

Name:   mongos
State:  S (sleeping)
Tgid:   17603
Pid:    17603
PPid:   1
TracerPid:      0
Uid:    997     997     997     997
Gid:    996     996     996     996
FDSize: 1024
Groups: 996
VmPeak:   724400 kB
VmSize:   724396 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    165184 kB
VmRSS:    157332 kB
VmData:   584144 kB
VmStk:       136 kB
VmExe:     13084 kB
VmLib:      4112 kB
VmPTE:      1056 kB
VmSwap:   213500 kB

RAM (512m total) + swap (256m total) was almost entirely consumed by mongos. Given more time Xen would have OOM killed the process (as it was doing with other mongos instances). I believe that was just before I upgraded it from 2.4.0 to 2.4.1 

In the time that it's taken to write this (and wait for it to grow a bit)...
insert  query update delete getmore command  vsize    res faults  netIn netOut  conn repl       time
     6     50     26      0       0       5   400m    13m      0    12k     8k    40  RTR   16:34:28
[snip]
     0     55    553      0       0       9   422m    30m      0   175k     8k    45  RTR   16:55:07

... with res/vsize slowing increasing in between. Unless it's fixed itself since yesterday morning (fingers crossed), this trend will continue (looks like about a meg every one to three minutes). Like Tecbot mentioned, it probably has to do with use (leaking cursors or something) as weekends are slow on our servers and it seems to be consuming slower than it was Friday. All mongos instances interacting with the web-apps are doing this. I haven't noticed the "utility" ones doing it but those are hardly used.

Eliot Horowitz

unread,
Mar 24, 2013, 1:06:40 PM3/24/13
to mongod...@googlegroups.com
Can you also take a netstat snapshot now and later

tst...@sacbee.com

unread,
Mar 24, 2013, 1:12:35 PM3/24/13
to mongod...@googlegroups.com
With just the relevant mongo ports? Otherwise it'll be thousands of time_wait port 80 entries.

Eliot Horowitz

unread,
Mar 24, 2013, 1:54:00 PM3/24/13
to mongod...@googlegroups.com
Just the mongo ports

tst...@sacbee.com

unread,
Mar 24, 2013, 2:30:35 PM3/24/13
to mongod...@googlegroups.com
Latest mongostat...
insert  query update delete getmore command  vsize    res faults  netIn netOut  conn repl       time
     0     58    423      0       0      71   497m    92m      0   135k    15k    62  RTR   18:25:19
     0     32     10      0       0      11   497m    92m      0     5k     6k    63  RTR   18:25:20
     1     28     13      0       0       6   497m    92m      0     5k     5k    63  RTR   18:25:21
     0     15    496      0       0       4   497m    92m      0   150k     3k    63  RTR   18:25:22
     0     33     10      0       0       7   497m    92m      0     5k     6k    63  RTR   18:25:23

res has not gone down at any point. It's simply increasing.

/proc/[pid]/status...
Name:   mongos
State:  S (sleeping)
Tgid:   26931
Pid:    26931
PPid:   1
TracerPid:      0
Uid:    997     997     997     997
Gid:    996     996     996     996
FDSize: 1024
Groups: 996
VmPeak:   631740 kB
VmSize:   497352 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     96596 kB
VmRSS:     96560 kB
VmData:   357100 kB
VmStk:       136 kB
VmExe:     13084 kB
VmLib:      4112 kB
VmPTE:       624 kB
VmSwap:        0 kB
Threads:        63

Netstat dumps....

At 17:25 UTC (an hour ago):
http://pastebin.com/raw.php?i=DHgDa8cC

At 18:17 UTC (a few minutes ago):

Nothing really appears different between those dumps and the connection counts make sense given our topology.

I'd like to restart mongos now as it's turning into a resource hog. Need anything else before I do?

Eliot Horowitz

unread,
Mar 24, 2013, 6:35:48 PM3/24/13
to mongod...@googlegroups.com
Can you open a ticket @ jira.mongodb.org with a full mongos log

tst...@sacbee.com

unread,
Mar 24, 2013, 7:50:31 PM3/24/13
to mongod...@googlegroups.com, el...@10gen.com
Created as SERVER-9108.

... forgot to attach mongos logs... just a minute.

tst...@sacbee.com

unread,
Mar 24, 2013, 8:05:10 PM3/24/13
to mongod...@googlegroups.com, el...@10gen.com
Logs uploaded. I wasn't 100% sure which one it was (I've been doing automatic restarts of mongos, so my log dir is filled with rotated files), so I uploaded the two most likely candidates.

tst...@sacbee.com

unread,
Mar 24, 2013, 9:04:44 PM3/24/13
to mongod...@googlegroups.com, el...@10gen.com
So while we're waiting for this issue to be identified and resolved, what's the recommendation here? Is there any downside to doing periodic mongos restarts (aside from the obvious collection of requests that get the 500 error during the restart)?

Also, I'm starting to think it's in my best interest to downgrade to 2.2. Are the data.n files compatible? If not, would I have to do a dump/restore?

Dan Pasette

unread,
Mar 25, 2013, 12:00:43 AM3/25/13
to mongod...@googlegroups.com
If you haven't created any of the new index types introduced in 2.4,
the data files are compatible. If you choose to downgrade, read the
documentation here:
http://docs.mongodb.org/manual/release-notes/2.4-upgrade/#downgrade-mongodb-from-2-4-to-previous-versions

Do you have MMS set up for this cluster?

Spencer T Brody

unread,
Mar 25, 2013, 1:54:48 PM3/25/13
to mongod...@googlegroups.com
Hi Thomas,
Could you also please file a "Community Private" ticket at jira.mongodb.org and attach logs from a mongos exhibiting this problem?  I'd like to try and compare that to the logs tst...@sacbee.com uploaded and see if there is anything in common between the two.  Thanks!

--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--


name     : "Spencer T Brody", 
  title    : "Software Engineer",
  location : "New York, NY" }

Tecbot

unread,
Mar 25, 2013, 3:25:09 PM3/25/13
to mongod...@googlegroups.com
Hi,

Our mongos instancens logs only the distribution logs and the pings to the config servers, nothing more so I think it is not so helpful.
Also we have no problems with connections failues like tst...@sacbee.com has.
We have only the memory problem wich increase only if we make operations because we have 2 mongos which get no operations and the memory usage remains the same all time.

Tecbot

unread,
Mar 27, 2013, 5:10:05 PM3/27/13
to mongod...@googlegroups.com
any news on this?

gregor

unread,
Mar 28, 2013, 7:29:49 AM3/28/13
to mongod...@googlegroups.com
Our engineers are currently studying this issue - as soon as there is resolution I will update you on this thread. 

On Wednesday, March 27, 2013 9:10:05 PM UTC, Tecbot wrote:
any news on this?

Barrie

unread,
Mar 29, 2013, 11:56:12 AM3/29/13
to mongod...@googlegroups.com
Hi everyone,

Thanks for your input and your patience while we worked on this. We suspect that what's going on here can be attributed to SERVER-8720, which will be fixed in the next minor release of 2.4.x, which is 2.4.2. You can test it out when we release 2.4.2-rc0, but note that as it's a release candidate it's not fully tested and shouldn't be used in a production environment.  And of course, please let us know of any feedback you have when the new version comes out.

Thanks again,

Barrie 

Taylor Fort

unread,
Mar 29, 2013, 3:48:53 PM3/29/13
to mongod...@googlegroups.com
What is the schedule for the 2.4.2 release?  We are contemplating downgrading if this issue is not resolved soon.  

Dan Pasette

unread,
Mar 29, 2013, 5:49:34 PM3/29/13
to mongod...@googlegroups.com
Hi Taylor,

We are planning on a release candidate for 2.4.2 by the middle or end of next week.  Release is tracked here in JIRA: https://jira.mongodb.org/browse/SERVER/fixforversion/12405

Dan

Taylor Fort

unread,
Mar 29, 2013, 7:17:48 PM3/29/13
to mongod...@googlegroups.com
If we are not using any new features in 2.4, can a 2.2 mongos successfully run with a 2.4 mongod sharded instance or do we have to rollback everything?

gregor

unread,
Apr 3, 2013, 11:51:11 AM4/3/13
to mongod...@googlegroups.com
Hi 
Mixing 2.2 and 2.4 in clusters is only recommended for short periods during upgrade/downgrade. There could be unknown issues that emerge from running mixed clusters for long periods so I would recommend a downgrade of the entire cluster unfortunately if that is what you decide to do. 
Thanks
Gregor
Reply all
Reply to author
Forward
0 new messages