significant RSS increase, and performance decrease, moving from node 0.8.x to 0.10.x

1,008 views
Skip to first unread message

spollack

unread,
Sep 17, 2013, 9:02:55 PM9/17/13
to nod...@googlegroups.com
I'm trying to track down a significant RSS increase, and performance decrease, moving from node 0.8.x to 0.10.x.
 
WIth the backend worker portion of our application doing its standard workload, under node 0.8.23, our RSS stabilizes under 400MB. The workload completes in roughly 12-13 hours. 

With the same application code and workload, under node 0.10.17 (no other changes whatsoever), our RSS rapidly grows to ~1GB (within the first 15 minutes or so). This is approximately the amount memory available on this machine, and so we thrash and performance degrades. The workload takes roughly 20-27 hours to finish. 

Now my first suspect is the change in garbage collection behavior between the major versions of node. 

Our application is run with --expose_gc and calls system.gc() 5 times in a row, every 60 seconds. This strategy worked well under node 0.8.x. 

After seeing the poor results under node 0.10.x, i've tried a handful of gc-related experiments, such as increasing the frequency of system.gc() calls to 5 times in a row every 5 seconds, which slowed the RSS increase but didn't help overall performance. I also tried the --always_compact flag, with similar results (somewhat less RSS memory usage, but overall performance still not close to before). 

Under both 0.8.x and 0.10.x, heap memory usage is low, in the 100MB range. So it's non-heap memory usage that is getting us. 

Potentially some other change in node 0.10.x is using way more non-heap memory? 

Ideas on what's going on here? 

Thanks,
Seth

Ben Noordhuis

unread,
Sep 18, 2013, 5:12:35 AM9/18/13
to nod...@googlegroups.com
You're correct in that there's been a number of changes to V8's
garbage collection heuristics. The short version is that it tries to
be as lazy as possible now and that shows itself in the way it will
usually prefer to grow the heap instead of performing a major garbage
collection cycle.

If you're running in a memory-constrained environment, try starting
node with --max_old_space_size=<n> (where <n> is in megabytes.) The
default is 1500 or 1900.

spollack

unread,
Sep 19, 2013, 12:07:23 PM9/19/13
to nod...@googlegroups.com
Thanks Ben. I've been experimenting with --max_old_space_size, but so far this hasn't resolved my issue. 

In the test i ran last night in the approximate sweet spot between too little memory (crash) and too much memory (thrash) for our conditions, performance was averaging around 12.5 minutes per bundle of work, still far from node 0.8.x, where we average around 9 minutes per bundle. This was with --max_old_space_size=208. I'm trying another run now with --max_old_space_size=192, as at 208 we still eventually exceeded the max memory of the virtual machine (1GB). 

Just to confirm my understanding: max_old_space_size helps at all with this excessive RSS usage issue by forcing more full GCs, and thus cleaning up non-heap allocations associated with heap objects (such as Buffers), correct? 

How much awareness does V8 have of non-heap memory usage? I see there is a --max_executable_size flag available. Would this help here? 

Is it worth looking at what non-heap allocations are building up, and if so do you have any tool recommendations for this? 

Anything else you recommend trying? 

Thanks!

Ben Noordhuis

unread,
Sep 19, 2013, 1:14:22 PM9/19/13
to nod...@googlegroups.com
On Thu, Sep 19, 2013 at 6:07 PM, spollack <se...@pollackphoto.com> wrote:
>
> Thanks Ben. I've been experimenting with --max_old_space_size, but so far this hasn't resolved my issue.
>
> In the test i ran last night in the approximate sweet spot between too little memory (crash) and too much memory (thrash) for our conditions, performance was averaging around 12.5 minutes per bundle of work, still far from node 0.8.x, where we average around 9 minutes per bundle. This was with --max_old_space_size=208. I'm trying another run now with --max_old_space_size=192, as at 208 we still eventually exceeded the max memory of the virtual machine (1GB).
>
> Just to confirm my understanding: max_old_space_size helps at all with this excessive RSS usage issue by forcing more full GCs, and thus cleaning up non-heap allocations associated with heap objects (such as Buffers), correct?

Yes, that's correct. 1 GB is a bit on the low side these days,
especially when running on a virtualized machine where the physical
memory may be overcommitted. The easiest way to fix your issues is
probably to add more memory.

> How much awareness does V8 have of non-heap memory usage?

Node.js provides hints about off-heap memory usage to V8 which gives
V8 a better picture of the total amount of memory that's being used.

> I see there is a --max_executable_size flag available. Would this help here?

Probably not. The generated code usually doesn't amount to more than
a few megabytes unless your application is really big (as in: millions
of lines of code) and it's allocated on demand.

> Is it worth looking at what non-heap allocations are building up, and if so do you have any tool recommendations for this?
>
> Anything else you recommend trying?
>
> Thanks!

Maybe have a look at node-heapdump or node-webkit-agent. They're both
tools for creating and diffing heap snapshots. Finding out what is
using memory is a good first step.

spollack

unread,
Sep 19, 2013, 2:32:34 PM9/19/13
to nod...@googlegroups.com


On Thursday, September 19, 2013 10:14:22 AM UTC-7, Ben Noordhuis wrote:

Yes, that's correct.  1 GB is a bit on the low side these days,
especially when running on a virtualized machine where the physical
memory may be overcommitted.  The easiest way to fix your issues is
probably to add more memory.

Our app is running in Heroku, and the only dyno size options they provide right now are 512MB or 1024MB. We could look at switching hosting providers, but that is a bigger undertaking. So I'm definitely going to try to get node 0.10.x to perform similar to 0.8.x for us if at all possible. 
 

Maybe have a look at node-heapdump or node-webkit-agent.  They're both
tools for creating and diffing heap snapshots.  Finding out what is
using memory is a good first step.

Thanks. I'll do some more experimentation with different --max_old_space_size settings, and if that isn't working, i'll try looking at the heapdumps to see what i can see. Do either of these tools give direct insight into non-heap allocations? 

Thanks,
Seth

Ben Noordhuis

unread,
Sep 19, 2013, 2:52:22 PM9/19/13
to nod...@googlegroups.com
Yes and no. They're strictly for snapshotting the V8 heap but
embedders can provide information about external objects and node.js
does.

For example, a Buffer consumes only a few bytes of on-heap memory but
potentially many megabytes of off-heap memory. Node.js provides some
metadata to V8 that ends up in the heap snapshot and lets e.g.
Chrome's inspector tool show the object's true size.

Bruno Jouhier

unread,
Sep 19, 2013, 3:46:02 PM9/19/13
to nod...@googlegroups.com
Hi Seth,

Are you using streamline in fibers mode for this test? If so it may be worth upgrading to 0.11.x and trying generators mode.

We had a high memory consumption in one of our processes. In fibers mode we had very large oscillations on memory usage with high peaks and it was a bit difficult to tell whether there was a memory leak or not. It looked more like a fragmentation issue that was delaying release of memory because we were getting sudden drops from 1,200MB to 300MB every 10 minutes. In generators mode we got a lower memory usage, a very flat memory curve and no drift at all (no memory leak :-). Memory usage kept oscillating between 210 MB and 240 MB throughout the test.

Bruno

Seth Pollack

unread,
Sep 19, 2013, 3:52:40 PM9/19/13
to nod...@googlegroups.com
Hi Bruno, thanks for the suggestions. yes we are using streamline fibers mode. 

Several questions: 
1) regarding the fiber mode memory oscillations that you saw: was this under node 0.10.x? and do you know if you had the same behavior under node 0.8.x? Also, the memory numbers you are quoting here are RSS usage, right? 
2) do you know if streamline callbacks mode also avoids the memory oscillations? 

Thanks,
Seth


--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/jD3fWckZgI8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bruno Jouhier

unread,
Sep 19, 2013, 6:17:14 PM9/19/13
to nod...@googlegroups.com
Our tests were done with 0.10 and 0.11. We did not test with 0.8 because we upgraded to 0.10 a while ago.

I looked at the figures again. The large oscillations and big drop in fibers mode that I gave were actually on the "heap used". The RSS figures oscillate too but the drop is smaller. First RSS drop is from  1,500 MB to 1,000 MB and it does not seem to recover fully because the second drop is from 1,600 MB to 1,270 MB. I did not run it long enough because it was slowing down my machine but it looks like it was recovering heap ok but not RSS.

With generators, there are small oscillations both on RSS and heap used and no drift.

In callbacks mode RSS oscillates between 320 and 400 MB and "heap used" between 230 and 300 MB. No drift.

I did not let the test run very long in all modes because my main question was on whether there was a memory leak in our code or not and the generators test convinced me that our code was fine but that fibers was probably creating some fragmentation.

Note that I ran these tests before fixing the streamline issue that you submitted. So the fibers anomaly was probably exacerbated by the fact that too many fibers were created (but normally this should not cause a memory drift because they the extra fibers should get gced - but maybe not fast enough).

I just ran the test again (with a different dataset and different code). Here are the results:

FIBERS (node 0.10.12):

{"rss":314937344,"heapTotal":283053568,"heapUsed":254641840,"pass":0,"time":29372},
{"rss":534265856,"heapTotal":499651840,"heapUsed":432355680,"pass":1,"time":29911},
{"rss":620900352,"heapTotal":581138688,"heapUsed":422644152,"pass":2,"time":29221},
{"rss":618565632,"heapTotal":578115072,"heapUsed":416696632,"pass":3,"time":28907},
{"rss":639131648,"heapTotal":596605696,"heapUsed":410939696,"pass":4,"time":29213},
{"rss":639836160,"heapTotal":596605696,"heapUsed":398062944,"pass":5,"time":29281},
{"rss":642822144,"heapTotal":598669568,"heapUsed":459811952,"pass":6,"time":28656},
{"rss":606179328,"heapTotal":562587904,"heapUsed":471224040,"pass":7,"time":29660},
{"rss":574820352,"heapTotal":532709888,"heapUsed":417162456,"pass":8,"time":29323},
{"rss":637779968,"heapTotal":592935128,"heapUsed":406728656,"pass":9,"time":29772},
{"rss":639041536,"heapTotal":593979096,"heapUsed":470931984,"pass":10,"time":29700},
{"rss":601161728,"heapTotal":556865496,"heapUsed":478325752,"pass":11,"time":29406},
{"rss":576159744,"heapTotal":532087000,"heapUsed":408100176,"pass":12,"time":29582},
{"rss":643297280,"heapTotal":598118872,"heapUsed":401238528,"pass":13,"time":29282},
{"rss":645103616,"heapTotal":599252256,"heapUsed":465132296,"pass":14,"time":29024},
{"rss":603955200,"heapTotal":558645832,"heapUsed":465093840,"pass":15,"time":29542},
{"rss":579842048,"heapTotal":534875208,"heapUsed":402461880,"pass":16,"time":29927},
{"rss":645910528,"heapTotal":599839048,"heapUsed":463863312,"pass":17,"time":30046},
{"rss":603533312,"heapTotal":559653704,"heapUsed":474944280,"pass":18,"time":30995},
{"rss":579911680,"heapTotal":534899272,"heapUsed":409166288,"pass":19,"time":30505},
{"rss":643850240,"heapTotal":597811272,"heapUsed":474930392,"pass":20,"time":29379},
{"rss":603017216,"heapTotal":557601864,"heapUsed":404188824,"pass":21,"time":29771},
{"rss":646098944,"heapTotal":599917680,"heapUsed":464544432,"pass":22,"time":29596},

GENERATORS (node 0.11.5):

{"rss":393945088,"heapTotal":360213504,"heapUsed":278081672,"pass":0,"time":48933},
{"rss":369987584,"heapTotal":335567360,"heapUsed":308752168,"pass":1,"time":46810},
{"rss":402051072,"heapTotal":366405120,"heapUsed":333544944,"pass":2,"time":48393},
{"rss":409866240,"heapTotal":373604608,"heapUsed":350195848,"pass":3,"time":46794},
{"rss":403386368,"heapTotal":366393088,"heapUsed":342079296,"pass":4,"time":48155},
{"rss":396079104,"heapTotal":357189888,"heapUsed":264438320,"pass":5,"time":47397},
{"rss":368582656,"heapTotal":330419712,"heapUsed":292616840,"pass":6,"time":47927},
{"rss":396505088,"heapTotal":359205632,"heapUsed":330376792,"pass":7,"time":47052},
{"rss":390201344,"heapTotal":352018176,"heapUsed":323006624,"pass":8,"time":47252},
{"rss":376860672,"heapTotal":338663168,"heapUsed":309587504,"pass":9,"time":46469},
{"rss":378093568,"heapTotal":339695104,"heapUsed":315238776,"pass":10,"time":47025},
{"rss":386592768,"heapTotal":348910336,"heapUsed":329006168,"pass":11,"time":46781},
{"rss":384413696,"heapTotal":344818688,"heapUsed":314528152,"pass":12,"time":48344},
{"rss":375791616,"heapTotal":336635392,"heapUsed":292589752,"pass":13,"time":46985},
{"rss":390516736,"heapTotal":352030208,"heapUsed":262808992,"pass":14,"time":47844},

Generators are slower but use less memory and mem usage seems more stable.

Bruno

spollack

unread,
Sep 19, 2013, 7:15:23 PM9/19/13
to nod...@googlegroups.com
Thanks Bruno, that's helpful. I may run some experiments under different streamline modes to compare.

The good news is that i am seeing better results now on my workload under node 0.10.x with max_old_space_size tightened to 160MB. RSS is still quite a bit higher than node 0.8.x but it is staying under 1GB so far, and performance looks comparable to node 0.8.x. fingers crossed here. 

spollack

unread,
Sep 26, 2013, 1:34:08 PM9/26/13
to nod...@googlegroups.com
I got a chance to do some further testing on this. 

For our scenario, setting max_old_space_size certainly helped with both memory usage and performance under node 0.10.x, but over our full workload, under no value of max_old_space_size did performance match 0.8.x, and at some point sooner or later during the run RSS always exceeded available memory. 

(Bruno, i also tried some experiments with different streamline runtimes. using callbacks definitely used less memory than fibers, but, at least with only that change and no max_old_space_size override, we still went into swap. I also tried running under node 0.11.x but a number of other package dependencies of our server would not run there, so no luck trying generators, at least without more work.)

My new (or additional) suspect is worse non-heap memory fragmentation under node 0.10.x vs node 0.8.x. 

Running a small subset of our workload, and then once it finishes running system.gc() ten or more times to reclaim memory, head to head between the two node versions: 
under 0.10.x: we end with RSS ~400MB, heap total ~80MB, heap used ~35MB
under 0.8.x: we end with RSS ~145MB, heap total ~55MB, heap used ~45MB

So, even though node 0.10.x ended up after cleanup with slightly less heap used, RSS was much larger. 

This is also supported by a heapdump (via node-heapdump) from the end of a 0.10.x run with the subset workload test above, after all the system.gc() cleanup, where the sum of all "retained size" as reported by chrome was <150MB, whereas the RSS was close to 400MB. 

So, some questions to validate my assumptions above:
1) Ben, does "retained size" reported in the heapdump include non-heap allocations as hinted from node to V8, as you described above? Is the difference between the sum of retained size, vs. RSS total, due to fragmentation only, or could it be something else? 
2) does system.gc() do full garbage collection? is running that ten times a fair way to reclaim most/all recoverable memory? 
3) did the memory management of non-heap memory (say, for Buffers) change from node 0.8.x to 0.10.x? 

Btw, the subset workload does https external data fetches, buffer/zlib handling, database writes, and https data posts, over many items, running 25 in parallel. I will try to create a simplified test case, but does anything here jump out as suspicious in terms of increased memory usage in node 0.10.x vs. 0.8.x? 

Thanks,
Seth


Ben Noordhuis

unread,
Sep 27, 2013, 3:04:05 AM9/27/13
to nod...@googlegroups.com
For buffers, yes.

> Is the difference between the sum of retained size, vs. RSS total, due to
> fragmentation only, or could it be something else?

Fragmentation or traditionally malloc'd memory. I'm not intimately
familiar with node-fibers but I assume it uses a fair bit of memory.
Same goes for any other native add-ons you may be using.

> 2) does system.gc() do full garbage collection? is running that ten times a
> fair way to reclaim most/all recoverable memory?

It does a major garbage collection. Running it once or ten times in a
row should make no observable difference.

> 3) did the memory management of non-heap memory (say, for Buffers) change
> from node 0.8.x to 0.10.x?

Not really. The V8 GC however did change a great deal.
Reply all
Reply to author
Forward
0 new messages