Node.js memory, GC and performance

8,612 views
Skip to first unread message

Alexey Petrushin

unread,
Jul 12, 2012, 9:10:50 PM7/12/12
to nod...@googlegroups.com

There are rumors that current Node.js (or, more exactly V8 GC) performs badly when there are lots of JS objects and memory used.

Can You please explain what exatly is the problem - lots of objects or lots of properties on one object (or array)?

Maybe there are some benchmarks, would be interesting to see actual code and numbers.

As far as I know the main problem - lots of properties on one object, not lots of objects itself (although I'm not sure). If so - would be the in-memory graph database (about couple of hundreds of properties on each node at max) a good case?

Also I heard that latest versions of V8 has improved GC and that it solved some parts of this problems - is this true, and when it will be available in Node.js?

Ben Noordhuis

unread,
Jul 12, 2012, 9:32:10 PM7/12/12
to nod...@googlegroups.com
On Fri, Jul 13, 2012 at 3:10 AM, Alexey Petrushin
<alexey.p...@gmail.com> wrote:
> There are rumors that current Node.js (or, more exactly V8 GC) performs
> badly when there are lots of JS objects and memory used.
>
> Can You please explain what exatly is the problem - lots of objects or lots
> of properties on one object (or array)?

This is one of those questions to which there is no single good answer.

The V8 GC is fairly unsophisticated as garbage collectors go (sorry,
Vyacheslav) and some usage patterns don't play well with it. That's
not a knock on V8, by the way, most garbage collectors have similar
issues.

A good example of a usage pattern that taxes the V8 GC is the http
module in node. It creates lots of objects with different, often
overlapping lifecycles and it breaks a number of (otherwise very
reasonable) assumptions that most GCs make, f.e. that the tenured
generation has few pointers to the young generation.

> Maybe there are some benchmarks, would be interesting to see actual code and
> numbers.
>
> As far as I know the main problem - lots of properties on one object, not
> lots of objects itself (although I'm not sure). If so - would be the
> in-memory graph database (about couple of hundreds of properties on each
> node at max) a good case?
>
> Also I heard that latest versions of V8 has improved GC and that it solved
> some parts of this problems - is this true, and when it will be available in
> Node.js?

0.8 releases ship with V8 3.11.10.*. I believe the improvements you
refer to are part of V8 3.12.0 and newer.

Joran Greef

unread,
Jul 13, 2012, 3:02:08 AM7/13/12
to nod...@googlegroups.com
To get a very rough idea of how much time your program spends in gc, you can expose the gc by running node with the --nouse_idle_notification --expose_gc flags and then call gc manually with "global.gc()" and time how long that takes.

I have a program where a single JS object was being used as a hash table with a few million entries. When the gc would run, as far as I am aware, it would iterate over every one of those entries and it would do this every few seconds, each time pausing for about 500ms. I switched to a Buffer backed open addressed hash table which is also more efficient in other respects.

Yi Tan

unread,
Jul 13, 2012, 3:37:46 AM7/13/12
to nod...@googlegroups.com
Hi Joran,

We are facing a similar performance problem as your million-entry-object. 

May I ask you to explain a bit detail about the implementation of ”Buffer backed open addressed hash table “

Many thanks,

ty


2012/7/13 Joran Greef <jo...@ronomon.com>

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Joran Greef

unread,
Jul 13, 2012, 4:28:10 AM7/13/12
to nod...@googlegroups.com
ty, it's just an implementation of the dense hash map described by http://sparsehash.googlecode.com/svn/trunk/doc/implementation.html

It stores the keys and values side by side, in a single buffer, using open addressing and triangular probing (requires the number of buckets to be a power of two in order for the triangular probing to reach every bucket), and a tabulation hash (http://en.wikipedia.org/wiki/Tabulation_hashing) with pre-generated random tables (the key length is fixed so this kind of hash is ideal for that and very fast with good distribution). The buffer is doubled in size whenever the (number of entries plus the number of deleted entries) over the (number of buckets) is more than 0.6 to avoid too much clustering.


Here's a discussion on the hash function in v8-users: https://groups.google.com/forum/#!msg/v8-users/zGCS_wEMawU/6mConTiBUyMJ
2012/7/13 Joran Greef <jo...@ronomon.com>

Vyacheslav Egorov

unread,
Jul 13, 2012, 7:37:17 AM7/13/12
to nod...@googlegroups.com
> To get a very rough idea of how much time your program spends in gc, you can expose the gc by running node with the --nouse_idle_notification --expose_gc flags and then call gc manually with "global.gc()" and time how long that takes.

This will give you something that is _far_ from a realistic estimate.
First of all contrary to want you might think
--nouse-idle-notification does not disable automatic GC in V8. What it
does is tells V8 not to perform GC actions (be it advance the sweeper,
incremental marker or do a full
GC) in response to IdleNotifications that embedder (node.js in this
case) sends to V8. If V8 sees fit (e.g. on allocation failure) it
_will_ perform it and you can't disable that. Calling gc() force a
full non-incremental collection which is much more expensive than (and
is quite different from) from incremental collections that V8 tries to
use during the normal run of your program.

Now about large object with a few million entries. The biggest problem
here is that you stash all your entries into a single object, the
biggest problem will be spending time scanning that object during GC
(V8 does split marking into step, but it does not split scanning of a
single object into steps). If you allocate many objects it should not
be a problem for V8's incremental GC:

here is an excerpt from a --trace-gc output for a small stupid-stupid
test I've wrote (https://gist.github.com/3104414):

100000 keys created (total keys: 20800000)
111188 ms: Mark-sweep 4126.8 (4174.9) -> 3978.6 (4042.0) MB, 89 ms
(+ 5605 ms in 2232 steps since start of marking, biggest step
22.033936 ms) [StackGuard GC request] [GC in old space requested].

As you can see there is no one huge pause. The work is done in increments.

Of course if you can avoid pauses altogether by allocating your huge
chunks of memory outside of GC managed heap it's even better. My
personal position here is that anybody who allocates 4GBs of GC
managed objects is doing something wrong.

[of course I am not claiming that there is nothing to improve here on
V8 side: marking, sweeping and evacuation steps can be optimized
further and even parallelized internally; some portions can be even
trivially made concurrent].

[@Ben: it is not as sophisticated as a concurrent soft-real time
collector would be, hard to dispute that. It is however much more
sophisticated than a straightforward mark-sweep would be :-) I still
think that the best fit for node.js might be a combination of GC with
a region based memory management]

--
Vyacheslav Egorov
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com

Joran Greef

unread,
Jul 13, 2012, 7:48:13 AM7/13/12
to nod...@googlegroups.com
Vyacheslav, to clarify, the suggestion of timing gc() was made in the context of a program with one large object containing a few million entries, in which case that would give a "very rough idea". In that case, there would be little difference between full collection and incremental collection, since the majority of work would be indivisible.
> nodejs+unsubscribe@googlegroups.com

Yi Tan

unread,
Jul 13, 2012, 10:12:23 AM7/13/12
to nod...@googlegroups.com
Hi Joran,

Thank you so much for the information!

Regards,

ty


2012/7/13 Joran Greef <jo...@ronomon.com>

Alexey Petrushin

unread,
Jul 14, 2012, 3:19:02 PM7/14/12
to nod...@googlegroups.com
Thanks for help.

>  If you allocate many objects it should not be a problem for V8's incremental GC:  
That's very good news for me.

Jimb Esser

unread,
Jul 15, 2012, 5:13:19 PM7/15/12
to nod...@googlegroups.com
Just to add some anecdotal experience to the contrary... admittedly we're still on node 0.6, and it sounds like the GC has had a little love between then and the latest version, so this may not be totally relevant.

We do no giant allocations, just lots of JS objects, some of which are constantly being modified (using a native physics module to modify the JS objects and synchronize them to other servers).  I'm looking at one specific node process that's been running 20 minutes or so (we don't let them run much longer than that, because the GC starts going crazy, and it takes less time to serialize the entire state of all objects into a new process than a single GC takes at that point).  It's JS Heap size is about 400mb (used and total).  For running a manual garbage collect, since getting command line arguments on our launched processes is a pain, I just use the one exposed by the mtrace native module, it just calls "while (!V8::IdleNotification());", but I'm assuming that's effectively the same thing.

Anyway, running a manual garbage collect on this server took 1470ms.  Also, looking at the logs, about once every 5-10 seconds, the server stalls for around 1.4s, which the profiler shows as time spent in a garbage collect.  This is definitely contrary to the comments stated above (we allocate many small objects, no giant objects, and it regularly stalls for a full GC).  We usually see times of around 500ms for GCs, but, as I said, this particular process has been running longer than most of ours.

Side note:  this has been said before, but it's worth repeating - don't use node for hard-real-time apps.  We are, and it's kind of working, but it's rather insane, and the GC is really not happy with us.  If it wasn't so easy to develop on, we would have switched to an all native server for this part of our server stack long ago ;).

  Jimb Esser
  Cloud Party, Inc

Vyacheslav Egorov

unread,
Jul 16, 2012, 7:17:05 AM7/16/12
to nod...@googlegroups.com
There were some issues around IdleNotification that were preferring to
force a full non-incremental GC. [which makes perfect sense if
embedder calls IdleNotification when it is _really_idle_ for a long
time, and not so much sense when it is not].

We decreased aggressiveness of that, so it might help you and remove stalls.

Node is pretty aggressive at calling IdleNotifications when it is not
actually idle, so you might be even better if you disable that
altogether and just let allocations drive GC.

Another thing here is that you might be allocating objects with a high
scavenger survival rate and this does not allow incremental marker to
keep up and finish marking before heap becomes too big.

Anyway, any GC problem requires deep investigation and tweaking of GC
parameters. There is no GC that fits every allocation pattern.

--
Vyacheslav Egorov

Matt Ranney

unread,
Jul 19, 2012, 12:08:28 PM7/19/12
to nod...@googlegroups.com
On Mon, Jul 16, 2012 at 4:17 AM, Vyacheslav Egorov <veg...@chromium.org> wrote:
Node is pretty aggressive at calling IdleNotifications when it is not
actually idle, so you might be even better if you disable that
altogether and just let allocations drive GC.

In our experience, disabling idle notification is all good and no bad.  This seems like something that would make a reasonable default change.

sahal

unread,
Jul 20, 2012, 12:41:15 AM7/20/12
to nod...@googlegroups.com
I use delete varname; anywhere I expect varname to be big, on big iterations, and after doing response.end(); I delete both request and response.

What can I say, I'm a very tidy person.

Nuno Job

unread,
Jul 20, 2012, 4:09:22 AM7/20/12
to nod...@googlegroups.com
You are hilarious :) You should do some nodejs standup comedy!

Nuno

On Fri, Jul 20, 2012 at 5:41 AM, sahal <keta...@gmail.com> wrote:
I use delete varname; anywhere I expect varname to be big, on big iterations, and after doingresponse.end(); I delete both request and response.

wavded

unread,
Jul 26, 2012, 10:20:52 AM7/26/12
to nod...@googlegroups.com
To add to this thread, we just noticed an issue where having Idle Notifications was destroying objects prematurely causing a rather confusing issue with the zmq module:

https://github.com/JustinTulloss/zeromq.node/issues/124#issuecomment-7270322

Not sure how that all works but maybe Matt has a point in that it should be disabled by default.

Bert Belder

unread,
Jul 26, 2012, 10:26:54 AM7/26/12
to nod...@googlegroups.com

On Thursday, July 26, 2012 4:20:52 PM UTC+2, wavded wrote:
To add to this thread, we just noticed an issue where having Idle Notifications was destroying objects prematurely causing a rather confusing issue with the zmq module:

https://github.com/JustinTulloss/zeromq.node/issues/124#issuecomment-7270322

That's impossible. If this happens you have a bug in your code.

wavded

unread,
Jul 26, 2012, 10:36:46 AM7/26/12
to nod...@googlegroups.com
I'm definetely open to the idea that there is a bug in this binding but I'm confused why the object never seems to be collected when Idle Notifications is turned off but is within 5 seconds with it turned on:  The binding is relatively small and so far we haven't been able to pinpoint it until using that flag:

The binding:
https://github.com/JustinTulloss/zeromq.node/blob/master/binding.cc

The issue discussion:
https://github.com/JustinTulloss/zeromq.node/issues/124#issuecomment-7270322

Any help/clarification is appreciated as I'm learning this stuff.

Marc

wavded

unread,
Jul 26, 2012, 6:21:03 PM7/26/12
to nod...@googlegroups.com
Yeah --expose-gc and running gc() manually does it too, some reference is being lost or not set properly it sounds like, just have no clue what that is at this point.  Any help is welcome:

Bala sudheer Bheemarasetty

unread,
Sep 14, 2013, 9:01:55 AM9/14/13
to nod...@googlegroups.com
hello friend , i want to monitor node.js gc so for that i have to redirect output --trace_gc _verborse  and --trace_gc to file ,is there vm arguments to redirect that output to a log file like  -xloggc: in java 
Reply all
Reply to author
Forward
0 new messages