Finding memory leaks? - node_g and valgrind

1,073 views
Skip to first unread message

Marco Rogers

unread,
Apr 10, 2010, 11:18:42 PM4/10/10
to nodejs
Some on this list know we have been trying to track down and eradicate
memory leaks in the libxmljs lib. We've made some progress but we
really need better technique for locating potential issues.

I've been studying valgrind and I've got it running against node.

http://www.cprogramming.com/debugging/valgrind.html

I think it can be more helpful, but it doesn't seem to be able to
track the names and line numbers of code run through node addons. I'm
running my tests through valgrind like this.

$> valgrind -v --tool=memcheck --leak-check=full --show-reachable=yes
node_g xmltest.js

Here's a gist of xmltest.js - http://gist.github.com/361906
And more info about it at -
http://groups.google.com/group/libxmljs/browse_thread/thread/f99d106896545f31#msg_fd0b72264d1c22c7

I've also tried different combinations of flags like "--track-
children=yes". Anyway, I've had limited success and I'm hoping
someone on this list can shed some light. Here are my issues/
questions.

- It makes sense to run node_g instead of node proper correct?
Because this allows an external program to instrument the runtime?
- The problem with using node_g is that my tests actually fail and
segfault when using it. They run to completion when run with node.
What's the deal with node_g? Are there known issues caused by
differences here?
- When valgrind does run through the test it does report invalid reads
and bad blocks. However, it can't seem to give me any actual useful
info. Instead I get lots of ?s:

==1037== Invalid read of size 4
==1037== at 0x2DD7887: ???
==1037== by 0x2E19430: ???
==1037== by 0x2E19264: ???
==1037== by 0x2D54252: ???
==1037== by 0x2E15133: ???
==1037== by 0x2D4B69E: ???
==1037== by 0x2E1AEDE: ???
==1037== by 0x2E1BC74: ???
==1037== by 0x2D55E34: ???
==1037== by 0x2DD0DED: ???
==1037== by 0x2DD13FE: ???
==1037== by 0x2DE0F13: ???
==1037== Address 0xbaddeac is not stack'd, malloc'd or (recently)
free'd
==1037==
==1037==
==1037== Process terminating with default action of signal 11
(SIGSEGV)
==1037== Access not within mapped region at address 0xBADDEAC
==1037== at 0x2DD7887: ???
==1037== by 0x2E19430: ???
==1037== by 0x2E19264: ???
==1037== by 0x2D54252: ???
==1037== by 0x2E15133: ???
==1037== by 0x2D4B69E: ???
==1037== by 0x2E1AEDE: ???
==1037== by 0x2E1BC74: ???
==1037== by 0x2D55E34: ???
==1037== by 0x2DD0DED: ???
==1037== by 0x2DD13FE: ???
==1037== by 0x2DE0F13: ???

My thought was that this is because libxmljs.node is a dynamic library
loaded after valgrind and so is not instrumented. Is this reasonable?
Is there anything I can do about this?

Any help is appreciated. FYI, I've tried node_debug and running
node_g through gdb. I get no useful backtraces out of either of
these. If I'm on the wrong road or someone can recommend other useful
tools, I'm all ears.

Thanks
:Marco

r...@tinyclouds.org

unread,
Apr 11, 2010, 2:38:05 AM4/11/10
to nod...@googlegroups.com, Orlando Vazquez
> - It makes sense to run node_g instead of node proper correct?
> Because this allows an external program to instrument the runtime?

Yes

> - The problem with using node_g is that my tests actually fail and
> segfault when using it.  They run to completion when run with node.
> What's the deal with node_g?  Are there known issues caused by
> differences here?

node compiles out all of the sanity assertions made in V8 and Node in
order to run faster. Node_g keeps them. A segfault in node_g is
indication of a problem. Explore those segfaults in gdb and with
valgrind. I'd be happy to take a look a the code if you pointed out to
me the situation and where to look.

I'm not sure. I'll get back to you. I think Orlando was having a
similar problem with node-sqlite.

billywhizz

unread,
Apr 11, 2010, 9:32:08 AM4/11/10
to nodejs
is it possible to call the GC collect directly in node.js? I am also
testing libxmljs and trying to figure out the odd memory behaviour. it
seems to me to be an issue with the GC not collecting aggressively
enough...

Marco Rogers

unread,
Apr 11, 2010, 9:57:43 AM4/11/10
to nodejs
I remember that ry and others made the decision not to surface the
garbage collector directly. I the collector does get called when you
allow the event loop to cycle though. And it's pretty efficient. Our
problem is almost certainly memory that can be culled by the GC. The
evidence is that it doesn't matter how
long you wait after running the test. The memory never gets
cleaned up.

billywhizz

unread,
Apr 11, 2010, 10:23:37 AM4/11/10
to nodejs
same symptoms i am seeing here. the destructor does not seem to be
called by the GC on the C++ objects when they go out of scope.

here's a quick script you can use to test it out:

var sys = require('sys'),
fs = require('fs'),
libxml = require('./libxmljs');

fs.readFile("test.xml", function (err, data) {
if (err) throw err;
var count = 0;
(function() {
var doc = libxml.parseXmlString(data);
count++;
if((count % 100) == 0) {
sys.puts("Count:" + count);
setTimeout(arguments.callee, 100);
}
else {
setTimeout(arguments.callee, 1);
}
})();
});

no matter what delays i put in there it never seems to release any
memory and eventually gets killed when it runs out of memory. could it
be that the libxmljs objects are not being marked as out of scope (ie
their ref count is not getting reduced to 0 or however it works)?

billywhizz

unread,
Apr 11, 2010, 10:39:29 AM4/11/10
to nodejs
right. i found this comment in the node.cc source:

// We need to notify V8 when we're idle so that it can run the garbage
// collector. The interface to this is V8::IdleNotification(). It
returns
// true if the heap hasn't be fully compacted, and needs to be run
again.
// Returning false means that it doesn't have anymore work to do.
//
// We try to wait for a period of GC_INTERVAL (2 seconds) of idleness,
where
// idleness means that no libev watchers have been executed. Since
// everything in node uses libev watchers, this is a pretty good
measure of
// idleness. This is done with gc_check, which records the timestamp
// last_active on every tick of the event loop, and with gc_timer
which
// executes every few seconds to measure if
// last_active + GC_INTERVAL < ev_now()
// If we do find a period of idleness, then we start the gc_idle timer
which
// will very repaidly call IdleNotification until the heap is fully
// compacted.

So it seems that node will wait for 2 seconds of idleness before it
invokes the GC. I've changed the sleep interval on the example above
to 4000 milliseconds and everything works great.

I'd like to suggest a change to node.js so we can tweak this timeout
as it's way too long for an application that might be constantly busy
with very little idle time. Either that or expose full control over
when the GC runs. I presume it could be implemented as an addon if
there are concerns about having it in the default libraries...

billywhizz

unread,
Apr 11, 2010, 11:30:49 AM4/11/10
to nodejs
i've created a module to expose the v8 GC collection which seems to
solve any issues with libxmljs when if you call it regularly. wouldn't
be too hard to keep track of RSS usage and then call this method if
you reach a predefined threshold...

http://github.com/billywhizz/node-gc

let me know if i'm doing anything stupid or need to put more
protection around the call to V8::IdleNotification.

Connor Dunn

unread,
Apr 11, 2010, 11:51:55 AM4/11/10
to nod...@googlegroups.com
> So it seems that node will wait for 2 seconds of idleness before it
> invokes the GC. I've changed the sleep interval on the example above
> to 4000 milliseconds and everything works great.
>
> I'd like to suggest a change to node.js so we can tweak this timeout
> as it's way too long for an application that might be constantly busy
> with very little idle time. Either that or expose full control over
> when the GC runs. I presume it could be implemented as an addon if
> there are concerns about having it in the default libraries...
>

Is there a reason node doesn't immediately start calling
IdleNotification when the event loop is empty? Then keep calling it
until either it returns false or something is added to the event loop?
I assume it runs quickly which is why it needs to be called multiple
times, I'm not sure why its a good assumption that if node has been
idle for 2 seconds it will be idle longer, and with busy scripts that
will obviously never happen.

Connor

billywhizz

unread,
Apr 11, 2010, 2:27:11 PM4/11/10
to nodejs
there's a related discussion here:

http://groups.google.com/group/nodejs/browse_thread/thread/d6d1721b79a86339/1a2ca6388f1383f0?lnk=gst&q=gc+collect#1a2ca6388f1383f0

but i don't know why it was decided to implement the 2 second rule.
this is the code from node_cc:

static void CheckIdleness(EV_P_ ev_timer *watcher, int revents) {
assert(watcher == &gc_timer);
assert(revents == EV_TIMER);

//fprintf(stderr, "check idle\n");

ev_tstamp idle_time = ev_now(EV_DEFAULT_UC) - last_active;

if (idle_time > GC_INTERVAL) {
if (needs_gc) {
needs_gc = false;
if (!V8::IdleNotification()) {
ev_idle_start(EV_DEFAULT_UC_ &gc_idle);
}
}
// reset the timer
gc_timer.repeat = GC_INTERVAL;
ev_timer_again(EV_DEFAULT_UC_ watcher);
}
}


that looks to me like the GC will never get a chance to run unless
there have been 2 seconds without any events being handled. can that
be correct??

Marco Rogers

unread,
Apr 11, 2010, 8:19:23 PM4/11/10
to nodejs

> node compiles out all of the sanity assertions made in V8 and Node in
> order to run faster. Node_g keeps them. A segfault in node_g is
> indication of a problem. Explore those segfaults in gdb and with
> valgrind. I'd be happy to take a look a the code if you pointed out to
> me the situation and where to look.
>

Here's my libxmljs repo and a gist with the simplest code that works
in node but throws a segfault in node_g. If you have time to take a
look that would be great.

http://gist.github.com/363158
http://github.com/polotek/libxmljs

r...@tinyclouds.org

unread,
Apr 11, 2010, 8:26:47 PM4/11/10
to nod...@googlegroups.com

Marco Rogers

unread,
Apr 12, 2010, 11:25:46 AM4/12/10
to nod...@googlegroups.com
Thanks for the info.  I was just learning about closing scopes this weekend actually.  You're right that this is a pretty pervasive issue in the codebase right now.  I tried to apply a fix at least to the toString method.  But I still get the segfault.  Is it safe to say that anytime v8 tries to collect an object with local handles that haven't been closed, it'll run into problems?  So we will need to find ALL of the problems here before we can get a simple working test case?



--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.




--
Marco Rogers
marco....@gmail.com

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz

r...@tinyclouds.org

unread,
Apr 12, 2010, 11:36:41 AM4/12/10
to nod...@googlegroups.com
On Mon, Apr 12, 2010 at 8:25 AM, Marco Rogers <marco....@gmail.com> wrote:
> Thanks for the info.  I was just learning about closing scopes this weekend
> actually.  You're right that this is a pretty pervasive issue in the
> codebase right now.  I tried to apply a fix at least to the toString method.
>  But I still get the segfault.  Is it safe to say that anytime v8 tries to
> collect an object with local handles that haven't been closed, it'll run
> into problems?

Yes.

> So we will need to find ALL of the problems here before we
> can get a simple working test case?

Probably - but it shouldn't be hard to go through and wrap all those
return statements in scope.Close();

Reply all
Reply to author
Forward
0 new messages