Memory leak and eventual crash (with jsdom)

799 views
Skip to first unread message

Kim

unread,
May 16, 2011, 12:19:35 PM5/16/11
to nodejs
My application leaks (seems to leak as the process hogs more and more
memory after time) and eventually crashes with the error message:

FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of
memory

I narrowed it down to a simple example:

https://gist.github.com/974752

I just load google.com and jquery with jsdom and just print out some
selected elements a 1000 times. I can't see a place where I still hold
a refernece to data from old call of doWork(). What am I doing wrong
here? Thanks!

- KIM

Jann Horn

unread,
May 16, 2011, 3:23:55 PM5/16/11
to nod...@googlegroups.com
Am Montag, den 16.05.2011, 09:19 -0700 schrieb Kim:
> I narrowed it down to a simple example:
>
> https://gist.github.com/974752
>
> I just load google.com and jquery with jsdom and just print out some
> selected elements a 1000 times. I can't see a place where I still hold
> a refernece to data from old call of doWork(). What am I doing wrong
> here? Thanks!

I found one thing in jsdom that I think of that it's a (at least
potential) memory leak:
https://github.com/tmpvar/jsdom/blob/master/lib/jsdom.js#L182

var
window = exports.html(html, null, {
features : {
'FetchExternalResources' : false,
'ProcessExternalResources' : false
}
}).createWindow(),
features = window.document.implementation._features,
docsLoaded = 0,
totalDocs = config.scripts.length;
readyState = null,
errors = null;

The semicolon in "totalDocs = config.scripts.length;" has to be a
comma, right?

signature.asc

Kim

unread,
May 17, 2011, 12:48:39 PM5/17/11
to nodejs
So, the leak could result from jsdom itself and nout by the way I use
it. Is any jsdom dev reading here or should I post it there again?

@Jann: did you already report that? That is clearly a bug.

- KIM

Elijah Insua

unread,
May 17, 2011, 1:33:48 PM5/17/11
to nod...@googlegroups.com

the issue is with setTimeout/setInterval. The entire script context will leak until those methods are done.

something like this: https://gist.github.com/976918 is a possible workaround.

-- Elijah 

--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


Kim

unread,
May 18, 2011, 2:08:46 PM5/18/11
to nodejs
@Elijah:

ah thank you for the hint. The issues is known and will be fixed
sometime soon. Too bad I can't use jsdom till then, my app leaks a lot
needs to restart quiet often. I'll have to wait then.

Ryan Gahl

unread,
May 18, 2011, 2:22:17 PM5/18/11
to nod...@googlegroups.com
Does your app really require the serial creation of 1000+ jsdom environments? Couldn't you potentially create a much smaller number of them ahead of time, and pool them for use by your 'doWork'ers? 




Marak Squires

unread,
May 18, 2011, 2:34:12 PM5/18/11
to nod...@googlegroups.com
I'd definitely try to structure my app so it created as few JSDOM envs as possible. 

George Stagas

unread,
May 18, 2011, 3:10:36 PM5/18/11
to nod...@googlegroups.com
@Kim:

If you want to free your application from Jsdom's heaviness, best way
is to use worker processes. If you're scraping a lot, you can use my
lib https://github.com/stagas/jjw that does this exact thing for you.
Each action will run a worker process, execute some predefined jquery
tasks, callback the results and die. So no matters how much it's
blocking or leaking, your app will never know.

-stagas


2011/5/16 Kim <operon...@gmail.com>:

Kim

unread,
May 20, 2011, 12:28:08 PM5/20/11
to nodejs

On 18 Mai, 20:22, Ryan Gahl <ryan.g...@gmail.com> wrote:
> Does your app really require the serial creation of 1000+ jsdom
> environments? Couldn't you potentially create a much smaller number of them
> ahead of time, and pool them for use by your 'doWork'ers?

@ Ryan: Of course you are right, I could re-use just a few enironments
to work around the issue. I just like to make things the naive way
first and optimize later, I was just very disappointed that I just
couldn't do 1000s at once.

@Marak: Well I was trying to scrape 10.000s of pages, I wanted maximum
throughput and pulled as much stuff as I could as I thought bandwith
and RAM would be the limiting factors. In the end it took me 2 hrs and
my idea didn't work out. My next project does not involve screen
scraping.

@Stagas: I'll have a look, thanks.

dhruvbird

unread,
May 21, 2011, 1:18:48 PM5/21/11
to nodejs
Surprisingly, I ran into the same issue today.

I'm basically doing this:

var jsdom = require('jsdom');
var document = jsdom.jsdom('<html><body>');
var window = document.createWindow();

var JQUERY_URL = 'http://localhost/js/jquery-1.4.4.min.js';

function foo(text) {
document.innerHTML = '<html><body>' + text;
jsdom.jQueryify(window, JQUERY_URL, function() {
var $ = window.$;
console.log(document.innerHTML);
});
}

setInterval(function() {
foo("Text");
}, 1000);


And the process' memory increases and eventually terminates. How do
you suggest I should reuse the DOM?

I am trying this for now:

function foo(text) {
document.innerHTML = '<html><body>' + text;
if (!window.$) {
jsdom.jQueryify(window, JQUERY_URL, function() {
var $ = window.$;
console.log(document.innerHTML);
});
} else {
var $ = window.$;
console.log(document.innerHTML);
}
}

with a higher timeout (to give jsdom a chance to do its stuff).

Actually, I have now moved to a recursion based approach which calls
the next "queued" task once the previous one is done.

Do you have a better solution?


Regards,
-Dhruv.


On May 18, 11:22 pm, Ryan Gahl <ryan.g...@gmail.com> wrote:
> Does your app really require the serial creation of 1000+ jsdom
> environments? Couldn't you potentially create a much smaller number of them
> ahead of time, and pool them for use by your 'doWork'ers?
>

Marak Squires

unread,
May 21, 2011, 1:24:42 PM5/21/11
to nod...@googlegroups.com
If you really can't avoid creating a lot of envs, you could probably hack something up where you spawned new JSDOM envs in child processes, then kill those processes after you are done using the env. I think that will work? It's not a pretty solution.

dhruvbird

unread,
May 21, 2011, 3:30:21 PM5/21/11
to nodejs
The recursion trick above coupled with reusing the DOM is working well
for me right now (since performance isn't really a requirement for
this task), so I'll investigate the child process idea once that limit
is hit.

Otoh, I am having some trouble with loading jQuery. Does jsdom require
me to load it only via a web server? I couldn't find any API to load
jQuery off the disk. Is there any way to do so? This is making my
script sort of hard to give out to people. They need to be able to
access this server which serves jQuery!! (I guess I could run a local
server in node itself to serve that file, but the FileSystem route
would probably be cleaner).

Regard,
-Dhruv.

dhruvbird

unread,
May 22, 2011, 2:29:26 PM5/22/11
to nodejs
The documentation on this page: https://github.com/tmpvar/jsdom

indicates that [script] (when using jsdom.env) can contain either a
URL or files (is this file paths or file contents?) I am assuming file
paths, and tried passing the following:
* ./jquery.js
* path.join(process.cwd(), 'jquery.js')
* 'file://' + path.join(process.cwd(), 'jquery.js')

But nothing works :-( It only works when I load it from localhost.
What am I missing here??

Regards,
-Dhruv.


On May 22, 12:30 am, dhruvbird <dhruvb...@gmail.com> wrote:
> The recursion trick above coupled with reusing the DOM is working well
> for me right now (since performance isn't really a requirement for
> this task), so I'll investigate the child process idea once that limit
> is hit.
>
> Otoh, I am having some trouble with loading jQuery. Doesjsdomrequire
> me to load it only via a web server? I couldn't find any API to load
> jQuery off the disk. Is there any way to do so? This is making my
> script sort of hard to give out to people. They need to be able to
> access this server which serves jQuery!! (I guess I could run a local
> server in node itself to serve that file, but the FileSystem route
> would probably be cleaner).
>
> Regard,
> -Dhruv.
>
> On May 21, 10:24 pm, Marak Squires <marak.squi...@gmail.com> wrote:
>
>
>
>
>
>
>
> > If you really can't avoid creating a lot of envs, you could probably hack
> > something up where you spawned newJSDOMenvs in child processes, then kill
> > those processes after you are done using the env. I think that will work?
> > It's not a pretty solution.
>
> > On Sat, May 21, 2011 at 10:18 AM, dhruvbird <dhruvb...@gmail.com> wrote:
> > > Surprisingly, I ran into the same issue today.
>
> > > I'm basically doing this:
>
> > > varjsdom  = require('jsdom');
> > > var document =jsdom.jsdom('<html><body>');
> > > var window = document.createWindow();
>
> > > var JQUERY_URL = 'http://localhost/js/jquery-1.4.4.min.js';
>
> > > function foo(text) {
> > >    document.innerHTML = '<html><body>' + text;
> > >    jsdom.jQueryify(window, JQUERY_URL, function() {
> > >        var $ = window.$;
> > >        console.log(document.innerHTML);
> > >    });
> > > }
>
> > > setInterval(function() {
> > >    foo("Text");
> > > }, 1000);
>
> > > And the process' memory increases and eventually terminates. How do
> > > you suggest I should reuse the DOM?
>
> > > I am trying this for now:
>
> > > function foo(text) {
> > >    document.innerHTML = '<html><body>' + text;
> > >    if (!window.$) {
> > >      jsdom.jQueryify(window, JQUERY_URL, function() {
> > >        var $ = window.$;
> > >        console.log(document.innerHTML);
> > >      });
> > >    } else {
> > >      var $ = window.$;
> > >      console.log(document.innerHTML);
> > >    }
> > > }
>
> > > with a higher timeout (to givejsdoma chance to do its stuff).
>
> > > Actually, I have now moved to a recursion based approach which calls
> > > the next "queued" task once the previous one is done.
>
> > > Do you have a better solution?
>
> > > Regards,
> > > -Dhruv.
>
> > > On May 18, 11:22 pm, Ryan Gahl <ryan.g...@gmail.com> wrote:
> > > > Does your app really require the serial creation of 1000+jsdom
> > > > environments? Couldn't you potentially create a much smaller number of
> > > them
> > > > ahead of time, and pool them for use by your 'doWork'ers?
>
> > > > On Wed, May 18, 2011 at 1:08 PM, Kim <operon.de...@gmail.com> wrote:
> > > > > @Elijah:
>
> > > > > ah thank you for the hint. The issues is known and will be fixed
> > > > > sometime soon. Too bad I can't usejsdomtill then, my app leaks a lot

Justin Russell

unread,
May 23, 2011, 8:08:29 AM5/23/11
to nodejs
I haven't had any issues loading the What is the exact error that
you're getting?

Also, have you tried hard coding an absolute path path to the jquery
file?

If you wanted you could always use a version of jquery hosted by
google in their libraries api:

http://code.google.com/apis/libraries/

i.e.

https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js

Also (I assume you know but) if you do not specify a jquery path it
will use a default version of jquery for you.

On May 22, 2:29 pm, dhruvbird <dhruvb...@gmail.com> wrote:
> The documentation on this page:https://github.com/tmpvar/jsdom
>
> indicates that [script] (when usingjsdom.env) can contain either a

dhruvbird

unread,
May 23, 2011, 11:32:43 PM5/23/11
to nodejs
Hello Justin,
I don't see any errors (nothing is printed), but window.jQuery and
window.$ are not defined in the callback (it just fails silently).

I've tried:
* absolute path
* relative path
* absolute path with file://

but none seem to work :-(

The reason I'm not loading from an external URL is that it takes time
to load it (which is why I'm hosting jquery.js on localhost) since
there are a few thousand jsdom tasks that am running. However, with
the cachine trick that Ryan Gahl mentioned, I am loading it just once
(and resetting document.innerHTML every time), so I think it shouldn't
matter - didn't realize that this was a non-issue. Either ways, I
would like to have jquery load locally rather than make a web call
every time - helps in testing as well.

> Also (I assume you know but) if you do not specify a jquery path it
> will use a default version of jquery for you.

I didn't know this!! Thanks!!


Regards,
-Dhruv.

On May 23, 5:08 pm, Justin Russell <jruss...@hmccentral.com> wrote:
> I haven't had any issues loading the What is the exact error that
> you're getting?
>
> Also, have you tried hard coding an absolute path path to the jquery
> file?
>
> If you wanted you could always use a version of jquery hosted by
> google in their libraries api:
>
> http://code.google.com/apis/libraries/
>
> i.e.
>
> https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js
>
>

Justin Russell

unread,
May 25, 2011, 7:57:05 AM5/25/11
to nodejs
> > Also (I assume you know but) if you do not specify a jquery path it
> > will use a default version of jquery for you.
>
> I didn't know this!! Thanks!

Nice, hope that did the trick for you then.

If you're still having no luck another options is to just load jquery
as you would a regular script on the page:

//
----------------------------------------------------------------------

function doSomethingWithWindow(window) { ... }

function buidTheWindow(func) {
var window = jsdom.jsdom().createWindow();

var jqScriptEl = window.document.createElement("script");
jsdom.jqueryify(window, "", function() {
jqScriptEl.src = " *** path to jquery *** ";

jqScriptEl.onload = func(window);

});

}

buildTheWindow(doSomethingWithWindow);

//
----------------------------------------------------------------------

I'm not sure if jqueryify is the right method to user there, there may
be a similar method that isn't implicitly tied to jquery already - I
don't know jsdom well enough to say.

Matthew Richmond

unread,
May 25, 2011, 11:01:04 AM5/25/11
to nod...@googlegroups.com
Hi Dhruv,

Taken from one of the tests for jsdom:

var jQueryFile = __dirname + "/../../example/jquery/jquery.js";

<snip/>

jsdom.jQueryify(tmpWindow(), jQueryFile, testFunction);

So absolute path _should_ work.

Have you tried the code with a URL? - jsdom appends jsquery to the
document body, so it may be that your DOM is not valid (e.g. in your
pastings <HEAD><BODY> is not valid), try using: var doc =
jsdom.jsdom(); instead (with no parameters), which should
automatically create the document for you.

HTH,

Matt

dhruvbird

unread,
May 26, 2011, 3:23:04 AM5/26/11
to nodejs
Hello Matthew,

This is the code that I am trying to run:

// jquery.js is present in CWD

var jsdom = require('jsdom');
var fs = require('fs');
var path = require('path');

// var JQUERY_URL = 'file://' + path.join(process.cwd(), 'jquery.js');
// var JQUERY_URL = path.join(process.cwd(), 'jquery.js');
// var JQUERY_URL = './jquery.js';

// Only the line below works. Setting JQUERY_URL to any of the above
throws this exception:
// TypeError: undefined is not a function
// at CALL_NON_FUNCTION (native)
// at /cygdrive/c/Documents and Settings/dhruv.m/My Documents/
projects/log_extractor/test02.js:14:5
// at /opt/node-v0.4.7/lib/node_modules/jsdom/lib/jsdom.js:207:13
// ...
// Line 14 is $('script')....
//
var JQUERY_URL = 'https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/
jquery.min.js';
console.log(JQUERY_URL);

jsdom.env('./test.html', [ JQUERY_URL ], [], function(err, window) {
var $ = window.$;
$('script').add('img').add('noscript').add('iframe').remove();
console.log(new
Buffer(window.document.innerHTML).toString('ascii'));
});

and test.html is this file:
<html>
<head></head>
<body><img/></body>


Any idea why it isn't working??

Regards,
-Dhruv.


On May 25, 8:01 pm, Matthew Richmond <matthew.j.richm...@gmail.com>
wrote:
> Hi Dhruv,
>
> Taken from one of the tests for jsdom:
>
> var jQueryFile = __dirname + "/../../example/jquery/jquery.js";
>
> <snip/>
>
> jsdom.jQueryify(tmpWindow(), jQueryFile, testFunction);
>
> So absolute path _should_ work.
>
> Have you tried the code with a URL? - jsdom appends jsquery to the
> document body, so it may be that your DOM is not valid (e.g. in your
> pastings <HEAD><BODY> is not valid), try using: var doc =
> jsdom.jsdom(); instead (with no parameters), which should
> automatically create the document for you.
>
> HTH,
>
> Matt
>

Matthew Richmond

unread,
May 26, 2011, 5:11:22 AM5/26/11
to nod...@googlegroups.com
Hi Dhruv,

The following works on my machine:

var jsdom   = require('jsdom');

jsdom.env('test.html', [ 'jquery-1.6.1.min.js' ], function(err, window) {
   if(err) {
   console.log(err);
}
   var $ = window.$;
   $('body').append('<div class="testing">Hello World, It works</div>');
  console.log($('.testing').text());
   console.log(new Buffer(window.document.innerHTML).toString('ascii'));
});

[admin@mjr ~/jsdomTest]$ node test.js
Hello World, It works
<html>
<head></head>
<body><img><div class="testing">Hello World, It works</div></body>
<script src="jquery-1.6.1.min.js"></script></html>


where test.html is in cwd, as is jquery-1.6.1.min.js.

using same html as you gave, jquery is from http://code.jquery.com/jquery-1.6.1.min.js

See how you get on,

Matt


Reply all
Reply to author
Forward
0 new messages