Writing large file with WriteStream - process runs out of memory

3,256 views
Skip to first unread message

Max

unread,
Jun 15, 2011, 7:48:48 AM6/15/11
to nod...@googlegroups.com
I am parsing a large XML file and generating import commands for a database out of it. These commands should be written to a file which then is passed to the db.

Parsing the large file is fine, but as soon as I write data to the WriteStream, the process dies after a few seconds with


FATAL ERROR: JS Allocation failed - process out of memory


Here is my code:


var fs = require('fs'),
    parser = require('sax-js').parser(true);

var writeStream = fs.createWriteStream('import-commands-posts', {
    flags: 'a',
    encoding: 'utf-8'
});

parser.onopentag = function(node) {
    var post = {
        id: node.attributes.Id,
       ...
    };

    var cmds = '';

    for (var field in post) {
        var value = post[field];
        cmds += ....
    }

    writeStream.write(cmds);
};

parser.onend = function() {
    writeStream.flush();
};


If I leave out the writeStream.write() call the script runs fine. It takes a while but the memory usage is constantly around 0.4.

As soon as I write to the Stream, the memory usage grows rapidly to around 17% and the process dies.

I am using node.js v 0.4.8 on ubuntu with default settings.



mscdex

unread,
Jun 15, 2011, 8:59:14 AM6/15/11
to nodejs
On Jun 15, 7:48 am, Max <nas...@gmail.com> wrote:
> I am parsing a large XML file and generating import commands for a database
> out of it. These commands should be written to a file which then is passed
> to the db.
>
> Parsing the large file is fine, but as soon as I write data to the
> WriteStream, the process dies after a few seconds with
>
> FATAL ERROR: JS Allocation failed - process out of memory

You need to pay attention to when writeStream.write() returns false.
When it does this, that means the kernel buffer is full and the stream
is then starting to buffer writes into userspace memory.

So what you need to do is when it returns false, listen for the
'drain' event on the stream before continuing to write again in order
to prevent all the process's memory from being used up.

Max

unread,
Jun 15, 2011, 9:42:57 AM6/15/11
to nod...@googlegroups.com
Thanks for your reply!

But when I do this:

    var success = writeStream.write(cmds);
    console.log(success);

It immediately/always returns false although some content gets written to the outfile.

Linus G Thiel

unread,
Jun 15, 2011, 10:17:28 AM6/15/11
to nod...@googlegroups.com
I'm not an expert on streams at all but it seems a bit backwards that you do

var cmds = '';

for (var field in post) {
var value = post[field];
cmds += ....
}

writeStream.write(cmds);

Why do you first concatenate and only then write, why not do

for (var field in post) {
var value = post[field];

writeStream.write(...);

> --
> You received this message because you are subscribed to the Google Groups
> "nodejs" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/nodejs/-/yQ3M4ydP9PcJ.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en.
>

--
Linus G Thiel
Hansson & Larsson
http://hanssonlarsson.se/
+46 709 89 03 85

Max

unread,
Jun 15, 2011, 10:20:48 AM6/15/11
to nod...@googlegroups.com
Thank you, but I figured it out, the hint for the drain event got me on the right track.

What I do: create a ReadStream, pass the chunk to the XML parser, parse the xml, when I get a node, I pause the read stream, write to the WriteStream, when the WriteStream is drained I continue the ReadStream. Puh… but it works now, memory usage stays below 1%.

Many thanks to both of you!
Reply all
Reply to author
Forward
0 new messages