How to flush a writestream before the program is done executing?

6,298 views
Skip to first unread message

Stephen Weiss

unread,
May 13, 2012, 6:50:39 PM5/13/12
to nod...@googlegroups.com
I'm new to node.js so forgive what's probably a very newbie question...

I've been trying to use writestreams in a few different use cases - one as an HTTP request stream and one just writing to a plain file - and no matter what, I always observe the same behavior, where nothing actually ever gets written to the stream, until program execution is over.

So, for example, if I write this code:
var fs = require('fs');

var ws = fs.createWriteStream("/tmp/out", {
flags: 'w+'
});

var i = 0;

while (i < 100000) {
console.log("writing");
ws.write("random text\n");
i++;
}

ws.end();


I will see all the "writing" lines outputted, and only after all of the "writing" lines are outputted to my terminal do any of the "random text" lines get sent to my file.  I could have set it to write 10 lines, or a billion lines, and I see the same behavior.  

My problem is, I'm trying to write a routine that generates JSON for several million objects and writes those JSON objects to elasticsearch, via the elasticsearchclient module, which sends data to elasticsearch via an HTTP request (which is also a writestream).  However, my routine always fails, because node.js runs out of memory before any data actually gets written to the stream.  My routine works great if I only try to index 10 documents - once program execution ends, it sends all 10 documents over at once, and they are indexed - but when I try to index the entire database, it fails, even though I send the writes along 1000 at a time, and it has ample time to start sending at least some of the documents .  It would all work great if it could just start sending the data as soon as it's buffered, but nothing I've found gets it to do that.  What I really need is a "flush" command or something but there isn't one listed in the documentation.  The documentation would seem to indicate that this should happen automatically, but it just doesn't.

Nothing in the documentation seems to indicate that writestreams should work this way, so I find this very baffling and frustrating.  Is there any way to flush the writestream, to force it to start writing to the stream before it runs out of memory?  It seems like a pretty obvious thing, and in other programming languages I've never had a problem like this, but I'm just not finding the documentation where it explains this.  Usually, in most languages, you write to a stream, and it tries to output the data as quickly as it can - it doesn't usually buffer until your program is done executing.   I try listening for the "drain" event but the "drain" event never fires.  The stream is always writable = false, right from the start - the kernel buffer seems to be full right away.  Nothing really seems to work the way it's documented...

I'm running node 0.6.17.  I'm pretty sure I'm missing something very obvious here, but I've scoured the documentation and the forums for hours and I can't find anything that helps me solve my problem.  If anyone can please help, I'd really appreciate it.  Thanks.

Ben Noordhuis

unread,
May 13, 2012, 11:56:02 PM5/13/12
to nod...@googlegroups.com
In your example, you're doing all the work in a single "tick" of the
event loop, effectively queuing up 100K write requests.

If you slice up the requests like below, you give node.js the
opportunity to process them concurrently:

var i = 0;
function work() {
while (i < 100000) {
console.log("writing");
ws.write("random text\n");
if (++i % 1000 == 0) return process.nextTick(work);
}
}
work();

Jimb Esser

unread,
May 14, 2012, 12:03:19 AM5/14/12
to nodejs
Because (almost) all I/O is asynchronous in node, the event loops
needs to have a chance to run. I've done very little with streams,
but looking at the docs, write() will return false if the buffer is
full, and that won't be flushed (or perhaps even start to write if its
waiting for an asynchronous file open or something) until the event
loop has had a chance to run (after control returns to node from your
main .js file in this case). Restructuring your loop to wait for the
'drain' event whenever a write indicates it is full should fix your
issue:

var fs = require('fs');
var ws = fs.createWriteStream("/tmp/out", {
flags: 'w+'
});

var i = 0;
function writesome() {
while (i < 1000000) {
i++;
console.log("writing");
if (!ws.write("random text\n")) {
// buffer is full, don't write any more until we're notified
ws.once('drain', writesome);
return;
}
}
ws.end();
}
writesome();

Generally, with node, even if you're doing something fairly
straightforward, you need to think in an async/event-driven manner.
This is a bit annoying when writing the simple things, but is
wonderful when it's embraced and you're doing anything more complex.
If you wanted to, for example, open 4 streams and write to whichever
isn't full/busy, that becomes trivial with the code above that's
operating on events.

- jimb
Reply all
Reply to author
Forward
0 new messages