Poor WriteStream Performance

386 views
Skip to first unread message

Jeff Jolma

unread,
May 20, 2014, 1:29:58 PM5/20/14
to nod...@googlegroups.com
Hello.

I am working on a node app and found that its performance bottleneck is logging. And digging deeper, it looks like the root bottleneck is on the WriteStream.

I ran some quick benchmarks of the WriteStream performance, and it seems really slow.  Can someone help me understand why it is taking so long and if I can improve the performance?  Or, maybe I am doing something wrong with my benchmarks.

I get the following output running on my MacBook:


$ node --harmony benchmarks.js 
Running benchmarks with 100000 writes
fs.writeSync took 0.302 seconds
fs.write took 0.235 seconds
writeStream.write took 4.565 seconds

$ cat benchmarks.js
'use strict';
const fs = require('fs');

const NUM_WRITES = 100 * 1000;
const DUMMY_MESSAGE = 'Dummy log message that is about 100 characters.....................................................\n';
const DUMMY_MESSAGE_BUFFER = new Buffer(DUMMY_MESSAGE);

function benchmarkFileSync() {
  var fd = fs.openSync('./dummy.log', 'w');

  const startTime = new Date();
  for (let i = 0; i < NUM_WRITES; i++) {
    fs.writeSync(fd, DUMMY_MESSAGE);
  }
  const diff = (new Date() - startTime) / 1000;
  console.log('fs.writeSync took', diff, 'seconds');
  fs.closeSync(fd);
}

function benchmarkFileAsync() {
  var fd = fs.openSync('./dummyAsync.log', 'w');

  const startTime = new Date();
  for (let i = 0; i < NUM_WRITES; i++) {
    // yes, this is BAD since it isn't waiting for the callback. It
    // drops some of these writes on the floor
    fs.write(fd, DUMMY_MESSAGE_BUFFER, 0, DUMMY_MESSAGE_BUFFER.length, null, function cb() {
      // ignore the callback
    });
  }
  const diff = (new Date() - startTime) / 1000;
  console.log('fs.write took', diff, 'seconds');
  fs.closeSync(fd);
}

function benchmarkWriteStream() {
  const stream = fs.createWriteStream(
    './dummyWriteStream.log', {
      encoding: "utf8",
      mode: parseInt('0644', 8)
    }
  );

  const startTime = new Date();
  for(let i = 0; i < NUM_WRITES; i++) {
    let ret = stream.write(DUMMY_MESSAGE);
  }

  const interval = setInterval(function checkIfDone() {
    if (stream.bytesWritten === DUMMY_MESSAGE.length * NUM_WRITES) {
      const diff = (new Date() - startTime) / 1000;
      console.log('writeStream.write took', diff, 'seconds');
      clearInterval(interval);
    }
  }, 100);
}

console.log('Running benchmarks with', NUM_WRITES, 'writes');
benchmarkFileSync();
benchmarkFileAsync();
benchmarkWriteStream();

The performance becomes extremely pathological when I increase the number of writes beyond this. For example, with 200,000:

$ node --harmony benchmarks.js 
Running benchmarks with 200000 writes
fs.writeSync took 0.646 seconds
fs.write took 0.479 seconds
writeStream.write took 290.4 seconds



Thanks,
Jeff

Denys Khanzhyiev

unread,
May 20, 2014, 2:10:09 PM5/20/14
to nodejs
1. Your async code is incorect as you close file before all write operation complete.
2. WriteStream buffers written data until other end is ready to consume data. So you actually first fill memory with data and then write it to file. 


--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/2a3f6115-e1f5-40cd-a821-9a326590720e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aria Stewart

unread,
May 20, 2014, 1:53:40 PM5/20/14
to nod...@googlegroups.com, Jeff Jolma

On May 20, 02014, at 13:29, Jeff Jolma <jjo...@gmail.com> wrote:

> Hello.
>
> I am working on a node app and found that its performance bottleneck is logging. And digging deeper, it looks like the root bottleneck is on the WriteStream.
>
> I ran some quick benchmarks of the WriteStream performance, and it seems really slow. Can someone help me understand why it is taking so long and if I can improve the performance? Or, maybe I am doing something wrong with my benchmarks.
>
> I get the following output running on my MacBook:


Your benchmarks look a little suspect — setInterval rather than stream.on(‘finish’), and the async write case, you close the fd before you finish writing.

signature.asc

Jeff Jolma

unread,
May 20, 2014, 2:31:06 PM5/20/14
to nod...@googlegroups.com
Hi Denys.

Thanks for taking a look.

1. Yes, that is a bug in the async benchmark.  Fixing that doesn't change the timings I see, though.
2. Yup, but I am surprised at the performance hit this buffering causes.  Should it really take 4.5 seconds to write 100,000 lines and 290 seconds to write 200,000 lines?

Jeff Jolma

unread,
May 20, 2014, 2:46:53 PM5/20/14
to nod...@googlegroups.com, Jeff Jolma
Hi Aria.

I see similar results even after using stream.on('finish').  (I removed the async benchmark altogether.)

$ node --harmony benchmarks.js 
Running benchmarks with 100000 writes
fs.writeSync took 0.32 seconds
writeStream.write took 4.073 seconds

$ node --harmony benchmarks.js 
Running benchmarks with 200000 writes
fs.writeSync took 0.66 seconds
writeStream.write took 323.897 seconds

$ cat benchmarks.js 
'use strict';
const fs = require('fs');

const NUM_WRITES = 200 * 1000;
const DUMMY_MESSAGE = 'Dummy log message that is about 100 characters.....................................................\n';
const DUMMY_MESSAGE_BUFFER = new Buffer(DUMMY_MESSAGE);

function benchmarkFileSync() {
  var fd = fs.openSync('./dummy.log', 'w');

  const startTime = new Date();
  for (let i = 0; i < NUM_WRITES; i++) {
    fs.writeSync(fd, DUMMY_MESSAGE);
  }
  const diff = (new Date() - startTime) / 1000;
  console.log('fs.writeSync took', diff, 'seconds');
  fs.closeSync(fd);
}

function benchmarkWriteStream() {
  const stream = fs.createWriteStream(
    './dummyWriteStream.log', {
      encoding: "utf8",
      mode: parseInt('0644', 8)
    }
  );

  const startTime = new Date();
  for(let i = 0; i < NUM_WRITES; i++) {
    let ret = stream.write(DUMMY_MESSAGE);
  }
  stream.end('');

  stream.on('finish', function onFinish() {
    const diff = (new Date() - startTime) / 1000;
    console.log('writeStream.write took', diff, 'seconds');
  });
}

console.log('Running benchmarks with', NUM_WRITES, 'writes');
benchmarkFileSync();
benchmarkWriteStream();

Forrest Norvell

unread,
May 20, 2014, 2:52:16 PM5/20/14
to nod...@googlegroups.com

I am surprised at the performance hit this buffering causes. Should it really take 4.5 seconds to write 100,000 lines and 290 seconds to write 200,000 lines?

Node 0.12 will have better support for writev / corking and uncorking, which will allow you to coalesce writes and regain some of the efficiency lost by the individual writes ignoring fs buffering (although it doesn’t look like fs.createWriteStream() returns a stream that coalesces by default, so you’ll have to write your own). Which is, I suspect, the source of the lag you’re seeing – each write to the stream has to succeed and be acknowledged before the next one can be performed, whereas fs.write() / fs.writeSync() are writing to the fs’s buffer. The difference is going to be especially pronounced with an append-only file like a log.

F

Denys Khanzhyiev

unread,
May 20, 2014, 3:59:47 PM5/20/14
to nodejs
Or even simplier

 (function doWrite(){
    if(i--) stream.write(DUMMY_MESSAGE,doWrite);
    else stream.end('');
  })();   


2014-05-20 22:45 GMT+03:00 Denys Khanzhyiev <xde...@gmail.com>:
this variant is more efficient.

function benchmarkWriteStream() {
  const stream = fs.createWriteStream(
    './dummyWriteStream.log', {
      encoding: "utf8",
      mode: parseInt('0644', 8)
    }
  );

  const startTime = new Date();

  var i = NUM_WRITES;
  (function doWrite(){
    if(i--) {
      var ret = stream.write(DUMMY_MESSAGE);
      if(!ret) stream.once('drain',doWrite);
      else doWrite();
    }
    else stream.end('');
  })();   


  stream.on('finish', function onFinish() {
    const diff = (new Date() - startTime) / 1000;
    console.log('writeStream.write took', diff, 'seconds');
  });
}

though it is 2 times slower than sync variant


--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.

Denys Khanzhyiev

unread,
May 20, 2014, 3:45:30 PM5/20/14
to nodejs
this variant is more efficient.

function benchmarkWriteStream() {
  const stream = fs.createWriteStream(
    './dummyWriteStream.log', {
      encoding: "utf8",
      mode: parseInt('0644', 8)
    }
  );

  const startTime = new Date();

  var i = NUM_WRITES;
  (function doWrite(){
    if(i--) {
      var ret = stream.write(DUMMY_MESSAGE);
      if(!ret) stream.once('drain',doWrite);
      else doWrite();
    }
    else stream.end('');
  })();   


  stream.on('finish', function onFinish() {
    const diff = (new Date() - startTime) / 1000;
    console.log('writeStream.write took', diff, 'seconds');
  });
}

though it is 2 times slower than sync variant
2014-05-20 21:52 GMT+03:00 Forrest Norvell <othi...@gmail.com>:

--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.

Denys Khanzhyiev

unread,
May 20, 2014, 4:20:30 PM5/20/14
to nodejs
I will agree that WriteStream shows some strange performance loss. It may be explained by WriteStream's overhead for buffering and eventing. But that overhead should be almost same for equal amount of write calls.
But it is not so. I just have tested with much bigger DUMMY_MESSAGE - it is still 2 times slower.

Floby

unread,
May 21, 2014, 4:47:12 AM5/21/14
to nod...@googlegroups.com
Part of the poor performance you're seeing is because all your benchmarks are synchronous. You're measuring the time for a for loop to run to completion. This for loop runs on a single tick and this has various side effects in each case.

  1. sync: Every time you execute the code in the for loop, the dumy message is passed to the writeSync C counterpart (which is, down the line, write(3)). So your dummy message makes a very brief appearence in user memory before being flushed to the system buffer and then to disk. Every tick is slower, but the overall performance should be better because you never hit memory limits of the v8 VM or the UV event loop.
  2. async: This one does not measure the same thing as the first one since you're not waiting for the write to be confirmed. This might be how you use your logger, but it does not measure the same thing. Here, you measure how fast you can put write queries into the uv queue, effectively gathering a huge amount of messages into UV-managed memory. the memory used is rougly equivalent to 101*n bytes ; not counting whatever structure UV is putting around it.
  3. stream: This one does measure kind of the same thing, given that you changed it to listen to the 'finish' event which means all buffered data has been flushed to the system queue. However, since you're adding your messages to the queue on the very same tick, you're once again gathering all your pending messages into memory. This time, it is V8 managed memory. This means it is limited, it is garbage collected, it stores strings encoded as UTF8, and it has significant structure overhead consuming even more memory. My guess is that you're hitting performance limits of the V8 memory management, most probably garbage collection and such.
You should change your async version to stop the timer when all queued messages are effectively flushed. You should change your stream AND your async version to not write 100,000 messages on the very same tick as this is very unlikely to happen in common usage for a logger. You should probably bulk those write to a reasonable maximum amount like 100 and distribute the writes on successive ticks.


notes : 
This may have changed since the last time I read the node.js source code, but it is my understanding that writes are NEVER synchrounous when using fs.write or fs.WriteStream.
Reply all
Reply to author
Forward
0 new messages