POSIX aio vs libeio?

786 views
Skip to first unread message

Ben Leslie

unread,
May 26, 2010, 1:22:22 AM5/26/10
to nodejs
Hi all,

First of, I'm new to node.js, and been very happy with it so far. It
is a nice small code base, and I like looking under the hood to see
how things work.

In reading about node.js, I was convinced of how bad threads were for
multiplexing I/O. (Well, I was pretty biased towards this p.o.v to
being with.)

So, I figured that node.js would be using POSIX aio_* when accessing
the file system, but instead, it appears to be using threads and
synchronous I/O. This is a surprising design decision to me.

Was it decided that aio_ was too platform dependant, or were/are there
other problems with it?

Also, is there a reason '4' is chosen as the maximum number of threads
to use in libeio?

Obviously this can cause problems when small reads are blocked by
large reads. An example (completely synthetic) of where this is a
problem is demonstrated here:

"""
var fs = require('fs');
var sys = require('sys');
var Buffer = require('buffer').Buffer;

function show_result(id) {
return function (err, read) {
if (err) {
sys.puts("Error(" + id + "): " + err);
} else {
sys.puts("Success(" + id + "): " + read);
}
}
}

fs.open("largefile", "r", 0666, function (err, fd) {
if (err) {
sys.puts("Error opening largefile");
return;
}
var buf1 = new Buffer(512 * 1024 * 1024);
var buf2 = new Buffer(512);
var count = 0;
fs.read(fd, buf1, 0, buf1.length, 0, show_result(1));
fs.read(fd, buf2, 0, buf2.length, 0, show_result(2));
fs.read(fd, buf1, 0, buf1.length, 0, show_result(3));
fs.read(fd, buf2, 0, buf2.length, 0, show_result(4));
fs.read(fd, buf1, 0, buf1.length, 0, show_result(5));
fs.read(fd, buf2, 0, buf2.length, 0, show_result(6));
fs.read(fd, buf1, 0, buf1.length, 0, show_result(7));
fs.read(fd, buf2, 0, buf2.length, 0, show_result(8));
fs.read(fd, buf1, 0, buf1.length, 0, show_result(9));
fs.read(fd, buf2, 0, buf2.length, 0, show_result(10));
});
"""

With the current implementation, some of the small reads get stuck
behind the large reads, however if I manually change libeio to use a
thread pool of 6 rather than 4, none of the small reads get stuck.
(Granted to is completely synthetic, and may not be a problem in the
real world).

Sorry if this is rehashing old ground.

Cheers,

Benno

r...@tinyclouds.org

unread,
May 26, 2010, 1:44:40 PM5/26/10
to nod...@googlegroups.com
On Tue, May 25, 2010 at 10:22 PM, Ben Leslie <be...@benno.id.au> wrote:
> Hi all,
>
> First of, I'm new to node.js, and been very happy with it so far. It
> is a nice small code base, and I like looking under the hood to see
> how things work.
>
> In reading about node.js, I was convinced of how bad threads were for
> multiplexing I/O. (Well, I was pretty biased towards this p.o.v to
> being with.)
>
> So, I figured that node.js would be using POSIX aio_* when accessing
> the file system, but instead, it appears to be using threads and
> synchronous I/O. This is a surprising design decision to me.
>
> Was it decided that aio_ was too platform dependant, or were/are there
> other problems with it?

Yes. It seems that a thread solution would need to be done anyway for
portability to windows. From what I've heard there are slight
differences in the reporting mechanisms of the different AIO
implementations. Additionally having a thread pool around is useful
for adding other things (we're thinking about doing a gzip
implementation which uses many different threads for each chunk).

But you should read from the author of libeio why he made it:
http://lists.schmorp.de/pipermail/libev/2008q2/000277.html
http://lists.schmorp.de/pipermail/libev/2008q2/000293.html
http://lists.schmorp.de/pipermail/libev/2008q2/000335.html

> Also, is there a reason '4' is chosen as the maximum number of threads
> to use in libeio?

No, it's just the default. I'll add a binding to JS at some point to
allow that to be changed. (Patches welcome!)

There is also functionality in libeio for managing the queue more
effectively (eio_grp). I have no bindings to this either.

Marco Rogers

unread,
May 26, 2010, 2:21:36 PM5/26/10
to nodejs

>
> No, it's just the default. I'll add a binding to JS at some point to
> allow that to be changed. (Patches welcome!)
>

Ry, I've been looking for a nice place to get into helping with node
core. What would you like the the api for this to look like? Node is
very minimal on configuration options right now so there's not a lot
of prior art.

r...@tinyclouds.org

unread,
May 26, 2010, 2:36:24 PM5/26/10
to nod...@googlegroups.com

This is a run time configuration option:

http://github.com/ry/node/blob/895f89d62a63e02bd936deebafb494664bf4e248/deps/libeio/eio.h#L221-223

I'm not sure what the API should look like. Maybe:

process.maxThreads = 5

?

Marco Rogers

unread,
May 26, 2010, 2:51:36 PM5/26/10
to nod...@googlegroups.com
Maybe process.maxIOThreads?  Don't want people confused about the fact there is only one process thread for node.

:Marco


--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.




--
Marco Rogers
marco....@gmail.com

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz

Jorge

unread,
May 26, 2010, 3:14:37 PM5/26/10
to nod...@googlegroups.com
On 26/05/2010, at 19:44, r...@tinyclouds.org wrote:

> (...) (we're thinking about doing a gzip
> implementation which uses many different threads for each chunk) (...)

Just a side note: gzipping in many small chunks yields worse results than in a single big chunk: to get the best results, the output .gz file should have a single member (coming from a single input chunk): IOW:

This (with 2 members):

$ gzip -c file1 > foo.gz
$ gzip -c file2 >> foo.gz

Or this:

$ gzip -c file1 file2 > foo.gz

compresses worse than :

$ cat file1 file2 | gzip > foo.gz

It worsens as the # of chunks increases.
--
Jorge.

Jorge

unread,
May 26, 2010, 4:05:23 PM5/26/10
to nodejs

For example: a test.txt of ~128Kb file compressed as:

$ gzip test.txt
$ ls -l test.txt.gz
-rw-r--r--@ 1 jorge staff 10243 26 may 21:52 test.txt.gz

Broken in 4k chunks:

$ split -n 4096 test.txt

And compressed in chunks:

$ gzip -c xaa ... xbe > test2.gz
$ ls -l x.gz
-rw-r--r-- 1 jorge staff 49020 26 may 21:53 x.gz

is almost 5x bigger.
--
Jorge.

r...@tinyclouds.org

unread,
May 26, 2010, 4:11:43 PM5/26/10
to nod...@googlegroups.com
On Wed, May 26, 2010 at 12:14 PM, Jorge <jo...@jorgechamorro.com> wrote:
> On 26/05/2010, at 19:44, r...@tinyclouds.org wrote:
>
>> (...) (we're thinking about doing a gzip
>> implementation which uses many different threads for each chunk) (...)
>
> Just a side note: gzipping in many small chunks yields worse results than in a single big chunk: to get the best results, the output .gz file should have a single member (coming from a single input chunk): IOW:

I was thinking of something like Pigz http://www.zlib.net/pigz/ for
high throughput.

Ben Leslie

unread,
May 26, 2010, 10:09:19 PM5/26/10
to nod...@googlegroups.com

OK, cool, thanks for the links, all makes sense. I didn't realise that the
aio_* were usually implemented as threads anyway on those OSes.
(Seems so silly since at the hardware level it is all asynchronous, we
go from asynch DMA back to synchronous threads back to asynch
node, seems such a waste switching abstractions back and forth, but
so be it.)

Cheers,

Benno

Reply all
Reply to author
Forward
0 new messages