Streams writev API

Isaac Schlueter

unread,

Apr 22, 2013, 8:01:50 PM4/22/13

to nodejs

There's a syscall called `writev` that lets you write an array (ie,
"Vector") of buffers of data rather than a single buffer.

I'd like to support something like this for Streams in Node, mostly
because it will allow us to save a lot of TCP write() calls, without
having to copy data around, especially for chunked encoding writes.
(We write a lot of tiny buffers for HTTP, it's kind of a nightmare,
actually.)

Fedor Indutny has already done basically all of the legwork to
implement this. Where we're stuck is the API surface, and here are
some options. Node is not a democracy, but your vote counts anyway,
especially if it's a really good vote with some really good argument
behind it :)

Goals:
1. Make http more good.
2. Don't break existing streams.
3. Don't make things hard.
4. Don't be un-node-ish

For all of these, batched writes will only be available if the
Writable stream implements a `_writev()` method. No _writev, no
batched writes. Any bulk writes will just be passed to _write(chunk,
encoding, callback) one at a time in the order received.

In all cases, any queued writes will be passed to _writev if that
function is implemented, even if they're just backed up from a slow
connection.

Ideas:

A) stream.bulk(function() { stream.write('hello');
stream.write('world'); stream.end('!\n') })

Any writes done in the function passed to `stream.bulk()` will be
batched into a single writev.

Upside:
- Easier to not fuck up and stay frozen forever. There is basically
zero chance that you'll leave the stream in a corked state. (Same
reason why domain.run() is better than enter()/exit().)

Downsides:
- easier to fuck up and not actually batch things. eg,
s.bulk(function(){setTimeout(...)})
- bulk is a weird name. "batch" maybe? Nothing else really seems
appropriate either.
- somewhat inflexible, since all writes have to be done in the same
function call

B) stream.cork(); stream.write('hello'); stream.write('world');
stream.end('!\n'); stream.uncork();

Any writes done while corked will be flushed to _writev() when uncorked.

Upside:
- Easy to implement
- Strictly more flexible than stream.bulk(writer). (Can trivially
implement a bulk function using cork/uncork)
- Useful for cases outside of writev (like corking a http request
until the connection is established)

Downsides:
- Easy to fuck up and stay corked forever.
- Two functions instead of just one (double the surface area increase)

C) stream.writev([chunks,...], [encodings,...], callback)

That is, implement a first-class top-level function called writev()
which you can call with an array of chunks and an array of encodings.

Upside:
- No unnecessary surface area increase
- NOW IT'S YOUR PROBLEM, NOT MINE, HAHA! (Seriously, though, it's
less magical, simpler stream.Writable implementation, etc.)

Downside:
- A little bit tricky when you don't already have a list of chunks to
send. (For example, with cork, you could write a bunch of stuff into
it, and then uncork all at the end, and do one writev, even if it took
a few ms to get it all.)
- parallel arrays, ew.

D) stream.writev([ {chunk:buf, encoding: blerg}, ...], callback)

That is, same as C, but with an array of {chunk,encoding} objects
instead of the parallel arrays.

Same +/- as C, except the parallel array bit. This is probably how
we'd call the implementation's stream._writev() anyway, so it'd be a
bit simpler.

Which of these seems like it makes the most sense to you?

Is there another approach that you'd like to see here? (Note: "save
all writes until end of tick always" and "copy into one big buffer"
approaches are not feasible for obvious performance reasons.)

Nathan Rajlich

unread,

Apr 22, 2013, 9:21:24 PM4/22/13

to nodejs

I'd vote D, because then .writev() could accept an Array of these "chunk objects" and/or Buffer instances (which don't need "encoding" specified). That way, if you're just writing Buffer instances, it would be the logical:

stream.writev([ buf1, buf2, buf3 ], callback);

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dean Landolt

unread,

Apr 22, 2013, 9:40:27 PM4/22/13

to nod...@googlegroups.com

On Mon, Apr 22, 2013 at 9:21 PM, Nathan Rajlich <nat...@tootallnate.net> wrote:

I'd vote D, because then .writev() could accept an Array of these "chunk objects" and/or Buffer instances (which don't need "encoding" specified). That way, if you're just writing Buffer instances, it would be the logical:

stream.writev([ buf1, buf2, buf3 ], callback);

Do you mean C?

I rather like the C/D family of options, though given the choice I'd go with option E, a small tweak to more closely mirrors the signature of stream.write...

stream.writev([ [ buf1 ], [ buf2, "ignored" ], [ str1, 'ascii' ] ], callback);

It'd just be an array of tuples where the second arg is ignored if the first arg is a buffer.

Kevin Swiber

unread,

Apr 22, 2013, 9:42:49 PM4/22/13

to nod...@googlegroups.com

+1

Keep it simple in core. Variations can be built on top of this API to every module author's content, but Nathan's suggestion is how I'd prefer to use it in the core API.

Sent from my iPhone

Mikeal Rogers

unread,

Apr 22, 2013, 10:15:38 PM4/22/13

to nod...@googlegroups.com

Is there a reason not to just have the underlying libuv *always* writev when it has more than one pending buffer to write?

I'm wondering whey we can't just optimize this behind the scenes, is there a reason we need to map each stream write a write syscall?

-Mikeal

Tim Smart

unread,

Apr 22, 2013, 11:20:45 PM4/22/13

to nodejs

Ryan did this a while back, and couldn't get it fast enough for small
writes (might need some reference here)

Simply put - the overhead of abstraction wasn't worth it. A lot of
people using template engines are practically doing
response.writeHead(200, ...); response.end(template.compile()) which
doesn't need the writev fluff.

Tim

Mikeal Rogers

unread,

Apr 22, 2013, 11:22:46 PM4/22/13

to nod...@googlegroups.com

I remember when Ryan did this and it was back when there as just Buffer (now SlowBuffer) so I'm not sure how much we should trust those findings since things have changed pretty drastically since then.

Isaac Schlueter

unread,

Apr 23, 2013, 12:28:05 AM4/23/13

to nodejs

Mikeal, you are correct, Ry's original foray into this area is not
valid evidence. All of the relevant codes have been changed since
then.

> Is there a reason not to just have the underlying libuv *always*
> writev when it has more than one pending buffer to write?

No, and of course if _writev is a function, it would call that with
however many pending buffers it has. But if you write 4 times in one
tick, and all 4 can be immediately flushed to the TCP socket without
blocking, then that's what'll happen. It would happen faster if we
could grab all 4 and send them at once, but either (a) you need to
signal that you're doing that, or (b) the underlying system has to
introduce lag (qv nagle).

Tim, you bring up a good point, ironically:

> response.writeHead(200, ...); response.end(template.compile()) which
> doesn't need the writev fluff.

On the contrary! This is one sort of operation where writev can
really shine! Even though we're not doing chunked encoding, in the
"header, end(body)" case, we manually copy the header and body into a
single chunk so that we can send it in one write(). That copy starts
to hurt when you have medium to large bodies, but with a writev, we'd
get the benefits of a single syscall without having to copy.

Dean, imo, "array of tuples" is about as bad as "parallel arrays".
It's a bit more manageable, but still slow and unwieldy.

Ben Noordhuis

unread,

Apr 23, 2013, 3:44:54 AM4/23/13

to nod...@googlegroups.com

On Tue, Apr 23, 2013 at 4:15 AM, Mikeal Rogers <mikeal...@gmail.com> wrote:
> Is there a reason not to just have the underlying libuv *always* writev when it has more than one pending buffer to write?

It could but it doesn't (at least, not always.) On most operating
systems, it's often faster to call write() two or three times in a row
than it is to call writev() once (for small payloads.)

There is an open issue[1] and some in-progress work[2] but it only has
a significant impact on big writes, on the order of 100 kB or more.

[1] https://github.com/joyent/libuv/issues/742
[2] https://github.com/bnoordhuis/libuv/compare/issue742

> I'm wondering whey we can't just optimize this behind the scenes, is there a reason we need to map each stream write a write syscall?

node.js and libuv optimize for latency first, throughput second
(because you can always buffer higher up in the application.) IOW, it
tries to send out pending data as quickly as possible.

(Caveat emptor: I'm not 100% sure if that statement is always correct
when streams2 are involved.)

Because node.js doesn't know when new data will be sent, it doesn't
know when it's sound to buffer. That's why the choice to buffer has
to be an explicit one, made by the programmer.

greelgorke

unread,

Apr 23, 2013, 5:54:33 AM4/23/13

to nod...@googlegroups.com

fist i thought D in Nathans version. this or may be this approach (it's cork modified in fact):

s.write('first') // writes immediately 'first'

s.buffer('second') // nothing written

s.buffer('third') // nothing written

s.write('forth') // writes 'secondthirdforth', so it flushes previously buffered data

s.buffer('fifth') // nothing written

s.buffer('sixth') // nothing written

s.end() // writes 'fithsixth' and emits 'end' etc.

s.buffer has same signature as s.write, handles watermarks etc. it just holds an array of buffers internally. i think it inherits upsides of version B but not it's downsides (only one function, write/end flushes/uncorks, so you will uncork, at last on .end()). But i don't know if it's easy to implement or not.

tjholowaychuk

unread,

Apr 24, 2013, 12:27:02 AM4/24/13

to nodejs

+1 C / D, D would be less awkward as far as building up the things
you're passing to .writev() goes, but the arrays are alright. Less
fancy stuff in core++

Jake Verbaten

unread,

Apr 24, 2013, 2:16:15 AM4/24/13

to nod...@googlegroups.com

I like D) stream.writev([ {chunk:buf, encoding: blerg}, ...], callback)

The leveldb driver has a very similar batch api ( https://github.com/rvagg/node-levelup#batch )

the leveldb driver also has a large thread about possible better APIs ( https://github.com/rvagg/node-levelup/issues/45 ) from which some inspiration may be drawn.

Isaac Schlueter

unread,

Apr 24, 2013, 4:18:40 AM4/24/13

to nodejs

> +1 C / D, D would be less awkward as far as building up the things
> you're passing to .writev() goes, but the arrays are alright. Less
> fancy stuff in core++

Part of my complaint about parallel arrays is that we'd probably end
up having to re-match them in many cases anyway. An array of
{chunk,encoding} is how the write queue is already implemented.

Matt

unread,

Apr 24, 2013, 9:18:08 AM4/24/13

to nod...@googlegroups.com

I'm gonna go against the crowd and ask for B:cork/uncork here. It's far easier for when you don't have a bunch of pre-composed buffer objects, and would fit into something streaming lines to the output a lot better. Yes you can fuck up, but there's a million ways the programmer can fuck up with node anyway.

greelgorke

unread,

Apr 24, 2013, 11:09:20 AM4/24/13

to nod...@googlegroups.com

stream.end should uncork and +1 (or see my prev post)

Matt

unread,

Apr 24, 2013, 11:30:38 AM4/24/13

to nod...@googlegroups.com

On Wed, Apr 24, 2013 at 11:09 AM, greelgorke <greel...@gmail.com> wrote:

stream.end should uncork and +1 (or see my prev post)

Am Mittwoch, 24. April 2013 15:18:08 UTC+2 schrieb Matt Sergeant:

I'm gonna go against the crowd and ask for B:cork/uncork here. It's far easier for when you don't have a bunch of pre-composed buffer objects, and would fit into something streaming lines to the output a lot better. Yes you can fuck up, but there's a million ways the programmer can fuck up with node anyway.

Also worth noting that experienced network programmers are used to doing cork/uncork already, so this will be familiar to them.

Isaac Schlueter

unread,

Apr 24, 2013, 12:29:16 PM4/24/13

to nodejs

Matt,

Yeah, I'm kind of in agreement here. TCP_CORK is a venerable old TCP-ism.

Functionally, though a top-level writev([{chunk,encoding}],cb) or
writev([chunks],[encodings],cb) *seems* a bit simpler, it's actually
not saving much complexity, compared with what it adds to the Writable
user. (Node *uses* a lot of streams, as well as implementing them, so
this is a relevant consideration from the "keep core simple" point of
view.)

Anyone unfamiliar with TCP_CORK should read this: http://baus.net/on-tcp_cork

On Wed, Apr 24, 2013 at 11:09 AM, greelgorke <greel...@gmail.com> wrote:
> stream.end should uncork and +1 (or see my prev post)

Yes, that is a good idea.

Matt

unread,

Apr 24, 2013, 1:02:49 PM4/24/13

to nod...@googlegroups.com

On Wed, Apr 24, 2013 at 12:29 PM, Isaac Schlueter <i...@izs.me> wrote:

Anyone unfamiliar with TCP_CORK should read this: http://baus.net/on-tcp_cork

Holy comment spam!

Marco Rogers

unread,

Apr 24, 2013, 2:50:54 PM4/24/13

to nod...@googlegroups.com, i...@izs.me

I've been trying to follow this thread, but there's a lot here. I apologize in advance if I'm retreading anything that's already been said.

I have a simple question first. Are you considering having both the flexible cork/uncork api and also putting a nice api on top of it with C/D option? That sounds nice in theory, but complex in execution and maintenance. If we're considering only one option to address this use case, then D is a huge pain in the ass. Building up an object for each chunk is onerous. C is a little nicer but as you mentioned, the buf/enc matchings wouldn't be explicit and might be error prone.

I'm not sure if I see the best API solution here. But I don't have a better idea yet. Have we considered B cork/uncork as the api, but then wrapping the complexity in pipe like we've done with most other things? This sounds like a use case that not everyone will need. And the people that do will take the time to do it right. But most people won't need it and shouldn't even have to care about it.

:Marco

Mike Pilsbury

unread,

Apr 24, 2013, 5:05:39 PM4/24/13

to nod...@googlegroups.com, i...@izs.me

Is this signature not worth considering?

stream.writev([chunks,...], encoding, callback)

It's an easier API to use. No need to create an object for each chunk.

Of course it'd be no use where there's a need to different encodings for some of the chunks, but how common a requirement is that? Maybe I'm naive in thinking that a single encoding for all chunks is the more common scenario.

Perhaps being able to provide either a single string or an array of strings would help.

stream.writev([chunks,...], encoding | [encodings,...], callback)

The common use case of a single encoding for all chunks is nice and easy to use, but the other use case is still catered for.

Fedor Indutny

unread,

Apr 24, 2013, 5:07:59 PM4/24/13

to nod...@googlegroups.com, Isaac Schlueter

+1 for cork/uncork, that's it.

Cheers,

Fedor.

--

Isaac Schlueter

unread,

Apr 24, 2013, 7:57:21 PM4/24/13

to Mike Pilsbury, nodejs

> Of course it'd be no use where there's a need to different encodings
> for some of the chunks, but how common a requirement is that?

It's as common a requirement as `res.write('some string that is
probably utf8')`. Requiring all chunks to be the same encoding is not
reasonable for the use case we care most about (http).

Marco,
We could certainly do s.bulk(function() { write() write() write() })
on top of cork/uncork. But at that point, it's probably unnecessary,
and could be something that userland streams do if they want to.

In the r.pipe(w) case, it won't matter much. The reader will be
calling write() and the writer will be writing usually one chunk at a
time. If, for some reason, it backs up and multiple writes, and
supports _writev, then yes, it'll writev it all at once.

Marco Rogers

unread,

Apr 24, 2013, 8:12:32 PM4/24/13

to nod...@googlegroups.com, Mike Pilsbury

I think you're answering my primary concern. Are you saying that whatever we choose, users will get the benefits of this for free in the 80% case? If that's true, then I think the unusual case should have the most flexible interface. That feels like cork/uncork to me. But take my opinion with a grain of salt, because I don't often work at that level.

I'd be interested in hearing about some actual use cases, because I've got another stupid question. Is this type of operation always synchronous? If s.bulk(function() {}) suggests that you're ready to write everything within the execution of that function. Unless the function parameter needs to have a callback. In which case, I really don't like that option. s.writev assumes you've got all of the chunks ready when you call it, but it also allows you to build them up asynchronously if you need to. Cork/uncork also allows this and is much more explicit in that affordance. But of course as you said, it's easier to get wrong. If there's an error in the middle, you're still corked. Or you could just forget to uncork which is even worse. So getting some perspective on the use cases might be helpful here.

:Marco

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

---

You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/UNWhF64KeQI/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Marco Rogers
marco....@gmail.com | https://twitter.com/polotek

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz

Isaac Schlueter

unread,

Apr 24, 2013, 8:48:59 PM4/24/13

to nodejs, Mike Pilsbury

> Are you saying that whatever we choose, users will get the benefits of this for free in the 80% case?

Yes, that is what I'm saying.

> Is this type of operation always synchronous? If s.bulk(function() {}) suggests that you're ready to write everything within the execution of that function.

Yeah, if you setTimeout in there, you're not in bulk() any more, so it
fails. It doesn't feel very node-ish to me. It looks like
domain.run(fn) but it's got wildly different semantics.

> s.writev assumes you've got all of the chunks ready when you call it

Yes, that is the case. Cork/uncork doesn't require you to already
know what you're going to write.

> If there's an error in the middle, you're still corked.

Errors generally indicate that the stream is hosed (hah) anyway, so
whatever. I don't care too much about that, really. If your stream
has an error, it's broken, and should be considered poisonous.

> Or you could just forget to uncork which is even worse.

Sure, but I think that the idea of uncorking automatically when you
call .end() solves most of that hazard.

> So getting some perspective on the use cases might be helpful here.

The primary use case is the http-tcp interaction, and saving syscalls
in web sites.

> You received this message because you are subscribed to the Google Groups
> "nodejs" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Marco Rogers

unread,

Apr 25, 2013, 1:19:05 AM4/25/13

to nod...@googlegroups.com, Mike Pilsbury, i...@izs.me

So what's your remaining issue with cork/uncork exactly? Just that it's ugly?

:Marco

Micheil Smith

unread,

Apr 25, 2013, 5:37:05 AM4/25/13

to nod...@googlegroups.com, i...@izs.me

What about an API in which a new pseudo stream was created, so, `writev`

becomes something of a mode on streams, such that you could do

something like:

stream.bulk(function(bulkStream){

bulkStream.write(buf, enc);

bulkStream.end(buf, enc) // or bulkStream.flush(cb)

})

That `bulkStream` would always use writev, and could be passed around to

other functions and worked with asynchronously. You'd just have to remember

to `end()` or `flush()` it.

(Note: the individual writes could in theory also have a `callback` argument,

but I've omitted that for brevity)

– Micheil

Tim Smart

unread,

Apr 25, 2013, 5:34:03 PM4/25/13

to nod...@googlegroups.com

My vote is with B), including the amendment of uncorking automatically in the
end() call.

res.cork()
res.writeHead(200, ...)
res.end(template.render())

If corked instead of using the hot path (squashing everything into one string to
write) it would use writev to combine the headers with the template blob/buffer.

Tim

Isaac Schlueter

unread,

Apr 25, 2013, 6:40:22 PM4/25/13

to nodejs

The winner is option B, cork/uncork, with the behavior that end()
automatically uncorks. Author-implemented
_writev([{chunk,encoding},...], cb) will get called with the buffered
writes, or _write(chunk, encoding, cb) will be called repeatedly if
_writev is not implemented.

A first-class writev([{chunk,encoding},...],cb) or
writev([chunks...],[encodings...],cb) was a very strong contender as
well, since it's a bit less stuff to grok, but it's also trickier to
use and doesn't really pay enough rent to justify the added
trickiness.

Thanks for the discussion, everyone, this has been incredibly helpful.

Marco Rogers

unread,

Apr 25, 2013, 7:53:41 PM4/25/13

to nod...@googlegroups.com

This was easily the smoothest discussion about core node changes that I've ever been a part of.

:Marco

You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/UNWhF64KeQI/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

greelgorke

unread,

Apr 26, 2013, 2:38:12 AM4/26/13

to nod...@googlegroups.com

yay \o/

Jorge

unread,

Apr 28, 2013, 5:12:19 PM4/28/13

to nod...@googlegroups.com

On 23/04/2013, at 02:01, Isaac Schlueter wrote:

> There's a syscall called `writev` that lets you write an array (ie,
> "Vector") of buffers of data rather than a single buffer.

> <snip>

About 3 years ago when ry was working on this for streams, I implemented a writev() for fs.write()s and even sent a patch, and found that -IIRC- writev()ing was more than 20% faster than simply write()ing, but there seems to be a limit in writev()s maximum allowable iovecs iocnt, if you hit it you have to break it down in several smaller writev()s with fewer iovecs and queue them properly, which is a bit of a mess.

--
( Jorge )();

Fedor Indutny

unread,

Apr 28, 2013, 5:15:44 PM4/28/13

to nod...@googlegroups.com

libuv has started doing it automatically recently.

Cheers,

Fedor.

Jorge

unread,

Apr 28, 2013, 5:55:43 PM4/28/13

to nod...@googlegroups.com

On 28/04/2013, at 23:15, Fedor Indutny wrote:

> libuv has started doing it automatically recently.

Awesome :-)

One more thing: OSX would benefit *a* *lot* more than anybody else of an fs.writev() due to the mutex in write()s due to the concurrent write()s bug (has anybody checked if it's been fixed already? Last time I checked it was... ~ 3 years ago, in Snow Leopard!).

That 20..25% speedup figure is very likely only good for OSX, and that mutex is very likely the culprit.

Are you going to implement an fs.writev(), or just a writev() for sockets?

WRT to the API: Perhaps it shouldn't accept strings/encodings, just buffers:

writev(buffer [, buffer...] [, cb]);

The user can convert his strings to buffers quite easily already.

Cheers,
--
( Jorge )();

Fedor Indutny

unread,

Apr 28, 2013, 6:00:04 PM4/28/13

to nod...@googlegroups.com

It already landed in master and it works for every TCP socket.

Cheers,

Fedor.

--
( Jorge )();

Jorge

unread,

Apr 28, 2013, 6:13:41 PM4/28/13

to nod...@googlegroups.com

On 29/04/2013, at 00:00, Fedor Indutny wrote:

> It already landed in master and it works for every TCP socket.

Awesome :-)

What about fs?

Fedor Indutny

unread,

Apr 28, 2013, 6:14:36 PM4/28/13

to nod...@googlegroups.com

No, only TCP so far, but it could be implemented for FS too, pull requests are welcome! :)

Cheers,

Fedor.

Jorge

unread,

Apr 28, 2013, 6:28:24 PM4/28/13

to nod...@googlegroups.com

On 29/04/2013, at 00:14, Fedor Indutny wrote:

> No, only TCP so far, but it could be implemented for FS too, pull requests are welcome! :)

Oh, what a pity... :-)

And any of you have checked if Apple has fixed that bug already?
--
( Jorge )();

Ben Noordhuis

unread,

Apr 29, 2013, 5:08:59 AM4/29/13

to nod...@googlegroups.com

It's still there in 10.8.2.

Reply all

Reply to author

Forward