Efficient implementation of web sockets protocol with Binary/B and socket streams

344 views
Skip to first unread message

Wes Garland

unread,
Dec 18, 2012, 4:03:32 PM12/18/12
to comm...@googlegroups.com
Has anybody done any work in this regard?

I am doing some code review, and could not help but notice that there is a glaring inefficiency that should be solvable easily, but doesn't seem to be.

Basically, in web sockets we have to do something like

sock.write(new ByteString([0]));
sock.write(new ByteString(payload, "us-ascii"));
sock.write(new ByteString([255]));

in C, I would write this either a writev() with three arrays of char in an iovec, or maybe something like

int writeIt(fd, payload, payloadLen)
{
  static char *buf = NULL;
  static size_t bufLen = 0;

  if (bufLen - 2 > payloadLen)
  {
    buf = realloc(buf, PAGESIZE + ((payloadLen + 2) * (payloadLen + 2) / PAGESIZE));
    buf[0] = (char)0;
  }

  buf[payloadLen + 1] = (char)0xff;
  memcpy(buf + 1, payload, payloadLen);
  return write(fd, buf, payloadLen + 2);
}

(obviously this is untested code that relies on a blocking write, but it's the memory management that's interesting, not the I/O for now)

So -- any ideas for a "good" way to fix this?

The first one that pops up is to implement a sockets API that essentially implements writev(), but that still means that I will be churning a ByteArray per write, in order to knock the string down to 8-bit rep'n.   That also breaks compatibility with Node.js's net module, which I am trying to keep.

The next one that pops up is to essentially implement the C solution above, only using ByteArray.copy() instead of memcpy.  I'm still creating a temporary ByteArray to this for each write, though.

Should we perhaps be looking at a new copy + conversion form for ByteArray instances?  Code like this might be more ideal from a memory/GC POV:

function writeIt(payload, payloadLen)
{
  if (!writeIt.buf || writeIt.buf.length < payloadLen + 2)
  {
    writeIt.buf = new (require("binary").ByteArray)(PAGESIZE + ((payloadLen + 2) * (payloadLen + 2) / PAGESIZE));
    writeIt.buf[0] = 0;
  }

  writeIt.buf[payloadLen - 1] = 255;
  writeIt.buf.copy(1, payloadLen, payload, "us-ascii");
}

Thoughts from the list?  Does anybody know what Node.js, python, perl, etc do under the hood?

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Ondřej Žára

unread,
Dec 19, 2012, 3:26:02 AM12/19/12
to comm...@googlegroups.com
Hi Wes,

TeaJS (previously v8cgi) implements WebSocket server as a pure JS module: http://code.google.com/p/teajs/source/browse/lib/websocket.js

Note that I use Binary/F instead of Binary/B.

Implementation-wise, I simply create a larger Buffer, set some control bytes and then copy the databuffer into it. Nothing fancy / super-optimized.


Sincerely,
Ondrej Zara


2012/12/18 Wes Garland <w...@page.ca>
--
You received this message because you are subscribed to the Google Groups "CommonJS" group.
To post to this group, send email to comm...@googlegroups.com.
To unsubscribe from this group, send email to commonjs+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/commonjs?hl=en.



--
+------------------+
|tn=b.             |
|p==n              |
|+j1               |
|=+                |
|1          #  ##  |
|           # #    |
|           #  ##  |
|        #  #    # |
|         ##   ##  |
+------------------+
 jsstyle.github.com

Donny Viszneki

unread,
Dec 18, 2012, 8:05:38 PM12/18/12
to comm...@googlegroups.com
socket.io allocates one buffer for each message sent over the
websocket transport, copies into the buffer:

socket.io 0.9.10 / 2012-08-10
filename: socket.io/lib/transports/websocket/default.js
file md5: 3b98fa9c89d5ee14e495138d73ad019f

188 var length = Buffer.byteLength(data)
189 , buffer = new Buffer(2 + length);
190
191 buffer.write('\x00', 'binary');
192 buffer.write(data, 1, 'utf8');
193 buffer.write('\xff', 1 + length, 'binary');
> --
> You received this message because you are subscribed to the Google Groups
> "CommonJS" group.
> To post to this group, send email to comm...@googlegroups.com.
> To unsubscribe from this group, send email to
> commonjs+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/commonjs?hl=en.



--
http://codebad.com/

Wes Garland

unread,
Dec 21, 2012, 10:18:21 PM12/21/12
to comm...@googlegroups.com
Thanks, Donny & Ondřej!

I remember now that Ryan (Node.js) complained that buffer-building in Binary/B was pretty much a non-starter, which I think is why Binary/F was born.  I should maybe take a look at implementing that.

I found that with GPSEE's implementation of Binary/B that I can implement this, which is fairly performant albeit, non-ideal.  Not much less ideal than the TeaJS approach, though... They both suffer from the same minor flaw, which is that they create a lot of short-lived GC things under certain workloads.

/** Write an array of Strings or ByteThings to the socket in a single system call.
 *  @param      dataArray       The array of things to write
 *  @returns    true            if all data was written to the kernel's buffer, false if some data was queued for later writing.
 *  @see        Socket.prototype.write()
 */
Socket.prototype.writev = function socket$writev(dataArray)
{
  var writevBuf = new binary.ByteArray(0);
  var i, tmp;

  if (dataArray.length === 1)
    return this.write(dataArray[0]);

  for (i = 0; i < dataArray.length; i++)
  {
    if (!require("gpsee").isByteThing(dataArray[i]))
    {
      tmp = new ffi.Memory(dataArray[i].length);
      tmp.copyDataString(dataArray[i]);
      dataArray[i] = tmp;
    }
    writevBuf.concat(dataArray[i]);
  }

  return this.write(writevBuf);
}

( http://code.google.com/p/gpsee/source/browse/modules/net/net.js?r=cf6a42847983ba1c649ef50bffd540bf7a04e436#425 )

What winds up happening here is that ByteArray.prototype.concat appends the new ByteArray (or Memory object) onto the end of the existing ByteArray, calling realloc() as necessary.  I'm not sure if this matches the original Binary/B spec or not... the current spec basically says, "See Array.concat" and clearly that's not right.   If GPSEE's implementation is non-conformant, we should think about adopting it as an extension named "append" or something.

I think CommonJS should actually have a Binary/??? spec that lets us concatenate ByteArrays, ByteStrings, and Strings (along with conversion details) all into one buffer, reallocating the buffer as necessary.  It make implementing code like writev() above simple AND efficient (fewer copies at the C level).

var wsData = new ByteArray([0], {string: "Hello", charset: "us-ascii"}, [255]);

Any other ideas out there?  I was thinking of prototyping in GPSEE but the code above actually performs well enough that I might not need to.

Wes

Ondřej Žára

unread,
Dec 28, 2012, 4:22:51 AM12/28/12
to comm...@googlegroups.com
What winds up happening here is that ByteArray.prototype.concat appends the new ByteArray (or Memory object) onto the end of the existing ByteArray, calling realloc() as necessary.  I'm not sure if this matches the original Binary/B spec or not... the current spec basically says, "See Array.concat" and clearly that's not right.   If GPSEE's implementation is non-conformant, we should think about adopting it as an extension named "append" or something.


So basically the goal here is to have a mutating version of concat, right? That makes sense to me, even if the final name (after months of bikeshedding) ends up as "concat2" or similar :-)

(no personal preference here, "append" would be okay with me as well)

O.

 



--

Wes Garland

unread,
Dec 28, 2012, 3:25:34 PM12/28/12
to comm...@googlegroups.com
Yeah -- if we had .append() which returned "this", and a settable length, we could write fairly efficient code, like this:

var tmpBA = new BINARY.ByteArray(0);
var xFF = new BINARY.ByteArray([0xff]);
var x00 = new BINARY.ByteArray([0x00]);
function makePacket(payloadObject)
{
  tmpBA.length = 0;
  return tmpBA.concat(x00).concat(new BINARY.ByteString(JSON.stringify(payloadObject), "us-ascii")).concat(xFF);
}

This get us closer -- tmpBA is continuously resized but not churned (instanciated/finalized), leaving the implementation room to be efficient in how heap is allocated.

But we are still churning a dummy ByteString, just to get the String converted so it can be put into the packet.

I'm re-reading the Binary/F spec, and it may be that implementing that spec -- along with a resizing .append() -- might be the better solution.  The resizing .append() winds up being a good way to build a buffer aggregating writes and so on...not as good as exposing libc's writev() to script (one extra copy) but not bad, either.

http://code.google.com/p/gpsee/source/browse/modules/net/net.js?r=9e9fe4e3ce6571bf0abbda46c77f144423330822#425

Hilariously, I was Googling for Binary/F and found Binary/F/Wes.  I'm glad somebody snapshotted that, it has fast Unicode conversion stuff in it which make doing JSON->ByteStrings faster.

I wonder how hard it would be to implement JSON.ByteStringify()?

Wes

Wes Garland

unread,
Dec 28, 2012, 3:27:07 PM12/28/12
to comm...@googlegroups.com
I should mention, the reason why Binary/F is better is that it allows Buffer filling from String, starting at an offset other than zero.  That is the only way to avoid the intermediary ByteString->ByteArray copy that happens with Binary/B, regardless of the prescence of .append()

Wes

Michael Schwartz

unread,
Dec 29, 2012, 12:38:30 PM12/29/12
to comm...@googlegroups.com
Hey Wes,

That looks rather inefficient.

How about just a plain old buffer that's growable, like a C++ string.

var payload = payload.toString(), len = payload.length;

buf.byteAt(0, 0x00);
buf.byteCopy(1, payload, payload. len);
buf.byteAt(payload.len + 1, 0xff);

Or even better:

buf.assemble(0x00, payload, 0xff);

And let it figure out the # of arguments, type of arguments, and how to build the final buffer.

Donny Viszneki

unread,
Dec 29, 2012, 6:22:18 PM12/29/12
to comm...@googlegroups.com
On Sat, Dec 29, 2012 at 9:38 AM, Michael Schwartz <myk...@gmail.com> wrote:
> buf.assemble(0x00, payload, 0xff);

This looks very similar to one of the examples Wes gave:

On Fri, Dec 21, 2012 at 7:18 PM, Wes Garland <w...@page.ca> wrote:
> var wsData = new ByteArray([0], {string: "Hello", charset: "us-ascii"},
> [255]);

Michael suggests interpreting number values as bytes. Wes suggests
being able to or needing to specify a character encoding for string
values. I think all of these are good API ideas.

As for efficiency,

I think one ought to also consider that for any kind of growing
buffer, it would be best to allow to specify a maximum size for the
buffer and/or individual character strings that will end up in the
buffer. This has the advantage of not having to transcode character
strings twice, in the case that your application needs to limit the
size of the buffer or character strings that end up there. (Also, in
case it isn't obvious, returning how many bytes were copied into the
buffer is a must-have.)

--
http://codebad.com/

Michael Schwartz

unread,
Dec 29, 2012, 9:42:35 PM12/29/12
to comm...@googlegroups.com
Not exactly what I meant.

buf.assemble() takes arguments and builds a single buffer. Wes' example looks like it's creating intermediate buffers and/or vectors of byte arrays vs. a single one.

Regards
Reply all
Reply to author
Forward
0 new messages