Binary/E

3 views
Skip to first unread message

Kris Kowal

unread,
Jan 17, 2010, 3:18:39 AM1/17/10
to comm...@googlegroups.com
New proposal. Can't say I'm proud of writing it; it's designed to be
closer to what most people have requested. Largely, it is a subset of
Binary/D. My intent is that future revisions would add bit types,
improve bidirectional conversion from existing primordial types, and
add radix encoding conveniences. But, this is certainly enough for
now. It is nearly a strict subset of the previous, but
ByteString.[[Get]] returns a Number (I regret this already, but it's
the mistake you guys want, so be it), ByteString and ByteArray are
both objects and support both calling and construction to create
instances of their type, and I've reintroduced the Binary base type.

http://wiki.commonjs.org/wiki/Binary/E

Kris Kowal

Daniel Friesen

unread,
Jan 17, 2010, 4:59:34 AM1/17/10
to comm...@googlegroups.com
I believe I said it before, but I guess I'll say it again.
Ditching Buffer and it's Text/Binary interface pair for ByteArray voids
the majority of the use cases for the abstract interface I thought of
when I came up with it.
Things like typeof bs[idx] === "number" also screw that up, and
bs.Content === Number also voids half of the things that can be done
with it.

I still don't get the general push towards an api that breaks symmetry
with existing API.


IMHO the binary proposals are a mess. Does anyone else notice unsettling
parts of them, or does it take a chart to visualize that?

Oh well, I'll probably just support two api's and make the underlying
data interchangeable.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Ash Berlin

unread,
Jan 17, 2010, 7:27:12 AM1/17/10
to comm...@googlegroups.com

On 17 Jan 2010, at 09:59, Daniel Friesen wrote:

>
> Things like typeof bs[idx] === "number" also screw that up, and bs.Content === Number also voids half of the things that can be done with it.

Name me a concrete use for bs[isx] returning a ByteString again. Every use I can think of for using ByteStrings makes [[Get]] useless unless it returns a primitive.

I for one any very happy with this change, and I like the feel of this proposal.


But then I've not really seen the use .Content in the first place.

>
> I still don't get the general push towards an api that breaks symmetry with existing API.

What existing binary API?

Daniel Friesen

unread,
Jan 17, 2010, 1:14:39 PM1/17/10
to comm...@googlegroups.com
Ash Berlin wrote:
> On 17 Jan 2010, at 09:59, Daniel Friesen wrote:
>
>> Things like typeof bs[idx] === "number" also screw that up, and bs.Content === Number also voids half of the things that can be done with it.
>>
> Name me a concrete use for bs[isx] returning a ByteString again. Every use I can think of for using ByteStrings makes [[Get]] useless unless it returns a primitive.
>
> I for one any very happy with this change, and I like the feel of this proposal.
>
> But then I've not really seen the use .Content in the first place.
>
The primary use case for .Content/.contentConstructor was originally
part of Buffer `new Buffer(seq.contentConstructor);`.
The secondary use case was the use of .contentConstructor to abstractly
cast, create empty data, and use the abstract parts of the functions on
the constructor like using .contentConstructor.fromCode to allow the
inverse of .codeAt(idx).
.Content === Number breaks the matching api of having it be either
String and ByteString.
Likewise typeof bs[idx] === "number" means there is no bsors[idx].fn
synchrony.

>> I still don't get the general push towards an api that breaks symmetry with existing API.
>>
> What existing binary API?
I'm not talking about an existing binary API, I'm talking about symmetry
with the existing standard api and conventions.

Kris Kowal

unread,
Jan 17, 2010, 1:48:16 PM1/17/10
to comm...@googlegroups.com
On Sun, Jan 17, 2010 at 1:59 AM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
> I believe I said it before, but I guess I'll say it again.
> Ditching Buffer and it's Text/Binary interface pair for ByteArray voids the
> majority of the use cases for the abstract interface I thought of when I
> came up with it.

For the benefit of those tuning in, you made proposal C. I am not
against your Buffer idea. Can you explain how it could not be
implemented in pure JavaScript on top of D or E, or in a future
version thereof? I've taken things I want out too, in order to target
the minimal feature set that we're willing to converge on.

> Things like typeof bs[idx] === "number" also screw that up, and bs.Content
> === Number also voids half of the things that can be done with it.

I don't like ByteString.Content === Number any more than you do, but
if enough people support it, I'm willing to compromise. So far I know
Ash supports it. I'd like to hear from others.

> I still don't get the general push towards an api that breaks symmetry with
> existing API.

There is no ratified binary API. There are trial implementations of B
and C, but none of them have received support from this group. As
long as this group doesn't converge, I'm going to keep generating
permutations of the API, until we hit something we can agree upon.
Compatibility with a previous design that people did not want is a
non-goal.

> IMHO the binary proposals are a mess. Does anyone else notice unsettling
> parts of them, or does it take a chart to visualize that?

I would prefer to evaluate them on their individual merits and
failures, since they bear no relationship to one another for purpose
of ratification. Please be specific.

Kris Kowal

Daniel Friesen

unread,
Jan 18, 2010, 2:11:11 PM1/18/10
to comm...@googlegroups.com
Kris Kowal wrote:
> On Sun, Jan 17, 2010 at 1:59 AM, Daniel Friesen
> <nadir.s...@gmail.com> wrote:
>
>> I believe I said it before, but I guess I'll say it again.
>> Ditching Buffer and it's Text/Binary interface pair for ByteArray voids the
>> majority of the use cases for the abstract interface I thought of when I
>> came up with it.
>>
>
> For the benefit of those tuning in, you made proposal C. I am not
> against your Buffer idea. Can you explain how it could not be
> implemented in pure JavaScript on top of D or E, or in a future
> version thereof? I've taken things I want out too, in order to target
> the minimal feature set that we're willing to converge on.
>
Sure it's possible someone could HACK together a StringBuffer
implementation using ByteArray and Encodings, however doing so would
likely make a very low-level of the system inefficient.
StringBuffer like String uses character based indexes, not byte based
indexes; .length, codeAt, splice, slice, indexOf, valueAt, clear, fill,
insert, replace, remove, split, etc... Every index based operation
(which can potentially be a fair bit of calls) on a StringBuffer is a
character index, not a byte index. So in order to use a ByteArray
implementation under Buffer you would likely need to use encodings in
some way to convert large portions of the ByteArray into a String at
various points in order to make calculations, or implement understanding
of UCS-2/UTF-16 in JavaScript.
However, this is really something that is likely already implemented
natively in the host running JavaScript. Engines already know how to
handle UCS-2/UTF-16 for String, and in Java StringBuffer and BlobBuffer
would be a implementation that passed around either a char[] or a byte[]
and performed almost the exact same algorithms and code on whichever of
them is relevant (in fact I was implementing *Buffer using one class and
a helper that would abstractly do just that).
So essentially to implement StringBuffer using a ByteArray you would
have to re-implement low-level native code inside JavaScript inherently
wrapped by a number of underlying calls.

As for making Buffer a future part of the Binary spec. That feels a
little awkward to me. ByteArray and BlobBuffer do the same thing, but
with a different API that is incompatible. So now in the future you end
up with two completely different API to do the same thing, both are part
of a standardized spec, so even though they do the same thing you can't
really drop either even as a major version change dropping support for
old backwards compatibility layers.

By the way, does anyone have any example, or in-use code using ByteArray
in practical uses? I wan't to compare how nice and robust the ByteArray
and BlobBuffer APIs are in comparison to each other in terms of
binary-only use.
> ...


>> I still don't get the general push towards an api that breaks symmetry with
>> existing API.
>>
>
> There is no ratified binary API. There are trial implementations of B
> and C, but none of them have received support from this group. As
> long as this group doesn't converge, I'm going to keep generating
> permutations of the API, until we hit something we can agree upon.
> Compatibility with a previous design that people did not want is a
> non-goal.
>

Not (compatibility) symmetry with a previous design. "Symmetry with an
existing API" as in symmetry with the standard API.
ie:
var test = function(Constructor) { return Constructor().Content ===
Constructor; };
test(String); // true
test(ByteString); // false
String().valueOf().Content === String().valueAt(0).Content; // false,
"value" is inconsistent in type


Btw, Binary/E's valueAt is actually basically my codeAt() which is
symmetrical and consistent with the Standard API
String.fromCharCode/string.charCodeAt inspired
(String|Blob).fromCode/(string|blob).codeAt
string.charAt alone was matched by blob.byteAt
string.charAt+string.valueOf inspired string.valueAt and thus
blob.valueAt/blob.valueOf
When reading code you can consistently expect "code" refers to a char
code or 0-255 byte number both of a Number type (which can be put back
into .contentConstructor.fromCode()), and "value" refers to a sequence
of the same type as the content (String or Blob).
> ...
> Kris Kowal

Wes Garland

unread,
Jan 19, 2010, 12:56:46 PM1/19/10
to comm...@googlegroups.com
I have been thinking hard about []... So, Let's talk use-cases for a bit.

Who here has used binary/* to write real-world programs?

I have used binary/b quite a bit, but most of my uses are pretty mundane:
  - generator which yields ByteStrings or ByteArrays  ("stream")
  - converting from memory (C functions) to String  (decodeToString)
  - reading/writing base64 and quoted printable MIME encodings
  - removing trailing newlines from ByteArrays

I have found that the ByteArray / ByteString distinction, for me at least, is more of a mutable/immutable distinction -- I don't seem to (personally) get a lot of use out of the methods on the prototype.  (And, in fact, in GPSEE you can apply methods from ByteString to ByteArray anyhow).

The [] operator is interesting, because it looks really useful -- yet it's actually quite limited. What do we want to do with it? Well, assigning to it and comparing it with something else both seem like good ideas.  Both of these operations require literals on the right-hand-side of the expression to be of real use in day-to-day programming.

So, what literal to use?  It would be nice to use ByteString / ByteArray literals, but there is simply no way to do that with today's technology.  Pragmatists among us, then, must chose between
 - Nothing
 - One-character strings  (with charCodeAt() values <= 0xff)
 - Numbers with values <= 0xff

Let me throw up a bunch of code examples, and you guys can give feedback on the reasonableness of the syntax from a CommonJS-using programmer's POV.

if ("\n" == ba[ba.length])    /*1*/
  ba.length--;

if (ba[ba.length] == "\n")   /*2*/
  ba.length--;

if (ba[ba.length] == 10)   /*3*/
  ba.length--;

if (10 == ba[ba.length]))  /*4*/
  ba.length--;

if (ba[ba.length].charCodeAt(0) == 10)   /*5*/
  ba.length--;

if (10 == ba[ba.length].charCodeAt(0))   /*6*/
  ba.length--;

  • if [] returns a number, 3 and 4 work.
  • if [] returns a one-char string, 1 and 2 work
  • if [] returns a careful object, 1, 4, 5, 6 work but 2 and 3 fail the law of least surprises
  • if [] returns a one-byte ByteArray, 5 and 6 work
More areas for thinking:
  • Same exercise as above for assignment
  • What should the default iterator do
  • If both "descend" from a common mutable BLOB, many similar types can be written for special purposes. GPSEE does this for C types, and treats binary/b stuff like C types, it's very handy. I think that would solve Dan Frisen's issues as well.
Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Daniel Friesen

unread,
Jan 19, 2010, 1:44:30 PM1/19/10
to comm...@googlegroups.com
> * if [] returns a number, 3 and 4 work.
> * if [] returns a one-char string, 1 and 2 work
> * if [] returns a careful object, 1, 4, 5, 6 work but 2 and 3 fail

> the law of least surprises
> * if [] returns a one-byte ByteArray, 5 and 6 work
>
> More areas for thinking:
>
> * Same exercise as above for assignment
> * What should the default iterator do
> * If both "descend" from a common mutable BLOB, many similar types

> can be written for special purposes. GPSEE does this for C
> types, and treats binary/b stuff like C types, it's very handy.
> I think that would solve Dan Frisen's issues as well.
>
> Wes
>
> --
> Wesley W. Garland
> Director, Product Development
> PageMail, Inc.
> +1 613 542 2787 x 102
Point 4 I believe you mean ByteString.
One more idea to throw up in that 4point list.

We're dealing with single bytes here since [idx] can never return more
than one byte. This means that there will never be more than 256
possible returns, and each of these is easily mapped without complex
hashing algorithms. Why not have [] return a single instance of any one
byte. ie: if you bsa[0] === bsb[0] and both bytes are 1 the same
ByteString instance will be returned instead of 2 separate ones.

The only unexpected thing I can think of that would come from that is if
someone did something really weird like `bs[0].foo = ...; bs[1].foo()`.
Which really shouldn't be done, assigning properties doesn't work with
primitives like string and I don't see any reason it should work with
ByteStrings. For all I care implementations could decide that
ByteString/Blob instances are immediately sealed on-creation (heck, I
might do that myself when I'm writing Blobs for MonkeyScript).

Techniques like that have worked well for me so far. In the mongo-rhino
lib the pure Rhino version of Kommonwealth v3 (company project) I use a
Java WeakHashMap to ensure that within a worker thread there is never
more than one instance of an ObjectId (ie: If it's in use already
calling ObjectId returns the saved ObjectId instance, if the ObjectId
looses all it's references the instance gets GCed). It works great for
making === work. I added the code around a point where I wanted to be
able to check an array of ObjectIds for an ObjectId and didn't want to
have to fancy with a specialized function and just wanted to use
indexOf/has(Wrench.js) to check.

Wes Garland

unread,
Jan 19, 2010, 2:08:22 PM1/19/10
to comm...@googlegroups.com
On Tue, Jan 19, 2010 at 1:44 PM, Daniel Friesen <nadir.s...@gmail.com> wrote:
Point 4 I believe you mean ByteString.

No. ByteString is an immutable type, which means you can't remove a trailing newline from it.  So for streams where I think I might want to do this, I yield ByteArrays instead.  It is possible to easily and cheaply make ByteStrings from ByteArrays if I happen to need ByteString methods; the key is a ground-up back-end COW design.  (I am working on one but have gotten sidetracked). 
 
We're dealing with single bytes here since [idx] can never return more than one byte. This means that there will never be more than 256 possible returns, and each of these is easily mapped without complex hashing algorithms. Why not have [] return a single instance of any one byte. ie: if you bsa[0] === bsb[0] and both bytes are 1 the same ByteString instance will be returned instead of 2 separate ones.

You know, that's a really good idea.  I like it.  It is also implementable on ES3. It works around the lack of operator overloading or modifiable coersion by simply enumerating every possible case, and using exact equality.  Essentially, you're talking about interned bytes!

Let's expand here:

For it to work effectively, though, we need more support, though.  I think it should be possible to write:

if (ba[ba.length] == Byte("\n"))
  ba.length--;

This example needs
  - your suggestion
  - ByteArray[] which returns an instance of Byte (only 256 instances possible)
  - A special Byte function which knows what to do with a one-character string

I think the Byte function should also know what do with a number.
 
Using Byte like this is also interesting, because you can go the other way:

if (ba[ba.length] != Byte("\n"))
  ba.push(Byte("\n"));


This is great food for thought, Dan. Thanks!

Wes

Daniel Friesen

unread,
Jan 19, 2010, 2:34:53 PM1/19/10
to comm...@googlegroups.com
Wes Garland wrote:
>
> On Tue, Jan 19, 2010 at 1:44 PM, Daniel Friesen
> <nadir.s...@gmail.com <mailto:nadir.s...@gmail.com>> wrote:
>
> Point 4 I believe you mean ByteString.
>
>
> No. ByteString is an immutable type, which means you can't remove a
> trailing newline from it. So for streams where I think I might want
> to do this, I yield ByteArrays instead. It is possible to easily and
> cheaply make ByteStrings from ByteArrays if I happen to need
> ByteString methods; the key is a ground-up back-end COW design. (I am
> working on one but have gotten sidetracked).
You want to be able to remove something from the one byte
ba[ba.length-1] and have it affect the byte array? Wouldn't that be
ba.pop()?

>
> We're dealing with single bytes here since [idx] can never return
> more than one byte. This means that there will never be more than
> 256 possible returns, and each of these is easily mapped without
> complex hashing algorithms. Why not have [] return a single
> instance of any one byte. ie: if you bsa[0] === bsb[0] and both
> bytes are 1 the same ByteString instance will be returned instead
> of 2 separate ones.
>
>
> You know, that's a really good idea. I like it. It is also
> implementable on ES3. It works around the lack of operator overloading
> or modifiable coersion by simply enumerating every possible case, and
> using exact equality. Essentially, you're talking about interned bytes!
;) Precisely.
It's actually somewhat like string primitives and string objects. A
string primitive is roughly similar to an interned string in Java.

> Let's expand here:
>
> For it to work effectively, though, we need more support, though. I
> think it should be possible to write:
>
> if (ba[ba.length] == Byte("\n"))
> ba.length--;
>
> This example needs
> - your suggestion
> - ByteArray[] which returns an instance of Byte (only 256 instances
> possible)
> - A special Byte function which knows what to do with a
> one-character string
>
> I think the Byte function should also know what do with a number.
>
> Using Byte like this is also interesting, because you can go the other
> way:
>
> if (ba[ba.length] != Byte("\n"))
> ba.push(Byte("\n"));
Don't see why it couldn't work in ByteString though.

function ByteString(...) {
...
if ( length === 1 ) {
if ( _byteCache.hasObjectProperty(byteCode) )
return _byteCache[byteCode];
return _byteCache[byteCode] = this;
}
...
}

Roughly in JS as an example. Real implementation would be different, but
just an explanation.


> This is great food for thought, Dan. Thanks!
>
> Wes
>
> --
> Wesley W. Garland
> Director, Product Development
> PageMail, Inc.
> +1 613 542 2787 x 102

Kris Kowal

unread,
Jan 19, 2010, 5:26:48 PM1/19/10
to comm...@googlegroups.com
A: x.[[Get]] returns a Number
B: x.[[Get]] returns something of type X

Please note here if this is a false dilemma in your opinion.

X: a.[[Get]](b) === a.[[Get]](b)
Y: !X

0: a.slice(...b) === a.slice(...b)
1: !0

Kris Kowal

Dean Landolt

unread,
Jan 19, 2010, 6:09:38 PM1/19/10
to comm...@googlegroups.com
On Tue, Jan 19, 2010 at 5:26 PM, Kris Kowal <cowber...@gmail.com> wrote:
A: x.[[Get]] returns a Number
B: x.[[Get]] returns something of type X

Please note here if this is a false dilemma in your opinion.

A+1

What is type X? If type X is a ByteString, B-1. If type X is the new Byte type Dan and Wes conjured up where Byte(n) === Byte(n), B+1 -- I'm indifferent between [[Get]] returns Number or [[Get]] returns Byte
 

X: a.[[Get]](b) === a.[[Get]](b)
Y: !X

I don't get X -- what the callable [[Get]] would return?


0: a.slice(...b) === a.slice(...b)
1: !0

Is 0 even possible? If so, +1

Kris Kowal

unread,
Jan 19, 2010, 7:38:31 PM1/19/10
to comm...@googlegroups.com
To clarify without pseudo-internal-method-call-notation:

A: x[y] returns Number
B: x[y] returns something that is the same type as x, albeit
ByteString or ByteArray
C: x[y] returns a Byte object (I do not think this is what Wes and
Daniel are converging upon, but what Dean appears to believe they are
converging upon. If so, what distinguishes it from A and B will need
to be qualified.)

X: a[b] === a[b]

Daniel Friesen

unread,
Jan 19, 2010, 7:51:53 PM1/19/10
to comm...@googlegroups.com
Kris Kowal wrote:
> To clarify without pseudo-internal-method-call-notation:
>
> A: x[y] returns Number
> B: x[y] returns something that is the same type as x, albeit
> ByteString or ByteArray
> C: x[y] returns a Byte object (I do not think this is what Wes and
> Daniel are converging upon, but what Dean appears to believe they are
> converging upon. If so, what distinguishes it from A and B will need
> to be qualified.)
>
I was proposing x[y] returns an interned ByteString, everyone else
started to converge on a Byte object till Number got thrown back in.

> X: a[b] === a[b]
> Y: !X
>
This was the goal of the proposal. There are only 256 possible bytes so
even without any sort of WeakHashMap or operator overloading it's
possible to make this work either with a interned Byte type or interned
ByteStrings for single bytes.

> 0: a.slice(...b) === a.slice(...b)
> 1: !0
>
This was outside the goal of the proposal, though if you had something
like WeakHashMap or operator overloading it would be possible to make work.
> Kris Kowal

Kris Kowal

unread,
Jan 19, 2010, 9:06:07 PM1/19/10
to comm...@googlegroups.com
If we support the idiom:

x[n] === x[n]

We should also preserve the idiom:

x[n] === x.slice(n, n + 1)

To do less would be uncanny. I'm in support of either supporting
object strict equality and all idioms you might expect with object
strict equality for equivalent immutable byte objects, or no support
for strict equality. Object strict equality and equivalence comes
with a lot of idioms. For example it is reasonable to expect
currently that if:

typeof x === "object"

Then it would not be reasonable to expect:

new X() === new X()

I'm suggesting we play the game carefully. Every proposal but
Binary/E presumes that equivalent byte strings are strictly equal. In
response to Ash and Wes's criticisms of that approach, Binary/E is
much simpler; fewer methods (possibly fewer in a future draft) and no
fanciness attempting to emulate built-in types. I am looking for a
show of hands for which general approach to pursue.

Kris Kowal

Daniel Friesen

unread,
Jan 19, 2010, 9:55:30 PM1/19/10
to comm...@googlegroups.com
Kris Kowal wrote:
> If we support the idiom:
>
> x[n] === x[n]
>
> We should also preserve the idiom:
>
> x[n] === x.slice(n, n + 1)
>
> To do less would be uncanny. I'm in support of either supporting
> object strict equality and all idioms you might expect with object
> strict equality for equivalent immutable byte objects, or no support
> for strict equality. Object strict equality and equivalence comes
> with a lot of idioms.
That's pretty much why I put my example inside the ByteString
constructor instead of [[Get]], I expected someone would probably want
x[n] === ByteString(0) to work.

Now that I think about it `x[n] === x.slice(n, n + 1)` is actually a
argument in favor of a ByteString return type for [], you can't preserve
equivalence using a separate Byte type without making slice return
ByteStrings in some cases and Byte objects in others, which would be
unexpected.


> For example it is reasonable to expect
> currently that if:
>
> typeof x === "object"
>
> Then it would not be reasonable to expect:
>
> new X() === new X()
>

Didn't include it in my earlier example, but I'd go for omitting any
interning when new is called.
In essence X as a constructor is new object creation, X as a function is
like how (new String(...)).intern() behaves (though you can omit the
overhead of creating an extra object that may be discarded).


> I'm suggesting we play the game carefully. Every proposal but
> Binary/E presumes that equivalent byte strings are strictly equal. In
> response to Ash and Wes's criticisms of that approach, Binary/E is
> much simpler; fewer methods (possibly fewer in a future draft) and no
> fanciness attempting to emulate built-in types. I am looking for a
> show of hands for which general approach to pursue.
>
> Kris Kowal
>

Kris Kowal

unread,
Jan 19, 2010, 10:03:11 PM1/19/10
to comm...@googlegroups.com
On Tue, Jan 19, 2010 at 6:55 PM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
> Kris Kowal wrote:

>> For example it is reasonable to expect
>> currently that if:
>>
>> typeof x === "object"
>>
>> Then it would not be reasonable to expect:
>>
>> new X() === new X()
>>
>
> Didn't include it in my earlier example, but I'd go for omitting any
> interning when new is called.
> In essence X as a constructor is new object creation, X as a function is
> like how (new String(...)).intern() behaves (though you can omit the
> overhead of creating an extra object that may be discarded).

It is not impossible to induce "new X()" to return something other
than "this". Try:

javascript:(function () {function X() {return new Number(10)};
alert(new X().valueOf())})()

My argument stands with X() === X() as well.

Kris Kowal

Daniel Friesen

unread,
Jan 19, 2010, 10:48:51 PM1/19/10
to comm...@googlegroups.com
Double negative, what did you mean?

Donny Viszneki

unread,
Jan 20, 2010, 1:48:20 AM1/20/10
to comm...@googlegroups.com
On Tue, Jan 19, 2010 at 2:08 PM, Wes Garland <w...@page.ca> wrote:
> ByteString is an immutable type, which means you can't remove a trailing
> newline from it.  So for streams where I think I might want to do this, I
> yield ByteArrays instead.  It is possible to easily and cheaply make
> ByteStrings from ByteArrays if I happen to need ByteString methods; the key
> is a ground-up back-end COW design.  (I am working on one but have gotten
> sidetracked).

Also, on GPSEE we support the application of ByteString methods onto
ByteArray instances, so we don't even need to make a copy if we just
wanted a ByteString method that wasn't found on the ByteArray
prototype.

--
http://codebad.com/

Wes Garland

unread,
Jan 20, 2010, 1:13:27 PM1/20/10
to comm...@googlegroups.com
Hi, Kris!

On Tue, Jan 19, 2010 at 7:38 PM, Kris Kowal <cowber...@gmail.com> wrote:
C: x[y] returns a Byte object (I do not think this is what Wes and
Daniel are converging upon, but what Dean appears to believe they are
converging upon.  If so, what distinguishes it from A and B will need
to be qualified.)


You missed the IRC discussion yesterday, which is why some of Dean's comments appear slightly out of context.
 
Dean, Iminimo, Dan, Ash and I had a chit-chat. No conclusions were reached, but some observations were made that I found interesting:

1 - It is possible to have a Byte() pseudo-type which behaves similarly to an atom.  This is accomplished by having no "new Byte" constructor, just a function that returns one of 256 possible objects, presumably out of an array or something.

2 - This Byte type is useful for [[Set]]

3 - We can identify which Byte we are interested in many ways: by single-char string, by number, by collections of bits...

5 - With said Byte type, we can meaningfully compare [[Get]] results for equality, and against something which resembles/works like a Byte literal

5 - If we give each instance of the Byte type a valueOf() function, we can meaningfully compare [[Get]] results for size (greater than, less than)

6 - If we give decided that valueOf() returns one-character strings with said strings matching the value of the Byte, we can compare against one-character strings

7 - If we chose number for valueOf() instead, we could compare against Numbers instead

Then I thought about all these, and said, oh hell, we get almost all those benefits, minus the overhead of marshalling objects, by just using Numbers everywhere.  That would be really advantageous if our ByteArrays were used to represent something like canvas pixels.  Is the sugar worth the price?

Today, I also noticed that functions like Binary/B indexOf() could really benefit from Byte() if they really mean to work on single bytes.

Just now, I realized that significant portions of Binary/B's ByteArray could be implemented in ES3 browsers via Arrays of Byte. Not sure how useful an observation that is.

Wes Garland

unread,
Jan 20, 2010, 1:44:59 PM1/20/10
to comm...@googlegroups.com
Sample Byte() implementation, with lazy creation and one-char-String valueOf:

#! /usr/bin/commonjs

function Byte(which)
{
  if (this instanceof arguments.callee)
    throw("This is not a constructor");

  function Byte(which)
  {
    this.which = which;
  }

  Byte.prototype.valueOf = function Byte_valueOf()
  {
    return String.fromCharCode(this.which)
  };

  Byte.prototype.toSource = function Byte_toSource()
  {
    return "Byte(" + this.which + ")";
  };

  if (which instanceof Byte)
    return which;

  switch(typeof which)
  {
    case "object":
      if (which.valueOf && (which.valueOf() || (which.valueOf() === 0)))
        which = which.valueOf();
      else
        which = which.toString();
    case "string":
      if (which.length != 1)
        throw("invalid length String for Byte selector");
      else
        which = which.charCodeAt(0);
    case "number":
      break;
    default:
      throw("invalid type for Byte selector");
  }

  if (typeof this.bytes == "undefined")
    this.bytes = new Array(256);
  if (typeof this.bytes[which] == "undefined")
    this.bytes[which] = new Byte(which);

  return this.bytes[which];
}

print(Byte(10) ==  Byte(10));                   /* true */
print(Byte(10) === Byte(10));                   /* true */
print(Byte(10) >   Byte(9));                    /* true */
print(Byte(10) <   Byte(9));                    /* false */
print(Byte(10) ==  "\n");                       /* true */
print(Byte(10) === "\n");                       /* false */
print(Byte(10) ==  10);                         /* false */
print(Byte(10) === 10);                         /* false */
print(Byte(10) ===  Byte(Byte(10)));            /* true */
print(Byte(10) ===  Byte("\n"));                /* true */

Dean Landolt

unread,
Jan 20, 2010, 2:13:56 PM1/20/10
to comm...@googlegroups.com

Very descriptive, Wes.

Sorry, Kris, I should have given more context. The Byte function is certainly descriptive but as inimino pointed out on IRC, it could be just as useful as a helper type should [[Get]] returns a number. This kind of convenience would be lost if [[Get]] returns a ByteString though.

Wes Garland

unread,
Jan 20, 2010, 2:21:49 PM1/20/10
to comm...@googlegroups.com
One other thing which wasn't mentioned, is that if ByteString's [[Get]] returns a Byte (or a Number) instead of a ByteString, it becomes significantly easier to perform a deep traversal of an object hierarchy containing a ByteString, without having to special-case ByteString.

x[y] produces the same type as x, typical deep traversal algorithms will invariably fail with an endless loop yielding OOM.

Kris Kowal

unread,
Jan 20, 2010, 3:55:05 PM1/20/10
to comm...@googlegroups.com
On Wed, Jan 20, 2010 at 11:21 AM, Wes Garland <w...@page.ca> wrote:
> x[y] produces the same type as x, typical deep traversal algorithms will
> invariably fail with an endless loop yielding OOM.

I'm favoring typeof x[y] == "number" at this point. What are the
advantages of Byte over Number? That the Byte could host the
ByteString API?

I think we're looking at having ByteString and ByteArray both be
Objects that behave as Arrays of Numbers. The difference between the
two could be as little as having one be immutable and the other
mutable, where the immutable would host the stateless a subset of the
mutable API. Binary/E gets close to that.

If we make ByteString a non-special object, it probably makes the most
sense to go with the Array of Number interpretation. If we make
ByteString special, in the same sense that String is special, I think
the previous proposals make more sense. You don't run into the deep
traversal problem with String exactly because it is special, right
down to all of its operators, internal methods, and being a primitive
type. I imagine that this is a pattern: the more you try to be like
String, the more you have to be exactly like String. So far, I've
proposed two approaches:

1. Creating future-compatible types that resemble and would in future
revisions be exactly like String and Array. Binary/B and Binary/D.
2. Creating types that can be implemented without revising the
underlying JavaScript engine to support ==, typeof,
Object.prototype.toString, or enumerability, but allow patterns that
would not be compatible with a future revision that does do those
things. Binary/E.

There's a third approach, which ECMA would probably pursue but we
obviously can not because of our short term goals.

3. Create types that are exactly like String and Array, today.

If we are to pursue the second approach, I think we have to throw all
notions of emulating String behaviors like a[x] === a[x], a.slice(x,
x+1) === a[x], typeof a === "bytestring", typeof a[x] == typeof a,
Object.prototype.toString.call(a) === "[object ByteString]". We don't
have to throw away length assignment or [[Get]] since it seems
everyone agrees that those can be implemented by embeddings without
revising the core engine. What do you think? I'm starting to favor a
clean break.

Kris Kowal

Ryan Dahl

unread,
Jan 20, 2010, 4:30:08 PM1/20/10
to comm...@googlegroups.com

I'm trying to speed up Node by providing better buffering - the
current implementation copies data around very often. This is my third
iteration of doing binary, and I'm pretty happy with how it's is
turning out. It looks like this:

var s = "hello world";
var b = new Buffer(1024);
b.asciiWrite(s, 0, s.length);
b.utf8Write(s, s.length, s.length);
b.asciiSlice(0, 5) // => "hello"
b.utf8Slice(0, 5) // => "hello"
b.slice(0, 5) //=> Buffer object

You can see the source code here
http://github.com/ry/node/blob/df59f067345e3a45be7637122ba8566009aab830/src/node_buffer.cc
and an usage example here
http://github.com/ry/node/blob/df59f067345e3a45be7637122ba8566009aab830/lib/net.js

Daniel Friesen

unread,
Jan 20, 2010, 4:43:05 PM1/20/10
to comm...@googlegroups.com
Have you taken a look at IO/B/Buffer? That part of my proposal is
actually fairly close to what you list there.
Cept it's BlobBuffer since I have both Text and Binary buffers. It's
modeled after Java's StringBuffer class.
http://wiki.commonjs.org/wiki/IO/B/Buffer

.slice is a little different though... To create a new buffer with the
contents of a section of another buffer you would do `new
Buffer(buf.range(0, 5));`.
Also, rather than asciiWrite to a BlobBuffer you would
bbuf.append(str.toBlob("US-ASCII")); you could prototype on helpers if
you want.

Kris Kowal

unread,
Jan 20, 2010, 5:57:19 PM1/20/10
to comm...@googlegroups.com
On Wed, Jan 20, 2010 at 1:30 PM, Ryan Dahl <coldre...@gmail.com> wrote:
>  b.asciiWrite(s, 0, s.length);
>  b.utf8Write(s, s.length, s.length);
>  b.asciiSlice(0, 5) // => "hello"
>  b.utf8Slice(0, 5) // => "hello"
>  b.slice(0, 5) //=> Buffer object

Do I presume correctly that these encoding-specific functions are not
provided merely for convenience, but to provide an interface that can
be optimized at a lower layer involving less buffer copying? Do these
methods throw or implicitly reallocate if they are not long enough for
the encoded version of the source? In what state is the target buffer
(b) left if the operation fails? Could this be generalized to be
agnostic to the target encoding, permitting encodings to be optimized
at the discretion of the implementation?

Kris Kowal

Ryan Dahl

unread,
Jan 20, 2010, 6:08:16 PM1/20/10
to comm...@googlegroups.com
On Wed, Jan 20, 2010 at 2:57 PM, Kris Kowal <cowber...@gmail.com> wrote:
> On Wed, Jan 20, 2010 at 1:30 PM, Ryan Dahl <coldre...@gmail.com> wrote:
>>  b.asciiWrite(s, 0, s.length);
>>  b.utf8Write(s, s.length, s.length);
>>  b.asciiSlice(0, 5) // => "hello"
>>  b.utf8Slice(0, 5) // => "hello"
>>  b.slice(0, 5) //=> Buffer object
>
> Do I presume correctly that these encoding-specific functions are not
> provided merely for convenience, but to provide an interface that can
> be optimized at a lower layer involving less buffer copying?

Correct

> Do these
> methods throw or implicitly reallocate if they are not long enough for
> the encoded version of the source?

They fail. There is no reallocation.

> In what state is the target buffer
> (b) left if the operation fails?

Lengths are checked before the actual write.

> Could this be generalized to be
> agnostic to the target encoding, permitting encodings to be optimized
> at the discretion of the implementation?

I guess. I wouldn't mind wrapping this with some JS to provide some
more thoughtful API. But I think at the binding level, I'd like to be
this (or at least similar to this). I'm still hacking around with it.
I'll probably be merging this into the master branch in the next week
or so.

Wes Garland

unread,
Jan 21, 2010, 1:44:32 PM1/21/10
to comm...@googlegroups.com
Hi, Kris!

On Wed, Jan 20, 2010 at 3:55 PM, Kris Kowal <cowber...@gmail.com> wrote:
On Wed, Jan 20, 2010 at 11:21 AM, Wes Garland <w...@page.ca> wrote:
> x[y] produces the same type as x, typical deep traversal algorithms will
> invariably fail with an endless loop yielding OOM.

I'm favoring typeof x[y] == "number" at this point.  

I think I am, too.

Going through the Byte() exercise was useful, in that it showed that for a limited-size set, we could get intrinsic-like behaviour out of script.  But I am not sure it is worth the performance penalty, or that the sugar is in fact not vinegar.
 
What are the
advantages of Byte over Number?  That the Byte could host the
ByteString API?

In my mind, the biggest immediate advantage of Byte over Number is that the instrinsic representation of Byte could be comparable with JS == etc operators against 1-character strings.  This is convenient in some cases:


if (ba[ba.length] == "\n")
  ba.length--;

vs.


if (ba[ba.length] == 10)
  ba.length--;

One says "we are looking for newlines", the other says "We are looking for whatever ASCII value 10 is; hope you know what that is".  OTOH, a programmer looking for clarity could still implement Byte like this and have it work:

function Byte(a)
{
  if (a.length != 1)
    throw "stop that";
  return a.charCodeAt(1);
}
 
There's a subtler advantage to Byte, though; if we posing Byte to make it NOT comparible -- directly -- to Number or String  (but make Byte comparable against other Bytes) we can future-proof code which is written today. Then we only have to hope that ES-next includes the facility that ByteString[0] and ByteArray[0] are comparable, adjusting the Byte() wrapper accordingly.
 
I think we're looking at having ByteString and ByteArray both be
Objects that behave as Arrays of Numbers.  The difference between the
two could be as little as having one be immutable and the other
mutable, where the immutable would host the stateless a subset of the
mutable API.  Binary/E gets close to that.

That's pretty much how I've been using Binary/B in practice.
 
If we make ByteString a non-special object, it probably makes the most
sense to go with the Array of Number interpretation.  If we make
ByteString special, in the same sense that String is special, I think
the previous proposals make more sense.

My feeling is that ByteString must be immutable, and that it ought to have a toString() method like the one in Binary/E. Along with [[Get]], [[Set]], slice, indexOf(X), lastIndexOf(X).  X should probably be a variant (that is unspecified in Binary/B).

Outside of those requirements, ByteString() just isn't very "stringy", because all the operations I want to do on text would normally push you into String territory anyhow.  Well, regexp matching and stuff against binary data might be useful, but I'm not sure we want to go there.

 You don't run into the deep
traversal problem with String exactly because it is special, right
down to all of its operators, internal methods, and being a primitive
type.  I imagine that this is a pattern: the more you try to be like
String, the more you have to be exactly like String.  

Yes, that is the catch-22.  I think the closest we can get to String today for ByteString is
 - no ByteString literal representation
 - [[Get]] as an instrinsic  (possibly Byte() if you want to differentiate them from Number)
 - all ByteStrings are "boxed", like new String("hey").
 - ByteString without new throws "not yet implemented"

So far, I've
proposed two approaches:

1. Creating future-compatible types that resemble and would in future
revisions be exactly like String and Array. Binary/B and Binary/D.
2. Creating types that can be implemented without revising the
underlying JavaScript engine to support ==, typeof,
Object.prototype.toString, or enumerability, but allow patterns that
would not be compatible with a future revision that does do those
things. Binary/E.

There's a third approach, which ECMA would probably pursue but we
obviously can not because of our short term goals.

3. Create types that are exactly like String and Array, today.

If we are to pursue the second approach, I think we have to throw all
notions of emulating String behaviors like a[x] === a[x], a.slice(x,
x+1) === a[x], typeof a === "bytestring", typeof a[x] == typeof a,
Object.prototype.toString.call(a) === "[object ByteString]".

a[x] === a[x] is doable today, future-proof, provided that we don't define typeof a[x]. Furthermore, we COULD use the Byte() function to make it useful, declaring *only* that Byte("x") has the same typeof a[x].

 
if (ba[ba.length] == Byte("\n"))
  ba.length--;

and

if (ba[ba.length] == Byte(10))
  ba.length--;

both represent reasonably elegant, future-proof code.   We would have ==, ===, >, <, etc.
 
We don't
have to throw away length assignment or [[Get]] since it seems
everyone agrees that those can be implemented by embeddings without
revising the core engine.  What do you think?  I'm starting to favor a
clean break.

I'm kind of torn, frankly.  I like Byte() for future-proof reasons, but my gut tells me that anybody using ByteArray/ByteString for non-trivial purposes might not like the performance impact.  I also like the idea of coming up with ByteString/ByteArray which resemble was ES-next might include, but I am focused more strongly on implementing what is useful today.  Something to keep in mind, though -- the fewer methods we attach to ByteString, the more likely we implement something that is a subset of ES-next.

Wes

Dean Landolt

unread,
Jan 21, 2010, 2:15:48 PM1/21/10
to comm...@googlegroups.com

Wouldn't those worried about performance be able to drop down into charCodeAt? Yes, it adds a function call, but is it that much more expensive than directly [[Get]]'ing a number?

In any event, great writeup Wes!

Wes Garland

unread,
Jan 21, 2010, 2:59:25 PM1/21/10
to comm...@googlegroups.com
On Thu, Jan 21, 2010 at 2:15 PM, Dean Landolt <de...@deanlandolt.com> wrote:
Wouldn't those worried about performance be able to drop down into charCodeAt? Yes, it adds a function call, but is it that much more expensive than directly [[Get]]'ing a number?

You have to consider both the case of reading *and* writing... Hmm, Let's see.  Let's pretend we're editing a 24-bit-depth bitmap picture and making it a bit redder.

function redden(buffer, factor)
{
  for (var i=0; i < buffer.length; i+=3)
  {
    buffer.charAt(i) = buffer.charCodeAt(i) * factor;
  }
}

Okay, Dean, I buy your argument completely.  Performance impact is not worth considering at this point, and I will not consider it again in this context without measurement.

I am now in the Byte() camp.

Kris Kowal

unread,
Jan 21, 2010, 3:19:59 PM1/21/10
to comm...@googlegroups.com
On Thu, Jan 21, 2010 at 11:59 AM, Wes Garland <w...@page.ca> wrote:
> Okay, Dean, I buy your argument completely.  Performance impact is not worth
> considering at this point, and I will not consider it again in this context
> without measurement.
>
> I am now in the Byte() camp.

I'm still reticent about Byte. With Number, byte to byte comparison
among Numbers works. I think that it would be a mistake to compare
bytes to Strings since the String exists in the character code space.
I think that I would prefer to pay the penalty in this manner:

new ByteString("\n", "us-ascii")[0] === "\n".charCodeAt()

And I buy the idea that "newless" construction should throw an error
since that gives us future-compatibility with unboxed versions.
Presumably unboxed versions would also come with byte-literals, which
is the real way to solve the problem with literal comparison.

Of course, I am not adamant. I would just like an approach that is
consistent from top to bottom.

Also, I'm going to remove some methods from Binary/E, and might make
some provisions to converge with Ryan Dahl's embedding ideas.

Kris Kowal

Wes Garland

unread,
Jan 21, 2010, 3:56:36 PM1/21/10
to comm...@googlegroups.com
On Thu, Jan 21, 2010 at 3:19 PM, Kris Kowal <cowber...@gmail.com> wrote:
I'm still reticent about Byte.  With Number, byte to byte comparison
among Numbers works.  I think that it would be a mistake to compare
bytes to Strings since the String exists in the character code space.

Let me clarify my position, I am thinking that it makes sense only to allow comparing Byte with Byte.   I'm actually tempted to poison the specification so that neither direct String nor direct Number comparisons work properly. This would keep everybody honest, which in turn helps future-proof the code.
 
I think that I would prefer to pay the penalty in this manner:

new ByteString("\n", "us-ascii")[0] === "\n".charCodeAt()

How about

new ByteString("\n", "us-ascii")[0] === Byte("\n") ?

While these seem semantically similar, the primary difference is that example containing Byte makes no assumptions about typeof ByteString's [[Get]] -- we only specify that it is the same as typeof Byte(10).  This in turn protects us from the possibility that ES-next specifies a byte literal, or doesn't. Almost any reasonable action by ES-next could see Byte() re-written so that current code stays working.  Then we ES-next comes out, we loosen the specification so that instead poisoning Byte.valueOf, we specify that it is the same as byte literals, and bless direct byte literal comparisons in future code.
 
Also, I'm going to remove some methods from Binary/E, and might make
some provisions to converge with Ryan Dahl's embedding ideas.

FWIW, I've been in a similar head space as Ryan, but I believe I will get further over-all mileage by grafting those types of operations onto types other than ByteString or ByteArray, i.e. stub objects taken by something like our encodings module.

The key to good-performing BLOBs IMO is the ability for the underlying implementation to know how to exchange BLOB pointers between native types, how to steal memory from immutables, how to copy-on-write and so on.  I think any intermediary solution, where some BLOB types cannot exchange memory with other BLOB types feels a little patchy to me.

FWIW, what I found that I need to keep track of to implement my stuff is
 - how to get information about arbitrary binary types passed from JS
 - who "owns" the memory
 - weak reference to true owner
 - proxy method to drill through to memory owner
 - how do I finalize the memory
 - proxy type to assist immutable to COW mutable transition
 - can a particular type actually COW  (types with mmap backing might not be able to)
 - can any individual method be apply()ied to an arbitrary buffer  (most immutable types' methods are totally apply()able)

Ryan's methods are sure-as-heck useful for what he's doing, but I wouldn't to drop them onto a generic binary module just for performance's sake.

{
  var r = require("reader");
  var u = require("utf8_util");
  var str;
  var utf8_str;

  str = r.read_into_bytestring("myfile.txt");

  utf8_str = u.UTF8_String(str);

  u.write(utf8_str.slice(0,5));
}

^^^^ this code can be implemented elegantly without copies and without modifying the base binary module types.

I guess what I'm trying to say is -- don't worry too much about those cases, not covering them in Binary/X poses no real significant performance penalty if the underlying embedding is designed right.

Wes

Kris Kowal

unread,
Jan 21, 2010, 5:19:09 PM1/21/10
to comm...@googlegroups.com
On Thu, Jan 21, 2010 at 12:56 PM, Wes Garland <w...@page.ca> wrote:
> How about
>
> new ByteString("\n", "us-ascii")[0] === Byte("\n") ?

I could almost buy this, but it only postpones the issue that we're
promoting "\n" to a Byte using "us-ascii" as the charset implicitly.
We'd have to specify that "\n" has to throw an error if the charCode
is out of the ASCII range. I think I still prefer the charCodeAt()
spelling since it defers these concerns to the user and doesn't
complicate the spec.

It seems more likely to me that ES would either go for [[Get]]
returning Number or a unary ByteString on [[Get]], in which case Byte
would become legacy cruft for converting unary Strings in the ASCII
range to unary bytestring literals. There certainly wouldn't be a
Byte type. A Byte type would be a range-restricted Number. I bet
that anything in the Byte, Word, Quad, Integer, Whole, Natural, Real,
Complex realm is a non-goal for ECMA.

I'm still on the fence. I'm not fully convinced by the ideas for
future-compatibility because the adapters that make sense today would
be cruft tomorrow. I think that if we went with Number, ECMA would
follow. If we went with ByteString, we have a difficult time making
it work today in a way that continues to make sense in the future. If
we go with Byte, I think we have an API that doesn't make complete
sense today or tomorrow.

With regard to the memory management issues you mention, I think that
most of the proposals, especially the later ones, are all adequate
from that perspective. Thanks for clarifying.

Kris Kowal

Wes Garland

unread,
Jan 21, 2010, 7:06:17 PM1/21/10
to comm...@googlegroups.com
On Thu, Jan 21, 2010 at 5:19 PM, Kris Kowal <cowber...@gmail.com> wrote:
On Thu, Jan 21, 2010 at 12:56 PM, Wes Garland <w...@page.ca> wrote:
> How about
>
> new ByteString("\n", "us-ascii")[0] === Byte("\n") ?

I could almost buy this, but it only postpones the issue that we're
promoting "\n" to a Byte using "us-ascii" as the charset implicitly.
We'd have to specify that "\n" has to throw an error if the charCode
is out of the ASCII range.  

More precisely, I think the language would be something like

The argument to byte may be an integer Number or a String. An exception is thrown for any invalid argument.  Valid numbers are in the range 0..255.  Valid Strings are only those Strings which can be instanciated by String.fromCharCode(0..255).

The following warning might be a good idea

If a programmer chooses to use a String literal to describe a Byte, it is recommended that only Strings composed of a character in the range 0..127 are chosen; otherwise, the character set encoding of the source code may cause non-portable surprises.

(note that that last paragraph would go away if we could all decide to use UTF-8 encoded source code)

I think I still prefer the charCodeAt()
spelling since it defers these concerns to the user and doesn't
complicate the spec.

That is definitely an advantage.  I'm not super-sold on Byte, but I think it *does* make the code more readable in the usual cases (with removing the option to go by number).
 
It seems more likely to me that ES would either go for [[Get]]
returning Number or a unary ByteString on [[Get]], in which case Byte
would become legacy cruft for converting unary Strings in the ASCII
range to unary bytestring literals.  There certainly wouldn't be a
Byte type.

I think you're right.   I think ES would pick either Number or ByteString. Question is -- which? :)    With Byte, we can have our cake, eat it, and throw out the candles later.

I'm still on the fence.  I'm not fully convinced by the ideas for
future-compatibility because the adapters that make sense today would
be cruft tomorrow.  I think that if we went with Number, ECMA would
follow.  If we went with ByteString, we have a difficult time making
it work today in a way that continues to make sense in the future.  If
we go with Byte, I think we have an API that doesn't make complete
sense today or tomorrow.

I actually think Byte makes sense -- try writing a bit of sample code -- but I agree that it is likely to become cruft in the future.  

This is a really tough decision, in some ways. In other ways, not so much -- leaving it out and declaring "Number!" lets anybody who cares to write future-proof code still do that.  What might not be so wise would be picking String or ByteString as the canonical representation for ByteString's [[Get]] -- too many problems.

Wes

Ryan Dahl

unread,
Jan 22, 2010, 8:08:27 PM1/22/10
to comm...@googlegroups.com
Has anyone thought about how to do ntohs, ntohl, htons, htonl?

Ash Berlin

unread,
Jan 22, 2010, 8:21:19 PM1/22/10
to comm...@googlegroups.com

On 23 Jan 2010, at 01:08, Ryan Dahl wrote:

> Has anyone thought about how to do ntohs, ntohl, htons, htonl?

This almost a side issue to the current form of binary APIs as the have no other pack or unpack functions on them. I'd say that ntohs etc belong with pack("int4", 42) etc.

Quite where those belong I don't know.

-ash

Ryan Dahl

unread,
Jan 22, 2010, 8:28:24 PM1/22/10
to comm...@googlegroups.com
On Fri, Jan 22, 2010 at 5:08 PM, Ryan Dahl <coldre...@gmail.com> wrote:
> Has anyone thought about how to do ntohs, ntohl, htons, htonl?

The most obvious thing to me would be

buffer.getNetShort(index) // get 16-bit int, convert to host
buffer.getHostShort(index) // get 16-bit int, use host byte-order

buffer.getNetLong(index) // get 32-bit int
buffer.getHostLong(index) // get 32-bit int

buffer.putNetShort(index, 42) // takes up index and index+1
buffer.putHostShort(index, 42)

buffer.putNetLong(index, 42) // takes up index, index+1, index+2, index+3
buffer.putHostLong(index, 42)

throwing errors if anything was out of bounds.

Kris Kowal

unread,
Jan 22, 2010, 9:03:14 PM1/22/10
to comm...@googlegroups.com

These are very much in the spirit of some of the older prior art
(links from the wiki):

http://help.adobe.com/en_US/AIR/1.1/jslr/flash/utils/ByteArray.html
https://developer.mozilla.org/En/NsIBinaryInputStream
https://developer.mozilla.org/En/NsIBinaryOutputStream
http://www.ejscript.org/products/ejs/doc/api/ejscript/intrinsic-ByteArray.html

It remains my opinion that these are all implementable in
pure-JavaScript and could be done more carefully and with a DSL like
the pack notation familiar to Pythonistas, Rubiests, PHPnauts, and
Perlers.

http://docs.python.org/library/struct.html
http://perldoc.perl.org/functions/pack.html
http://www.codeweblog.com/ruby-string-pack-unpack-detailed-usage/

My reasoning is that an exhaustive API would need [network, machine,
big, little] x ([8, 16, 32, 64, ... bit int] x [signed, unsigned] +
[float, double float]) x [single, array, struct] methods, all of which
with positional arguments. That could get hairy.

I would like to get low level binary (focusing on storage and
retrieval of bytes) out of the way as quickly as possible so we can
build these kinds of thing on top of it. Even if we don't make a
standard for each of our varying interface ideas, we could share
pure-js code for everyone's favorite approach.

Kris Kowal

Wes Garland

unread,
Jan 22, 2010, 9:50:13 PM1/22/10
to comm...@googlegroups.com
On Fri, Jan 22, 2010 at 8:08 PM, Ryan Dahl <coldre...@gmail.com> wrote:
Has anyone thought about how to do ntohs, ntohl, htons, htonl?


Heh. I've been doing some of those with ffi versions of ntohl (etc) and some by directly manipulating the bytes from script.

It would be interesting, to me at least, if we could come with a way to do those which allow a short-circuit that avoids a function call.  Interesting to me, because unlike most of you, I live on a Big Endian machine most of the time.  That said, I have no idea how to accomplish that goal; just throwing it out there in case somebody does.  In C it's handled by the precompiler.

Ryan, what's your use-case?  Typically, in network code, there a few places where I want these...
 1 - socket creation (like port numbers and stuff)
 2 - converting ints which are sent raw over-the-wire
 3 - often those ints are in structs
 4 - libresolv functions

1 and 4 shouldn't hit the ByteArray layer; endianness would be handled at the function entry barrier, e.g. the JS versions of gethostent, socket, and so forth.

3 is kind of interesting; I have struct reflection from C<=>JS which automagically does type coercion, but it knows nothing of endianness.  *IT* could benefit from something like

socket.port = require("binary").networkOrderNumber(0xc0ffee);

2 is probably your use-case; given that, wouldn't you want to convert to your ByteArray in-situ to avoid copies?

myByteArray.toNetworkOrder(4,8);

could change a 32-bit int from host to network order which is found on the fourth byte in the ByteArray.

Or, maybe it's more useful to pull a host order out of the array?

var i = myByteArray.fromNetworkOrder32(4);  /* get 2nd 32-bit int out of byte array and store in i in host-order */

I'm guessing you'd definitely want in-situ for stores --

myByteArray.toNetworkOrder32(4,0xc0ffee);
 
What are your feelings on alignment?  Allowing unaligned access costs cycles (especially on RISC), but makes for more flexible code.

Any way you look at it, I am all for these functions, but suggest they make up a separate addendum to the core binary spec.  That way, we can hash out both separately, and merge as they mature.

The first question the addendum needs to address: is this generic endian code, or is it network-oriented?

Wes

Ryan Dahl

unread,
Jan 23, 2010, 11:29:34 PM1/23/10
to comm...@googlegroups.com
On Fri, Jan 22, 2010 at 6:50 PM, Wes Garland <w...@page.ca> wrote:
> Ryan, what's your use-case?

Parsing binary protocols. (AMQP *groan*)

> 2 is probably your use-case; given that, wouldn't you want to convert to
> your ByteArray in-situ to avoid copies?

No, most often I need to get an int into the VM or out of the VM into a buffer.

> What are your feelings on alignment?  Allowing unaligned access costs cycles
> (especially on RISC), but makes for more flexible code.

For binary protocols, handling unaligned ints is important.

> Any way you look at it, I am all for these functions, but suggest they make
> up a separate addendum to the core binary spec.  That way, we can hash out
> both separately, and merge as they mature.

Yeah. I guess some pack/unpack-like function would be good.

Kris Kowal

unread,
Jan 23, 2010, 11:55:38 PM1/23/10
to comm...@googlegroups.com
On Sat, Jan 23, 2010 at 8:29 PM, Ryan Dahl <coldre...@gmail.com> wrote:
> Yeah. I guess some pack/unpack-like function would be good.

I've written some code in this arena, but it was very slow. I think I
got the API right, but the implementation is sorely lacking.

The struct module from pre-commonjs Chiron has pack/unpack:

http://code.google.com/p/chironjs/source/browse/trunk/src/struct.js

And Narwhal's struct module has some routines that might be useful,
that were consolidated from various encoding and hashing API's by Paul
Johnson, I believe.

http://github.com/280north/narwhal/blob/master/lib/struct.js

I prefer the former API, and the latter implementation bits. These
operate on strings and arrays, but when we have byte storage, we could
swap those in. Or, we could port someone else's pack/unpack. I'm not
particular about the pack notation, but I think Python did a good job,
with broad support for network-byte-order, big, little, and native
endianness and alignment tucked under the hood. PHP's seemed sloppy
to me. I presume that Perl, Python, and Ruby have similar notation.
Of course, there's no reason we couldn't have different modules to
support various notations if we end up porting code from multiple
sources.

Kris Kowal

Shanti Rao

unread,
Jan 23, 2010, 11:41:21 PM1/23/10
to CommonJS
On Jan 22, 5:08 pm, Ryan Dahl <coldredle...@gmail.com> wrote:
> Has anyone thought about how to do ntohs, ntohl, htons, htonl?

I ran into this problem a while ago with JSDB. The best solution I
found was a function which returns a number either in host or network
order, depending on a parameter.

Stream.readInt32(network)
Stream.readInt16(network)
Stream.readInt8(network)
Stream.readUInt32(network)
Stream.readUInt16(network)
Stream.readUInt8(network)

Shanti

Reply all
Reply to author
Forward
0 new messages