http://wiki.commonjs.org/wiki/Binary/E
Kris Kowal
I still don't get the general push towards an api that breaks symmetry
with existing API.
IMHO the binary proposals are a mess. Does anyone else notice unsettling
parts of them, or does it take a chart to visualize that?
Oh well, I'll probably just support two api's and make the underlying
data interchangeable.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>
> Things like typeof bs[idx] === "number" also screw that up, and bs.Content === Number also voids half of the things that can be done with it.
Name me a concrete use for bs[isx] returning a ByteString again. Every use I can think of for using ByteStrings makes [[Get]] useless unless it returns a primitive.
I for one any very happy with this change, and I like the feel of this proposal.
But then I've not really seen the use .Content in the first place.
>
> I still don't get the general push towards an api that breaks symmetry with existing API.
What existing binary API?
For the benefit of those tuning in, you made proposal C. I am not
against your Buffer idea. Can you explain how it could not be
implemented in pure JavaScript on top of D or E, or in a future
version thereof? I've taken things I want out too, in order to target
the minimal feature set that we're willing to converge on.
> Things like typeof bs[idx] === "number" also screw that up, and bs.Content
> === Number also voids half of the things that can be done with it.
I don't like ByteString.Content === Number any more than you do, but
if enough people support it, I'm willing to compromise. So far I know
Ash supports it. I'd like to hear from others.
> I still don't get the general push towards an api that breaks symmetry with
> existing API.
There is no ratified binary API. There are trial implementations of B
and C, but none of them have received support from this group. As
long as this group doesn't converge, I'm going to keep generating
permutations of the API, until we hit something we can agree upon.
Compatibility with a previous design that people did not want is a
non-goal.
> IMHO the binary proposals are a mess. Does anyone else notice unsettling
> parts of them, or does it take a chart to visualize that?
I would prefer to evaluate them on their individual merits and
failures, since they bear no relationship to one another for purpose
of ratification. Please be specific.
Kris Kowal
As for making Buffer a future part of the Binary spec. That feels a
little awkward to me. ByteArray and BlobBuffer do the same thing, but
with a different API that is incompatible. So now in the future you end
up with two completely different API to do the same thing, both are part
of a standardized spec, so even though they do the same thing you can't
really drop either even as a major version change dropping support for
old backwards compatibility layers.
By the way, does anyone have any example, or in-use code using ByteArray
in practical uses? I wan't to compare how nice and robust the ByteArray
and BlobBuffer APIs are in comparison to each other in terms of
binary-only use.
> ...
>> I still don't get the general push towards an api that breaks symmetry with
>> existing API.
>>
>
> There is no ratified binary API. There are trial implementations of B
> and C, but none of them have received support from this group. As
> long as this group doesn't converge, I'm going to keep generating
> permutations of the API, until we hit something we can agree upon.
> Compatibility with a previous design that people did not want is a
> non-goal.
>
Not (compatibility) symmetry with a previous design. "Symmetry with an
existing API" as in symmetry with the standard API.
ie:
var test = function(Constructor) { return Constructor().Content ===
Constructor; };
test(String); // true
test(ByteString); // false
String().valueOf().Content === String().valueAt(0).Content; // false,
"value" is inconsistent in type
Btw, Binary/E's valueAt is actually basically my codeAt() which is
symmetrical and consistent with the Standard API
String.fromCharCode/string.charCodeAt inspired
(String|Blob).fromCode/(string|blob).codeAt
string.charAt alone was matched by blob.byteAt
string.charAt+string.valueOf inspired string.valueAt and thus
blob.valueAt/blob.valueOf
When reading code you can consistently expect "code" refers to a char
code or 0-255 byte number both of a Number type (which can be put back
into .contentConstructor.fromCode()), and "value" refers to a sequence
of the same type as the content (String or Blob).
> ...
> Kris Kowal
We're dealing with single bytes here since [idx] can never return more
than one byte. This means that there will never be more than 256
possible returns, and each of these is easily mapped without complex
hashing algorithms. Why not have [] return a single instance of any one
byte. ie: if you bsa[0] === bsb[0] and both bytes are 1 the same
ByteString instance will be returned instead of 2 separate ones.
The only unexpected thing I can think of that would come from that is if
someone did something really weird like `bs[0].foo = ...; bs[1].foo()`.
Which really shouldn't be done, assigning properties doesn't work with
primitives like string and I don't see any reason it should work with
ByteStrings. For all I care implementations could decide that
ByteString/Blob instances are immediately sealed on-creation (heck, I
might do that myself when I'm writing Blobs for MonkeyScript).
Techniques like that have worked well for me so far. In the mongo-rhino
lib the pure Rhino version of Kommonwealth v3 (company project) I use a
Java WeakHashMap to ensure that within a worker thread there is never
more than one instance of an ObjectId (ie: If it's in use already
calling ObjectId returns the saved ObjectId instance, if the ObjectId
looses all it's references the instance gets GCed). It works great for
making === work. I added the code around a point where I wanted to be
able to check an array of ObjectIds for an ObjectId and didn't want to
have to fancy with a specialized function and just wanted to use
indexOf/has(Wrench.js) to check.
Point 4 I believe you mean ByteString.
We're dealing with single bytes here since [idx] can never return more than one byte. This means that there will never be more than 256 possible returns, and each of these is easily mapped without complex hashing algorithms. Why not have [] return a single instance of any one byte. ie: if you bsa[0] === bsb[0] and both bytes are 1 the same ByteString instance will be returned instead of 2 separate ones.
function ByteString(...) {
...
if ( length === 1 ) {
if ( _byteCache.hasObjectProperty(byteCode) )
return _byteCache[byteCode];
return _byteCache[byteCode] = this;
}
...
}
Roughly in JS as an example. Real implementation would be different, but
just an explanation.
> This is great food for thought, Dan. Thanks!
>
> Wes
>
> --
> Wesley W. Garland
> Director, Product Development
> PageMail, Inc.
> +1 613 542 2787 x 102
Please note here if this is a false dilemma in your opinion.
X: a.[[Get]](b) === a.[[Get]](b)
Y: !X
0: a.slice(...b) === a.slice(...b)
1: !0
Kris Kowal
A: x.[[Get]] returns a Number
B: x.[[Get]] returns something of type X
Please note here if this is a false dilemma in your opinion.
X: a.[[Get]](b) === a.[[Get]](b)
Y: !X
0: a.slice(...b) === a.slice(...b)
1: !0
A: x[y] returns Number
B: x[y] returns something that is the same type as x, albeit
ByteString or ByteArray
C: x[y] returns a Byte object (I do not think this is what Wes and
Daniel are converging upon, but what Dean appears to believe they are
converging upon. If so, what distinguishes it from A and B will need
to be qualified.)
X: a[b] === a[b]
x[n] === x[n]
We should also preserve the idiom:
x[n] === x.slice(n, n + 1)
To do less would be uncanny. I'm in support of either supporting
object strict equality and all idioms you might expect with object
strict equality for equivalent immutable byte objects, or no support
for strict equality. Object strict equality and equivalence comes
with a lot of idioms. For example it is reasonable to expect
currently that if:
typeof x === "object"
Then it would not be reasonable to expect:
new X() === new X()
I'm suggesting we play the game carefully. Every proposal but
Binary/E presumes that equivalent byte strings are strictly equal. In
response to Ash and Wes's criticisms of that approach, Binary/E is
much simpler; fewer methods (possibly fewer in a future draft) and no
fanciness attempting to emulate built-in types. I am looking for a
show of hands for which general approach to pursue.
Kris Kowal
Now that I think about it `x[n] === x.slice(n, n + 1)` is actually a
argument in favor of a ByteString return type for [], you can't preserve
equivalence using a separate Byte type without making slice return
ByteStrings in some cases and Byte objects in others, which would be
unexpected.
> For example it is reasonable to expect
> currently that if:
>
> typeof x === "object"
>
> Then it would not be reasonable to expect:
>
> new X() === new X()
>
Didn't include it in my earlier example, but I'd go for omitting any
interning when new is called.
In essence X as a constructor is new object creation, X as a function is
like how (new String(...)).intern() behaves (though you can omit the
overhead of creating an extra object that may be discarded).
> I'm suggesting we play the game carefully. Every proposal but
> Binary/E presumes that equivalent byte strings are strictly equal. In
> response to Ash and Wes's criticisms of that approach, Binary/E is
> much simpler; fewer methods (possibly fewer in a future draft) and no
> fanciness attempting to emulate built-in types. I am looking for a
> show of hands for which general approach to pursue.
>
> Kris Kowal
>
>> For example it is reasonable to expect
>> currently that if:
>>
>> typeof x === "object"
>>
>> Then it would not be reasonable to expect:
>>
>> new X() === new X()
>>
>
> Didn't include it in my earlier example, but I'd go for omitting any
> interning when new is called.
> In essence X as a constructor is new object creation, X as a function is
> like how (new String(...)).intern() behaves (though you can omit the
> overhead of creating an extra object that may be discarded).
It is not impossible to induce "new X()" to return something other
than "this". Try:
javascript:(function () {function X() {return new Number(10)};
alert(new X().valueOf())})()
My argument stands with X() === X() as well.
Kris Kowal
Also, on GPSEE we support the application of ByteString methods onto
ByteArray instances, so we don't even need to make a copy if we just
wanted a ByteString method that wasn't found on the ByteArray
prototype.
C: x[y] returns a Byte object (I do not think this is what Wes and
Daniel are converging upon, but what Dean appears to believe they are
converging upon. If so, what distinguishes it from A and B will need
to be qualified.)
I'm favoring typeof x[y] == "number" at this point. What are the
advantages of Byte over Number? That the Byte could host the
ByteString API?
I think we're looking at having ByteString and ByteArray both be
Objects that behave as Arrays of Numbers. The difference between the
two could be as little as having one be immutable and the other
mutable, where the immutable would host the stateless a subset of the
mutable API. Binary/E gets close to that.
If we make ByteString a non-special object, it probably makes the most
sense to go with the Array of Number interpretation. If we make
ByteString special, in the same sense that String is special, I think
the previous proposals make more sense. You don't run into the deep
traversal problem with String exactly because it is special, right
down to all of its operators, internal methods, and being a primitive
type. I imagine that this is a pattern: the more you try to be like
String, the more you have to be exactly like String. So far, I've
proposed two approaches:
1. Creating future-compatible types that resemble and would in future
revisions be exactly like String and Array. Binary/B and Binary/D.
2. Creating types that can be implemented without revising the
underlying JavaScript engine to support ==, typeof,
Object.prototype.toString, or enumerability, but allow patterns that
would not be compatible with a future revision that does do those
things. Binary/E.
There's a third approach, which ECMA would probably pursue but we
obviously can not because of our short term goals.
3. Create types that are exactly like String and Array, today.
If we are to pursue the second approach, I think we have to throw all
notions of emulating String behaviors like a[x] === a[x], a.slice(x,
x+1) === a[x], typeof a === "bytestring", typeof a[x] == typeof a,
Object.prototype.toString.call(a) === "[object ByteString]". We don't
have to throw away length assignment or [[Get]] since it seems
everyone agrees that those can be implemented by embeddings without
revising the core engine. What do you think? I'm starting to favor a
clean break.
Kris Kowal
I'm trying to speed up Node by providing better buffering - the
current implementation copies data around very often. This is my third
iteration of doing binary, and I'm pretty happy with how it's is
turning out. It looks like this:
var s = "hello world";
var b = new Buffer(1024);
b.asciiWrite(s, 0, s.length);
b.utf8Write(s, s.length, s.length);
b.asciiSlice(0, 5) // => "hello"
b.utf8Slice(0, 5) // => "hello"
b.slice(0, 5) //=> Buffer object
You can see the source code here
http://github.com/ry/node/blob/df59f067345e3a45be7637122ba8566009aab830/src/node_buffer.cc
and an usage example here
http://github.com/ry/node/blob/df59f067345e3a45be7637122ba8566009aab830/lib/net.js
.slice is a little different though... To create a new buffer with the
contents of a section of another buffer you would do `new
Buffer(buf.range(0, 5));`.
Also, rather than asciiWrite to a BlobBuffer you would
bbuf.append(str.toBlob("US-ASCII")); you could prototype on helpers if
you want.
Do I presume correctly that these encoding-specific functions are not
provided merely for convenience, but to provide an interface that can
be optimized at a lower layer involving less buffer copying? Do these
methods throw or implicitly reallocate if they are not long enough for
the encoded version of the source? In what state is the target buffer
(b) left if the operation fails? Could this be generalized to be
agnostic to the target encoding, permitting encodings to be optimized
at the discretion of the implementation?
Kris Kowal
Correct
> Do these
> methods throw or implicitly reallocate if they are not long enough for
> the encoded version of the source?
They fail. There is no reallocation.
> In what state is the target buffer
> (b) left if the operation fails?
Lengths are checked before the actual write.
> Could this be generalized to be
> agnostic to the target encoding, permitting encodings to be optimized
> at the discretion of the implementation?
I guess. I wouldn't mind wrapping this with some JS to provide some
more thoughtful API. But I think at the binding level, I'd like to be
this (or at least similar to this). I'm still hacking around with it.
I'll probably be merging this into the master branch in the next week
or so.
On Wed, Jan 20, 2010 at 11:21 AM, Wes Garland <w...@page.ca> wrote:I'm favoring typeof x[y] == "number" at this point.
> x[y] produces the same type as x, typical deep traversal algorithms will
> invariably fail with an endless loop yielding OOM.
What are the
advantages of Byte over Number? That the Byte could host the
ByteString API?
I think we're looking at having ByteString and ByteArray both be
Objects that behave as Arrays of Numbers. The difference between the
two could be as little as having one be immutable and the other
mutable, where the immutable would host the stateless a subset of the
mutable API. Binary/E gets close to that.
If we make ByteString a non-special object, it probably makes the most
sense to go with the Array of Number interpretation. If we make
ByteString special, in the same sense that String is special, I think
the previous proposals make more sense.
You don't run into the deep
traversal problem with String exactly because it is special, right
down to all of its operators, internal methods, and being a primitive
type. I imagine that this is a pattern: the more you try to be like
String, the more you have to be exactly like String.
So far, I've
proposed two approaches:
1. Creating future-compatible types that resemble and would in future
revisions be exactly like String and Array. Binary/B and Binary/D.
2. Creating types that can be implemented without revising the
underlying JavaScript engine to support ==, typeof,
Object.prototype.toString, or enumerability, but allow patterns that
would not be compatible with a future revision that does do those
things. Binary/E.
There's a third approach, which ECMA would probably pursue but we
obviously can not because of our short term goals.
3. Create types that are exactly like String and Array, today.
If we are to pursue the second approach, I think we have to throw all
notions of emulating String behaviors like a[x] === a[x], a.slice(x,
x+1) === a[x], typeof a === "bytestring", typeof a[x] == typeof a,
Object.prototype.toString.call(a) === "[object ByteString]".
We don't
have to throw away length assignment or [[Get]] since it seems
everyone agrees that those can be implemented by embeddings without
revising the core engine. What do you think? I'm starting to favor a
clean break.
Wouldn't those worried about performance be able to drop down into charCodeAt? Yes, it adds a function call, but is it that much more expensive than directly [[Get]]'ing a number?
I'm still reticent about Byte. With Number, byte to byte comparison
among Numbers works. I think that it would be a mistake to compare
bytes to Strings since the String exists in the character code space.
I think that I would prefer to pay the penalty in this manner:
new ByteString("\n", "us-ascii")[0] === "\n".charCodeAt()
And I buy the idea that "newless" construction should throw an error
since that gives us future-compatibility with unboxed versions.
Presumably unboxed versions would also come with byte-literals, which
is the real way to solve the problem with literal comparison.
Of course, I am not adamant. I would just like an approach that is
consistent from top to bottom.
Also, I'm going to remove some methods from Binary/E, and might make
some provisions to converge with Ryan Dahl's embedding ideas.
Kris Kowal
I'm still reticent about Byte. With Number, byte to byte comparison
among Numbers works. I think that it would be a mistake to compare
bytes to Strings since the String exists in the character code space.
I think that I would prefer to pay the penalty in this manner:
new ByteString("\n", "us-ascii")[0] === "\n".charCodeAt()
Also, I'm going to remove some methods from Binary/E, and might make
some provisions to converge with Ryan Dahl's embedding ideas.
I could almost buy this, but it only postpones the issue that we're
promoting "\n" to a Byte using "us-ascii" as the charset implicitly.
We'd have to specify that "\n" has to throw an error if the charCode
is out of the ASCII range. I think I still prefer the charCodeAt()
spelling since it defers these concerns to the user and doesn't
complicate the spec.
It seems more likely to me that ES would either go for [[Get]]
returning Number or a unary ByteString on [[Get]], in which case Byte
would become legacy cruft for converting unary Strings in the ASCII
range to unary bytestring literals. There certainly wouldn't be a
Byte type. A Byte type would be a range-restricted Number. I bet
that anything in the Byte, Word, Quad, Integer, Whole, Natural, Real,
Complex realm is a non-goal for ECMA.
I'm still on the fence. I'm not fully convinced by the ideas for
future-compatibility because the adapters that make sense today would
be cruft tomorrow. I think that if we went with Number, ECMA would
follow. If we went with ByteString, we have a difficult time making
it work today in a way that continues to make sense in the future. If
we go with Byte, I think we have an API that doesn't make complete
sense today or tomorrow.
With regard to the memory management issues you mention, I think that
most of the proposals, especially the later ones, are all adequate
from that perspective. Thanks for clarifying.
Kris Kowal
On Thu, Jan 21, 2010 at 12:56 PM, Wes Garland <w...@page.ca> wrote:I could almost buy this, but it only postpones the issue that we're
> How about
>
> new ByteString("\n", "us-ascii")[0] === Byte("\n") ?
promoting "\n" to a Byte using "us-ascii" as the charset implicitly.
We'd have to specify that "\n" has to throw an error if the charCode
is out of the ASCII range.
I think I still prefer the charCodeAt()
spelling since it defers these concerns to the user and doesn't
complicate the spec.
It seems more likely to me that ES would either go for [[Get]]
returning Number or a unary ByteString on [[Get]], in which case Byte
would become legacy cruft for converting unary Strings in the ASCII
range to unary bytestring literals. There certainly wouldn't be a
Byte type.
I'm still on the fence. I'm not fully convinced by the ideas for
future-compatibility because the adapters that make sense today would
be cruft tomorrow. I think that if we went with Number, ECMA would
follow. If we went with ByteString, we have a difficult time making
it work today in a way that continues to make sense in the future. If
we go with Byte, I think we have an API that doesn't make complete
sense today or tomorrow.
> Has anyone thought about how to do ntohs, ntohl, htons, htonl?
This almost a side issue to the current form of binary APIs as the have no other pack or unpack functions on them. I'd say that ntohs etc belong with pack("int4", 42) etc.
Quite where those belong I don't know.
-ash
The most obvious thing to me would be
buffer.getNetShort(index) // get 16-bit int, convert to host
buffer.getHostShort(index) // get 16-bit int, use host byte-order
buffer.getNetLong(index) // get 32-bit int
buffer.getHostLong(index) // get 32-bit int
buffer.putNetShort(index, 42) // takes up index and index+1
buffer.putHostShort(index, 42)
buffer.putNetLong(index, 42) // takes up index, index+1, index+2, index+3
buffer.putHostLong(index, 42)
throwing errors if anything was out of bounds.
These are very much in the spirit of some of the older prior art
(links from the wiki):
http://help.adobe.com/en_US/AIR/1.1/jslr/flash/utils/ByteArray.html
https://developer.mozilla.org/En/NsIBinaryInputStream
https://developer.mozilla.org/En/NsIBinaryOutputStream
http://www.ejscript.org/products/ejs/doc/api/ejscript/intrinsic-ByteArray.html
It remains my opinion that these are all implementable in
pure-JavaScript and could be done more carefully and with a DSL like
the pack notation familiar to Pythonistas, Rubiests, PHPnauts, and
Perlers.
http://docs.python.org/library/struct.html
http://perldoc.perl.org/functions/pack.html
http://www.codeweblog.com/ruby-string-pack-unpack-detailed-usage/
My reasoning is that an exhaustive API would need [network, machine,
big, little] x ([8, 16, 32, 64, ... bit int] x [signed, unsigned] +
[float, double float]) x [single, array, struct] methods, all of which
with positional arguments. That could get hairy.
I would like to get low level binary (focusing on storage and
retrieval of bytes) out of the way as quickly as possible so we can
build these kinds of thing on top of it. Even if we don't make a
standard for each of our varying interface ideas, we could share
pure-js code for everyone's favorite approach.
Kris Kowal
Has anyone thought about how to do ntohs, ntohl, htons, htonl?
Parsing binary protocols. (AMQP *groan*)
> 2 is probably your use-case; given that, wouldn't you want to convert to
> your ByteArray in-situ to avoid copies?
No, most often I need to get an int into the VM or out of the VM into a buffer.
> What are your feelings on alignment? Allowing unaligned access costs cycles
> (especially on RISC), but makes for more flexible code.
For binary protocols, handling unaligned ints is important.
> Any way you look at it, I am all for these functions, but suggest they make
> up a separate addendum to the core binary spec. That way, we can hash out
> both separately, and merge as they mature.
Yeah. I guess some pack/unpack-like function would be good.
I've written some code in this arena, but it was very slow. I think I
got the API right, but the implementation is sorely lacking.
The struct module from pre-commonjs Chiron has pack/unpack:
http://code.google.com/p/chironjs/source/browse/trunk/src/struct.js
And Narwhal's struct module has some routines that might be useful,
that were consolidated from various encoding and hashing API's by Paul
Johnson, I believe.
http://github.com/280north/narwhal/blob/master/lib/struct.js
I prefer the former API, and the latter implementation bits. These
operate on strings and arrays, but when we have byte storage, we could
swap those in. Or, we could port someone else's pack/unpack. I'm not
particular about the pack notation, but I think Python did a good job,
with broad support for network-byte-order, big, little, and native
endianness and alignment tucked under the hood. PHP's seemed sloppy
to me. I presume that Perl, Python, and Ruby have similar notation.
Of course, there's no reason we couldn't have different modules to
support various notations if we end up porting code from multiple
sources.
Kris Kowal
I ran into this problem a while ago with JSDB. The best solution I
found was a function which returns a number either in host or network
order, depending on a parameter.
Stream.readInt32(network)
Stream.readInt16(network)
Stream.readInt8(network)
Stream.readUInt32(network)
Stream.readUInt16(network)
Stream.readUInt8(network)
Shanti