Binary/D Draft 2

6 views
Skip to first unread message

Kris Kowal

unread,
Dec 1, 2009, 4:23:43 AM12/1/09
to comm...@googlegroups.com
I've posted Binary/D Draft 2.

http://wiki.commonjs.org/wiki/Binary/D

Some of the things that have changed:

# Used a spreadsheet and some automation to assure consistency and
completeness. Link included.
# Number cross-conversion and endianness sorted out.
# Fixed *Array constructors to be consistent with Array
# Rationale on augmenting existing primordials.
# Clarification on enumerability of prototype methods
# splice respecified on *String.
# Made Number to byte and bit coercion consistent
# Argument order on copy and copyFrom revised for better defaults
# Some notes on performance optimizations
# Added more alphabet constants and fixed encoding/decoding normalization
# Fixed formatting of [[Get]], [[Put]] and other ECMAScript internals.
# Consistently applied _opt prefix to optional arguments.
# Added require clauses to .toSource() methods.
# Added UTF-16 as a required encoding.
# Added a method for checking whether a charset is supported.
# Expanded on the rationale on genericity

The new draft uses a lot of the same content, but is essentially a
rewrite. Please give it a thorough review if you have some time.
There are certainly lurking issues, so I've tried to increase the
specificity to reveal any potential disagreements.

Kris Kowal

Hannes Wallnoefer

unread,
Dec 1, 2009, 11:09:27 AM12/1/09
to comm...@googlegroups.com
2009/12/1 Kris Kowal <kris....@cixar.com>:
> I've posted Binary/D Draft 2.

Thanks for your relentless work, Kris! I'm a bit late with my
feedback, but at least I'm in time for the second draft.

I share some of the gripes Ash expressed in the original post, namely:

1) The "new" operator throwing an error: I think this is wrong since,
as specified, the binary types are object constructors. The two
patterns in JS for constructors are either have the same or different
behaviour with or without "new" operator, and the underlying intention
is always to provide non-constrcutor functionality with omission of
"new".

Since at least for now binaries will always be full-blown objects
(adding new primitive types would require major changes at least in
Rhino, and is both hypothetical and irrelevant for the spec at hand
IMO), outlawing "new" seems the wrong approach to me. I think we
should follow the example of Object/Array and make the "new" operator
optional.

These arguments may also apply for (or better: against) the omission
of the Binary base type, although I don't know yet if that's an issue
for me, and whether the introduction of the Bit* types wouldn't have
invalidated that anyway).

2) The Content property and different return types of indexed access
(aka [[Get]]): I think that coherence between ByteArray and ByteString
(and BitArray and BitString, respectively) are more important than
between ByteString and String, and ByteArray and Array. And yes, I
also think that indexed access to strings returning a string of length
1 in JS is an problem - it's the prime cause building well-performing
parsers in JS is incredibly hard (unless you're using V8).

Hence, I propose to uniformely return numbers in the [[Get]] methods.

One minor doubt I had is how the copy() methods behaves especially
with an Array as argument. I assume existing array elements are
replaced rather than shifted, right? That probably should be made
explicit in the spec.

I'll spend more time on Binary/D in the coming hours/days, so I'll
send more feedback as it emerges.

Hannes
> --
>
> You received this message because you are subscribed to the Google Groups "CommonJS" group.
> To post to this group, send email to comm...@googlegroups.com.
> To unsubscribe from this group, send email to commonjs+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/commonjs?hl=en.
>
>
>

Wes Garland

unread,
Dec 1, 2009, 4:24:15 PM12/1/09
to comm...@googlegroups.com
Hi, Kris!

Thanks for the updated document.

I'm still with some of the others on "new" throwing, as I outlined before. To wit, new Array and new String don't throw, and neither do those constructors in Binary/B.

Some of the fixes in this version, like require() on toString are observed and approved of. This document is also quite a bit less confusing (actually, I don't think it's confusing at all).

Oh!  require("binary") -- this makes it impossible for a user to install, say, both Binary/B and Binary/D on the same system at once.  I think require(X) where X is the literal version of module.id would be wiser wording.

Incidentally, I am on working on a full COW system for BLOB-like things in GPSEE (we call them ByteThings) that provides a mechanism for making immutable-from-mutable without copies (unless you write to the mutable after creating the immutable).  This will allow me, for example, to read a line from a file into a ByteArray, strip off the trailing newline, and yield a ByteString without copying.   We already allow limited-scope use of mutable-as-immutable via call()/apply(), although that is not thread-safe. Conclusion: your observations re. backing-store memory in the specification are totally on-point. 
(Maciej or others, if you want more info, let me know in another thread).

I still see huge problems with BitArray and BitString, aside from the fact that I do not believe they they belong in *this* module:
 - Bit endianness is meaningless
 - no mechanism for shift, roll, or, and, not, these are key bit-wise operations
 - no mechanism to break up into words
 - no mechanism to populate from words
 - note that concepts of "words" automatically carry concepts of endianness
 - No need to throw type errors on e.g. toByteArray when not multiples of 8 IMO.  Bit-oriented code is virtually always assumed to pad zero to MSb as needed
 - Inverse of valueOf should have consistent bit-size (52 IMHO FWIW)

And, why do I think "not in Binary/D"?  Frankly, efficient implementation of some of those methods are non-trivial  (lastIndexOf arbitrary bit sequence comes to mind), and yet, we NEED Binary for things like I/O.  Complex pie-in-the-sky for a VERY select audience should not tie up implementation time for basic language building blocks.

Number.toBitString and Number.toBitArray should not be defined with big-bit-endian order.  Should be little-bit-endian, such that 0x2 >> 1 == 0x1.

"ByteString instances are comparable with the == and === operators based on equal order and respective values of their content."

Sorry, but that has to change.  That is not implementable in ES3, I don't think it's implementable in ES5, and cannot be implemented with spidermonkey-latest without actually modifying the actual guts of the interpreter itself (=== is the hard one).  I would be surprised if == or === were overloadable in v8.  I also think (but am not sure) that this violates ES3. Same comments for BitString.

BTW, can you get some of the people who claim Bit(String|Array) would be useful in here?  I would like to see pseudo-code (or code in other languages) before thinking about giving the nod to these objects.

I think you should also be careful with sentences like "coerced internally with the Number constructor".  I believe this rule is too strong for a specification, that is an implementation guideline.  "coerced per the ToNumber rule efined in ES3 15.1.2.3.4" or some such is probably more reasonable.

Encoding: If we are doing base-64, why not quoted-printable?  Why not uuencode?  What's special about base-N that warrants including here?

Character Sets: Did I miss seeing this, or did you miss putting it in?  Something like

Converting to String causes the bytes in the ByteArray|ByteString to be interpreted as though they were bytes in character charset, converted to UTF-16xE*  with each UTF-16xE entity stored as an element in the String. This means that a String, composed of a single Unicode character which is represented in UTF-16 with a surrogate pair will have a .length property equal to 2.
*UTF-16xE means UTF-16BE on big-endian machines and UTF-16LE on little-endian machines. This is another way to express "Native-order UTF-16 with the BOM (Byte-Order Marke eliminated".  UTF-16BE and UTF-16LE are valid IANA character set names, defined in RFC-2781.

I'll also go on record again here saying  that I'm not comfortable how the modifications to the standard classes are defined here.  I would be happier if they were defined somewhere else, with pointers pointing here for details.  Modifications to a group of classes should be documented together, not all over the place (although I personally favour no modifications to the standard classes which the programmer did not actually request).

 FWIW, GPSEE will probably do something like this during standard-class initialization:

String.prototype.toByteString = function(charset) {
  String.prototype.toByteString = require("binary").String_toByteString;
  return String.prototype.toByteString(charset);
}

I guess you could call this a Monkey-Thunk.   (You saw it here first, folks!)

I'm not happy about having to execute extra code during engine start-up, but at least this prevents me from having to load and link another library when my script doesn't use the features.    It also allows me to unload the library (if those features aren't used)  if there are no references to it and memory pressure is high.

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Kris Kowal

unread,
Dec 1, 2009, 5:07:14 PM12/1/09
to comm...@googlegroups.com
It seems the major dimensions of design for a binary API include:

1. Whether to support both mutable and immutable binary types.
2. Whether to support both byte and bit binary types.
3. Whether to use Objects for all types or mirror the internal
properties and behaviors of String and Array.
4. Whether to require engine-level support or pure-JavaScript.
5. Whether to support convenience functions for bi-directional
conversion of objects in all dimensions, or to separate them into
architectural strata (bytes low, bits high, bits convertible to bytes,
but no byte methods for converting bytes to bits).
6. Whether to augment existing types.
7. Whether to support radix encoding and decoding with member
functions and constructors.
8. Whether to support charset encoding and decoding with member
functions and constructors.

3 and 4 make a big difference. We would not be able to use [[Get]],
[[Put]], or operator overrides without native engine support. We will
certainly need native engine support to drive for the performance
benefits of having a binary API. If we mirror String and Array, we
can half-bake support for binary types in browsers by exporting String
as ByteString and Array as ByteArray.

6 has ECMA implications. It would be good to hear which way they want
to go with that. How native should binary data types be? Binary/D is
built on the assumption that they will eventually be, but are not
quite yet, entirely native types. Should the next draft eschew
nativeness entirely? Should it embrace nativeness fully although it
will take longer to adopt?

If we go with non-native types, we will need to add methods for things
that would otherwise be taken care of with operators like ==. We
would have to establish precedent for methods like "equals".

Kris Kowal

Wes Garland

unread,
Dec 2, 2009, 12:57:59 PM12/2/09
to comm...@googlegroups.com
Hi, Kris!

Thanks for this excellent breakdown of the issues.
 
1. Whether to support both mutable and immutable binary types.

I sure think we should.  There are well-known good reasons for implementing immutable types.  For starters, ByteString.slice() on large strings, but also formal reasoning in traditional functional programming.
 
2. Whether to support both byte and bit binary types.

There is a compelling argument for bit types, but I do not believe they belong at the same implementation "level" as byte-oriented types.  The motivations for the two are very different, even if they appear, on the surface, to have great similarities.
 
3. Whether to use Objects for all types or mirror the internal
properties and behaviors of String and Array.

I'm with the Object camp.  However, I feel much more strongly that new should still work regardless of which way we go. String and Array work just fine with new, as do existing Binary/B implementations.  In fact, Array itself constructs an object.

js> print(typeof(Array()))
object
 
4. Whether to require engine-level support or pure-JavaScript.

There is a fine balance here.  pure-JavaScript for Byte-oriented types might not be a bad goal; after all, we might want to use them on the browser. Say, for a canvas pixel array.

If we require engine-level support, we need to be careful about *what* support we require. We should be looking at using only features that are available with engine APIs for, say, Rhino, v8, SquirrelFish, SpiderMonkey, JSCore and maybe WSH.   We should *not* be requiring that CommonJS platform embedders need to modify the actual semantic behaviour of their JavaScript implementation.

A good litmus test for what-features-are-available-in-most-engines probably revolves around "what sorts of operations normally happen in script?"

Script will resolve properties, enumerate them, etc. But it won't add new keywords to the language, or add new type coercions at operator-overload-time.
 
5. Whether to support convenience functions for bi-directional
conversion of objects in all dimensions, or to separate them into
architectural strata (bytes low, bits high, bits convertible to bytes,
but no byte methods for converting bytes to bits).

You know, bit-array to byte-array and back is probably very useful for people dealing with bits directly.  bit-array to String and back is probably almost just as useful (in this case, I am considering String to be an immutable array of 16-bit words with machine-native endianess).
 
6. Whether to augment existing types.

I'm very much in the EIBTI (explicit-is-better-than-implicit) camp here, and as such favour a "patch" method on exports which takes the standard types as arguments.  There are also sandbox implications in this neighbourhood.

require("binary").patch(String, Array);

7. Whether to support radix encoding and decoding with member
functions and constructors.

I'm of the opinion that while this is not particularly bad, that it is orthogonal to the base issues this module addresses. As such, I would prefer to see them in a separate module, which possibly supports a very wide range of content transfer encodings, and maybe even compression.
 
8. Whether to support charset encoding and decoding with member
functions and constructors.

I am of the opinion that you cannot reasonably have ByteStrings without character sets. I am also of the opinion that you CAN reasonably have ByteArrays without them.... ByteArrays with character sets are handy but not an absolute necessity.

3 and 4 make a big difference.  We would not be able to use [[Get]],
[[Put]], or operator overrides without native engine support.

This is precisely why I see native-engine support as a grey area requiring a feature-by-feature test.  I'm pretty sure all the engines can implement [[Get]] and  [[Put]] -- these are basically property-not-found handlers.

Adding operator overloading, though, is a feature which specifically defies ES3 and the ES5 draft.  Making ByteString === comparable, before ByteString is an intrinsic type, requires semantics that are not supported by JavaScript. In JS, {} != {} and {a:1} !== {a:1}.  This actually constitutes a fundamental modification to the core language.

Consider the following program:

var a = new ByteString("abcdef");
var b = new ByteString("abcdef");

print(a == b, a === b);

The current Binary/D proposal suggests that the output of this program ought to be "true true".

However, no two distinct objects are ever == or === in JS.  So, you have a cart-before-the-horse problem, where you're trying to push formal "it's a part of the language" semantics on CommonJS modules even though TC39 has yet to ratify the behaviour.

So, ignoring the SpiderMonkey hook for ==, the == and === proposals in Binary/D cannot be implemented when ByteString and ByteArray are JavaScript objects.

Now, SpiderMonkey has a little-known hook for == (which, BTW, may harm JITability in the long run), even though allowing that hook means that you're not technically writing JavaScript anymore.  This is also a native hook - you can't access it from script, so that is another nail in the coffin of the browser version of this type.

SpiderMonkey does *not* have hooks for overloading +, -, ===, etc, and you can't fake it with valueOf or toString, because those are called based on the type of the left-hand operand (and in this case, typeof is "object").  This means that to implement ===, you actually have to modify the engine itself. I don't think maintaining a SpiderMonkey fork is a reasonable requirement for a CommonJS implementor.

6 has ECMA implications.  It would be good to hear which way they want
to go with that.  How native should binary data types be?  Binary/D is
built on the assumption that they will eventually be, but are not
quite yet, entirely native types.  Should the next draft eschew
nativeness entirely?  Should it embrace nativeness fully although it
will take longer to adopt?

I think we need CommonJS now, not in 2016 when ES-next comes out.
 
If we go with non-native types, we will need to add methods for things
that would otherwise be taken care of with operators like ==.  We
would have to establish precedent for methods like "equals".

It might not be a bad idea to do this, although [[Get]] on immutables is a non-issue even in ES3 if we ignore performance...

exports.ByteString = function ByteString_Constructor(data) {
  for (var i=0; i < data.length; i++)
    this[i] = data[i];
}


The property-not-found hook from ES5 (whatever it's called) would fix performance and mutables, provided it ever finds it's way into a browser. (Actually, it's already there in Firefox, isn't it? Anybody know about Chrome?)

As well, your point that String and Array can substitute for ByteString and ByteArray under many circumstances also point to supporting [[Get]] and [[Put]].

Operators == and === are a little hairier, as Strings constructed with new are not == or ===, but without the new they are, because they are interned and unboxed. That, however, is not the case with Array -- arrays themselves are objects.

Strings are the special case here and the only JS type which behaves like that.

Maksim Lin

unread,
Dec 2, 2009, 5:28:25 PM12/2/09
to comm...@googlegroups.com
Hi Kris,

Thanks for all your hard work on this.

I'm sorry to be quite late to the discussion, but I was wondering if
you are open to looking at what is being done by WebGL in terms of
binary types?

Vladimir Vukićević did a blog post recently nicely describing the types:
http://blog.vlad1.com/2009/11/06/canvasarraybuffer-and-canvasarray/

and he explicitly says that he thinks they have "...application
outside of WebGL".
So despite the name and the quite low-level nature of they API,
perhaps it would be useful to look at?, since I think this is
something that is going to come very quickly in the browsers (already
in webkit and FF nightlys) and it would be nice for commonjs to stay
consistent with browser js in areas like this.

Of course hopefully they will change the naming to something more
generic the WebGL*...

Maks.

Kris Kowal

unread,
Dec 2, 2009, 7:43:40 PM12/2/09
to comm...@googlegroups.com
On Wed, Dec 2, 2009 at 2:28 PM, Maksim Lin <maksi...@gmail.com> wrote:
> and he explicitly says that he thinks they have "...application
> outside of WebGL".
> So despite the name and the quite low-level nature of they API,
> perhaps it would be useful to look at?, since I think this is
> something that is going to come very quickly in the browsers (already
> in webkit and FF nightlys) and it would be nice for commonjs to stay
> consistent with browser js in areas like this.

Is there a link to API docs for WebGLArrayBuffer?

Kris Kowal

Maksim Lin

unread,
Dec 2, 2009, 8:06:24 PM12/2/09
to comm...@googlegroups.com
Unfortunately none that I could quickly find :-(

The blog post seems about the best explanation I've seen so far.

I assume the specs are a bit lite on at the moment because they are
essentially working on a JS binding for the existing OpenGL ES 2.0
spec.

My primary reason for pointing to it was after reading the blog post,
I got the feeling of it being a nice simple, minimal api that could
easily be built upon plus would be available in any browser supporting
WebGL.

Maks.

Maksim Lin

unread,
Feb 20, 2010, 5:03:37 AM2/20/10
to comm...@googlegroups.com
I find it hard to keep up with all the discussion that happens here,
but if anyone is still interested the WebGL draft spec was published
in Dec last year and the relevant part of the spec in relation to
Arrays and Binary data is here:

https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/WebGL-spec.html#5.13

Maks.

Reply all
Reply to author
Forward
0 new messages