Hi, Kris!
Thanks for the updated document.
I'm still with some of the others on "new" throwing, as I outlined before. To wit, new Array and new String don't throw, and neither do those constructors in Binary/B.
Some of the fixes in this version, like require() on toString are observed and approved of. This document is also quite a bit less confusing (actually, I don't think it's confusing at all).
Oh! require("binary") -- this makes it impossible for a user to install, say, both Binary/B and Binary/D on the same system at once. I think require(X) where X is the literal version of
module.id would be wiser wording.
Incidentally, I am on working on a full COW system for BLOB-like things in GPSEE (we call them ByteThings) that provides a mechanism for making immutable-from-mutable without copies (unless you write to the mutable after creating the immutable). This will allow me, for example, to read a line from a file into a ByteArray, strip off the trailing newline, and yield a ByteString without copying. We already allow limited-scope use of mutable-as-immutable via call()/apply(), although that is not thread-safe. Conclusion: your observations re. backing-store memory in the specification are totally on-point.
(Maciej or others, if you want more info, let me know in another thread).
I still see huge problems with BitArray and BitString, aside from the fact that I do not believe they they belong in *this* module:
- Bit endianness is meaningless
- no mechanism for shift, roll, or, and, not, these are key bit-wise operations
- no mechanism to break up into words
- no mechanism to populate from words
- note that concepts of "words" automatically carry concepts of endianness
- No need to throw type errors on e.g. toByteArray when not multiples of 8 IMO. Bit-oriented code is virtually always assumed to pad zero to MSb as needed
- Inverse of valueOf should have consistent bit-size (52 IMHO FWIW)
And, why do I think "not in Binary/D"? Frankly, efficient implementation of some of those methods are non-trivial (lastIndexOf arbitrary bit sequence comes to mind), and yet, we NEED Binary for things like I/O. Complex pie-in-the-sky for a VERY select audience should not tie up implementation time for basic language building blocks.
Number.toBitString and Number.toBitArray should not be defined with big-bit-endian order. Should be little-bit-endian, such that 0x2 >> 1 == 0x1.
"ByteString instances are comparable with the == and === operators based on equal order and respective values of their content."
Sorry, but that has to change. That is not implementable in ES3, I don't think it's implementable in ES5, and cannot be implemented with spidermonkey-latest without actually modifying the actual guts of the interpreter itself (=== is the hard one). I would be surprised if == or === were overloadable in v8. I also think (but am not sure) that this violates ES3. Same comments for BitString.
BTW, can you get some of the people who claim Bit(String|Array) would be useful in here? I would like to see pseudo-code (or code in other languages) before thinking about giving the nod to these objects.
I think you should also be careful with sentences like "coerced internally with the Number constructor". I believe this rule is too strong for a specification, that is an implementation guideline. "coerced per the ToNumber rule efined in ES3 15.1.2.3.4" or some such is probably more reasonable.
Encoding: If we are doing base-64, why not quoted-printable? Why not uuencode? What's special about base-N that warrants including here?
Character Sets: Did I miss seeing this, or did you miss putting it in? Something like
Converting to String causes the bytes in the ByteArray|ByteString to be interpreted as though they were bytes in character charset, converted to UTF-16xE* with each UTF-16xE entity stored as an element in the String. This means that a String, composed of a single Unicode character which is represented in UTF-16 with a surrogate pair will have a .length property equal to 2.
*UTF-16xE means UTF-16BE on big-endian machines and UTF-16LE on little-endian machines. This is another way to express "Native-order UTF-16 with the BOM (Byte-Order Marke eliminated". UTF-16BE and UTF-16LE are valid IANA character set names, defined in RFC-2781.
I'll also go on record again here saying that I'm not comfortable how the modifications to the standard classes are defined here. I would be happier if they were defined somewhere else, with pointers pointing here for details. Modifications to a group of classes should be documented together, not all over the place (although I personally favour no modifications to the standard classes which the programmer did not actually request).
FWIW, GPSEE will probably do something like this during standard-class initialization:
String.prototype.toByteString = function(charset) {
String.prototype.toByteString = require("binary").String_toByteString;
return String.prototype.toByteString(charset);
}
I guess you could call this a
Monkey-Thunk. (You saw it here first, folks!)
I'm not happy about having to execute extra code during engine start-up, but at least this prevents me from having to load and link another library when my script doesn't use the features. It also allows me to unload the library (if those features aren't used) if there are no references to it and memory pressure is high.
Wes
--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102