Binary D ready for review

0 views
Skip to first unread message

Kris Kowal

unread,
Nov 25, 2009, 4:38:21 AM11/25/09
to comm...@googlegroups.com
http://wiki.commonjs.org/wiki/Binary/D

This new Binary proposal is an significant redraft of Binary/B. I've
left Binary/B alone since there are implementations tracking it.

Some of the highlights:

* Reduction in ByteArray methods.
* Major simplification of [[Get]], valueAt, and byteStringAt.
* Addition of Daniel Friesen's Range object
* Bits as requested by many folks at jsconf
* Radix encoding for 2,16, 32, 64.
* A lot more rationale for little choices, many of which catalog other
major changes

Kris Kowal

Ash Berlin

unread,
Nov 25, 2009, 7:43:57 AM11/25/09
to comm...@googlegroups.com
Few points that came up from a quick scan:

* new ByteString(): throwing a type error is at odds with the behaviour of new String and new Array. Right now all we can do is return the 'boxed' type
* [[Get]]: on ByteString returns a unary bytestring of length one. This still feels wrong to me - what the operation is doing is requesting a byte. Right now this behaviour makes the get operation nigh on useless: when would you ever want this over valueAt()
* Content: should be Not Writable, not Configurable, not Enumerable.? Also why Caps?
* rfc4648: Guessing this is the default alphabet for toString radix operations, but thoroughly confused at first. (just needs a see also link"?)
* toSource: should it return the full content for a 2mb blob?
* indexOf/lastIndexOf: whats the behaviour when the `byte` is longer than one byte? Is there no way of searching for a terminator sequence? (think jpeg header for instance.)
* join: wording doesn't quite parse right - i'm not sure exactly what this is now doing
* copy methods etc should take a Range too?
* Should probably require UTF-16 as a required encoding too.
* How would a Range from a BitString interact with a ByteArray#copy etc?

-ash

Wes Garland

unread,
Nov 25, 2009, 11:17:58 AM11/25/09
to comm...@googlegroups.com
Kris;

Thanks for putting this together!

First-Blush comments:

ByteString, ByteArray

 - If we're supporting base-32, why not base-64 and quoted-printable?  I've actually had to use the latter pair in real-world code with ByteStrings

 - Why are we ditching "new"?  
1. creating ByteStrings without the new operator this way conflicts with GPSEE's FFI-oriented cast syntax, where ByteString(thing)  makes a ByteString out of thing  (assuming thing is of a reasonable type for that).

2. I like "new", all the other classes are using it for instanciation.

3. This breaks any kind of reasonable compatibility between Binary/B and Binary/D

- I like where you're going w.r.t. enumerability on instance props.  Can we go whole hog and explicitly say "only bytes are enumerable"?  Then we can safely write code like

for (i in a)
  print(a[i][0]);


(own property sugar could be added with ES5 code to make this even safer, i.e. from script modifications to prototype)

- What's with Get and Put?  Why are methods starting with capital letters?  Do you mean reading and writing to []?

- What exactly is Content?  A reference to the constructor? What is the rationale for this? Are we trying to graft a type system onto CommonJS?

 - returning ByteString[x] for toSource is wrong.  toSource() should generate legal source code. Thus, something like require("moduleID").ByteString([x]) is more appropos (although I think it should include 'new')

Bit Types

 - Bit-endianess is almost certainly both superflous and an anti-feature. AFAIK there is not a single processor on the market today which does not use little-endian bit order.  Even portable, low-level C code (like drivers and stuff) will assume little bit endian arch.

 - You have not touched at all on word endianness, which IS important.  This also means you need to work on word-size, and potentially think about mixed/middle endian syntax (ARM, PDP-11, ???) .

 - I do not see a justification for bits vs. bytes in the current proposal, unless you want to implicitly say "please use a smaller backing store" to the implementor. The fact that you made several thinkos copy-pasting from Byte to Bit types reinforces this to me. What are Bit types to be used for?  C gets by without a bit type, and it certainly does plenty of low-level work.

 - If we are really serious about supporting Bit types, bitwise operations must be supported. Shift, Roll, and, not, or, xor at a minimum I think.

Standard Class Monkey Patching

- I am still strongly opposed to implicit modifications of the standard classes happening when a CommonJS module is loaded!  If we want to patch standard classes, I believe explicit patching from the calling/using programmer should be used:

const binaryD = require("binaryD")
binaryD.patch(String, Array);

General Requirements

"None of the specified prototypes or augmentations to existing prototypes are enumerable."

I believe that statement makes it impossible to implement the String and Array monkey-patches in a pure ES3 environment. Although, [] might be ... tricky or impossible as well.

Rationale

- The paragraph about Pack and Unpack is confusing to me. Are those Java methods or something?  Do we need to specify what this is not?

- Encoding methods -- on the one hand, we say "we're not doing codecs", on the other hand, we have base-32, custom alphabets, blah blah.

- Future proofing, instanceof -- I'm convinced that making instanceof work where possible is a good idea, although I'm not sure that working across the sandbox boundary is required. It definitely complicates the implementation, because it means that the only way to protect the prototype is to freeze it.

- Memory Optimization -- agree whole heartedly.  Many GPSEE operations are aware of "under the hood" memory allocation details of immutable ByteString-like types.  We will share underlying storage between related types (like a ByteString and a slice of it) and maintain weak references as appropriate, blah blah.  No COW yet, but it's also on the list.

Genericity

Why do Get, Put, Content start with capital letters?

Idempotence

Agreement on decodeToString.  Frankly, I just assign toString to decodeToString in the prototype in my current code anyhow..

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Kris Kowal

unread,
Nov 25, 2009, 3:38:42 PM11/25/09
to comm...@googlegroups.com
On Wed, Nov 25, 2009 at 4:43 AM, Ash Berlin
<ash_flu...@firemirror.com> wrote:

First off, there's some confusion about Get and Put and why they have
that case convention. I'd like to get this out of the way. These
correspond to [[Get]], [[Put]], and further [[Writable]],
[[Enumerable]], &c from the ES specification, for internal properties.
The wiki treats double square brackets as markup, so I resorted to
making these italic to distinguish them. I will probably use single
brackets in the future, unless I can find a cheap way to make them
non-liky with double brackets.

> Few points that came up from a quick scan:
> * new ByteString(): throwing a type error is at odds with the behaviour of
> new String and new Array. Right now all we can do is return the 'boxed' type

It is not at odds with String and new String; I opted to avoid
burdening people with implementations of boxed byte string objects,
which are a hazard and useless anyway, but throwing a type error
leaves room for it to eventually be specified without promoting
non-future-compatible code in the interim.

However, using the same behavior for ByteArray is at odds with new
Array and Array's behavior. I will fix this.

> * [[Get]]: on ByteString returns a unary bytestring of length one.

> This
> still feels wrong to me - what the operation is doing is requesting a byte.
> Right now this behaviour makes the get operation nigh on useless: when would
> you ever want this over valueAt()

It's not useless, any more than string[index] is useless. If it's not
useful to you, I think it's fine to not use it, but not fine not to
implement it any other way, in my opinion. valueAt gives you an out.

ByteString([0, 0, 0, 1])[0].copy(buffer, at)

> * Content: should be Not Writable, not Configurable, not Enumerable.? Also
> why Caps?

By the JavaScript case convention, "content" would imply the instance,
"Content" implies the type. "Content" is the type of what is returned
by [[Get]].

> * rfc4648: Guessing this is the default alphabet for toString radix
> operations, but thoroughly confused at first. (just needs a see also link"?)

I'll add a reference. This is for base32 radix encoding and it's not
the default. It is probably a point of discussion, but there are two
alphabet specifications for base32. In my opinion, Doug Crockford's
is more useful, so I made it the default and put this alternative on
the constructor object (which is more interoperable with other
languages like Python).

> * toSource: should it return the full content for a 2mb blob?

I think this use case would be a bug. I would prefer not to specify
an arbitrary boundary where the source representation of binary
collection switches from being evaluable to non-evaluable. I also
(personally) don't think it's a real problem if implementations chose
to deviate from the specification at some arbitrarily high size if
they so choose. We're certainly not going to test that case in a
compliance suite.

> * indexOf/lastIndexOf: whats the behaviour when the `byte` is longer than
> one byte? Is there no way of searching for a terminator sequence? (think
> jpeg header for instance.)

The specification is pretty clear on this, I think. It behaves as you
would expect it, finding the first or last occurrence of byte string.
I'll rename the variable to "bytes" to be less misleading (although it
tolerates a Number for a single-byte delimiter).

> * join: wording doesn't quite parse right - i'm not sure exactly what this
> is now doing

It's different. Might take another read, or read the rationale at the bottom.

> * copy methods etc should take a Range too?

Yeah, probably a good idea. I'll write it out and see if it's still
palatable when I'm done. Probably.

> * Should probably require UTF-16 as a required encoding too.

Sure.

> * How would a Range from a BitString interact with a ByteArray#copy etc?

I've opted to leave copy specified to only work with like types. I'll
have to specify a TypeError for those cases, unless someone has some
ideas for *clear* implicit conversions between those types.

I need to fix the Bit types and endianness. That is definitely a
relevant point.

Kris Kowal

Kris Kowal

unread,
Nov 25, 2009, 4:54:21 PM11/25/09
to comm...@googlegroups.com
On Wed, Nov 25, 2009 at 8:17 AM, Wes Garland <w...@page.ca> wrote:

> ByteString, ByteArray
>
>  - If we're supporting base-32, why not base-64 and quoted-printable?  I've
> actually had to use the latter pair in real-world code with ByteStrings

We've got them. Look closely at toString(radix). The only reason
that base32 has its own alphabet is because there are two standard
alphabets for base32.

>  - Why are we ditching "new"?
> 1. creating ByteStrings without the new operator this way conflicts with
> GPSEE's FFI-oriented cast syntax, where ByteString(thing)  makes a
> ByteString out of thing  (assuming thing is of a reasonable type for that).

It's in the rationale. I'll fix ByteArray, but ByteString matches
String minus support for boxed types, which I presume you'll be
thankful not to have to implement since they wouldn't be useful in the
short term.

> 2. I like "new", all the other classes are using it for instanciation.

The intent is that ByteArray and ByteString need to be
future-compatible with native equivalents.

> - I like where you're going w.r.t. enumerability on instance props.  Can we
> go whole hog and explicitly say "only bytes are enumerable"?  Then we can
> safely write code like
>
> for (i in a)
>   print(a[i][0]);

I'm worried that specifying that kind of thing in great deal, and
other things that need to be true for an ECMA specification like
typeof ByteString() == "bytestring", will make it impossible to write
fully compliant prototypes in the short term. This might not be a big
deal. I'll see what I can do.

> - What's with Get and Put?  Why are methods starting with capital letters?
> Do you mean reading and writing to []?

They're italic to distinguish internal properties. I'll see what I
can do to make that more clear too.

> - What exactly is Content?  A reference to the constructor? What is the
> rationale for this? Are we trying to graft a type system onto CommonJS?

This is an idea from Daniel Friesen's proposal. He would probably be
better at explaining the use case, and whether this specification
fulfills his requirement.

>  - returning ByteString[x] for toSource is wrong.  toSource() should
> generate legal source code. Thus, something like
> require("moduleID").ByteString([x]) is more appropos (although I think it
> should include 'new')

Okay.

> Bit Types
>
>  - Bit-endianess is almost certainly both superflous and an anti-feature.
> AFAIK there is not a single processor on the market today which does not use
> little-endian bit order.  Even portable, low-level C code (like drivers and
> stuff) will assume little bit endian arch.

I think it might have been an error to make endianness an issue for
Byte-Bit conversion. I'm going to review this.

>  - You have not touched at all on word endianness, which IS important.  This
> also means you need to work on word-size, and potentially think about
> mixed/middle endian syntax (ARM, PDP-11, ???) .

Endianness is covered implicitly for the IANA stuff. If someone could
write up an explanation of how to construct different endianness
charset identifiers, I would like to include that in the
specification. If you could explain what mixed endianness is, maybe I
can fit it in somewhere. I do not thing, apart from that, endianness
comes into play with anything written so far, since there are no
word-level conversions yet. It's my intention to augment the Number
type to create byte and bit strings of various endiannesses.

>  - I do not see a justification for bits vs. bytes in the current proposal,
> unless you want to implicitly say "please use a smaller backing store" to
> the implementor. The fact that you made several thinkos copy-pasting from
> Byte to Bit types reinforces this to me. What are Bit types to be used for?
> C gets by without a bit type, and it certainly does plenty of low-level
> work.

I'm hoping Brian Mitchell can chime in with some requirements for bit
data types. They come from the Erland side of the world and come
highly recommended. I, having no experience with such things, can
only assume that the purpose is to make things like non-byte quantized
encodings, like base32, easier to implement using slice. I'm sure
you've noticed that non-power-of-2 radix are hard to implement.

>  - If we are really serious about supporting Bit types, bitwise operations
> must be supported. Shift, Roll, and, not, or, xor at a minimum I think.

I'm not sure. It might be adequate to do these operations with forEach and map.

> Standard Class Monkey Patching
>
> - I am still strongly opposed to implicit modifications of the standard
> classes happening when a CommonJS module is loaded!  If we want to patch
> standard classes, I believe explicit patching from the calling/using
> programmer should be used:
>
> const binaryD = require("binaryD")
> binaryD.patch(String, Array);

It's my intent that these would not be instantiated when the module is
loaded, just carried by the exports of the module. In this case the
module is just being used as a name space, not for the mechanics. The
specification does not indicate that these changes to base classes
should occur as a result of loading the module. Really, these ought
to be primordials and a rigorous implementation would implement them
fully in the engine, with the same diligence as the existing
primordials.

> General Requirements
>
> "None of the specified prototypes or augmentations to existing prototypes
> are enumerable."
>
> I believe that statement makes it impossible to implement the String and
> Array monkey-patches in a pure ES3 environment. Although, [] might be ...
> tricky or impossible as well.

It specifies that properties of the prototype are not enumerable.
That does not preclude enumerable instance properties.

> Rationale
>
> - The paragraph about Pack and Unpack is confusing to me. Are those Java
> methods or something?  Do we need to specify what this is not?

This is in the errata. I do not think it's important to define these
things yet. Pack, unpack, and calcsize are a byte-quantized unpacking
API supported by Perl, Ruby, PHP, and Python, at the very least.

> - Encoding methods -- on the one hand, we say "we're not doing codecs", on
> the other hand, we have base-32, custom alphabets, blah blah.

It says we aren't doing encoding-named methods, which distinguishes
this proposal from the first proposal. I'll try to make that more
clear.

> - Future proofing, instanceof -- I'm convinced that making instanceof work
> where possible is a good idea, although I'm not sure that working across the
> sandbox boundary is required. It definitely complicates the implementation,
> because it means that the only way to protect the prototype is to freeze it.

Yeah, agreed. I should clarify that the scope of this proposal is
intended to be implementable today with a certain degree of ease (so
we can build momentum and make claims like "I made a compliant
implementation") and also be ready for a future when there are more
rigorous implementations. I think that if you want to make an
implementation that pre-sages a native specification, that would be
fine, but I don't want to make it a requirement for compliance this
year. Eventually, these should be true, to be in keeping with
precedent:

typeof ByteString() == "byteString"
typeof new ByteString() == "object"
ByteString() instanceof ByteString == false
new ByteString() instanceof ByteString == true
typeof ByteArray() == "object"
typeof new ByteArray() == "object"
ByteArray() instanceof ByteArray == true
new ByteArray() instanceof ByteArray == true
Object.prototype.toString.call(ByteString()) == "[object ByteString]"
Object.prototype.toString.call(ByteArray()) == "[object ByteArray]"

> - Memory Optimization -- agree whole heartedly.  Many GPSEE operations are
> aware of "under the hood" memory allocation details of immutable
> ByteString-like types.  We will share underlying storage between related
> types (like a ByteString and a slice of it) and maintain weak references as
> appropriate, blah blah.  No COW yet, but it's also on the list.

Awesome. I hope this is satisfactory to Maciej from Apple.

> Agreement on decodeToString.  Frankly, I just assign toString to
> decodeToString in the prototype in my current code anyhow..

Oh, good. I expected this to be contentious.

Kris Kowal

Kris Kowal

unread,
Nov 26, 2009, 1:37:17 AM11/26/09
to comm...@googlegroups.com
In the course of struggling to keep everything consistent while I edit
Binary/D, I've created a spreadsheet that shows a bird's-eye-view. It
shows methods, their signatures, which types implement them, and what
they return.

http://spreadsheets.google.com/ccc?key=0An5phhxDkYDPdEFoTnlsRWgyc0x0WkpidURWT3pSYnc&hl=en

Kris Kowal

Alexandre Morgaut

unread,
Nov 26, 2009, 4:34:57 PM11/26/09
to CommonJS
Just to say I added the W3C File API Working Draft in Prior Arts

I think it should be looked at.
If a File API is made available on client side, most JavaScript
developers will firstly be aware of this one (as they are mostly
frontend developers or at least mostly use JavaScript on client side).
They it might be good if the server side File API preserve the same
logic.

Wes Garland

unread,
Nov 26, 2009, 5:32:49 PM11/26/09
to comm...@googlegroups.com
Alexandre;

On Thu, Nov 26, 2009 at 4:34 PM, Alexandre Morgaut <Alexandr...@4d.fr> wrote:
Just to say I added the W3C File API Working Draft in Prior Arts

I think it should be looked at.

I agree.

From a quick glance, I think it is easily implementable with fs-base, provided we implement a reasonable stream type (as was implied in the last major file proposal).

Wes

Daniel Friesen

unread,
Nov 30, 2009, 12:49:19 AM11/30/09
to comm...@googlegroups.com
Kris Kowal wrote:
> On Wed, Nov 25, 2009 at 8:17 AM, Wes Garland <w...@page.ca> wrote:
>
>> - What exactly is Content? A reference to the constructor? What is the
>> rationale for this? Are we trying to graft a type system onto CommonJS?
>>
>
> This is an idea from Daniel Friesen's proposal. He would probably be
> better at explaining the use case, and whether this specification
> fulfills his requirement.
>
I've been using .contentConstructor rather than .Content;

It provides an abstract way of working with characters or bytes in
algorithms where you don't care which you are working with, as long as
the units are right. (Like streaming algorithms or whatnot).

Explaining in terms of Binary/B

// Where seq is a string or a blob
seq.length; // Gets the number of 1unit parts (bytes/characters) inside
the sequence
seq.valueAt(#); // Gets you a single character/byte at an index
seq.codeAt(#); // Gets the numeric code at an index (character code, or
0-255 byte number)
seq.slice(begin, end); // Gets a subsequence from the sequence
seq.concat(seq2); // Concatenates two strings together
seq.indexOf(seq2, [offset]); // Find the location of a subsequence
inside of a sequence
seq.split(seq2, [limit]); // Split a sequence up into multiple sequences
by a subsequence
seq.valueOf(); // In Binary/C+IO/B/Buffer this ties in with Buffer to
get the non-mutable version (A StringBuffer will return a String, a
BlobBiuffer will return a Blob).
seq.contentConstructor.fromCode(code); // Return a byte/character based
on it's byte number or character code
seq.contentConstructor(...); // Conversion, cast toString or cast toBlob;

There are a few sub patterns to that:
seq.contentConstructor(); // Return an empty string or empty blob of the
same type of the sequence
seq.contentConstructor.fromCode(0); // Return the null character or byte
("\0" or Blob([0])), useful for filling

Combined together with an abstract Buffer like IO/B/Buffer this makes
for a very flexible way of writing char/byte independent algorithms.
Which end up being relied on heavily when prototyping new things for
type independent streams from files, sockets and whatnot.


Binary/D does include most of those abstraction methods (or variants of
there of; I'm not quite happy with naming and some return types, then
again I still can't comprehend Byte{String,Array}), however exclusion of
Buffer (a mutable type with abstract interface that can work on either
characters or bytes) frankly voids the purpose in the majority of use cases.

For a simple example, abstractly reading a fixed length of data from a
stream.
http://github.com/dantman/monkeyscript.lite/blob/master/src/bananas/os/io/Stream.js#L111-130

> ...
>> - I do not see a justification for bits vs. bytes in the current proposal,
>> unless you want to implicitly say "please use a smaller backing store" to
>> the implementor. The fact that you made several thinkos copy-pasting from
>> Byte to Bit types reinforces this to me. What are Bit types to be used for?
>> C gets by without a bit type, and it certainly does plenty of low-level
>> work.
>>
>
> I'm hoping Brian Mitchell can chime in with some requirements for bit
> data types. They come from the Erland side of the world and come
> highly recommended. I, having no experience with such things, can
> only assume that the purpose is to make things like non-byte quantized
> encodings, like base32, easier to implement using slice. I'm sure
> you've noticed that non-power-of-2 radix are hard to implement.
>
>
I don't like the bit type either.
Java, Ruby, Python, C, Lisp, etc... all get by without any sort of bit
type.
Why do we need a low level bit type in such a high level language, when
the low level languages don't even have one.

And 90% of things people are going to write aren't going to have a use
for the bit type.
At the least the bit type should be a separate module, not required for
basic binary compliance.
>
> Kris Kowal
>

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Kris Kowal

unread,
Nov 30, 2009, 1:47:52 AM11/30/09
to comm...@googlegroups.com
On Sun, Nov 29, 2009 at 9:49 PM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
> I've been using .contentConstructor rather than .Content;
> It provides an abstract way of working with characters or bytes in
> algorithms where you don't care which you are working with, as long as
> the units are right. (Like streaming algorithms or whatnot).

Let's see hands for .Content, .contentConstructor, or no such concept.

> // Where seq is a string or a blob
Aside: sequence and series have very specific mathematical
definitions. Let's avoid using them; they are not synonyms for string.

> Combined together with an abstract Buffer like IO/B/Buffer this makes
> for a very flexible way of writing char/byte independent algorithms.
> Which end up being relied on heavily when prototyping new things for
> type independent streams from files, sockets and whatnot.

> however exclusion of
> Buffer (a mutable type with abstract interface that can work on either
> characters or bytes) frankly voids the purpose in the majority of use cases.

My intent is that buffer types would be defined in terms of ByteArray,
BitArray, and Array at the IO layer.

> I don't like the bit type either.
> Java, Ruby, Python, C, Lisp, etc... all get by without any sort of bit
> type.

This is specious. We could all get by without cell phones. My
impression is that people-in-the-know from using Erlang have found
that they greatly simplify certain algorithms, and I believe them.
For example, encoding and decoding Hamming codes at the line level,
and base32 encoding would be greatly simplified if you did not have to
do complicated shifting to grab relevant slices of a byte string.

Furthermore, it would not make sense to tack bit types on at a later
layer, since byte and bit types should be inter-convertible with
toBit* and toByte* methods cross implemented.

I'll have a new draft up shortly.

Kris Kowal

Ash Berlin

unread,
Nov 30, 2009, 7:14:37 AM11/30/09
to comm...@googlegroups.com

On 30 Nov 2009, at 06:47, Kris Kowal wrote:

> On Sun, Nov 29, 2009 at 9:49 PM, Daniel Friesen
> <nadir.s...@gmail.com> wrote:
>> I've been using .contentConstructor rather than .Content;
>> It provides an abstract way of working with characters or bytes in
>> algorithms where you don't care which you are working with, as long as
>> the units are right. (Like streaming algorithms or whatnot).
>
> Let's see hands for .Content, .contentConstructor, or no such concept.

i) .Content
ii) .contentConstructor
iii) no such concept

I'm still not entirely convinced by this, so iii) then ii) (Caps as an instance property which isn't a whole new class doesn't feel right to me)


>> I don't like the bit type either.
>> Java, Ruby, Python, C, Lisp, etc... all get by without any sort of bit
>> type.
>
> This is specious. We could all get by without cell phones. My
> impression is that people-in-the-know from using Erlang have found
> that they greatly simplify certain algorithms, and I believe them.
> For example, encoding and decoding Hamming codes at the line level,
> and base32 encoding would be greatly simplified if you did not have to
> do complicated shifting to grab relevant slices of a byte string.
>
> Furthermore, it would not make sense to tack bit types on at a later
> layer, since byte and bit types should be inter-convertible with
> toBit* and toByte* methods cross implemented.
>
> I'll have a new draft up shortly.
>
> Kris Kowal

One thing that i mentioned in IRC and i'm not sure if you picked up on or not: Bit*'s content should be Boolean, not a Number.

-ash

Kris Kowal

unread,
Nov 30, 2009, 2:10:55 PM11/30/09
to comm...@googlegroups.com
On Mon, Nov 30, 2009 at 4:14 AM, Ash Berlin
<ash_flu...@firemirror.com> wrote:
> One thing that i mentioned in IRC and i'm not sure if you picked up on or not: Bit*'s content should be Boolean, not a Number.

That was my first idea, however, it think it would drive
inconsistency. Consider toSource():

require("binary").BitString([true, true, true, true, false, false, true, false])
require("binary").BitString([1, 1, 1, 1, 0, 0, 1, 0])

Using Number instead of Boolean wouldn't affect logical operations.

I'm assuming that the latter is more consistent with people's
expectations. In any case, let's put it up for hands.

A.) Boolean
B.) Number

Kris Kowal

Nathan Stott

unread,
Nov 30, 2009, 2:14:49 PM11/30/09
to comm...@googlegroups.com
A) Boolean


--

You received this message because you are subscribed to the Google Groups "CommonJS" group.
To post to this group, send email to comm...@googlegroups.com.
To unsubscribe from this group, send email to commonjs+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/commonjs?hl=en.



Ash Berlin

unread,
Nov 30, 2009, 2:22:42 PM11/30/09
to comm...@googlegroups.com
Hmm you might be right. But I guess it depends on what the point of the Content/contentConsturctor property is for.

Some of Daniel's use cases don't work on Number anyway: seq.contentConstructor.fromCode(0); (at least the /D spec doesn't have any fromCode specified, and we dont add anything to Number anyway.

So lets all pretend I'm stupid and I still don't get it: What's the point of this (Content) property?

-ash

Donny Viszneki

unread,
Nov 30, 2009, 2:28:10 PM11/30/09
to comm...@googlegroups.com
On Mon, Nov 30, 2009 at 12:49 AM, Daniel Friesen
<nadir.s...@gmail.com> wrote:
> Kris Kowal wrote:
>> On Wed, Nov 25, 2009 at 8:17 AM, Wes Garland <w...@page.ca> wrote:
>>
>>> - What exactly is Content?  A reference to the constructor? What is the
>>> rationale for this? Are we trying to graft a type system onto CommonJS?
>>>
>>
>> This is an idea from Daniel Friesen's proposal.  He would probably be
>> better at explaining the use case, and whether this specification
>> fulfills his requirement.
>>
> I've been using .contentConstructor rather than .Content;
>
> It provides an abstract way of working with characters or bytes in
> algorithms where you don't care which you are working with, as long as
> the units are right. (Like streaming algorithms or whatnot).

<snip>

> seq.contentConstructor.fromCode(code); // Return a byte/character based
> on it's byte number or character code
> seq.contentConstructor(...); // Conversion, cast toString or cast toBlob;
>
> There are a few sub patterns to that:
> seq.contentConstructor(); // Return an empty string or empty blob of the
> same type of the sequence
> seq.contentConstructor.fromCode(0); // Return the null character or byte
> ("\0" or Blob([0])), useful for filling

This "contentConstructor" property sounds a lot like the "constructor" property.

--
http://codebad.com/

Daniel Friesen

unread,
Nov 30, 2009, 3:09:06 PM11/30/09
to comm...@googlegroups.com
.contentConstructor is irrelevant to BitString. There is no
complementary unit smaller than a character for text.

As for BitString; If it is speced, imho it should accept both in input,
toSource should optimize with 0/1; the output of a .toArray or whatnot
is up for discussion.

Ash Berlin

unread,
Nov 30, 2009, 3:12:47 PM11/30/09
to comm...@googlegroups.com

On 30 Nov 2009, at 20:09, Daniel Friesen wrote:

> Ash Berlin wrote:
>>
>> Hmm you might be right. But I guess it depends on what the point of the Content/contentConsturctor property is for.
>>
>> Some of Daniel's use cases don't work on Number anyway: seq.contentConstructor.fromCode(0); (at least the /D spec doesn't have any fromCode specified, and we dont add anything to Number anyway.
>>
>> So lets all pretend I'm stupid and I still don't get it: What's the point of this (Content) property?
>>
>> -ash
>>
> .contentConstructor is irrelevant to BitString. There is no
> complementary unit smaller than a character for text.

Everywhere or nowhere is my view. So whats the general purpose point. And bear in mind your previous example didn't really make much sense to me.

-ash

Daniel Friesen

unread,
Nov 30, 2009, 3:17:13 PM11/30/09
to comm...@googlegroups.com
Sure, on String and Blob instances directly. But .contentConstructor is
present on Buffers, Streams, and so on...

try {
var buf = new Buffer(stream.contentConstructor);
while(buf.length < len) {
var chunk = stream.read(len-buf.length);
if ( !rlen.length )
break;
buf.append(chunk);
}
buf.append(stream.contentConstructor.fromCode(0));
return buf.valueOf();
} catch ( e ) {
return stream.contentConstructor();

Wes Garland

unread,
Nov 30, 2009, 4:22:22 PM11/30/09
to comm...@googlegroups.com
> For example, encoding and decoding Hamming codes at the line level,
> and base32 encoding would be greatly simplified if you did not have to
> do complicated shifting to grab relevant slices of a byte string.

For this application vis-a-vis the proposed spec, I don't see the difference between ByteArray and BitArray, with the exception that you are limiting BitArrays to only storing the numbers 0 and 1.

True?

Kris Kowal

unread,
Nov 30, 2009, 4:47:57 PM11/30/09
to comm...@googlegroups.com
On Mon, Nov 30, 2009 at 1:22 PM, Wes Garland <w...@page.ca> wrote:
>> For example, encoding and decoding Hamming codes at the line level,
>> and base32 encoding would be greatly simplified if you did not have to
>> do complicated shifting to grab relevant slices of a byte string.
>
> For this application vis-a-vis the proposed spec, I don't see the difference
> between ByteArray and BitArray, with the exception that you are limiting
> BitArrays to only storing the numbers 0 and 1.

The interfaces are similar. The restriction of only using 0 and 1
presents opportunities for the backing to be more space efficient than
merely using the byte array backing. Also, because the bit array is
not guaranteed to be a multiple of eight in length, I did not opt to
support byte quantized encodings in the interface. Essentially, the
bit quantized data could make a nice abstraction for any algorithm
that would normally have to do complicated shifting and masking. For
example:

for (var i = 0, j = 0, ii = bits.length; i < ii; i += width, j++) {
chars[j] = String.fromCharCode(alphabet[bits.slice(i, i +
width).valueOf()]);
}

Instead of http://github.com/280north/narwhal/blob/master/lib/base64.js
and its ilk. That, I presume, is what is intended with bit quantized
data. Again, waiting for someone who knows Erlang to chime in about
what the real advantages are.

Kris Kowal
Reply all
Reply to author
Forward
0 new messages