ByteArray: byteAt method

23 views
Skip to first unread message

Aristid Breitkreuz

unread,
May 16, 2009, 5:32:45 PM5/16/09
to serv...@googlegroups.com
Hi,

Right now, ByteArray only supports [] for element access. I think byteAt
(like in ByteString) would be a reasonable addition, because it gives a
reasonable subset of operations that work identically on both ByteString
and ByteArray (effectively creating an interface of Binary).

Can I add that method to the Wiki?

On a related note, specifying some methods to be on Binary directly
might not hurt either.

Kind regards

Aristid Breitkreuz

Kris Kowal

unread,
May 16, 2009, 5:49:46 PM5/16/09
to serv...@googlegroups.com
On Sat, May 16, 2009 at 2:32 PM, Aristid Breitkreuz
<aristid.b...@gmx.de> wrote:
> Right now, ByteArray only supports [] for element access. I think byteAt
> (like in ByteString) would be a reasonable addition, because it gives a
> reasonable subset of operations that work identically on both ByteString
> and ByteArray (effectively creating an interface of Binary).
>
> Can I add that method to the Wiki?

Yeah, byteAt would be a fine edition, I think.

For the sake of genericity, I think that "charAt" was a mistake in the
earliest versions of JavaScript. A general purpose "get" function,
implemented wherever indexing [] is distinct from attribute selection,
would have been far more reusable and repurposable than a bunch of
functions named after the type that they return, like "charAt",
"byteAt", "numberAt", or so on. Since precedent is established, we
may as well run with "byteAt", but perhaps we can add the usual CRUD,
"get", "set", "has", and "del", to container types later. I would go
so far as to recommend more three-letter names for this purpose like
"cut" (get and del), and "put" (which would bump successive numeric
indicies for types where position is important, but otherwise be
identical to "set"). I know that we're not in the business of
invention here, but I think it would be good to keep these names and
purposes in the back of our minds.

> On a related note, specifying some methods to be on Binary directly
> might not hurt either.

Perhaps you could mention some ideas here. I'll start off with the
suggestion that "fromCharCodes" would be useful where applicable.

Kris Kowal

Aristid Breitkreuz

unread,
May 17, 2009, 6:20:04 PM5/17/09
to serv...@googlegroups.com
Hi,

I will have many more suggestions, but these I shall ask now.

First, I have the following suggestion: moving ByteArray.prototype.join
to ByteString.join (as a constructor method). This would allow joining
Arrays of ByteStrings together to a single ByteString (interleaved with
the delimiter), arguably a more significant use case than joining the
bytes of a ByteArray to a ByteString, interleaved with delimiters.

I even think that having no join at all would be better than
ByteArray.prototype.join, because... just take into account that it
would join _single bytes_ with a delimiter. So
ByteArray([1,2,3]).join(ByteString[5]) becomes
ByteString([1,5,2,5,3])... ByteArray is NOT an Array of ByteStrings...


OK, second thing,

Kris Kowal schrieb:

What would that do? And do you mean Binary.fromCharCodes, or
Binary.prototype.fromCharCodes or do you even actually mean ByteString?

Kris Kowal

unread,
May 17, 2009, 6:57:04 PM5/17/09
to serv...@googlegroups.com
On Sun, May 17, 2009 at 3:20 PM, Aristid Breitkreuz
<aristid.b...@gmx.de> wrote:

> First, I have the following suggestion: moving ByteArray.prototype.join
> to ByteString.join (as a constructor method). This would allow joining
> Arrays of ByteStrings together to a single ByteString (interleaved with
> the delimiter), arguably a more significant use case than joining the
> bytes of a ByteArray to a ByteString, interleaved with delimiters.
>
> I even think that having no join at all would be better than
> ByteArray.prototype.join, because... just take into account that it
> would join _single bytes_ with a delimiter. So
> ByteArray([1,2,3]).join(ByteString[5]) becomes
> ByteString([1,5,2,5,3])... ByteArray is NOT an Array of ByteStrings...

Sure. ByteArray is effectively a specialized array of numbers. I
wonder whether Array.prototype.join could be augmented to return a
ByteString if it is composed of ByteStrings and accepts a ByteString
as its delimiter argument, or return a ByteArray if the delimiter and
components are ByteArrays. Very good catch. It would be a stretch to
make ByteArray.prototype.join behave analogously to
Array.prototype.join (coercing its items and delimiter to ByteString.
How do you coerce a Number to a ByteString?)

>> [snip]


> What would that do? And do you mean Binary.fromCharCodes, or
> Binary.prototype.fromCharCodes or do you even actually mean ByteString?

I think String.fromCharCodes(arrayOfNumbers:Array) would be a good
complement to String.fromCharCode(charCode:Number). Thus:
String.fromCharCodes(string.charCodes()) and
String.fromCharCodes(array).charCodes() would complete a loop.

I also noticed when integrating Jack with ByteString that it's
sometimes handy to call object.toByteString(charset) without regard
for whether the object is a String or ByteString. Just as toString()
is idempotent, you would want toByteString(charset) to be idempotent.
To that end, I think we should consider adding
ByteString.prototype.toByteString(charset) as a third form of that
method that, like the no-arg form, returns itself, ignoring the
charset argument. Thus,
string.toByteString('utf-8').toByteString('utf-8') ==
string.toByteString('utf-8').

I am also in favor of a ByteString.prototype.toString(charset) form,
as an alias of ByteString.prorotype.decodeToString(charset).
Number.prorotype.toString([radix]) establishes a precedent in
JavaScript for that form. I also like that the specified
ByteString.prorotype.toString() form returns a "[ByteString {length}]"
representation.

That brings up the issue, should ByteString.prorotype.decodeToString()
without a charset argument use the system default charset? If no-one
objects, I would recommend that any proponent update the wiki.

https://wiki.mozilla.org/ServerJS/Binary/B

Another concern is that ByteString may not be comparable with == as
String is. If this is not implementable in all platforms, we should
add an equality comparison method. I recommend .eq(), .ne(), .lt(),
.le(), .gt(), and .ge(), for those purposes, in general, though I see
that the Java .equals() has some support in other APIs.

Kris Kowal

Ash Berlin

unread,
May 18, 2009, 4:20:27 AM5/18/09
to serv...@googlegroups.com

On 17 May 2009, at 23:57, Kris Kowal wrote:
> That brings up the issue, should ByteString.prorotype.decodeToString()
> without a charset argument use the system default charset? If no-one
> objects, I would recommend that any proponent update the wiki.

I think this would encourage sloppy programming - if you've got binary
data, you (the programmer) should get in the habit of knowing what
charset it is in if you want to turn it into a string of characters.
It's hard enough dealing with character sets without this behaviour in
my view.

>
> https://wiki.mozilla.org/ServerJS/Binary/B
>
> Another concern is that ByteString may not be comparable with == as
> String is. If this is not implementable in all platforms, we should
> add an equality comparison method. I recommend .eq(), .ne(), .lt(),
> .le(), .gt(), and .ge(), for those purposes, in general, though I see
> that the Java .equals() has some support in other APIs.

Hmmm you make a good point. I think these might be needed, and I'd
also suggest a compare that returns <= -1, 0, >= 1 for sorting purposes.

>
> Kris Kowal
>
> >

Aristid Breitkreuz

unread,
May 18, 2009, 8:53:01 AM5/18/09
to serv...@googlegroups.com
Kris Kowal schrieb:

> On Sun, May 17, 2009 at 3:20 PM, Aristid Breitkreuz
> <aristid.b...@gmx.de> wrote:
>
>
>> First, I have the following suggestion: moving ByteArray.prototype.join
>> to ByteString.join (as a constructor method). This would allow joining
>> Arrays of ByteStrings together to a single ByteString (interleaved with
>> the delimiter), arguably a more significant use case than joining the
>> bytes of a ByteArray to a ByteString, interleaved with delimiters.
>>
>> I even think that having no join at all would be better than
>> ByteArray.prototype.join, because... just take into account that it
>> would join _single bytes_ with a delimiter. So
>> ByteArray([1,2,3]).join(ByteString[5]) becomes
>> ByteString([1,5,2,5,3])... ByteArray is NOT an Array of ByteStrings...
>>
>
> Sure. ByteArray is effectively a specialized array of numbers. I
> wonder whether Array.prototype.join could be augmented to return a
> ByteString if it is composed of ByteStrings and accepts a ByteString
> as its delimiter argument, or return a ByteArray if the delimiter and
> components are ByteArrays.

No! Not on Array! Changing the behaviour of Array.prototype.join seems
like a very very very stupid idea to me. It could break expectations in
all sort of ways, and it wouldn't really help clarity.

My proposal: join on the constructor of ByteString, taking an Array of
Binaries.

> [snip]


>
>
>>> [snip]
>>>
>> What would that do? And do you mean Binary.fromCharCodes, or
>> Binary.prototype.fromCharCodes or do you even actually mean ByteString?
>>
>
> I think String.fromCharCodes(arrayOfNumbers:Array) would be a good
> complement to String.fromCharCode(charCode:Number). Thus:
> String.fromCharCodes(string.charCodes()) and
> String.fromCharCodes(array).charCodes() would complete a loop.
>

I'm not sure how this is related to a module about bytes and binaries.

> I also noticed when integrating Jack with ByteString that it's
> sometimes handy to call object.toByteString(charset) without regard
> for whether the object is a String or ByteString. Just as toString()
> is idempotent, you would want toByteString(charset) to be idempotent.
> To that end, I think we should consider adding
> ByteString.prototype.toByteString(charset) as a third form of that
> method that, like the no-arg form, returns itself, ignoring the
> charset argument. Thus,
> string.toByteString('utf-8').toByteString('utf-8') ==
> string.toByteString('utf-8').
>

*blinks*

I'm not sure I follow, but it seems like it wouldn't create many problems.

> I am also in favor of a ByteString.prototype.toString(charset) form,
> as an alias of ByteString.prorotype.decodeToString(charset).
>

And I'm against that! I'm really really really really against
overloading method s in semantically unrelated ways. ToString is
specified to return a debug representation, then toString(charset)
should not do something _completely different_!

> Number.prorotype.toString([radix]) establishes a precedent in
> JavaScript for that form. I also like that the specified
> ByteString.prorotype.toString() form returns a "[ByteString {length}]"
> representation.
>

But Number.prototype.toString() without radix does a semantically
_closely_ related thing!

> That brings up the issue, should ByteString.prorotype.decodeToString()
> without a charset argument use the system default charset? If no-one
> objects, I would recommend that any proponent update the wiki.
>

If there should be a default, it should be specified by the spec.

> https://wiki.mozilla.org/ServerJS/Binary/B
>
> Another concern is that ByteString may not be comparable with == as
> String is. If this is not implementable in all platforms, we should
> add an equality comparison method. I recommend .eq(), .ne(), .lt(),
> .le(), .gt(), and .ge(), for those purposes, in general, though I see
> that the Java .equals() has some support in other APIs.
>

For symmetry reasons, I'd recommend Binary.eq/ne/lt/le/gt/ge and
Binary.compare(x1, x2), returning -1/0/1, all on the constructor of
Binary. x.eq(y) looks very weird IMHO.


We should drop the + operator from the spec, as it just can't be
implemented everywhere (I don't know if it actually can be implemented
anywhere, but it's certainly not possible with Spidermonkey).

And what does valueOf do?

Wes Garland

unread,
May 18, 2009, 9:40:37 AM5/18/09
to serv...@googlegroups.com
On Mon, May 18, 2009 at 8:53 AM, Aristid Breitkreuz <aristid.b...@gmx.de> wrote:

Kris Kowal schrieb:

> Sure.  ByteArray is effectively a specialized array of numbers.  I
> wonder whether Array.prototype.join could be augmented to return a
> ByteString if it is composed of ByteStrings and accepts a ByteString
> as its delimiter argument, or return a ByteArray if the delimiter and
> components are ByteArrays.

Can you clarify, Kris?  Are you suggesting that it parallel this?   [1,2,3].join("hello") becomes "1hello2hello3"?  (except for ByteString and ByteArray).   That seems to be the sensible behaviour to me.
 

> I think String.fromCharCodes(arrayOfNumbers:Array) would be a good
> complement to String.fromCharCode(charCode:Number).  Thus:
> String.fromCharCodes(string.charCodes()) and
> String.fromCharCodes(array).charCodes() would complete a loop.
>

I'm not sure how this is related to a module about bytes and binaries.

I have to agree with MisterN here. Kris, I thought we had agreed that the ServerJS standard library would not monkey-patch the standard classes?  If that is what we want, we should come up with a better way for the programmer to do this. Can we export a method that we apply to Strings?
 
> I also noticed when integrating Jack with ByteString that it's
> sometimes handy to call object.toByteString(charset) without regard
> for whether the object is a String or ByteString.  

This is the same pattern I have wanted to solve by making ByteString:toString() automatically transcode when the encoding of the ByteString is known.  Unfortunately, the encoding of the ByteString is not always known; for example, those ByteStrings which are returned from [], charAt, slice, etc.  BUT the encoding IS known when we're iterating over a text file, it is also known in several flavours of the ByteString constructor. 
 
Just as toString()
> is idempotent, you would want toByteString(charset) to be idempotent.

How can toByteString(charset) fail to be idempotent?

> I am also in favor of a ByteString.prototype.toString(charset) form,
> as an alias of ByteString.prorotype.decodeToString(charset).
>

I'm really strongly opposed to any toString() method that takes an argument. That is just a recipe for errors IMHO. You're right about the radix precedent, but I think that one's wrong too. ;)


> Number.prorotype.toString([radix]) establishes a precedent in
> JavaScript for that form.  I also like that the specified
> ByteString.prorotype.toString() form returns a "[ByteString {length}]"
> representation.
>

Enforcing "[ByteString {length}]" is something else I have a problem with. Spelling it that way actually makes it different than any other object in GPSEE.  The default toString representation for an object without a meaningful toString() method in SpiderMonkey is "[object Classname]"; in GPSEE Classname is augmented to show what module the class originated in. This is good debugging info.

I'm not sure what is gained by the current toString(), and under GPSEE it actually costs the developer.  I have currently implemented it so that it returns "[object gpsee.module.binary.ByteString 10]". I think the spec should be loosened up to allow varying representations of debugging information.

But Number.prototype.toString() without radix does a semantically
_closely_ related thing!

> That brings up the issue, should ByteString.prorotype.decodeToString()
> without a charset argument use the system default charset?  If no-one
> objects, I would recommend that any proponent update the wiki.
>

If there should be a default, it should be specified by the spec.

Not only that, but "System's default charset" needs to be defined.  Does that mean the current locale, or does it mean the JavaScript interpreter's default way of handling C strings?  I would favour the former, as it's most likely to be the default on Rhino.
 
> https://wiki.mozilla.org/ServerJS/Binary/B
>
> Another concern is that ByteString may not be comparable with == as
> String is.  If this is not implementable in all platforms, we should
> add an equality comparison method.  I recommend .eq(), .ne(), .lt(),
> .le(), .gt(), and .ge(), for those purposes, in general, though I see
> that the Java .equals() has some support in other APIs.
>

These are good suggestions.
 
For symmetry reasons, I'd recommend Binary.eq/ne/lt/le/gt/ge and
Binary.compare(x1, x2), returning -1/0/1, all on the constructor of
Binary. x.eq(y) looks very weird IMHO.

I'm not sure how meaningful comparing a ByteString to a ByteArray is, but the comparison operator is bang-on the money.
 
We should drop the + operator from the spec, as it just can't be
implemented everywhere (I don't know if it actually can be implemented
anywhere, but it's certainly not possible with Spidermonkey).

It's certainly possible with Spidermonkey, you just have to modify the parser and implement operator overloading.

That said, you would no longer be programming in JavaScript.  I didn't see a "+" operator in the spec, but I agree that it does not belong there.

Wes


--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
Reply all
Reply to author
Forward
0 new messages