Re: [serverjs] Binary API Brouhaha (was File API Brouhaha)

33 views
Skip to first unread message

Kris Kowal

unread,
Mar 12, 2009, 5:59:07 PM3/12/09
to serv...@googlegroups.com
On Thu, Mar 12, 2009 at 2:23 PM, Tom Robinson <tlrob...@gmail.com> wrote:
> 1) Immutable content and size, like Strings. This would make things
> like slice very efficient, but creating binary objects would be
> difficult.
> 2) Immutable size, mutable content. Seems like a good compromise.
> 3) Mutable content and size, like Arrays. Hard to implement efficiently.
>
> I'd advocate 2.

I think Binary should mimic String as closely as possible, with a
minimum set of native methods to make it convenient to construct them,
from Strings with String.toBinary(encoding) and maybe even Array(of
integers).toBinary(width, endian) or Binary.fromArray(array, width,
endian). The encoding should be extensible. I recommend using
"require" under the hood to grab a module with an encode/decode
export.

While we're at it, lets pick the paint colo[u]r:

A) ByteArray
B) ByteString
C) Bytes
D) Binary

Kris Kowal: 1 D

Ryan Dahl

unread,
Mar 12, 2009, 6:11:38 PM3/12/09
to serv...@googlegroups.com

Daniel Friesen

unread,
Mar 12, 2009, 6:14:33 PM3/12/09
to serv...@googlegroups.com
I think you're missing one. Good ole Blob.
IMHO Blob is the closest name to the immutable string.

Strings already have enough functionality inside them to handle binary
data once you throw in a Pack interface (Infact, I wrote a FastCGI
responder in pure JS using nothing special but jslibs' Socket and Pack +
Buffers (currently Pack only handles buffers) to manipulate the Strings
which were really binary data).
The only real argument against using native Strings to handle binary
data is that each /character/ is 16bit which is a waste of memory for
the 8bit bytes.
So personally, I see nothing wrong with simply creating a Blob object
that behaves exactly the same as a native string but handles 8bit bytes
instead of 16bit /chars/ and throwing a pack interface (which can handle
String and Blob input) to handle the real extreme binary stuff.

If at any time you really need to mutate a blob of binary data, then
just convert it into an array of Blobs and join them together later.
That is after all what we do with strings.

~Daniel Friesen (Dantman, Nadir-Seen-Fire)

Daniel Friesen

unread,
Mar 12, 2009, 6:18:42 PM3/12/09
to serv...@googlegroups.com
jslibs to: http://code.google.com/p/jslibs/wiki/jslang#jslang::Blob_class

~Daniel Friesen (Dantman, Nadir-Seen-Fire)

Kris Kowal

unread,
Mar 12, 2009, 6:21:27 PM3/12/09
to serv...@googlegroups.com
Correction:

A) ByteArray
B) ByteString
C) Bytes
D) Binary

E) Blob

I've also already ported Chiron's struct module, which includes pack,
unpack, calcSize, ord, and chr, and operates with Strings with the
<128 invariant. With the inclusion of one of the above, it could be
trivial to support byte arrays. Looks like there are two extant
implementations of Blob. That could both be a good reason to call our
implementation Blob, or a good reason not to trample the existing
name.

Please show hands :-)

Kris Kowal

Kris Kowal

unread,
Mar 13, 2009, 2:21:19 AM3/13/09
to serv...@googlegroups.com
On Thu, Mar 12, 2009 at 2:59 PM, Kris Kowal <cowber...@gmail.com> wrote:
> On Thu, Mar 12, 2009 at 2:23 PM, Tom Robinson <tlrob...@gmail.com> wrote:
>> 1) Immutable content and size, like Strings. This would make things
>> like slice very efficient, but creating binary objects would be
>> difficult.
>> 2) Immutable size, mutable content. Seems like a good compromise.
>> 3) Mutable content and size, like Arrays. Hard to implement efficiently.
>>
>> I'd advocate 2.

> While we're at it, lets pick the paint colo[u]r:


>
> A) ByteArray
> B) ByteString
> C) Bytes
> D) Binary
>
> Kris Kowal: 1 D
>

I'm switching my vote to 2 (Binary) D (Fixed, Mutable). Fixed size,
mutable content byte arrays would flush well with the Python
readinto(b) in PEP 3116.

Kris Kowal 2 D

George Moschovitis

unread,
Mar 13, 2009, 3:10:36 PM3/13/09
to serverjs
I like D (Binary) though Blob is acceptable too...

I cannot decide between 1-2, the pros-cons are not clear (especially
for 2)

-g.


On Mar 13, 8:21 am, Kris Kowal <cowbertvon...@gmail.com> wrote:
> On Thu, Mar 12, 2009 at 2:59 PM, Kris Kowal <cowbertvon...@gmail.com> wrote:

Kris Kowal

unread,
Mar 14, 2009, 1:43:10 PM3/14/09
to serv...@googlegroups.com
On Sat, Mar 14, 2009 at 9:23 AM, Jason Orendorff
<jason.o...@gmail.com> wrote:
> What difficulty, exactly, are you referring to? Offhand, I would
> implement a ByteArray object as a JSObject that has a length, a
> capacity, and a pointer to a malloc'd buffer. Given that
> implementation, I don't see what we would gain by banning
> length-changing operations. (I claim losing the capacity field is not
> a significant win.)

Perhaps we've set up a false dilemma. For each level of mutability
there are clear advantages and disadvantages. Here's a thought,
perhaps we should start with a Binary type and then proceed to specify
a Buffer and Blob later on:

Binary -> An imutable, fixed length byte array with the String interface
advantage: can be coalesced for performance and, like a string, ==
would have the same behavior as ===
can be used in place of a String since it has all of the same invariants.
Buffer -> A mutable, fixed length byte array with the String interface
and [index] assignment
advantages: can be used with a File.readInto method. could also
support the byte IO API.
handy for algorithms that fiddle with bits without a reallocation
Blob -> A mutable, resizable byte array the String interface, with
splice and analogs to some Array operations
advantages: very flexible

Kris Kowal

George Moschovitis

unread,
Mar 14, 2009, 2:04:10 PM3/14/09
to serverjs
> there are clear advantages and disadvantages.  Here's a thought,
> perhaps we should start with a Binary type and then proceed to specify
> a Buffer and Blob later on:

interesting idea ;-)

-g.

Jason Orendorff

unread,
Mar 15, 2009, 12:32:13 AM3/15/09
to serv...@googlegroups.com
On Thu, Mar 12, 2009 at 4:59 PM, Kris Kowal <cowber...@gmail.com> wrote:
> On Thu, Mar 12, 2009 at 2:23 PM, Tom Robinson <tlrob...@gmail.com> wrote:
>> 1) Immutable content and size, like Strings. This would make things
>> like slice very efficient, but creating binary objects would be
>> difficult.
>> 2) Immutable size, mutable content. Seems like a good compromise.
>> 3) Mutable content and size, like Arrays. Hard to implement efficiently.
>>
>> I'd advocate 2.
>
> I think Binary should mimic String as closely as possible, [...]

You forgot to say why you think this.

We need a mutable byte-array type: stream I/O requires buffers. You
could, alternatively, keep all the buffers hidden behind the scenes in
C. Given the JIT, I think it is better to expose the primitives
(mutable byte-arrays and readinto) and implement the rest in
JavaScript on top of those.

I have not seen anyone explain why we need immutable byte-strings with
String-like methods. I think there's a good reason *not* to do it
unless necessary: confusing bytes and text is something we should
discourage.

> While we're at it, lets pick the paint colo[u]r:
>
> A) ByteArray
> B) ByteString
> C) Bytes
> D) Binary

ByteString and ByteArray stand out here. ByteString strongly suggests
immutability. ByteArray strongly suggests "an Array of bytes":
growability, mutability, and Array-like methods. Names that obvious
are a wonderful thing.

Bytes is less clear, and plural type names are slightly odd. Binary
is even less clear to me, verging on vacuous--all computer data is
binary. Blob is right out (stay away from my I/O primitives, you
database freaks!). "Buffer" has been suggested too, but ByteArray
still wins big on clarity.

-j

Kris Kowal

unread,
Mar 16, 2009, 6:09:22 PM3/16/09
to serv...@googlegroups.com
On Sat, Mar 14, 2009 at 9:32 PM, Jason Orendorff
<jason.o...@gmail.com> wrote:
>> I think Binary should mimic String as closely as possible, [...]
>
> You forgot to say why you think this.
>

Right. There are a few reasons. First, if it's a choice between
mimicking the Array API or the String API, as a basis, the crux is the
type behavior of indexing. For example, if a binary data type were
like an Array, indexing would return "Byte", but if it were like a
String, it would return an object of the same type with length of 1.
I think that the latter is appropriate, and establishes a basis for
user expectations of the type. From that point, all the good reasons
for the String being the way it is apply to the binary type. For
example, Strings are immutable and have a fixed length. This means
that it is implementable as a single pointer—if it's resizable, you
need a second layer of indirection so you can swap the underlying
allocation. It means that, like other primitives like Number,
equality and identity behave the same way. 1==1, 1===1, "a" == "a"
and "a" === "a". All primitives in JavaScript are immutable,
stateless, and support identity equivalence. All other objects do
not, like {} != {} and {} !== {}. This also sets user expectation.
Mutability would imply that identity equivalence would not be
supported. Also, to avoid strange differences of behavior, the new
binary type could be implemented by factoring common code with the
existing String type. It's arguably the simplest solution. Also, if
the String type is immutable, the implementation can optimize
allocation by coalescing allocations for identical strings, like
Python does with module byte code.

This being said, the advantages of making mutable and flexible binary
types still apply.

Kris Kowal

Reply all
Reply to author
Forward
0 new messages