Alternate binary proposal

3 views
Skip to first unread message

Daniel Friesen

unread,
Jul 30, 2009, 12:46:26 PM7/30/09
to serv...@googlegroups.com
Anyone mind if I swipe the Binary/C page to make an alternate well
explained binary API proposal?

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Kris Kowal

unread,
Jul 30, 2009, 1:02:19 PM7/30/09
to serv...@googlegroups.com
On Thu, Jul 30, 2009 at 9:46 AM, Daniel
Friesen<nadir.s...@gmail.com> wrote:
> Anyone mind if I swipe the Binary/C page to make an alternate well
> explained binary API proposal?

I've been hoping someone would be willing to do that. Go for it!

Kris

Daniel Friesen

unread,
Jul 30, 2009, 3:59:05 PM7/30/09
to serv...@googlegroups.com
Ok, Binary/C proposal submitted.
https://wiki.mozilla.org/ServerJS/Binary/C

I actually felt like having Blob and Buffer implemented in MonkeyScript
lite before writing the proposal, but I guess I'd just be waiting to
long. I ran into some rough spots while working on them. I've got past
them, but there's still more to do in implementation, and I don't even
have the File API to be able to do stuff that makes sense yet anyways.
For what it's worth, these are the current implementations of Blob and
Buffer I am working on:
http://github.com/dantman/monkeyscript.lite/blob/master/src/common/org/monkeyscript/lite/NativeBlob.java
http://github.com/dantman/monkeyscript.lite/blob/buffer/src/common/org/monkeyscript/lite/NativeBuffer.java

Note that Buffer is in it's own separate branch. When I rebase it into
master that branch is going to disappear and that link will be invalid,
when that happens this will be the correct link:
http://github.com/dantman/monkeyscript.lite/blob/master/src/common/org/monkeyscript/lite/NativeBuffer.java

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen

unread,
Jul 30, 2009, 6:32:59 PM7/30/09
to serv...@googlegroups.com
Aristid Breitkreuz made a few notes to Binary/C (T_T Apparently I need
to rewrite the proposal to be more "standalone"), and there are two
points which could use a show of hands.

blob.byteAt(index);
A) Returns a blob of length=1
B) Returns a Number (byte) representing the byte.
Note that either way blob.valueAt(index); will still return a blob of
length=1, B) deviates from the str.valueAt(idx) == str.charAt(idx); thus
blob.byteAt(idx) == blob.valueAt(idx); pattern I had in mind but doesn't
break abstract code since abstract code uses .valueAt; Note that B)
basically means that blob.byteAt(idx) == blob.integerAt(idx);

Buffer in what module:
A) Buffer inside require('binary');
B) Buffer inside require('io');
Ashb noted Buffer might fit better in an io module than a buffer module.


Also note that I thought a bit more about flexibility than all out
strict portability. I have two or three points inside that proposal that
accept implementations may implement something extra or differently than
other implementations, as long as the spec also specifies a method of
doing the same thing portably.
Namely:
* Implementations may make Blob and Buffer globals, as long as
require('binary') always works for portability.
* Implementations may optionally support blob[idx]; ie: Ideally if your
js interpreter suports the non-ecma str[idx] you should support
blob[idx] as well.
Should I pull that kind of stuff out?

Also ashb says buf.clear(off, len); doesn't completely make sense,
should I replace that with something else... buf.fill(off, len, seq); ?

Kris Kowal

unread,
Jul 30, 2009, 10:33:56 PM7/30/09
to serv...@googlegroups.com
The parts of this proposal that I like are:

* the commitment to genericity, the notion that Blob and String can be
interchangeable for some algorithms.
* to that end, I like the idea of making .valueAt the generic of
.charAt and .byteAt.
* generally, cutting back on features. a lot of primitive operations
can be written in terms of just memcopy.
* that Binary/C and Binary/B are very similar in spirit. I don't
think it escapes its stated purpose of divorcing Buffer from the
ideology of ByteArray, or Blob from ByteString, and I think this is a
good thing™.
* that this proposal makes explicit what methods are meant to be
generic between types. We could do more with this trend.

Adjustments I would make:
* integerAt and floatAt beg the inclusion of a lot of unpacking
functionality, like blobAt, pascalBlobAt, nullTerminatedBlobAt,
nullTerminatedStringAt(offset, charset), pascalStringAt and such, that
I feel ought to be deferred to a higher architectural layer, like a
"struct" module for unpacking opaque data from Blobs and Buffers.
* add a memcopy routine to Buffer, like buffer.copy(source, begin,
end, [sourceBegin]).
* favor (begin, end) for ranges, like slice notation, over (begin, length).
* we could probably get rid of reverse without consequence, while
we're cutting back to basics.
* BUT, if we require [] operator and .length, almost every function in
Array.prototype would work in Buffer.prototype. The only exceptions
are .concat and one other that escapes me at the moment.
* make ByteBuffer and StringBuffer separate types, or omit String
buffering, to simplify both usage and implementation.

In response to .clear, I think we should support:
* buffer.clear([begin=0, [end=length]])
* buffer.fill(begin, end, [value=0])
* buffer.copy(source, [begin, [end, [sourceBegin]]])

My hands: blob.byteAt(offset)
A) return a Blob of length 1

Add something like:
blob.codeAt(offset) = blob.byteCodeAt(offset)
string.codeAt(offset) = string.charCodeAt(offset)
To parallel:
blob.valueAt(offset) = blob.byteAt(offset)
string.valueAt(offset) = string.charAt(offset)

My hands: Buffer in what module:
A) Buffer inside require('binary') or
C) Buffer and Blob added to global

Buffer isn't actually a Stream. We have ByteIO and StringIO in
Narwhal's "io" module; they support the same API as file streams,
which are very different than the Buffer API. Buffer *is* actually a
byte array with a subset of the Array interface. I think this is
healthy.

I do not think we should leave unspecified whether Buffer and Blob are
available in the "binary" module or as free variables, or whether []
may or must be supported. Strict portability is a good thing. It's
hard to see that here when creating a narrow specification results in
so much argument, but our users will thank us. I'm indifferent about
whether we do these objects in modules, globals, or both.

I'm in favor of requiring [] to be supported. This will make our
lives difficult in Narwhal on Rhino, but in the long term, I'm hoping
that these types would be supported by Rhino out of the box. Until
then we would probably be partially non-compliant, but that's okay in
the short term.

I'd also like to hear what people think about Blob vs ByteString and
Buffer vs ByteArray. As far as I'm concerned, Binary/B and Binary/C
could use either pair of names without loss of accuracy.

I also think that we should pay attention to making generic routines
that make the Blob and Buffer interfaces interchangeable for some
algorithms too.

Kris

Daniel Friesen

unread,
Jul 30, 2009, 11:53:30 PM7/30/09
to serv...@googlegroups.com
Kris Kowal wrote:
> The parts of this proposal that I like are:
>
> * the commitment to genericity, the notion that Blob and String can be
> interchangeable for some algorithms.
> * to that end, I like the idea of making .valueAt the generic of
> .charAt and .byteAt.
> * generally, cutting back on features. a lot of primitive operations
> can be written in terms of just memcopy.
> * that Binary/C and Binary/B are very similar in spirit. I don't
> think it escapes its stated purpose of divorcing Buffer from the
> ideology of ByteArray, or Blob from ByteString, and I think this is a
> good thing™.
> * that this proposal makes explicit what methods are meant to be
> generic between types. We could do more with this trend.
>
> Adjustments I would make:
> * integerAt and floatAt beg the inclusion of a lot of unpacking
> functionality, like blobAt, pascalBlobAt, nullTerminatedBlobAt,
> nullTerminatedStringAt(offset, charset), pascalStringAt and such, that
> I feel ought to be deferred to a higher architectural layer, like a
> "struct" module for unpacking opaque data from Blobs and Buffers.
>
I suppose that could be alright provided one condition. If we do define
it as another module it should operate directly on the Blob itself.
jslibs had a Pack class inside of it that had readInt, readReal,
readString, writeInt, writeReal, writeString in it.
Pack only worked on a Buffer, not generically. Thus to read binary data
from a stream you had to:
var p = new Pack(new Buffer(stream));

This was fine up until I hit a big gotcha. p.write* would write to the
Buffer, not to the Stream. To write to a stream you would have to create
a new Buffer/Pack pair, write to it, read that into a String (even
though jslibs had a Blob a lot of it's binary methods extracted from a
generic String instead) then write that to the stream.

That's made me a little paranoid about middlemen when it comes to
dealing with the source of binary data, and the interface for reading
and writing binary data.
Some time after that is around when I thought "Why Pack, why not just
put the binary reading methods on the instance we already have?"


> * add a memcopy routine to Buffer, like buffer.copy(source, begin,
> end, [sourceBegin]).
>

Hmmm... random thought...
buffer.append(buffer2);
Implied memcopy when you pass something like a Buffer to one of the
buffer methods, instead of valueOf then call?


> * favor (begin, end) for ranges, like slice notation, over (begin, length).
>

Ah great... I forgot slice used begin,end. I was thinking every method
except str.substring used begin,length ranges.


> * we could probably get rid of reverse without consequence, while
> we're cutting back to basics.
>

Sure. I added it because A) java.io.StringBuffer had it B) Probably the
fastest for it to be done by the implementation rather than in usercode.
Other than that, if we pull it out it'll probably be just another one of
my stdlib extensions like how Wrench.js adds .reverse() to String.


> * BUT, if we require [] operator and .length, almost every function in
> Array.prototype would work in Buffer.prototype. The only exceptions
> are .concat and one other that escapes me at the moment.
>

.join?


> * make ByteBuffer and StringBuffer separate types, or omit String
> buffering, to simplify both usage and implementation.
>

From what I've been through in Java, it looks like writing something
that works abstractly on char[]/byte[] could probably be done using a
interface or an abstract class and using that from within NativeBuffer
to abstract the data manipulation actions.

I did leave splitting Buffer into two types or not up for discussion. I
suppose I'm happy as long as there is simple idiom that'll let someone
create one of either type abstractly.
I had two idioms for that:

(Note: My Stream class has a .text property on it, which is a boolean
indicating if the stream is text or binary, thus indicating whether
.read() will return String or Blob)
function(stream) {
var buf = new Buffer;
buf.text = stream.text;
...
return buf.valueOf();
}

And another:
function (absDataA, absDataB, absDataC) {
var buf = new Buffer(absDataA.constructor);
buf.append(absDataB);
buf.append(absDataA);
buf.append(absDataC);
return buf.valueOf();
}

I actually have an example of something written in MonkeyScript meant to
use that (some of MonkeyScript Lite is theoretical implementation where
the stuff it depends on has not been written yet)

http://github.com/dantman/monkeyscript.lite/blob/master/src/bananas/io/Stream.js
Scroll down to Stream.prototype.yank

There were two main reasons I left the ambiguity with Blob being global
or in a module. I suppose one doesn't matter (I had thoughts about
mini-embedded languages which might want a way to handle binary data,
but had nothing to do with the rest of ServerJS like require()).

I consider Blob to be a counterpart to String, just like string another
native type (it's basically just a String for binary data). Other than
an extreme desire to avoid initializing absolutely anything extra, I see
no reason for Blob not to be a global sitting alongside String.

In fact, I believe there was a topic about ECMA looking at what we come
up with for binary and perhaps standardizing it sometime in the future?
Do I recall that right?
If that ever did happen Blob would become a native global. In fact in
that case if we had been sticking with require('binary').Blob; then my
little ambiguity would come into existence even if we decided to reject
it and stick Blob inside of a binary module only because now the
javascript interpreters would be the ones implementing Blob, and to be
compatible with old code we'd have to write a binary module that would
do what I listed in that spec `|exports.Blob = Blob;`|.

Leaving that ambiguity and making portable code use require('binary')
was my compromise to those who still thought binary should be secluded
inside of a module.
"Whether they are made global or not [...] |require('binary');| must
return an object containing Blob and Buffer as keys..."
ie: If Blob was in the binary module, then require('binary') would
require that module. If Blob was a global, then require('binary') would
return an object with Blob inside of it. Thus code using
require('binary') would be completely portable.


My intention is to make Blob a global inside of MonkeyScript, I see no
reason to do anything other than that. If it's not an option in the spec
the global.Blob will be a non-standard feature to ServerJS just like
str[i] is a non-standard feature to ECMA.

A little side note while I'm on the topic.
For those that don't like the idea of loading Blob all the time.
Wouldn't it work to just define a global getter for Blob which when used
would initialize Blob, delete the getter, set the Blob global, and
return it?

> I'm in favor of requiring [] to be supported. This will make our
> lives difficult in Narwhal on Rhino, but in the long term, I'm hoping
> that these types would be supported by Rhino out of the box. Until
> then we would probably be partially non-compliant, but that's okay in
> the short term.
>

MonkeyScript Lite is MIT. You could swipe NativeBlob and NativeBuffer
out of it if you want.
git submodules, a common repo, and a jar can work nicely for
collaboration to.


> I'd also like to hear what people think about Blob vs ByteString and
> Buffer vs ByteArray. As far as I'm concerned, Binary/B and Binary/C
> could use either pair of names without loss of accuracy.
>
> I also think that we should pay attention to making generic routines
> that make the Blob and Buffer interfaces interchangeable for some
> algorithms too.
>
> Kris
>

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen

unread,
Aug 3, 2009, 6:22:11 PM8/3/09
to serv...@googlegroups.com
Thanks for the cleanup of the proposal Kris Kowal.

I've started a show of hands page:
https://wiki.mozilla.org/ServerJS/Binary/C/Show_of_hands

I've fixed up the proposal a bit:
- Added the .codeAt, ... proposal
- Fixed slice
- Moved the unpacking related stuff into a subsection that notes it's
not part of the proposal (reference material for when we write a spec
for unpacking)
- Added buf.fill... I'll add detail later.

I'll fix Buffer in a bit. I'll probably set it up as Buffer,
StringBuffer, and BlobBuffer. StringBuffer and BlobBuffer will be the
individual implementations. Instances of both should work with `buf
instanceof Buffer` and provide a .text boolean. new Buffer will support
String|Blob a string or blob, or an object with a .text property. (Thus
if streams support .text then var buf = new Buffer(stream); will return
a buffer with the appropriate type).

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen

unread,
Aug 4, 2009, 1:39:04 PM8/4/09
to serv...@googlegroups.com
- Two more sections on the show of hands page.
- I've cleaned up Buffer to use separate classes now.

I'm thinking of dropping the .text boolean idiom. I have a better one.
Assuming .type for now the discussion I want is on what the best name
for the property is.
"foo".type === String
Blob([0,0,0]).type === Blob
stringbuffer.type === String
blobbuffer.type === Blob
textstream.type === String
binarystream.type === Blob

You can see where I'm going with this?

The idea for this idiom is to have a property on instances of things
like string, blob, streams, buffers, etc... which all fall into the
"Text, or Binary?" category.
They will return either the basic "String" function, or the "Blob" function.
This is an extension to that `new Buffer("foo".constructor);` idiom.
This way `new Buffer(stream.type);` will create an instance of either
StreamBuffer or BlobBuffer depending on whether the stream will return
string or blobs inside of .read;

My only question, is what should the property be named? type,
primitive... primitiveType, .typeOf, ...?

Daniel Friesen

unread,
Aug 4, 2009, 1:41:09 PM8/4/09
to serv...@googlegroups.com
Ack... sorry, I've been mixing two names up. All my previous references
to Aristid Breitkreuz were actually to Ash Berlin. I mixed them up
because Ash goes by "ashb" in IRC and Aristid had the closest name I
could find in recent messages.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen wrote:
> Aristid Breitkreuz made a few notes (T_T Apparently I need to rewrite
> the proposal to be more "standalone"), and there are two points which
> could use a show of hands.
>
> blob.byteAt(index);
> A) Returns a blob of length=1
> B) Returns a Number (byte) representing the byte.
> Note that either way blob.valueAt(index); will still return a blob of
> length=1, B) deviates from the str.valueAt(idx) == str.charAt(idx);
> thus blob.byteAt(idx) == blob.valueAt(idx); pattern I had in mind but
> doesn't break abstract code since abstract code uses .valueAt; Note
> that B) basically means that blob.byteAt(idx) == blob.integerAt(idx);
>
> Buffer in what module:
> A) Buffer inside require('binary');
> B) Buffer inside require('io');
> Ashb noted Buffer might fit better in an io module than a buffer module.
>
>
> Also note that I thought a bit more about flexibility than all out
> strict portability. I have two or three points inside that proposal
> that accept implementations may implement something extra or
> differently than other implementations, as long as the spec also
> specifies a method of doing the same thing portably.
> Namely:
> * Implementations may make Blob and Buffer globals, as long as
> require('binary') always works for portability.
> * Implementations may optionally support blob[idx]; ie: Ideally if
> your js interpreter suports the non-ecma str[idx] you should support
> blob[idx] as well.
> Should I pull that kind of stuff out?
>
> Also ashb says buf.clear(off, len); doesn't completely make sense,
> should I replace that with something else... buf.fill(off, len, seq); ?
>
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>

Kris Kowal

unread,
Aug 4, 2009, 2:17:07 PM8/4/09
to serv...@googlegroups.com
On Tue, Aug 4, 2009 at 10:39 AM, Daniel
Friesen<nadir.s...@gmail.com> wrote:
>
> - Two more sections on the show of hands page.
> - I've cleaned up Buffer to use separate classes now.
>
> I'm thinking of dropping the .text boolean idiom. I have a better one.
> Assuming .type for now the discussion I want is on what the best name
> for the property is.
> "foo".type === String
> Blob([0,0,0]).type === Blob
> stringbuffer.type === String
> blobbuffer.type === Blob
> textstream.type === String
> binarystream.type === Blob
>
> You can see where I'm going with this?

Now I do. This is similar to the C++ idiom with collection::iterator
generics, which are necessary since declaring an iterator requires
knowledge of the contained type…in C++. I don't need to see an
example to believe they would be useful in JavaScript for other
generic algorithms that operate on different types of buffers and
streams. To that end, I would not resist introducing a member like
this to both buffers and streams.

I think "type" is misleading: implies that it's the collection's own
type, not the type of its content. Generally, I like to use TitleCase
for constructors, whether they're members or globals, but
".constructor" is not consistent with my idiom, so take it or leave
it. I would pick one of "Content" or "Value", falling back to
"contentConstructor" or "contentPrototype" &c if none of those are
palatable, or "content", "value" if none of those were even palatable.
I considered "Unit" and "Element", but those would convey the idea of
"Character" and "Byte" instead of "String" and "Blob" respectively,
which would not be applicable.

Summarily, I think we should support these idioms:

var content = anyBufferOrStream.Content(); [1]
anyBufferOrStream.Content().anyGenericMethod(…);
anyBufferOrStream.Content.anyGenericClassProperty…;
anyBufferOrStream.Content.prototype.anyGenericMethod.apply(anyBufferOrStream,
…);

By requiring Streams and Buffers to include a "Content" attribute that
is exactly either "String" or "Blob".

Kris Kowal

[1] a nuance here: we can't support a generic "new
anyBufferOrStream.Content()" because in some cases "Content" will be
"String", in which case "new String()" would have boxing semantics. I
don't believe we've discussed whether "new" will be necessary for
"Blob" construction, but in this case, "new" must not be necessary to
work generically for the "String" case. To this end, we would either
have to make the "new" optional in "new Blob()" or we would need to
make anyBufferOrStream.Content() internally construct a "Blob()", and
set its prototype attribute to that of Blob.prototype. I think that
the former solution is more elegant.

Wes Garland

unread,
Aug 4, 2009, 4:14:10 PM8/4/09
to serv...@googlegroups.com
> textstream.type === String
> binarystream.type === Blob

> You can see where I'm going with this?

If you're going somewhere you have instanceof checks, you need to be aware that there can be different classes with the same name which behave exactly the same, such as when String is used inside a secure sandbox.

Wes
--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Daniel Friesen

unread,
Aug 4, 2009, 6:00:15 PM8/4/09
to serv...@googlegroups.com
Just a little pesudocode. I was referring to what the property would
contain, there isn't to much need to do that kind of type checking, the
property is intended for use by code that works abstractly with either
strings or bytes.
Like my .yank(len) function which works like .read(len) but buffers
until len or EOF is hit.
http://github.com/dantman/monkeyscript.lite/blob/master/src/bananas/io/Stream.js

Cept this part of the code:

var buf = new Buffer();
buf.text = this.text;

Would actually be under this new idiom:

var buf = new Buffer(this.type);

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen

unread,
Aug 4, 2009, 6:22:13 PM8/4/09
to serv...@googlegroups.com
I wasn't thinking of the type so much as a constructor, rather just an
indicator. With that in mind I'm leaning more to one of those fallbacks.

> Summarily, I think we should support these idioms:
>
> var content = anyBufferOrStream.Content(); [1]
> anyBufferOrStream.Content().anyGenericMethod(…);
> anyBufferOrStream.Content.anyGenericClassProperty…;
> anyBufferOrStream.Content.prototype.anyGenericMethod.apply(anyBufferOrStream,
> …);
>
> By requiring Streams and Buffers to include a "Content" attribute that
> is exactly either "String" or "Blob".
>
> Kris Kowal
>
> [1] a nuance here: we can't support a generic "new
> anyBufferOrStream.Content()" because in some cases "Content" will be
> "String", in which case "new String()" would have boxing semantics. I
> don't believe we've discussed whether "new" will be necessary for
> "Blob" construction, but in this case, "new" must not be necessary to
> work generically for the "String" case. To this end, we would either
> have to make the "new" optional in "new Blob()" or we would need to
> make anyBufferOrStream.Content() internally construct a "Blob()", and
> set its prototype attribute to that of Blob.prototype. I think that
> the former solution is more elegant.
>
Blob's constructor is defined as "[new] Blob(...);" ie: new is already
noted as optional.
The fact that `new Blob` behaves the same as `Blob()` is just a side
effect that Blob isn't a primitive data type, from the start I intended
it to be used as `String(...) Blob(...)`.

Heh, I didn't consider the idioms of using generics from the
constructor, I was just using it since it was a real nice representation
of String and Blob types.

Reply all
Reply to author
Forward
0 new messages