I sure hope you guys like this one. This is what I perceive to be close to truly minimal, without being crippled. It's based on Binary/E (which is based on Binary/D (which is based on Binary/B)), but makes some accommodations for what I saw in NodeJS's net2 branch (buffers that share allocation), renames ByteArray to Buffer, and makes the new Buffer type have an immutable length. Unlike Binary/Lite, it retains the charset encoding and decoding behaviors; I think these are important, and I think we should retain the overloaded constructors).
> I sure hope you guys like this one. This is what I perceive to be > close to truly minimal, without being crippled. It's based on > Binary/E (which is based on Binary/D (which is based on Binary/B)), > but makes some accommodations for what I saw in NodeJS's net2 branch > (buffers that share allocation), renames ByteArray to Buffer, and > makes the new Buffer type have an immutable length. Unlike > Binary/Lite, it retains the charset encoding and decoding behaviors; I > think these are important, and I think we should retain the overloaded > constructors).
toSource returns something not accepted by the constructor - specifically an array of numbers
One other thing is what about negative values for range, slice etc. Some of the Array methods accept negative values to count backwards from the end. For example:
> [4,5,6,7].slice(-3)
[5, 6, 7]
We just need to decide and explicitly state if these are supported or not. From what i recall most Array methods support them - at least on spidermonkey.
[[Put]] should probably throw a RangeError instead of ValueError?
From what i remember many months ago, this is fairly similar to the first Blob class we had in flusspferd. Using this proposal if you do need to grow or concatenate two blobs together i guess you create a new blob and use copy/copyFrom to do it. I might be happy with this -- will ponder.
<ash_flusspf...@firemirror.com> wrote: > Couple of small niggles: > toSource returns something not accepted by the constructor - specifically an array of numbers
This poses a fascinating philosophical question: whether to put the array constructor form back or to change the source representation to:
require("binary").Buffer(3).copyFrom([1, 2, 3])
I think I'll put the Array constructor back.
> One other thing is what about negative values for range, slice etc. Some of the Array methods accept negative values to count backwards from the end. For example:
>> [4,5,6,7].slice(-3) > [5, 6, 7]
I'll add verbiage for this.
> [[Put]] should probably throw a RangeError instead of ValueError?
Sure.
> From what i remember many months ago, this is fairly similar to the first Blob class we had in flusspferd. Using this proposal if you do need to grow or concatenate two blobs together i guess you create a new blob and use copy/copyFrom to do it. I might be happy with this -- will ponder.
One nice thing about this proposal is that we can build much nicer UI on top of it mostly in pure JavaScript, while having this relatively easily implemented subset at the embedding layer.
On Sun, Feb 21, 2010 at 12:05 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > On Sun, Feb 21, 2010 at 6:46 AM, Ash Berlin >> Couple of small niggles: >> toSource returns something not accepted by the constructor - specifically an array of numbers > I think I'll put the Array constructor back.
I have put the Array constructor back.
>> One other thing is what about negative values for range, slice etc. Some of the Array methods accept negative values to count backwards from the end. For example: >>> [4,5,6,7].slice(-3) >> [5, 6, 7]
> I'll add verbiage for this.
Also, this is now done.
>> [[Put]] should probably throw a RangeError instead of ValueError? > Sure.
Also done.
I also did a pass on copy editing, formatting, and made a bunch of things more explicit.
good job on this proposal! It is my current personal favorite.
Three remarks:
1) what is the deal with "views" (buffers that share allocation with other buffer)? Is there some truly frequent usage scenario that I am just missing?
2) the copy constructor form is a bit problematic if you want a 1:1 view on another buffer:
var clone = new Buffer(source, 0, source.length, false); // second and third arguments are redundant...
Maybe it would be a good idea to completely *remove* the 4th argument ( => always copy) and use only the "source.range()" instead?
3) the .Content property... what is its purpose? Or, more specifically, can you please show some usage scenario?
> On Sun, Feb 21, 2010 at 12:05 PM, Kris Kowal <cowbertvon...@gmail.com> > wrote: > > On Sun, Feb 21, 2010 at 6:46 AM, Ash Berlin > >> Couple of small niggles: > >> toSource returns something not accepted by the constructor - > specifically an array of numbers > > I think I'll put the Array constructor back.
> I have put the Array constructor back.
> >> One other thing is what about negative values for range, slice etc. Some > of the Array methods accept negative values to count backwards from the end. > For example: > >>> [4,5,6,7].slice(-3) > >> [5, 6, 7]
> > I'll add verbiage for this.
> Also, this is now done.
> >> [[Put]] should probably throw a RangeError instead of ValueError? > > Sure.
> Also done.
> I also did a pass on copy editing, formatting, and made a bunch of > things more explicit.
> -- > You received this message because you are subscribed to the Google Groups > "CommonJS" group. > To post to this group, send email to commonjs@googlegroups.com. > To unsubscribe from this group, send email to > commonjs+unsubscribe@googlegroups.com<commonjs%2Bunsubscribe@googlegroups.c om> > . > For more options, visit this group at > http://groups.google.com/group/commonjs?hl=en.
On Mon, Feb 22, 2010 at 1:04 AM, Ondřej Žára <ondrej.z...@gmail.com> wrote: > 1) what is the deal with "views" (buffers that share allocation with other > buffer)? Is there some truly frequent usage scenario that I am just missing?
This is an interesting one. One usage pattern I've definitely already seen with Buffer is:
var buffer = Buffer(1024); var actual = write(buffer, 0, 1024); return buffer.range(0, actual);
This returns a buffer trimmed to the size of its actual content without having to do an expensive reallocation and copy.
I'm not sure this is even the best example of using "range". I think the general idea is that "range" avoids reallocation. This is probably more important with this proposal than others since it does not abstract copy-on-write semantics. This is very much a low-level binary API that puts a lot of responsibility in pure JavaScript.
> 2) the copy constructor form is a bit problematic if you want a 1:1 view on > another buffer:
> var clone = new Buffer(source, 0, source.length, false); // second and third > arguments are redundant...
> Maybe it would be a good idea to completely *remove* the 4th argument ( => > always copy) and use only the "source.range()" instead?
I don't think this is a problem. It's pretty clear that you can throw "undefined" in for "start" and "stop" in which case they're inferred. I think most people will use "range" and "slice", but the inspiring implementation (Ryan Dahl's Buffer) uses the Buffer constructor as the basis of these operations. We could leave this as an implementation detail, but I want to encourage consistency in practice.
> 3) the .Content property... what is its purpose? Or, more specifically, can > you please show some usage scenario?
I'll leave this one to Daniel Friesen to defend. In his proposal it was called .contentConstructor and it had something to do with duck-typing collections. Its usage with a single type is not obvious, but with a variety of types that have different content types, it could be useful. I'm in favor of the idea in any case, just in principle. This feature opens the possibility of content-type agnostic algorithms that would not be possible to discern from an empty collection. That is, (typeof someCollection[0]) won't work with an empty buffer. Since Buffer would be the first typed collection in JavaScript, it's difficult to cite precedent. However, with WebGL, there promise to be many different types of collection. Some metadata can't hurt.
I have one more question/feature request, that is closely related to Binary stuff. I recently wrote a JS EXIF parser (feel free to comment, < http://github.com/seznam/JAK/blob/master/util/exif.js>) for a non-commonjs environment. This is a good usage scenario for a Binary/Buffer data type; I realized that there was a very frequent need to read Short, Long etc. value from the buffer.
What is your opinion on adding a very generic reader method:
On Mon, Feb 22, 2010 at 10:47 PM, Ondřej Žára <ondrej.z...@gmail.com> wrote: > Okay, thanks for explanation.
> I have one more question/feature request, that is closely related to Binary > stuff. I recently wrote a JS EXIF parser (feel free to comment, > <http://github.com/seznam/JAK/blob/master/util/exif.js>) for a non-commonjs > environment. This is a good usage scenario for a Binary/Buffer data type; I > realized that there was a very frequent need to read Short, Long etc. value > from the buffer.
> What is your opinion on adding a very generic reader method:
I think this is a "cascading virtualization of unpack" feature. I think we should postpone this; we can implement libraries for these things and come back and talk about how to augment the Buffer type. There are so many ways we could do this, I would rather see us hugging and buying each other beer than protract this spec.
But, having said that, I've got some ideas. The problem we'll have with this is that there are a lot of kinds of things you might want to grab out of a buffer, lots of places to get them, lots of endianness, lots of native/network byte order, lots of native alignment variation and non-aligned, lots of widths, signed/unsigned, lots of formats, lots of efficient array indexing and opaque struct dereferencing, lots of everything. It would be super clumsy to have a whole bunch of methods for this. One option is to have an unpacking DSL like "unpack" from Perl, PHP, Ruby, Python and so on.
unpack(buffer, "Hb*") buffer.unpack("Hb*")
We could have a "record" DSL for unpacking built on top of that.
In Python pack notation, "@" means native endianness, ">" and "<" are big and little, and "!" is network just in case you forget that it's the same as ">". I could also buy constants or "BE", and "LE" to match the IANA charset suffixes.
Note that "valueOf" is the idiomatic variant of "toNumber"; the Number() constructor defers to "valueOf" internally. This is symmetric to:
100..toString(2)
BUT…I think we should talk about this again for Binary/1.1 if we can agree on something smaller first.
Kris Kowal wrote: > On Mon, Feb 22, 2010 at 1:04 AM, Ondřej Žára <ondrej.z...@gmail.com> wrote:
>> 1) what is the deal with "views" (buffers that share allocation with other >> buffer)? Is there some truly frequent usage scenario that I am just missing?
> This is an interesting one. One usage pattern I've definitely already > seen with Buffer is:
> var buffer = Buffer(1024); > var actual = write(buffer, 0, 1024); > return buffer.range(0, actual);
> This returns a buffer trimmed to the size of its actual content > without having to do an expensive reallocation and copy.
> I'm not sure this is even the best example of using "range". I think > the general idea is that "range" avoids reallocation. This is > probably more important with this proposal than others since it does > not abstract copy-on-write semantics. This is very much a low-level > binary API that puts a lot of responsibility in pure JavaScript.
For the record, when I came up with .range in Binary/C and IO/B/Buffer (I'll just use Binary/C to refer to them together here) the solution trying to be solved was actually different than the solution .range is solving in Binary/F (though it does work in the same use case, even though that use case is almost gone).
In Binary/F it basically creates a buffer that operates on a specific part of another already allocated buffer.
In Binary/C the issue being solved wasn't buffer allocation (the return from .range wasn't even a Buffer, it was a OpaqueRange which had implementation specific semantics to let implementations choose what technique worked best for them). The issue was the api of memcopy. Both .splice and a .memcopy/.copy function have one issue. The argument list, it's an unsightly list of mostly numbers that make understanding code as you read it tough. (I for one don't bother memorizing argument lists consisting of a bunch of numbers and get tripped up when scanning api using them, and I expect other target programmers are the same; This is JavaScript after all) .splice is alright with only two confusing numbers, but memcopy; .copy(data, offset, length, [dataOffset]), in other words; bufB.copy(bufA, 5, 10, 15); The .splice api was already solved. You didn't have to ever touch .splice to modify a *Buffer in an intuitive way; .append, .insert, .replace, .remove, .fill, and .clear basically let you modify a buffer in any way you need without touching the .splice api without needing to do things like .splice(7, 0, [0,255]); /* .insert([0,255], 7); */, b.splice(b.length, 0, [0,255]); /* .append([0,255]); */, etc... The problem was that with said api to copy a section of a buffer to another buffer gave you two choices. A) Use the nice and readable api, at the cost of allocating a new Blob each time and discarding that allocated memory right after; B) Using the memcopy api and it's long list of args which aren't easy to read; So I ended up thinking of .range, it returns an OpaqueRange which refers to a part of another buffer temporarily, intended to be discarded right away (it doesn't really share it in any way, it just knows where the data is, and what portion of the data it points to). So now you get the benefits of both A and B without the problems in either; bufB.insert(bufA.range(5, 10), 7); /* Take the range of data from index 5-10 (or that might be 5-15, we didn't get enough responses to the show of hands to pick the api for .range) of buffer A, and insert it into buffer B at index 7 using memcopy instead of allocating any extra blobs. bufB.replace(bufA.range(5, 10), 7); would roughly be; bufB.copy(bufA, 7, 5, 5); bufB.append(bufA.range(5, 10)); would roughly be; var l = bufB.length; bufB.length += 5; bufB.copy(bufA, l, 5, 5); bufB.insert(bufA.range(5, 10), 7); would roughly be; var l = bufB.length; bufB.length += 5; bufB.copy(bufB, 7, l-7, 7+5); bufB.copy(bufA, 7, 5, 5);
>> 2) the copy constructor form is a bit problematic if you want a 1:1 view on >> another buffer:
>> var clone = new Buffer(source, 0, source.length, false); // second and third >> arguments are redundant...
>> Maybe it would be a good idea to completely *remove* the 4th argument ( => >> always copy) and use only the "source.range()" instead?
> I don't think this is a problem. It's pretty clear that you can throw > "undefined" in for "start" and "stop" in which case they're inferred. > I think most people will use "range" and "slice", but the inspiring > implementation (Ryan Dahl's Buffer) uses the Buffer constructor as the > basis of these operations. We could leave this as an implementation > detail, but I want to encourage consistency in practice.
>> 3) the .Content property... what is its purpose? Or, more specifically, can >> you please show some usage scenario?
> I'll leave this one to Daniel Friesen to defend. In his proposal it > was called .contentConstructor and it had something to do with > duck-typing collections. Its usage with a single type is not obvious, > but with a variety of types that have different content types, it > could be useful. I'm in favor of the idea in any case, just in > principle. This feature opens the possibility of content-type > agnostic algorithms that would not be possible to discern from an > empty collection. That is, (typeof someCollection[0]) won't work with > an empty buffer. Since Buffer would be the first typed collection in > JavaScript, it's difficult to cite precedent. However, with WebGL, > there promise to be many different types of collection. Some metadata > can't hurt.
I won't be defending it in this instance. I've noted it before, .contentConstructor's use cases almost completely disappear (at least every single use case I can come up with; as well as any faint idea on how it could be useful) when you remove it from Binary/C's abstract text/binary symetric API. And the use of .contentConstructor === Number makes it even less useful.
And trying to mix .contentConstructor into all the different binary API will likely not help. Extra binary API with .contentConstructor require much more thought on how they would interact and what unexpected things might happen. I put the overactive part of my brain to work when it came to figuring out how .contentConstructor and other abstract parts of the api would react to certain situations and how they would likely be expected to react. I haven't put that into play trying to figure out the theory of how various binary api would interact with .contentConstructor;
> Kris Kowal
Considering the various binary api efforts that are going on, the existing binary objects on various platforms, how we bikeshead and come up with varying permutations of a binary API which only varying portions of the group take to, and the varying use cases we have; I'm taking more to the idea of accepting that we will likely end up with more than one binary api to deal with and instead promoting the use of patterns that will be resistant to various binary systems being used through an app and it's libraries (ie: Being sure to use constructors to cast any data passed to your library from outside the library to your binary system), and instead collecting our use cases or goals into separate targets and standardizing multiple (not that many) binary API that fit our separate target cases. I don't mind a universal lite api that'll work interoperability independent of the capabilities of the platform... But I also don't mind the extra work of implementing a near-primitive (not as hard in Rhino as in other engines) from a forward-thinking spec written with the hope that it would become a future part of ES and implemented natively in future versions of JavaScript engines.
On Sun, Feb 21, 2010 at 4:56 AM, Kris Kowal <kris.ko...@cixar.com> wrote: > I sure hope you guys like this one. This is what I perceive to be > close to truly minimal, without being crippled. It's based on > Binary/E (which is based on Binary/D (which is based on Binary/B)), > but makes some accommodations for what I saw in NodeJS's net2 branch > (buffers that share allocation), renames ByteArray to Buffer, and > makes the new Buffer type have an immutable length. Unlike > Binary/Lite, it retains the charset encoding and decoding behaviors; I > think these are important, and I think we should retain the overloaded > constructors).
What about writing a string into the buffer? I definitely need the ability to take a string and write it, with a chosen encoding into the buffer, at a given location - with a maximum length that it could occupy. It should return the number of bytes written. It would be nice if it didn't split characters...
> What about writing a string into the buffer? I definitely need the > ability to take a string and write it, with a chosen encoding into the > buffer, at a given location - with a maximum length that it could > occupy. It should return the number of bytes written. It would be nice > if it didn't split characters...
You could copy a new/temp Buffer into a view, although this would not allow the efficiency that, say, letting iconv operate directly on the output buffer would offer.
-- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
> On Sun, Feb 21, 2010 at 4:56 AM, Kris Kowal <kris.ko...@cixar.com> wrote: >> I sure hope you guys like this one. This is what I perceive to be >> close to truly minimal, without being crippled. It's based on >> Binary/E (which is based on Binary/D (which is based on Binary/B)), >> but makes some accommodations for what I saw in NodeJS's net2 branch >> (buffers that share allocation), renames ByteArray to Buffer, and >> makes the new Buffer type have an immutable length. Unlike >> Binary/Lite, it retains the charset encoding and decoding behaviors; I >> think these are important, and I think we should retain the overloaded >> constructors).
> What about writing a string into the buffer? I definitely need the > ability to take a string and write it, with a chosen encoding into the > buffer, at a given location - with a maximum length that it could > occupy. It should return the number of bytes written. It would be nice > if it didn't split characters...
Just wondering: what's the use case for this when you don't know how much of the string (in characters, not bytes) was written to the buffer?
> -- > You received this message because you are subscribed to the Google Groups "CommonJS" group. > To post to this group, send email to commonjs@googlegroups.com. > To unsubscribe from this group, send email to commonjs+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/commonjs?hl=en.
On Wed, Feb 24, 2010 at 11:13 AM, Hannes Wallnoefer <hann...@gmail.com> wrote: > 2010/2/24 Ryan Dahl <coldredle...@gmail.com>: >> On Sun, Feb 21, 2010 at 4:56 AM, Kris Kowal <kris.ko...@cixar.com> wrote: >>> I sure hope you guys like this one. This is what I perceive to be >>> close to truly minimal, without being crippled. It's based on >>> Binary/E (which is based on Binary/D (which is based on Binary/B)), >>> but makes some accommodations for what I saw in NodeJS's net2 branch >>> (buffers that share allocation), renames ByteArray to Buffer, and >>> makes the new Buffer type have an immutable length. Unlike >>> Binary/Lite, it retains the charset encoding and decoding behaviors; I >>> think these are important, and I think we should retain the overloaded >>> constructors).
>> What about writing a string into the buffer? I definitely need the >> ability to take a string and write it, with a chosen encoding into the >> buffer, at a given location - with a maximum length that it could >> occupy. It should return the number of bytes written. It would be nice >> if it didn't split characters...
> Just wondering: what's the use case for this when you don't know how > much of the string (in characters, not bytes) was written to the > buffer?
On Wed, Feb 24, 2010 at 11:24 AM, Ryan Dahl <coldredle...@gmail.com> wrote: > On Wed, Feb 24, 2010 at 11:13 AM, Hannes Wallnoefer <hann...@gmail.com> wrote: >> 2010/2/24 Ryan Dahl <coldredle...@gmail.com>: >>> What about writing a string into the buffer? I definitely need the >>> ability to take a string and write it, with a chosen encoding into the >>> buffer, at a given location - with a maximum length that it could >>> occupy. It should return the number of bytes written. It would be nice >>> if it didn't split characters...
>> Just wondering: what's the use case for this when you don't know how >> much of the string (in characters, not bytes) was written to the >> buffer?
> Yeah, I guess it should provide that too.
Alright, here's some potential verbiage:
; copyString(source String, String(charset), Number(start_opt), Number(stop_opt), Number(sourceStart_opt), Number(sourceStop_opt)) [sourceStop Number, targetStop Number] # Encodes as much as possible of a String in a given charset into this buffer from "source" to "stop", using the source string from "sourceStart" to "sourceStop", and returns the actual stop index of the source string and this, the target buffer. # "start" is 0 if undefined or omitted. # "stop" is this buffer's length if undefined or omitted. # "sourceStart" is 0 if undefined or omitted. # "sourceStop" is the source string's length if undefined or omitted. # "charset" must be an IANA charset name. ## "copyString" must throw a ValueError if the given "charset" is not supported. ## The charsets "ascii", "utf-8", and "utf-16" must be supported. ## ''Note: the charset is not optional, there is no default.'' # Returns a duple Array with the actual "sourceStop" and "targetStop". ## The actual "sourceStop" is an index one past the last character actually read. ## The actual "targetStop" is an index one past the last byte actually written.
On Wed, Feb 24, 2010 at 11:51 AM, Kris Kowal <cowbertvon...@gmail.com> wrote: > On Wed, Feb 24, 2010 at 11:24 AM, Ryan Dahl <coldredle...@gmail.com> wrote: >> On Wed, Feb 24, 2010 at 11:13 AM, Hannes Wallnoefer <hann...@gmail.com> wrote: >>> 2010/2/24 Ryan Dahl <coldredle...@gmail.com>: >>>> What about writing a string into the buffer? I definitely need the >>>> ability to take a string and write it, with a chosen encoding into the >>>> buffer, at a given location - with a maximum length that it could >>>> occupy. It should return the number of bytes written. It would be nice >>>> if it didn't split characters...
>>> Just wondering: what's the use case for this when you don't know how >>> much of the string (in characters, not bytes) was written to the >>> buffer?
>> Yeah, I guess it should provide that too.
> Alright, here's some potential verbiage:
> ; copyString(source String, String(charset), Number(start_opt), > Number(stop_opt), Number(sourceStart_opt), Number(sourceStop_opt)) > [sourceStop Number, targetStop Number] > # Encodes as much as possible of a String in a given charset into this > buffer from "source" to "stop", using the source string from > "sourceStart" to "sourceStop", and returns the actual stop index of > the source string and this, the target buffer. > # "start" is 0 if undefined or omitted. > # "stop" is this buffer's length if undefined or omitted. > # "sourceStart" is 0 if undefined or omitted. > # "sourceStop" is the source string's length if undefined or omitted. > # "charset" must be an IANA charset name. > ## "copyString" must throw a ValueError if the given "charset" is not supported. > ## The charsets "ascii", "utf-8", and "utf-16" must be supported. > ## ''Note: the charset is not optional, there is no default.'' > # Returns a duple Array with the actual "sourceStop" and "targetStop". > ## The actual "sourceStop" is an index one past the last character > actually read. > ## The actual "targetStop" is an index one past the last byte actually written.
Yeah that seems reasonable. So, for clarity - `start`, `stop` are of the octet unit. `sourceStart` and and `sourceStop` are of the character unit?
If it just returned the total number of octets written, that would be sufficient?
"if the output is to be truncated by the buffer's size, truncation will happen on a character boundary, as defined by the target encoding"
This will avoid, for example, writing part of a 3-byte UTF8 sequence into a 2-byte buffer.
Now, the return value won't be able to tell you how to get the rest of the string. In order to do that, you need a lot of nastiness, which is well expressed in Aristid's Encodings specification, IIRC.
On Wed, Feb 24, 2010 at 2:51 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > On Wed, Feb 24, 2010 at 11:24 AM, Ryan Dahl <coldredle...@gmail.com> > wrote: > > On Wed, Feb 24, 2010 at 11:13 AM, Hannes Wallnoefer <hann...@gmail.com> > wrote: > >> 2010/2/24 Ryan Dahl <coldredle...@gmail.com>: > >>> What about writing a string into the buffer? I definitely need the > >>> ability to take a string and write it, with a chosen encoding into the > >>> buffer, at a given location - with a maximum length that it could > >>> occupy. It should return the number of bytes written. It would be nice > >>> if it didn't split characters...
> >> Just wondering: what's the use case for this when you don't know how > >> much of the string (in characters, not bytes) was written to the > >> buffer?
> > Yeah, I guess it should provide that too.
> Alright, here's some potential verbiage:
> ; copyString(source String, String(charset), Number(start_opt), > Number(stop_opt), Number(sourceStart_opt), Number(sourceStop_opt)) > [sourceStop Number, targetStop Number] > # Encodes as much as possible of a String in a given charset into this > buffer from "source" to "stop", using the source string from > "sourceStart" to "sourceStop", and returns the actual stop index of > the source string and this, the target buffer. > # "start" is 0 if undefined or omitted. > # "stop" is this buffer's length if undefined or omitted. > # "sourceStart" is 0 if undefined or omitted. > # "sourceStop" is the source string's length if undefined or omitted. > # "charset" must be an IANA charset name. > ## "copyString" must throw a ValueError if the given "charset" is not > supported. > ## The charsets "ascii", "utf-8", and "utf-16" must be supported. > ## ''Note: the charset is not optional, there is no default.'' > # Returns a duple Array with the actual "sourceStop" and "targetStop". > ## The actual "sourceStop" is an index one past the last character > actually read. > ## The actual "targetStop" is an index one past the last byte actually > written.
> Kris Kowal
> -- > You received this message because you are subscribed to the Google Groups > "CommonJS" group. > To post to this group, send email to commonjs@googlegroups.com. > To unsubscribe from this group, send email to > commonjs+unsubscribe@googlegroups.com<commonjs%2Bunsubscribe@googlegroups.c om> > . > For more options, visit this group at > http://groups.google.com/group/commonjs?hl=en.
-- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
On Sun, Feb 21, 2010 at 4:56 AM, Kris Kowal <kris.ko...@cixar.com> wrote: > I sure hope you guys like this one. This is what I perceive to be > close to truly minimal, without being crippled. It's based on > Binary/E (which is based on Binary/D (which is based on Binary/B)), > but makes some accommodations for what I saw in NodeJS's net2 branch > (buffers that share allocation), renames ByteArray to Buffer, and > makes the new Buffer type have an immutable length. Unlike > Binary/Lite, it retains the charset encoding and decoding behaviors; I > think these are important, and I think we should retain the overloaded > constructors).
I'm not particularly thrilled that I would need to create a range to slice out a string. If I've got a buffer containing a HTTP request, I want to rip out a bunch of ascii encoded strings really quickly, creating a range object for each field and value before .toString('ascii') them is a bit of overhead.
On Wed, Feb 24, 2010 at 12:02 PM, Ryan Dahl <coldredle...@gmail.com> wrote: > Yeah that seems reasonable. So, for clarity - `start`, `stop` are of > the octet unit. `sourceStart` and and `sourceStop` are of the > character unit?
Yeah, I'll make that more clear:
# "start" is 0 if undefined or omitted and counts bytes. # "stop" is this buffer's length if undefined or omitted, counting in bytes. # "sourceStart" is 0 if undefined or omitted, counting in characters. # "sourceStop" is the source string's length if undefined or omitted, counting in characters.
> If it just returned the total number of octets written, that would be > sufficient?
In order to resume writing on another buffer, you would need the string offset because with mixed-width encodings like UTF-*, the width is not computable in terms of the bytes written. It would be possible to compute in terms of UCS-*, but we're targeting the general case.
> If I've got a buffer containing a HTTP request, I > want to rip out a bunch of ascii encoded strings really quickly,
FWIW, if *I* had a buffer containing HTTP requests, I might seriously want a type that could return an array of Strings or Buffers based on a separator -- similar to the BSD strsep() call, or something like this pseudo code:
var header = [];
header.backingStore = buf; // provide GC root for (s = strtok(buf, "\r\n"); s; s = strtok(NULL, "\r\n")) header.push(Buffer(s, strlen(s));
Wes
-- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
On Wed, Feb 24, 2010 at 12:05 PM, Ryan Dahl <coldredle...@gmail.com> wrote: > On Sun, Feb 21, 2010 at 4:56 AM, Kris Kowal <kris.ko...@cixar.com> wrote: >> I sure hope you guys like this one. This is what I perceive to be >> close to truly minimal, without being crippled. It's based on >> Binary/E (which is based on Binary/D (which is based on Binary/B)), >> but makes some accommodations for what I saw in NodeJS's net2 branch >> (buffers that share allocation), renames ByteArray to Buffer, and >> makes the new Buffer type have an immutable length. Unlike >> Binary/Lite, it retains the charset encoding and decoding behaviors; I >> think these are important, and I think we should retain the overloaded >> constructors).
> I'm not particularly thrilled that I would need to create a range to > slice out a string. If I've got a buffer containing a HTTP request, I > want to rip out a bunch of ascii encoded strings really quickly, > creating a range object for each field and value before > .toString('ascii') them is a bit of overhead.
There are others here who would not be thrilled to have to provide a bunch of positional arguments, but I think we can entertain both the range().toString() case and yours by providing optional start,stop range args to toString. Would that suffice?
On Wed, Feb 24, 2010 at 12:30 PM, Kris Kowal <cowbertvon...@gmail.com> wrote: > On Wed, Feb 24, 2010 at 12:05 PM, Ryan Dahl <coldredle...@gmail.com> wrote: >> On Sun, Feb 21, 2010 at 4:56 AM, Kris Kowal <kris.ko...@cixar.com> wrote: >>> I sure hope you guys like this one. This is what I perceive to be >>> close to truly minimal, without being crippled. It's based on >>> Binary/E (which is based on Binary/D (which is based on Binary/B)), >>> but makes some accommodations for what I saw in NodeJS's net2 branch >>> (buffers that share allocation), renames ByteArray to Buffer, and >>> makes the new Buffer type have an immutable length. Unlike >>> Binary/Lite, it retains the charset encoding and decoding behaviors; I >>> think these are important, and I think we should retain the overloaded >>> constructors).
>> I'm not particularly thrilled that I would need to create a range to >> slice out a string. If I've got a buffer containing a HTTP request, I >> want to rip out a bunch of ascii encoded strings really quickly, >> creating a range object for each field and value before >> .toString('ascii') them is a bit of overhead.
> There are others here who would not be thrilled to have to provide a > bunch of positional arguments, but I think we can entertain both the > range().toString() case and yours by providing optional start,stop > range args to toString. Would that suffice?
> -- > You received this message because you are subscribed to the Google Groups > "CommonJS" group. > To post to this group, send email to commonjs@googlegroups.com. > To unsubscribe from this group, send email to > commonjs+unsubscribe@googlegroups.com<commonjs%2Bunsubscribe@googlegroups.c om> > . > For more options, visit this group at > http://groups.google.com/group/commonjs?hl=en.
On Wed, Feb 24, 2010 at 1:34 PM, Ondřej Žára <ondrej.z...@gmail.com> wrote: > 2010/2/24 Kris Kowal <cowbertvon...@gmail.com> > By the way, there is something like ValueError in standard javascript?
Oh, apparently there isn't (I've been assuming it was from my experience elsewhere). Should we change the three references to ValueError to RangeError or just Error.?