Proposal: Encodings API (independent from Streams)

4 views
Skip to first unread message

Aristid Breitkreuz

unread,
Apr 9, 2009, 8:24:20 AM4/9/09
to serv...@googlegroups.com
Hi,

I think it is clear that serverjs Streams (as outlined in the file
proposal) should have support for various encodings. I wish to go one
step further and propose low-level access to the encodings supported by
these streams.

I propose this API:

converter = new Encodings.Converter(from, to)
converter.write(byteStringEncodedInFrom)
byteStringEncodedInTo = converter.read(maximumSize)
or/and
readSize = converter.read(byteArray, maximumSize)

and for convenience

outputByteString = Encodings.convert(inputByteString, from, to)

Error handling should be done somehow, but I'm not yet sure how.

Maybe the ICONV encoding names would be good, but otherwise I'd
recommend sticking to some standard.

Kind regards,

Aristid Breitkreuz

Wes Garland

unread,
Apr 9, 2009, 9:31:11 AM4/9/09
to serv...@googlegroups.com
Maybe the ICONV encoding names would be good, but otherwise I'd
recommend sticking to some standard.

iconv encoding names have the interesting advantage that if you use them, and implement with iconv, any converting your OS can do is now doable by the encoding engine.
Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Aristid Breitkreuz

unread,
Apr 9, 2009, 9:58:53 AM4/9/09
to serv...@googlegroups.com
Hi,

Wes Garland schrieb:


>
> Maybe the ICONV encoding names would be good, but otherwise I'd
> recommend sticking to some standard.
>
>
> iconv encoding names have the interesting advantage that if you use
> them, and implement with iconv, any converting your OS can do is now
> doable by the encoding engine.
> Wes

It also would simplify implementation. That's why I proposed it. But if
there is some international standard that diverges from ICONV, then
maybe we should require that instead.

----

I also have a further idea for the API: if maximumSize as passed to
Converter.prototype.read is undefined, it would read as much as possible.

I also should note - there is some confusion there - that the proposed
API is _unidirectional_ - the converter class buffers. So converting
basically works like this:

converter.write(Chunk1)
converter.write(Chunk2)
while (var result = converter.read(4096)) {
doSomething(result);
//converter.write(Chunk3)
}

or more simply:

converter.write(Input)
output = converter.read()

(Everything is a ByteArray / ByteString here.)

Cheers

Aristid Breitkreuz

Ash Berlin

unread,
Apr 9, 2009, 10:06:04 AM4/9/09
to serv...@googlegroups.com

On Thu, 09 Apr 2009 15:58:53 +0200, Aristid Breitkreuz
<aristid.b...@gmx.de> wrote:
>
> Hi,
>
> Wes Garland schrieb:
>>
>> Maybe the ICONV encoding names would be good, but otherwise I'd
>> recommend sticking to some standard.
>>
>>
>> iconv encoding names have the interesting advantage that if you use
>> them, and implement with iconv, any converting your OS can do is now
>> doable by the encoding engine.
>> Wes
>
> It also would simplify implementation. That's why I proposed it. But if
> there is some international standard that diverges from ICONV, then
> maybe we should require that instead.
>

What sort of names does ICONV use? My preference would be for the IANA
encoding names as I mentioned and linked to in my comments on the Binary
proposals: http://www.iana.org/assignments/character-sets -ash


> ----
>
> I also have a further idea for the API: if maximumSize as passed to
> Converter.prototype.read is undefined, it would read as much as possible.
>
> I also should note - there is some confusion there - that the proposed
> API is _unidirectional_ - the converter class buffers. So converting
> basically works like this:
>
> converter.write(Chunk1)
> converter.write(Chunk2)
> while (var result = converter.read(4096)) {
> doSomething(result);
> //converter.write(Chunk3)
> }
>
> or more simply:
>
> converter.write(Input)
> output = converter.read()
>
> (Everything is a ByteArray / ByteString here.)
>
> Cheers
>
> Aristid Breitkreuz
>

Hmmmm, for some reason this seems the wrong way round to me (read/write).
It sort depends on your PoV (is the converter a data sink or a data
source.) Perhaps we can come up with better names that dont suffer from
this confusion? Nothing srpings to mind right now, but then I don't have
many spare cycles to think right now.

Aristid Breitkreuz

unread,
Apr 9, 2009, 10:49:56 AM4/9/09
to serv...@googlegroups.com
Ash Berlin schrieb:

> What sort of names does ICONV use? My preference would be for the IANA
> encoding names as I mentioned and linked to in my comments on the Binary
> proposals: http://www.iana.org/assignments/character-sets -ash
>
You can type iconv --list to get a comprehensive list of names. UTF-8 is
among them. I don't know if the IANA assigned character sets are
compatible to ICONV, which would simplify things.

> Hmmmm, for some reason this seems the wrong way round to me (read/write).
> It sort depends on your PoV (is the converter a data sink or a data
> source.) Perhaps we can come up with better names that dont suffer from
> this confusion? Nothing srpings to mind right now, but then I don't have
> many spare cycles to think right now.
>

How about push / read? Not symmetric, yeah, but that would maybe avoid
this conversion.

converter.push(Input)
output = converter.read()

Or even push / get:

converter.push(Input)
output = converter.get()

Aristid Breitkreuz

unread,
Apr 9, 2009, 11:26:06 AM4/9/09
to serv...@googlegroups.com
Hi,

Aristid Breitkreuz schrieb:
> I propose this API:
>

Along with a few modifications, I've written an article on the wiki:
https://wiki.mozilla.org/ServerJS/Encodings

The modifications are:

* It is a module now (name: encodings).
* Push/get instead of write/read.
* Added require('encodings').convertFromString and
require('encodings').convertToString methods.

Reply all
Reply to author
Forward
0 new messages