How to make Encodings API extensible?

4 views
Skip to first unread message

Aristid Breitkreuz

unread,
Apr 10, 2009, 2:50:51 PM4/10/09
to serv...@googlegroups.com, cowber...@gmail.com
Hi,

How can we make the encodings API extensible? Should it be extensible at
all?

One proposal is having one module per encoding, but I'm not sure if I
like that...

Aristid Breitkreuz

Ash Berlin

unread,
Apr 10, 2009, 3:14:06 PM4/10/09
to serv...@googlegroups.com

Yes, I can't think of a sane argument for not making it extensible
right now.

I'm not quite sure howthe API would work in practice, but my initial
thoughts are something like

(code)
var Enc = require('encoding')

Enc.registerEncoding('rot13', someMysticalObjectOrFunction);
(endcode)

And if every encoding can go from/to utf8, then every encoding can go
to every other encoding by converting to utf8 (internally) as needed.

However this approach would require the user to load the module
manually, but right now I can't think of a better way. Oh and I'm not
quite sure what `someMysticalObjectOrFunction` should actually do :)

-ash

Aristid Breitkreuz

unread,
Apr 10, 2009, 6:51:26 PM4/10/09
to serv...@googlegroups.com
Ash Berlin schrieb:

> Yes, I can't think of a sane argument for not making it extensible
> right now.
>

Well, it's not trivial, so if we could avoid it... but maybe it is
really better to have, dunno.

> I'm not quite sure howthe API would work in practice, but my initial
> thoughts are something like
>
> (code)
> var Enc = require('encoding')
>
> Enc.registerEncoding('rot13', someMysticalObjectOrFunction);
> (endcode)
>

ROT13? That's no character encoding?!

I note that this is manual loading that you propose.

> And if every encoding can go from/to utf8, then every encoding can go
> to every other encoding by converting to utf8 (internally) as needed.
>
> However this approach would require the user to load the module
> manually, but right now I can't think of a better way. Oh and I'm not
> quite sure what `someMysticalObjectOrFunction` should actually do :)
>

Kris Kowal wrote something about this in the ByteArray and ByteString
thread. If I understand correctly he proposes one low level converter
module and extensibility via one module for one encoding, so for example
if you'd convert 'my-ext-encoding' to 'iso-8859-1',
require('codec/my-ext-encoding') is used for converting
'my-ext-encoding' to some intermediate format.

But I'm not sure if that's optimal.

Kris Kowal

unread,
Apr 11, 2009, 2:37:11 AM4/11/09
to serv...@googlegroups.com
On Fri, Apr 10, 2009 at 3:51 PM, Aristid Breitkreuz
<aristid.b...@gmx.de> wrote:
> Kris Kowal wrote something about this in the ByteArray and ByteString
> thread. If I understand correctly he proposes one low level converter
> module and extensibility via one module for one encoding, so for example
> if you'd convert 'my-ext-encoding' to 'iso-8859-1',
> require('codec/my-ext-encoding') is used for converting
> 'my-ext-encoding' to some intermediate format.
>
> But I'm not sure if that's optimal.

The compromise I propose is that the "encodings" or "codec" module
would be responsible for finding the appropriate transcoder for any
given (source, target) codec pair. The types defined in the "binary"
module would defer to the "codec" module for decode, encode, and
transcode functions. If a high-performance native transcoder is
available, that would be used. If not, a pure-javascript adapter
would be constructed to "triangulate" between encoders and decoders
from the source to the target encoding. The idea is that an encoding
extension would be a module under codec/* that implements one function
to translate bytes in that encoding to unicode code points, and
another function to translate an array of unicode code points (or
coerces a String to an array of unicode code points) to a byte string
or byte array in the target encoding.

I believe this addresses both the issue of performance for the
majority of cases, and provides a scalable architecture option for
orthogonally designed encoding modules. This design would also
provide a transparent alternative for platforms that do not have
access to native transcoders, or have not yet implemented an interface
to native transcoders, albeit at the cost of suboptimal performance.

Kris Kowal

Reply all
Reply to author
Forward
0 new messages