BERT 2.0 Proposal: New complex type - {bert, string, Encoding, Binary}

100 views

Skip to first unread message

Tom Preston-Werner

unread,

Oct 27, 2009, 2:21:56 PM10/27/09

to BERT-RPC

The BERT 1.0 spec does not include a mechanism for specifying
character encoding for strings. I propose the addition of the
following complex type to BERT 2.0 to address this problem:

{bert, string, Encoding, Binary}

Where Encoding is an atom that specifies the character encoding and
Binary is a simple binary type containing the encoded string data.

Valid encoding atoms correspond to the libiconv encoding names such as
'ASCII', 'UTF-8', 'EUC-JP', etc. See http://www.gnu.org/software/libiconv/.

For example, to specify a UTF-8 encoded string, the complex type would
be:

{bert, string, 'UTF-8', <<"Jos\303\251">>}

Please let me know if you have any feedback on this proposal.

Tom

stephen judkins

unread,

Nov 3, 2009, 3:38:22 AM11/3/09

to BERT-RPC

I would argue it would be simpler to standardize on a single
encoding. Since UTF-8 is becoming the lingua franca of the internet,
that would probably be the best choice. Though arbitrary encodings
mapped from libiconv encoding names wouldn't be super hard for
implementors, in practice everyone would probably want to use UTF-8
anyways. Why not make it the default and only choice? If someone
feels the need to use another encoding they can always serialize it as
binary and decode it however they want.

If people have strong feelings the other way I'd like to hear from
them.

On Oct 27, 10:21 am, Tom Preston-Werner <mojo...@gmail.com> wrote:
> The BERT 1.0 spec does not include a mechanism for specifying
> character encoding for strings. I propose the addition of the
> following complex type to BERT 2.0 to address this problem:
>
> {bert, string, Encoding, Binary}
>
> Where Encoding is an atom that specifies the character encoding and
> Binary is a simple binary type containing the encoded string data.
>
> Valid encoding atoms correspond to the libiconv encoding names such as