I would argue it would be simpler to standardize on a single
encoding. Since UTF-8 is becoming the lingua franca of the internet,
that would probably be the best choice. Though arbitrary encodings
mapped from libiconv encoding names wouldn't be super hard for
implementors, in practice everyone would probably want to use UTF-8
anyways. Why not make it the default and only choice? If someone
feels the need to use another encoding they can always serialize it as
binary and decode it however they want.
If people have strong feelings the other way I'd like to hear from
them.
On Oct 27, 10:21 am, Tom Preston-Werner <
mojo...@gmail.com> wrote:
> The BERT 1.0 spec does not include a mechanism for specifying
> character encoding for strings. I propose the addition of the
> following complex type to BERT 2.0 to address this problem:
>
> {bert, string, Encoding, Binary}
>
> Where Encoding is an atom that specifies the character encoding and
> Binary is a simple binary type containing the encoded string data.
>
> Valid encoding atoms correspond to the libiconv encoding names such as
> 'ASCII', 'UTF-8', 'EUC-JP', etc. Seehttp://
www.gnu.org/software/libiconv/.