Prefereable charsets/languages

Maciej Piechotka

unread,

May 19, 2008, 6:18:13 AM5/19/08

to merb_global

From 0.0.2 it would be nice to support RFC 2616 (see #5). However
there are a few methods to do it:
- Choose the best language, choose the best charset. Convert string
into charset changing other characters into HTTP Entities
Pros: Very simple
Cons: Chainise language in Japanise Charset will take 7*2=14 bytes
(assuming 2 byte Japanise charset and 7 char per HTTP Entities

- Choose the best language. Choose the best charset but only if it
'supports' the language.
Pros: The most 'proper' behaviour
Cons: We would need to store somewhere list of proper charsets and
maintain them

- Choose the best language. Send it in UTF-8
Pros: Very simple
Cons: UTF-8 may not be supported by older clients

Any other ideas?

Maciej Piechotka

unread,

May 19, 2008, 2:20:12 PM5/19/08

to merb_global

And additional how should '*' be supported?
- Fetch all languages substract unsupported and choose first/at
random.
Pros: Simple
Cons: Why should Afar language be prefered? The randomization of
process doesn't help much...

- Have some priorities.
Pros: Choose possibly the best supported language
Cons: Where to store it?

- Simply ignore it (possibly appear soon in http-headers branch).
Pros: Simply
Cons: It's not a 'support'.

- Other possible methods?

What do you think?

Maciej Piechotka

unread,

May 24, 2008, 6:53:41 PM5/24/08

to merb_global

On May 19, 8:20 pm, Maciej Piechotka <uzytkown...@gmail.com> wrote:
> On May 19, 12:18 pm, Maciej Piechotka <uzytkown...@gmail.com> wrote:
>
>
> And additional how should '*' be supported?
> - Fetch all languages substract unsupported and choose first/at
> random.
> Pros: Simple
> Cons: Why should Afar language be prefered? The randomization of
> process doesn't help much...
>

The first one supported will be probably chosen. If you propose other
solution please pot it here anyway.

Regards

Maciej Piechotka

unread,

May 24, 2008, 7:23:43 PM5/24/08

to merb_global

On May 19, 12:18 pm, Maciej Piechotka <uzytkown...@gmail.com> wrote:

> From 0.0.2 it would be nice to support RFC 2616 (see #5). However
> there are a few methods to do it:
> - Choose the best language, choose the best charset. Convert string
> into charset changing other characters into HTTP Entities
> Pros: Very simple
> Cons: Chainise language in Japanise Charset will take 7*2=14 bytes
> (assuming 2 byte Japanise charset and 7 char per HTTP Entities
>
> - Choose the best language. Choose the best charset but only if it
> 'supports' the language.
> Pros: The most 'proper' behaviour
> Cons: We would need to store somewhere list of proper charsets and
> maintain them
>

From XMPP conversation with Alex Coles it seems that he is in favour
of this option. However what we should do with the user content data?
We would need to convert them each time on request:
- How will it affect the performance?
- Is it possible in merb?

> - Choose the best language. Send it in UTF-8
> Pros: Very simple
> Cons: UTF-8 may not be supported by older clients
>

I seem to be in favour of this option. It impose certain restriction
on user but:
- utf-8 should be supported by virtually anything (including IE 4)
- π ≈ 3.14159 and similar do not have to be represented in any other
encoding (and for sure not all of them)

> Any other ideas?

Alex Coles

unread,

May 29, 2008, 9:55:53 AM5/29/08

to merb_...@googlegroups.com

2008/5/25 Maciej Piechotka <uzytk...@gmail.com>:

I think the way to proceed is this last option - send as UTF-8.
Although I'd prefer something that could match charsets against
supported languages, you are correct in saying this is a lot of work -
and something that we perhaps should put down as something to be done,
further down the road (and if there is demand).

Alex

Maciej Piechotka

unread,

May 29, 2008, 10:33:30 AM5/29/08

to merb_global

On May 29, 3:55 pm, "Alex Coles" <alex.co...@gmail.com> wrote:
> 2008/5/25 Maciej Piechotka <uzytkown...@gmail.com>:

Well - may be in future. I'll mark the bugs as solved.

Regards

Reply all

Reply to author

Forward