Proposal: require UTF-8 encoding in PO files (?)

38 views
Skip to first unread message

Malcolm Tredinnick

unread,
Jul 14, 2007, 10:17:59 AM7/14/07
to Djang...@googlegroups.com
Hi all,

Short version: anybody unhappy if we *require* PO files to be encoded as
UTF-8? For almost everybody, this means no change at all. For "es" and
"es_AR", I will have to re-encode the current files (they are
ISO-8859-1) to UTF-8 and, in future, you will need to edit them as
UTF-8. I would like to make sure I'm not going to make things impossible
for anybody by asking for this.

Longer version (with explanation):

Working my way through thinking about a bunch of i18n tickets and
problems. One popular request is to allow non-ASCII msgid strings (the
original, untranslated strings).

I'd always believed the xgettext man page that said this wasn't possible
for Python. Turns out that the docs are wrong and it is possible. So we
have a nice patch in #4734 that works well and I'm happy with it.

The only catch is that we now have to agree on an encoding for PO files
because xgettext is going to be potentially putting non-ASCII in there,
which means an encoding is required. Typically (on most Open Source
projects I've worked on in the past, including big ones like GNOME),
UTF-8 has been the right choice. This isn't a big deal for core code,
although it might be useful in localflavors at some stage, but it will
help make the documentation easier. Note that if we can't agree, you
guys don't get to use non-ASCII msgid strings.

Aside from "es" and "es_AR", the only other non-UTF-8 file is
djangojs.po in "fr", but that hasn't been updated for a while and than
main (django.po) "fr" file is in UTF-8, so I figure it's not an issue
there.

Regards,
Malcolm

--
A conclusion is the place where you got tired of thinking.
http://www.pointy-stick.com/blog/

Ramiro Morales

unread,
Jul 14, 2007, 11:02:01 AM7/14/07
to Djang...@googlegroups.com
On 7/14/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
>
> Hi all,
>
> Short version: anybody unhappy if we *require* PO files to be encoded as
> UTF-8? For almost everybody, this means no change at all. For "es" and
> "es_AR", I will have to re-encode the current files (they are
> ISO-8859-1) to UTF-8 and, in future, you will need to edit them as
> UTF-8. I would like to make sure I'm not going to make things impossible
> for anybody by asking for this.
>
> Longer version (with explanation):
> [...]

I was going to post to the list asking about that same topic last week[1].

I'm the es_AR translator and I think there is no problem with what
you propose.

Now that two huge merges to trunk have happened and now you' ve
commited the [5695] fix I can post to Trac an updated translation
with the regular updates plus some fixes. I can post it already encoded
in UTF-8 if you want.

Regards,

--
Ramiro Morales

1. I decided to wait for the then current discussion about the
project-id-version header field value. By the way, we haven't given you much
feedback on that topic; I'm ok with using "Django" "Django JavaScript".
Will also point language-team to this list and will keep filling
my email address
on last-translator

Marc Fargas

unread,
Jul 14, 2007, 12:13:56 PM7/14/07
to Djang...@googlegroups.com
"ca" is already UTF-8 so there's no problem from this side ;)
signature.asc

Jorge Gajon

unread,
Jul 14, 2007, 1:53:37 PM7/14/07
to Djang...@googlegroups.com
On 7/14/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> Short version: anybody unhappy if we *require* PO files to be encoded as
> UTF-8? For almost everybody, this means no change at all. For "es" and
> "es_AR", I will have to re-encode the current files (they are

Hi, I'm not the only contributor to the 'es' translation, but it is
perfectly fine with me if we re-encode it to UTF-8.

I'll re-encode 'es' and submit a patch.

Regards

Malcolm Tredinnick

unread,
Jul 14, 2007, 9:57:53 PM7/14/07
to Djang...@googlegroups.com

Cool. Thanks for the rapid feedback (Ramiro, too). The only other person
who might be affected is Mario Gonzalez (who contributes to the 'es'
translation). However, I think he might be outvoted in any case.

Not that I don't love the rest of the translators like my own family,
but all the other files are already in UTF-8, so it's a bit of a
non-issue there.

I'll leave it up to translators to do the re-encoding, because I suspect
that if I commit a huge change like that you might get a bunch of
subversion conflicts on the next update.

Regards,
Malcolm

--
Honk if you love peace and quiet.
http://www.pointy-stick.com/blog/

Mario Gonzalez

unread,
Jul 16, 2007, 2:46:44 PM7/16/07
to Djang...@googlegroups.com
On 14/07/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> > I'll re-encode 'es' and submit a patch.
>
> Cool. Thanks for the rapid feedback (Ramiro, too). The only other person
> who might be affected is Mario Gonzalez (who contributes to the 'es'
> translation). However, I think he might be outvoted in any case.
>

Hello, sorry for the delay. I'm agree that strings must be in UTF-8
encoding, thanks for the patch Jorge.

>

--
http://www.advogato.org/person/mgonzalez/

Reply all
Reply to author
Forward
0 new messages