Enforce the use of a unicode string in settings.LANGUAGES

358 views
Skip to first unread message

Henrique Romano

unread,
Jan 21, 2014, 4:13:42 PM1/21/14
to django-d...@googlegroups.com
Hi,

As per the documentation[1], it is not clear that you _must_ use a unicode string for the language name.  If you don't use an unicode string, the following can happen:

>>> from django.utils.translation import ugettext
>>> ugettext("Português")
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

As opposed to:

>>> ugettext(u"Português")
u'Portugu\xeas'

I was just having an issue where the languages available was being rendered in the template, but since TEMPLATE_DEBUG was enabled, no errors was generated in the development environment.  Switching TEMPLATE_DEBUG off in the production resulted in an exception with almost no clue on what happened.

What do you guys think about making it clear that the user should always use an unicode string for the LANGUAGES setting? 
 
--
Henrique Romano

gilberto dos santos alves

unread,
Jan 21, 2014, 9:37:28 PM1/21/14
to django-d...@googlegroups.com
please see that it is python directive not django. for all sources it is a good practive for all we that use pt-br utf-8 explicit this on second line of file python code
# -*- coding: utf-8 -*- 

regards!



2014/1/21 Henrique Romano <chro...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CA%2BEHudKfPXxry6T2tW_6ZNFzJgAJUHs3nO_mLkdp3Xk2wS09xg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
gilberto dos santos alves
+55.11.98646-5049
sao paulo - sp - brasil




Henrique Romano

unread,
Jan 22, 2014, 8:21:56 AM1/22/14
to django-d...@googlegroups.com
Hi,

On Wed, Jan 22, 2014 at 12:37 AM, gilberto dos santos alves <gsa...@gmail.com> wrote:
please see that it is python directive not django. for all sources it is a good practive for all we that use pt-br utf-8 explicit this on second line of file python code
# -*- coding: utf-8 -*- 
 
Specyfing the encoding of the file don't make strings unicode automatically, therefore doing it won't solve the problem.  I think the problem is still relevant.

Thanks

gilberto dos santos alves

unread,
Jan 22, 2014, 8:38:20 AM1/22/14
to django-d...@googlegroups.com
please look details about on [1]. if you put
# -*- coding: utf-8 -*-  on sources and config files for django your string "português" will be automatically handled.
regards.

[1] http://docs.python.org/2/howto/unicode.html


2014/1/22 Henrique Romano <chro...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/groups/opt_out.

Henrique Romano

unread,
Jan 22, 2014, 8:59:23 AM1/22/14
to django-d...@googlegroups.com
On Wed, Jan 22, 2014 at 11:38 AM, gilberto dos santos alves <gsa...@gmail.com> wrote:
please look details about on [1]. if you put
# -*- coding: utf-8 -*-  on sources and config files for django your string "português" will be automatically handled.

Can you just try what I reported?  For example:

$ cat ~/foofoo.py
# -*- coding: utf-8 -*-
from django.utils.translation import ugettext

print ugettext("Português")
$ python ~/foofoo.py
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

So, I'm not sure what you are talking about.
--
Henrique Romano

Shai Berger

unread,
Jan 22, 2014, 9:39:43 AM1/22/14
to django-d...@googlegroups.com
This has nothing to do with the LANGUAGES setting, or the string being a
language name. it just so happens that ugettext tries to return unicode, and
so for an untranslated string s it returns unicode(s). You can get the same
error by writing

unicode("Português")

You should make sure that every string you pass to unicode(), directly or
indirectly, is either a unicode object or an ASCII-only string (except in
cases where you also pass the encoding); but that is general Python, not
Django-specific.

As far as file headers are concerned, you may want to use

from __future__ import unicode_literals

which really does make all your strings unicode.

Please take further discussion of this to django-users or other forums.

HTH,
Shai.

Ramiro Morales

unread,
Jan 22, 2014, 9:43:34 AM1/22/14
to django-d...@googlegroups.com
You aren't telling us all the story. there are many missing parts in your description of the issue you are finding so far so I don't think it's right to jump straight to the  "As per the documentation, it is not clear that you _must_ use a unicode string for the language name. ... What do you guys think about making it clear that the user should always use an unicode string for the LANGUAGES setting?" conclusion.

Some questions:

- What are you trying to do? Reducing the choics in the LANGUAGES setting? Do you intend to translate them from Portuguese to other laguages?

- Why are you using gettext() at the module level? You should use ugettext_lazy() or ugettex_noop()

- Why did you show us a python interactive session instead of a Python soruce code file? How do you thing the interpreter can deduce the encoding of a bare string in that case?

- Why in both cases are you using Django without are for setting up the settings infrastructure first?

- Is the foofoo.py file are you (or rather you text editor) actually using the utf-8 encoding?

You link to the development documentation but the tests I performed below are against the latest 1.6.x stable branch code with Python 2.7.3 (another piece of information you don't give us):

$ django-admin.py startproject lang_i18n
$ cd lang_i18n/
$ cat foofoo.py
#  -*- coding: utf-8 -*-

from django.utils.translation import ugettext, ugettext_lazy, ugettext_noop

print(ugettext_lazy("Portugués"))

$ file foofoo.py
foofoo.py: Python script, UTF-8 Unicode text executable

ramiro@mang:~/dtest/lang_i18n$ DJANGO_SETTINGS_MODULE=lang_i18n.settings python foofoo.py
<django.utils.functional.__proxy__ object at 0x20bfcd0>

So the (lazy)  translation machinery is working.

gilberto dos santos alves

unread,
Jan 22, 2014, 10:45:13 AM1/22/14
to django-d...@googlegroups.com
yes i will try with django 1.6.


2014/1/22 Henrique Romano <chro...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/groups/opt_out.

Henrique Romano

unread,
Jan 22, 2014, 11:01:14 AM1/22/14
to django-d...@googlegroups.com
On Wed, Jan 22, 2014 at 12:39 PM, Shai Berger <sh...@platonix.com> wrote:

This has nothing to do with the LANGUAGES setting, or the string being a
language name. it just so happens that ugettext tries to return unicode, and
so for an untranslated string s it returns unicode(s). You can get the same
error by writing

        unicode("Português")

Correct. 
 
You should make sure that every string you pass to unicode(), directly or
indirectly, is either a unicode object or an ASCII-only string (except in
cases where you also pass the encoding); but that is general Python, not
Django-specific.

The point is that I didn't know I had to pass an unicode string, I didn't even know about the internals (of django-cms or django, I don't remember) where the language name is being translated and therefore an unicode string was necessary.  That's what I'm suggesting here, to make it explicit that the language name string must be a unicode string.

--
Henrique Romano

In the face of ambiguity, refuse the temptation to guess.
    -- Tim Peters

Henrique Romano

unread,
Jan 22, 2014, 11:12:02 AM1/22/14
to django-d...@googlegroups.com
On Wed, Jan 22, 2014 at 12:43 PM, Ramiro Morales <cra...@gmail.com> wrote:
You aren't telling us all the story. there are many missing parts in your description of the issue you are finding so far so I don't think it's right to jump straight to the  "As per the documentation, it is not clear that you _must_ use a unicode string for the language name. ... What do you guys think about making it clear that the user should always use an unicode string for the LANGUAGES setting?" conclusion.

Some questions:

- What are you trying to do? Reducing the choics in the LANGUAGES setting? Do you intend to translate them from Portuguese to other laguages?

I'm using django-cms, I want to support only two languages in my system: English and Portuguese. 

- Why are you using gettext() at the module level? You should use ugettext_lazy() or ugettex_noop()

That was just an example, ugettext_* returns a functional.proxy, which doesn't try to render the string when you print it on the screen, but if you try to do it you will see the same result. 

- Why did you show us a python interactive session instead of a Python soruce code file? How do you thing the interpreter can deduce the encoding of a bare string in that case?

I didn't care about the correct encoding, I used the interactive session just to show that a _unicode_ string is necessary for ugettext.
 
- Why in both cases are you using Django without are for setting up the settings infrastructure first?

- Is the foofoo.py file are you (or rather you text editor) actually using the utf-8 encoding?

Yes.

You link to the development documentation but the tests I performed below are against the latest 1.6.x stable branch code with Python 2.7.3 (another piece of information you don't give us):

$ django-admin.py startproject lang_i18n
$ cd lang_i18n/
$ cat foofoo.py
#  -*- coding: utf-8 -*-

from django.utils.translation import ugettext, ugettext_lazy, ugettext_noop

print(ugettext_lazy("Portugués"))

$ file foofoo.py
foofoo.py: Python script, UTF-8 Unicode text executable

ramiro@mang:~/dtest/lang_i18n$ DJANGO_SETTINGS_MODULE=lang_i18n.settings python foofoo.py
<django.utils.functional.__proxy__ object at 0x20bfcd0> 
So the (lazy)  translation machinery is working.

As I wrote above, it "works" because it is returning a functional.proxy object, if you try to print this object as string, you will see the same error.

Shai Berger

unread,
Jan 22, 2014, 11:16:58 AM1/22/14
to django-d...@googlegroups.com
Wait -- so the real context (which, as Ramiro noted, you left out) is

# settings.py

LANGUAGES = (('pt_BR', _("Português")),)

Is it? Or is it

LANGUAGES = (('pt_BR', "Português"),)

If it is the former, then this is a generic issue of translatable strings --
nothing to do with settings.LANGUAGES. It is usually assumed that, if you are
making a string translatable, you write it in English -- then it's ASCII and
all's well.

If it is the latter, please provide more details about the specific error you
encountered (stack traces etc).

Shai.

Henrique Romano

unread,
Jan 22, 2014, 11:29:18 AM1/22/14
to django-d...@googlegroups.com
On Wed, Jan 22, 2014 at 2:16 PM, Shai Berger <sh...@platonix.com> wrote:
Wait -- so the real context (which, as Ramiro noted, you left out) is

# settings.py

LANGUAGES = (('pt_BR', _("Português")),)

Is it? Or is it

LANGUAGES = (('pt_BR', "Português"),)

If it is the former, then this is a generic issue of translatable strings --
nothing to do with settings.LANGUAGES. It is usually assumed that, if you are
making a string translatable, you write it in English -- then it's ASCII and
all's well.

If it is the latter, please provide more details about the specific error you
encountered (stack traces etc).

In my test, LANGUAGES is defined as follows:

LANGUAGES = [
    ('en-us', 'English'),
    ('pt-br', 'Português'),
]

As the documentation didn't make it clear that the language names should be unicode, I just used the translated name ("Português") in a plain string, I thought it would be OK since settings.py has the encoding declared at the top of the file.

As for the error, it happened inside django-cms.  There's a code that is called for building a list of the languages enabled and their names:

# cms/utils/i18n.py
 19 def get_languages(site_id=None):
 20     site_id = get_site(site_id)
 21     result = get_cms_setting('LANGUAGES').get(site_id)
 22     if not result:
 23         result = []
 24         defaults = get_cms_setting('LANGUAGES').get('default', {})
 25         for code, name in settings.LANGUAGES:
 26             lang = {'code': code, 'name': _(name)}
 27             lang.update(defaults)
 28             result.append(lang)
 29         get_cms_setting('LANGUAGES')[site_id] = result
 30     return result

In the code above it is trying to translate the language name and it is resulting in an error since the name isn't an unicode string.  I understand that this is happening in django-cms, but I still think it is relevant to document either that the name should be a unicode string or at least the english name of the language.
 

Shai Berger

unread,
Jan 22, 2014, 11:38:24 AM1/22/14
to django-d...@googlegroups.com
I don't think Django should take responsibility for a 3rd-party package which
decides that some part of a setting should be translatable whether the user
said so or not.

You might want to take this up with django-cms.

Shai.

Henrique Romano

unread,
Jan 22, 2014, 11:58:26 AM1/22/14
to django-d...@googlegroups.com
On Wed, Jan 22, 2014 at 2:38 PM, Shai Berger <sh...@platonix.com> wrote:
I don't think Django should take responsibility for a 3rd-party package which
decides that some part of a setting should be translatable whether the user
said so or not.

You might want to take this up with django-cms.

Not exactly true.  Even though my problem was with django-cms, if you grep django source code for "LANGUAGES"  you will find this templatetag:

# templatetags/i18n.py
 15 class GetAvailableLanguagesNode(Node):
 16     def __init__(self, variable):
 17         self.variable = variable
 18
 19     def render(self, context):
 20         from django.conf import settings
 21         context[self.variable] = [(k, translation.ugettext(v)) for k, v in settings.LANGUAGES]
 22         return ''

 It is also translating the given language name.  My suggestion is to update the documentation saying that the developer should use ugettext_lazy for the language name if they want it translated *or* at least use a unicode string.

Shai Berger

unread,
Jan 22, 2014, 12:15:44 PM1/22/14
to django-d...@googlegroups.com
On Wednesday 22 January 2014 18:58:26 Henrique Romano wrote:
> On Wed, Jan 22, 2014 at 2:38 PM, Shai Berger <sh...@platonix.com> wrote:
> > I don't think Django should take responsibility for a 3rd-party package
> > which
> > decides that some part of a setting should be translatable whether the
> > user said so or not.
> >
> > You might want to take this up with django-cms.
>
> Not exactly true. Even though my problem was with django-cms, if you grep
> django source code for "LANGUAGES" you will find this templatetag:
>
> # templatetags/i18n.py
> 15 class GetAvailableLanguagesNode(Node):
> 16 def __init__(self, variable):
> 17 self.variable = variable
> 18
> 19 def render(self, context):
> 20 from django.conf import settings
> 21 context[self.variable] = [(k, translation.ugettext(v)) for k, v
> in settings.LANGUAGES]
> 22 return ''
>
> It is also translating the given language name.

You are right. Further, the documentation[1] specifically says that language
names get translated, whether or not you use this tag.

> My suggestion is to
> update the documentation saying that the developer should use ugettext_lazy
> for the language name if they want it translated *or* at least use a
> unicode string.

That's not accurate -- the documentation should say that language names get
translated whether or not you use ugettext_lazy, if you use RequestContext or
the i18n tags, and *recommend* that anything you put there be either
translated with ugettext_lazy or be a unicode ubject. It is still valid to use
a non-unicode string if:

1) The value is all ASCII
2) The value has translations in all used languages.

(you saw the error because your language name does not have a translation, and
thus was used as given instead).

Bugs and patches welcome,

Shai.

[1] https://docs.djangoproject.com/en/dev/topics/i18n/translation/#other-tags

Ramiro Morales

unread,
Jan 22, 2014, 12:34:28 PM1/22/14
to django-d...@googlegroups.com
On Wed, Jan 22, 2014 at 1:12 PM, Henrique Romano <chro...@gmail.com> wrote:
>
> That was just an example, ugettext_* returns a functional.proxy, which
> doesn't try to render the string when you print it on the screen, but if you
> try to do it you will see the same result.

You are right.

On Wed, Jan 22, 2014 at 1:58 PM, Henrique Romano <chro...@gmail.com> wrote:
>
> On Wed, Jan 22, 2014 at 2:38 PM, Shai Berger <sh...@platonix.com> wrote:
>> You might want to take this up with django-cms.
>
> Not exactly true. Even though my problem was with django-cms, if you grep django source code for "LANGUAGES" you will find this templatetag:
>
> # templatetags/i18n.py
> 15 class GetAvailableLanguagesNode(Node):
> 16 def __init__(self, variable):
> 17 self.variable = variable
> 18
> 19 def render(self, context):
> 20 from django.conf import settings
> 21 context[self.variable] = [(k, translation.ugettext(v)) for k, v in settings.LANGUAGES]
> 22 return ''
>
> It is also translating the given language name.

Yes, I think that what both pieces of code are doing there is assume
the LANGUAGES setting in effect is the one shipped with Django where:

- All the original language names are in English (covered by ASCII)
- Most of them are translated by the Django translators and shipped
with Django (https://github.com/django/django/blob/master/django/conf/locale/en/LC_MESSAGES/django.po#L16)
so translations are readily available.

So it's relying on the fact there there is no need to go outside ASCII
when working with original language names.

I think you are onto something re: the fact that we don't make clear
that our ugettext*() functions fail to accept encoded literals with
characters outside ASCII under Python 2.x. even when the encoding
metadata is correct.

My theory is that the facts that documentation is a bit unclear about
the requirements these source strings arguments should comply with (or
rather, what are the data types supported by these functions) and that
using another language than English as translate-from language doesn't
seem like a common setup have contributed to this not getting too much
visibility until now.

In Python 3 (3.3) this problem doesn seem to exist. See tests below.

> My suggestion is to update the documentation saying that the developer should use ugettext_lazy for the language name if they want it translated

We have something along these lines. See (last two bullets)
https://docs.djangoproject.com/en/1.6/topics/i18n/translation/#how-django-discovers-language-preference

> *or* at least use a unicode string.

I will try to get some feedback from some more experienced devs to if
there is anything to be done about this in code and/or docs.


$ cat a.py
# -*- coding: utf-8 -*-

import sys
from django.utils.translation import ugettext, ugettext_lazy, ugettext_noop

print("Python version: %s" % sys.version)

a = ugettext_noop("Portugués")

print(a)
print(type(a))
b = u'%s' % a # try to get the lazy placeholder to evaluate itself.
print(b)
print(type(b))

$ file a.py
a.py: Python script, UTF-8 Unicode text executable

PYTHONPATH=.:~/django/upstream
DJANGO_SETTINGS_MODULE=lang_i18n.settings python a.py
Python version: 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3]
Portugués
<type 'str'>
Traceback (most recent call last):
File "a.py", line 12, in <module>
b = u'%s' % a # try to get the lazy placeholder to evaluate isself.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
7: ordinal not in range(128)


$ PYTHONPATH=.:~/django/upstream
DJANGO_SETTINGS_MODULE=lang_i18n.settings python3.3 a.py
Python version: 3.3.3 (default, Dec 27 2013, 19:27:19)
[GCC 4.6.3]
Portugués
<class 'str'>
Portugués
<class 'str'>


Thanks!

--
Ramiro Morales
@ramiromorales

Łukasz Rekucki

unread,
Jan 22, 2014, 3:28:00 PM1/22/14
to django-developers
Hi everyone,

First I'd like to say I got bitten by this in the past. What worries
me the most in the original report is the TEMPLATE_DEBUG part. IMHO,
this should fail loudly regardless of any debug settings.

As for other stuff:

On 22 January 2014 18:34, Ramiro Morales <cra...@gmail.com> wrote:
> I think you are onto something re: the fact that we don't make clear
> that our ugettext*() functions fail to accept encoded literals with
> characters outside ASCII under Python 2.x. even when the encoding
> metadata is correct.

I think everyone is forgetting that those are *u*gettext() functions
and they work fine with any literals as long as the argument type is
unicode, because that is the only type they know how to handle. The
implicit bytes->text conversion in Python 2 makes this a little less
obvious, but the expected argument type is in the name.

IMHO, It would be better for everyone if ugettext_lazy() and friends
fail immediately when given anything other then text (unicode on
Python 2, str on Python 3), but it's probably too late for that now.
At least until Python 2 support is dropped completely.


--
Łukasz Rekucki
Reply all
Reply to author
Forward
0 new messages