Django detects HTTP Accept-Language header in case-sensitive manner

2,040 views
Skip to first unread message

Wayne Ye

unread,
Sep 26, 2014, 3:58:34 AM9/26/14
to django-d...@googlegroups.com
I noticed on small defect here while I was testing my django website's I18N/L10n, basically below are example Accept-Language headers sent by Chrome and Firefox (all latest version):

Chrome: Accept-Language:  zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4
Firefox:  Accept-Language:      zh-tw,zh;q=0.8,en-us;q=0.5,en;q=0.3

You noticed that "zh-TW" and "zh-tw", according to the standard: w3c and ietf rfc2616, the language tag(s) are lower-cased.

But I've verified that Django will only correctly recognize Chrome's request (i.e. treat end user prefers zh-tw), for Firefox example above, Django will actually fallback to en-us/en.

Could some fellow developers in this forum kindly triage this issue? Once confirmed I will create a ticket in Trac and submit a pull request accordingly, expecting any feedback on this.


Thank you and best wishes!


--
Everything is worth doing is worth doing well!
Wayne's Geek Life 
http://WayneYe.com
Infinite Passion On Programming!

Carl Meyer

unread,
Sep 26, 2014, 11:32:58 AM9/26/14
to django-d...@googlegroups.com
Hello Wayne,

On 09/26/2014 01:58 AM, Wayne Ye wrote:
> I noticed on small defect here while I was testing my django website's
> I18N/L10n, basically below are example Accept-Language headers sent by
> Chrome and Firefox (all latest version):
>
> Chrome: Accept-Language: *zh-TW*,zh;q=0.8,en-US;q=0.6,en;q=0.4
> Firefox: Accept-Language: *zh-tw*,zh;q=0.8,en-us;q=0.5,en;q=0.3
>
> You noticed that "*zh-TW*" and "*zh-tw*", according to the standard: w3c
> <http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4>and ietf
> rfc2616 <http://tools.ietf.org/html/rfc2616#page-104>, the language
> tag(s) are lower-cased.
>
> But I've verified that Django will only correctly recognize Chrome's
> request (i.e. treat end user prefers zh-tw), for Firefox example above,
> Django will actually fallback to en-us/en.
>
> Could some fellow developers in this forum kindly triage this issue?
> Once confirmed I will create a ticket in Trac and submit a pull request
> accordingly, expecting any feedback on this.

Here's the section of RFC 2616 that makes it explicit that language tags
should be treated as case insensitive:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10

I haven't tested, but based on code inspection of
translation.get_language_from_request() and the functions it calls, it
appears to me that you are correct; Django is incorrectly handling
language tags as case-sensitive. If you'd be willing to file a bug and
submit a pull request, that would be great.

Ideally (ISTM) the pull request would be comprehensive in testing and
implementing case-insensitive locale-name handling throughout: i.e. not
just in handling the HTTP header, but also checking against
settings.LANGUAGES, etc.

I wonder if this also has implications for checking whether a given
locale is present on the system. It looks to me like we also do this
case-sensitively right now, but we should do it case-insensitively. This
might be a bit of work, since it would mean that rather than just
checking for the existence of a particular locale directory as we do
now, we would need to read in all the available locale names into memory
(presuming we can't/don't want to enforce that all locales on disk are
lower-cased or whatever).

(I'm not the most i18n-experienced member of the core team by far, so
hopefully if I'm totally off-track here someone will weigh in to correct
me.)

Carl

Ramiro Morales

unread,
Sep 27, 2014, 8:14:18 AM9/27/14
to django-d...@googlegroups.com
On Fri, Sep 26, 2014 at 4:58 AM, Wayne Ye <xiaot...@gmail.com> wrote:
> I noticed on small defect here while I was testing my django website's
> I18N/L10n, basically below are example Accept-Language headers sent by
> Chrome and Firefox (all latest version):
>
> Chrome: Accept-Language: zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4
> Firefox: Accept-Language: zh-tw,zh;q=0.8,en-us;q=0.5,en;q=0.3
>
> You noticed that "zh-TW" and "zh-tw", according to the standard: w3c and
> ietf rfc2616, the language tag(s) are lower-cased.
>
> But I've verified that Django will only correctly recognize Chrome's request
> (i.e. treat end user prefers zh-tw), for Firefox example above, Django will
> actually fallback to en-us/en.
>
> Could some fellow developers in this forum kindly triage this issue? Once
> confirmed I will create a ticket in Trac and submit a pull request
> accordingly, expecting any feedback on this.

I'd be very interested in learning about the scenario necessary to
reproduce this. My reading of the code and tests make me think it's
correclty handling these language names in a case insensitive manner
when it comes to interpret users' user agent language preferences.

Also, I've expanded test a bit to try to demonstrate this. See
https://github.com/ramiro/django/compare/lang-tag-ordering?expand=1

On Fri, Sep 26, 2014 at 12:32 PM, Carl Meyer <ca...@oddbird.net> wrote:
> Here's the section of RFC 2616 that makes it explicit that language tags
> should be treated as case insensitive:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10
>
> I haven't tested, but based on code inspection of
> translation.get_language_from_request() and the functions it calls, it
> appears to me that you are correct; Django is incorrectly handling
> language tags as case-sensitive.

See above.

>
> Ideally (ISTM) the pull request would be comprehensive in testing and
> implementing case-insensitive locale-name handling throughout: i.e. not
> just in handling the HTTP header, but also checking against
> settings.LANGUAGES, etc.
>
> I wonder if this also has implications for checking whether a given
> locale is present on the system. It looks to me like we also do this
> case-sensitively right now, but we should do it case-insensitively. This
> might be a bit of work, since it would mean that rather than just
> checking for the existence of a particular locale directory as we do
> now, we would need to read in all the available locale names into memory
> (presuming we can't/don't want to enforce that all locales on disk are
> lower-cased or whatever).

The names of directories with translations on disk are actually GNU
gettext locale names[1] as opposed to language names[1] (the ones in
the Accept-Language HTTP header and discussed above.)

It does specify that the part after the underscore separator must be a
ISO 3166 country code. See [2] and [3].

So, for me, this indicates Django current behavior with these file
system dir names is correct. But perhaps I'm missing something?

Regards,


1. https://docs.djangoproject.com/en/dev/topics/i18n/#definitions
2. https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html
3. https://www.gnu.org/software/gettext/manual/html_node/Country-Codes.html#Country-Codes

Wayne Ye

unread,
Sep 28, 2014, 10:27:31 PM9/28/14
to django-d...@googlegroups.com
Hi Ramiro,
Thank you very much for replying the thread, I read your testing code, it makes totally sense, however I guess there might be still some issues, I am not expert here but let me explain more details about what I found:


I was using translation.get_language() method to retrieve the client preferred language:

class IndexView(generic.TemplateView):
    def get(self, request):
        cur_language = translation.get_language()
        print('='*50)
        print('Current locale: ' + cur_language)
        print('='*50)
        return render(request, 'myapp/index.html', {}, context_instance=RequestContext(request))

 By hitting view above I noticed that django actually does not correctly recognise all lower-cased language tag (such as zh-cn), I did change your testing code a little bit to below and it will fail:

                 r.META = {'HTTP_ACCEPT_LANGUAGE': 'zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4'}
        self.assertEqual('zh-tw', g(r))
        self.assertEqual('zh-tw', get_language())
        r.META = {'HTTP_ACCEPT_LANGUAGE': 'zh-tw,zh;q=0.8,en-us;q=0.5,en;q=0.3'}
        self.assertEqual('zh-tw', g(r))
        self.assertEqual('zh-tw', get_language())

So I am thinking will it be better to use  get_language_from_request() instead of get_language(), i.e.:
cur_language = translation.get_language_from_request(request)

By doing this I don't see any issues. BTW, I was using both Django 1.4.15 LTS and edge 1.7.

Sincerely yours,
Wayne

Apostolos Bessas

unread,
Sep 29, 2014, 1:59:15 AM9/29/14
to django-d...@googlegroups.com
Hi Wayne,

On Mon, Sep 29, 2014 at 5:27 AM, Wayne Ye <xiaot...@gmail.com> wrote:

I was using translation.get_language() method to retrieve the client preferred language:

class IndexView(generic.TemplateView):
    def get(self, request):
        cur_language = translation.get_language()
        print('='*50)
        print('Current locale: ' + cur_language)
        print('='*50)
        return render(request, 'myapp/index.html', {}, context_instance=RequestContext(request))

 By hitting view above I noticed that django actually does not correctly recognise all lower-cased language tag (such as zh-cn), I did change your testing code a little bit to below and it will fail:

                 r.META = {'HTTP_ACCEPT_LANGUAGE': 'zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4'}
        self.assertEqual('zh-tw', g(r))
        self.assertEqual('zh-tw', get_language())
        r.META = {'HTTP_ACCEPT_LANGUAGE': 'zh-tw,zh;q=0.8,en-us;q=0.5,en;q=0.3'}
        self.assertEqual('zh-tw', g(r))
        self.assertEqual('zh-tw', get_language())

So I am thinking will it be better to use  get_language_from_request() instead of get_language(), i.e.:
cur_language = translation.get_language_from_request(request)



As Ramiro said, Django uses a different naming scheme for "language names" (GNU way) than what the Accept-Language header uses (BCP-47). 

The function get_language() can only be used to get the *already activated* language (based on Django's names) and not read it from the Accept-Language header. The steps to do this manually would be something like:

lang = guess_language()  // call get_language_from_request
activate_language(lang)   // sets the active language for Django
print get_language()         // prints that language.

In the tests above, g(r) has not activated any "language" in django for get_language() to work.

Can you double-check you have followed all steps at [1]?

I hope this helps,
Apostolis


 

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/32345674-3256-4204-91ac-66bfc5869213%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Apostolos Bessas

unread,
Sep 29, 2014, 2:19:15 AM9/29/14
to django-d...@googlegroups.com
On Sat, Sep 27, 2014 at 3:13 PM, Ramiro Morales <cra...@gmail.com> wrote:
> I wonder if this also has implications for checking whether a given
> locale is present on the system. It looks to me like we also do this
> case-sensitively right now, but we should do it case-insensitively. This
> might be a bit of work, since it would mean that rather than just
> checking for the existence of a particular locale directory as we do
> now, we would need to read in all the available locale names into memory
> (presuming we can't/don't want to enforce that all locales on disk are
> lower-cased or whatever).

The names of directories with translations on disk are actually GNU
gettext locale names[1] as opposed to language names[1] (the ones in
the Accept-Language HTTP header and discussed above.) 
It does specify that the part after the underscore separator must be a
ISO 3166 country code. See [2] and [3].

So, for me, this indicates Django current behavior with these file
system dir names is correct. But perhaps I'm missing something?

I guess it is "correct", meaning that all software/frameworks I know do use case-sensitive filenames. BCP47 suggests consistency as well [1]: 

"Although case distinctions do not carry meaning in language tags, consistent formatting and presentation of language tags will aid users."

Regarding the names themselves, BCP47 has been gaining ground and IMHO it handles language names a bit better. But that is not a reason to change anything.

Regards,
Apostolis

Carl Meyer

unread,
Sep 29, 2014, 1:45:20 PM9/29/14
to django-d...@googlegroups.com
Hi Ramiro,

On 09/27/2014 06:13 AM, Ramiro Morales wrote:
> The names of directories with translations on disk are actually GNU
> gettext locale names[1] as opposed to language names[1] (the ones in
> the Accept-Language HTTP header and discussed above.)
>
> It does specify that the part after the underscore separator must be a
> ISO 3166 country code. See [2] and [3].
>
> So, for me, this indicates Django current behavior with these file
> system dir names is correct. But perhaps I'm missing something?

Nope, I was the one missing just not one, but several things. Not only
the distinction between the two types of names, but the obvious call to
`.lower()` in `parse_accept_lang_header()`. My bad; clearly should have
explored more deeply before posting. Thanks for the corrections!

> 1. https://docs.djangoproject.com/en/dev/topics/i18n/#definitions

I do notice that this bit of documentation asserts that a language code
"Represents the name of a language. Browsers send the names of the
languages they accept in the Accept-Language HTTP header using this
format. Examples: it, de-at, es, pt-br. Both the language and the
country parts are in lower case."

It seems misleading to say that browsers send Accept-Language in "this
format" and then say that both parts in this format "are in lower case".
Do you think this paragraph should be updated to clarify that the
Accept-Language header is actually handled case-insensitively?

Carl

Ramiro Morales

unread,
Oct 15, 2014, 9:47:58 PM10/15/14
to django-d...@googlegroups.com
Hi Carl,

On Mon, Sep 29, 2014 at 2:45 PM, Carl Meyer <ca...@oddbird.net> wrote:
>> 1. https://docs.djangoproject.com/en/dev/topics/i18n/#definitions
>
> I do notice that this bit of documentation asserts that a language code
> "Represents the name of a language. Browsers send the names of the
> languages they accept in the Accept-Language HTTP header using this
> format. Examples: it, de-at, es, pt-br. Both the language and the
> country parts are in lower case."
>
> It seems misleading to say that browsers send Accept-Language in "this
> format" and then say that both parts in this format "are in lower case".
> Do you think this paragraph should be updated to clarify that the
> Accept-Language header is actually handled case-insensitively?

Today while reading about this library[1] I realized I never went back
to this thread. Please accept my apologies.

Regarding your comments about that documentation paragraph I must say
I agree with you. I can correct and enhance it myself, hopefully soon
if nobody beats me to it.


--
Ramiro Morales
@ramiromorales

1. https://github.com/LuminosoInsight/langcodes

Wayne Ye

unread,
Oct 16, 2014, 11:41:26 PM10/16/14
to django-d...@googlegroups.com
HI Carl and Ramiro,
Thank you both very much for tracking this issue!

Actually I am a junior Django developer (but has nearly 10 years exp on ASP.NET and RoR), I am eager to contribute to the Django project!  If this issue has not yet been logged onto Trac and not yet working in progress, can I humbly request give me an opportunity to fix it, I will feel so honored and excited to be working on it.

Looking forward to your reply, if you agree I will try to fix this within few days!

Carl Meyer

unread,
Oct 17, 2014, 11:48:34 AM10/17/14
to django-d...@googlegroups.com
On 10/16/2014 09:41 PM, Wayne Ye wrote:
> HI Carl and Ramiro,
> Thank you both very much for tracking this issue!
>
> Actually I am a junior Django developer (but has nearly 10 years exp on
> ASP.NET and RoR), I am eager to contribute to the Django project! If
> this issue has not yet been logged onto Trac and not yet working in
> progress, can I humbly request give me an opportunity to fix it, I will
> feel so honored and excited to be working on it.
>
> Looking forward to your reply, if you agree I will try to fix this
> within few days!

Sure, feel free to submit a pull request!

Carl

Wayne Ye

unread,
Oct 20, 2014, 6:59:22 AM10/20/14
to django-d...@googlegroups.com
Hi Carl,
Thanks for the encouragement!
I've created a new ticket in Trac: https://code.djangoproject.com/ticket/23689#ticket, could you please kindly take a look, feel free to revise it if I missed anything, and change the triage status, thanks a lot!

I will try to submit a patch/pull request within few days.

Sincerely yours,
Wayne

Carl Meyer

unread,
Oct 20, 2014, 2:04:05 PM10/20/14
to django-d...@googlegroups.com
Hi Wayne,

On 10/20/2014 04:59 AM, Wayne Ye wrote:
> Hi Carl,
> Thanks for the encouragement!
> I've created a new ticket in
> Trac: https://code.djangoproject.com/ticket/23689#ticket, could you
> please kindly take a look, feel free to revise it if I missed anything,
> and change the triage status, thanks a lot!
>
> I will try to submit a patch/pull request within few days.

Sorry for the confusion, I didn't communicate clearly.

Although I initially thought (based on inadequate code inspection) that
your report was accurate, Ramiro already corrected me in this thread
(and also Claude on the ticket you opened). Django does handle the HTTP
Accept-Language header case-insensitively (ever since [1]). If that's
not working for you, then something else is going on with your project.
Perhaps you are using an older Django version which pre-dates that commit?

The only remaining action item from this thread was to update the
documentation at [2] to clarify that Accept-Language is
case-insensitive. This is the piece I thought you were offering to write
a pull request for. I've now made this update myself (in [3]), so I
don't think there's anything left to be done here.

Thanks for your interest in contributing to Django!

Carl

[1]
https://code.djangoproject.com/changeset/2bab9d6d9ea095c4bcaeede2df576708afd46291/
[2] https://docs.djangoproject.com/en/dev/topics/i18n/#definitions
[3]
https://github.com/django/django/commit/2118aa8aeafe0a215eae7188c40484d791921c67
Reply all
Reply to author
Forward
0 new messages