The question is : which charset should be used in step 2? settings.DEFAULT_CHARSET or utf-8 (hardcoded)?
Of course, since the default value of DEFAULT_CHARSET is utf-8, this only makes a difference for the websites where it's been changed. They're probably a minority.
Currently, Django uses utf-8. As far as I can tell, that's more a side-effect of (ab)using force_str than anything else. It also has the drawback of making it impossible to serve perfectly legit HTTP URLs such as /caf%E9/ — try it:
https://www.djangoproject.com/caf%E9/ — that returns a 400 with no content. I think I once saw a ticket about this, but I can't locate it right now.
If we switch to DEFAULT_CHARSET, we'll also have to change the reverse() function and the {% url %} tag to honor DEFAULT_CHARSET when it encodes URLs, so that URLs round-trip properly.
Arguments for DEFAULT_CHARSET / against UTF-8:
- The query string is already decoded with DEFAULT_CHARSET; it's weird to decode different parts of the URL with different charsets (principle of least astonishment).
- It should be possible to serve any valid HTTP URL with Django (see example above).
Arguments for UTF-8 / against DEFAULT_CHARSET:
- Browsers default to UTF-8 when they open non-ASCII URLs.
- Everyone should use UTF-8 everywhere anyway; HTTP only allows non-ASCII URLs for legacy reasons.
Do you have experience on this topic? What do you think?