--
Ticket URL: <https://code.djangoproject.com/ticket/16501>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_docs: => 0
* easy: 1 => 0
* needs_tests: => 0
* needs_better_patch: => 0
Comment:
Congrats, you have found the very best way to piss off the maintainers of
an open-source project in your first sentence :)
You may want to read the FAQ:
https://docs.djangoproject.com/en/1.3/faq/contributing/#faq-contributing-
code
A bug won't be fixed if you don't submit a patch:
https://docs.djangoproject.com/en/dev/internals/contributing/writing-code
/submitting-patches/
----
Now, regarding your report, out of the 4 files called `validators.py` in
Django, I suppose you're referring to the one in `django/core`. You're
saying that the built-in validator `django.core.validators.validate_slug`
should accept unicode characters.
This function is currently documented as:
> A `RegexValidator` instance that ensures a value consists of only
letters, numbers, underscores or hyphens.
We could debate if "letters" means "ASCII letters" or "Unicode characters
that have some property that says they're a letter". Currently, it means
"ASCII letters".
`validate_slug` is used in only one place in Django, as the default
validator for `SlugField`.
So, the real problem here is that `SlugField` will only accept ASCII
letters.
----
In my opinion, the whole point of a slug is to contain only ASCII
characters, to make sure URLs built with slugs have no charset issues, no
matter where they're copy-pasted (URL bars, email, IRC, IM, etc.)
If the `SlugField` provided by Django doesn't meet your expectations, you
can easily build your own:
{{{
class MyCustomSlugField(CharField):
default_validators = [my_custom_validate_slug]
}}}
I'm leaning towards WONTFIX, but I'd like someone else to confirm my
analysis before closing the ticket.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:1>
Comment (by anonymous):
The world is not english speaking only.
Why do we have to do some magic instead of using out-of-the-box
functionality?
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:2>
Comment (by justinlilly):
Non-ASCII letters are completely valid in URL schemes. The whole point of
a slug isn't to contain ASCII letters. It is merely to provide a human-
readable URL entry. There are plenty of humans who read outside of ASCII.
Assuming the fix can be made backwards compatible, I see no reason not to
have it.
Also, aaugustin, your tone came off a little combative/hostile. I'm sure
you didn't mean it that way, but that's how it came out. Just a heads up.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:3>
* stage: Unreviewed => Design decision needed
Comment:
Replying to [comment:2 anonymous]:
> The world is not english speaking only.
Sure — as a native speaker of a language that uses accents heavily, I
agree completely :)
[[BR]]
> Why do we have to do some magic instead of using out-of-the-box
functionality?
Well, if you don't like the out-of-the-box functionality that everyone
else uses, it sounds reasonable to write you own variant. There's hardly
any magic in my suggestion.
----
Replying to [comment:3 justinlilly]:
> Non-ASCII letters are completely valid in URL schemes.
Yes, since RFC 3986, it's possible to use non-ASCII characters in URLs
without ambiguities: the charset must be UTF-8.
That RFC was published is 2005. Browser vendors may not have changed it
immediately (historically, most browsers defaulted to latin1 in western
languages), and some people still use browsers from a few years ago. I
don't know exactly what's the current status. We need to check how
mainstream browsers react to an URL like "http://localhost/how-to-brew-
café/" before proceeding: do they properly utf-8-encode and percent-encode
it?
[[BR]]
> The whole point of a slug isn't to contain ASCII letters. It is merely
to provide a human-readable URL entry. There are plenty of humans who read
outside of ASCII.
I'm a bit confused by this. Django provides a `SlugField` that is designed
to contain an URL-friendly version of a `CharField`. For instance, the
title of a blog post could be "How to brew café" (in the `CharField`) and
the slug would be "how-to-brew-cafe" (in the `SlugField`). If you assume
that an URL can contain any text, you don't need `SlugField` at all; you
can just use the original title.
A drawback is that the actual, correct URL is: http://localhost/how-to-
brew-caf%C3%A9/. Some browsers may percent-decode and utf-8-decode this to
display it as: http://localhost/how-to-brew-café/ — I think at least
Firefox does. I don't know about IE. If it turns out we have to choose
between http://localhost/how-to-brew-caf%C3%A9/ and http://localhost/how-
to-brew-cafe/, I think the latter is more readable.
[[BR]]
> Also, aaugustin, your tone came off a little combative/hostile. I'm sure
you didn't mean it that way, but that's how it came out. Just a heads up.
Thanks for the hint and sorry -- the tone of the original report got a bit
on my nerves. Comment 2 is great too... :)
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:4>
Comment (by norn):
Replying to [comment:4 aaugustin]:
> Replying to [comment:3 justinlilly]:
> > Non-ASCII letters are completely valid in URL schemes.
>
> Yes, since RFC 3986, it's possible to use non-ASCII characters in URLs
without ambiguities: the charset must be UTF-8.
>
> That RFC was published is 2005. Browser vendors may not have changed it
immediately (historically, most browsers defaulted to latin1 in western
languages), and some people still use browsers from a few years ago. I
don't know exactly what's the current status. We need to check how
mainstream browsers react to an URL like "http://localhost/how-to-brew-
café/" before proceeding: do they properly utf-8-encode and percent-encode
it?
[[BR]]
Well, why should we fear of browser incompatibility to pre-2005 standards
in Django to be released in 2011/2012?
Django 1.0 to 1.3 do not allow utf8 in slug, but world is changing. If
RFC3986 was approved in 2005, then django have to comply it.
I am sorry if my tone seems not friendly. English is not my native
language, but I am definitely do not want to offense anybody. I am just
trying to solve the problem and communicate in the most effecient way
without formal shell. Peace!
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:5>
Comment (by aaugustin):
Regarding browser support, the current consensus (per a recent discussion
on the mailing list) is to drop IE6 and start at IE7 — released in 2006.
We have to live with the fact that many developers use Django to create
websites for end users that aren't on the cutting edge of technology :)
In my opinion, the real issue here isn't the date of the RFC. It's the
support in mainstream browsers (IE, Firefox, Chrome, and to a lesser
extent Safari, Opera). I don't know if the RFC was actually implemented in
these browsers, and when — many RFCs die unimplemented... That why I
suggested that, if someone is interested in this, the first step is to
prove that it works correctly in IE >= 7, Firefox >= 3, and the latest
Chrome, Safari and Opera.
Then, there's still the question of whether the percent-encoded version is
more readable than the ASCII version, but that's a matter of taste, and
I'm not the one who makes these decisions :)
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:6>
* stage: Design decision needed => Accepted
Comment:
Marking accepted: `SlugField` should be useful in most of the world, not
just the English-speaking parts.
(`SlugField` probably should take a `regex` argument so people could
easily change the concept of what a "slug" is. But that's another ticket
:)
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:7>
* owner: nobody => pbnan
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:8>
* status: assigned => closed
* resolution: => fixed
* has_patch: 0 => 1
* type: Bug => New feature
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:9>
* status: closed => reopened
* resolution: fixed =>
Comment:
Closed accidentally, sorry.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:10>
* cc: kwadrat (added)
* stage: Accepted => Ready for checkin
Comment:
Reviewed - ready for checkin.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:11>
* stage: Ready for checkin => Accepted
Comment:
I'm bumping this back out of RFC. While I agree wholeheartedly that we
need to allow users to choose unicode slugs if they so desire, we can't
just arbitrarily change the way this works. Many developers depend on the
promise that the slug is ASCII. Arbitrarily switching to allow unicode
characters will produce bugs in previously stable code, and may introduce
security vulnerabilities.
Django's backwards compatibility policy requires that features like this
continue to work as documented. Before this patch is ready to commit, the
patch must be re-worked so that the default slug behavior of reducing to
ASCII remains. Whether or not the default behavior should change at some
point in the future (1.5 or 1.6) is a matter that should be discussed on
the django-dev list.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:12>
* needs_better_patch: 0 => 1
* needs_docs: 0 => 1
Comment:
I would propose to add a UnicodeSlugField instead of introducing an
incompatible change, thats why I would propose a patch needs update.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:14>
Comment (by slikts):
One of the requirements in my app is that I need to have non-English slugs
for use in URLs. This should pose no problem, since modern browsers
support Unicode in URLs, and Django is a framework that supports modern
practices as well, right? So I try to do it, but find out that `SlugField`
only works with ASCII by default. This is mildly annoying, but not really
a problem, since you can just extend `SlugField`, yes? So I look up the
source for `SlugField` and modify the validator regexp like this:
{{{
slug_re = re.compile(r'^[-\w_]+$', re.UNICODE)
validate_slug = RegexValidator(slug_re, "Enter a valid name consisting of
letters, numbers, underscores or hyphens.", 'invalid')
class SlugField(models.SlugField):
default_validators = [validate_slug]
}}}
As it turns out, it just doesn't work, Unicode alphanumerical characters
are still rejected as invalid, and I have not the slightest idea why. A
workaround I found was extending directly from `CharField`, but having to
completely reimplement Django functionality just because I want to use
more than ASCII not very nice at all.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:15>
Comment (by fcurella):
I've wrote a patch adding a ``unicode`` option for SlugField. Pull Request
is at https://github.com/django/django/pull/1979
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:16>
Comment (by fcurella):
I've created an alternative pul request that adds
`db.models.UnicodeSlugField` and `forms.UnicodeSlugField`
https://github.com/django/django/pull/1987
On one hand, I like the new field approach because it gives us an easier
upgrade path. On the other hand, it feels like I'm polluting
`django.db.models` and `django.forms` with slightly different versions of
something that's already there.
Unfortunately, just passing a regex argument (as suggested by Jacob) is
not going to be enough, because we'll also have to pass an error message
string for the validator.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:17>
Comment (by timgraham):
An alternate approach suggested by Claude on the mailing list was to
"create a new `validate_uslug` validator (better name anyone?) and allow a
custom validator to be passed to the SlugField constructor." We need
someone to compare these three approaches (at least) and find a consensus
on the DevelopersMailingList in order to move this issue forward.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:18>
Comment (by timgraham):
Well, I realized that 3rd approach is more or less the same as the PR
linked in comment 16.
I created a [https://groups.google.com/d/topic/django-
developers/Ic2hH3AWdUg/discussion django-developers thread] to get
feedback on the two approaches.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:19>
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:20>
Comment (by berkerpeksag):
I've opened [https://github.com/django/django/pull/3729 PR #3729] to
revise [https://github.com/django/django/pull/1979 PR #1979].
Changes:
* Fixed merge conflicts
* Added release notes
* Added more documentation
* Fixed commit message format
* Adressed review comments
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:21>
* needs_better_patch: 1 => 0
* version: 1.3 => master
* needs_docs: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:22>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:23>
Comment (by kutenai):
Submitted a new [https://github.com/django/django/pull/4518/commits PR
#4518] to replace [https://github.com/django/django/pull/3729 PR #3729].
The new pull request is rebased onto the latest master, and fixes some
items discussed in the previous pull request.
There is one item that needs to be reviwed.
django/contrib/admin/options.py:1744
{{{
@property
def media(self):
extra = '' if settings.DEBUG else '.min'
js = ['jquery%s.js' % extra, 'jquery.init.js', 'inlines%s.js' %
extra]
# TODO: Was removed in master, modified in patch.
if self.prepopulated_fields:
js.extend(['xregexp.min.js', 'urlify.js', 'prepopulate%s.js' %
extra])
if self.filter_vertical or self.filter_horizontal:
js.extend(['SelectBox.js', 'SelectFilter2.js'])
return forms.Media(js=[static('admin/js/%s' % url) for url in js])
}}}
I am not sure what to do with the updated code.
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:24>
Comment (by Tim Graham <timograham@…>):
In [changeset:"054e74420b7a31bac67d4993b462eea7b9b7a5ba" 054e7442]:
{{{
#!CommitTicketReference repository=""
revision="054e74420b7a31bac67d4993b462eea7b9b7a5ba"
Refs #16501, #26474 -- Added xregexp.js source file.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:26>
Comment (by Tim Graham <timograham@…>):
In [changeset:"ef93af919b557563e678ec0f9fb507bd2c6768d9" ef93af91]:
{{{
#!CommitTicketReference repository=""
revision="ef93af919b557563e678ec0f9fb507bd2c6768d9"
[1.10.x] Refs #16501, #26474 -- Added xregexp.js source file.
Backport of 054e74420b7a31bac67d4993b462eea7b9b7a5ba from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:27>
Comment (by Tim Graham <timograham@…>):
In [changeset:"307de7d9e80e1148685b981eede0bd2fdbe91d46" 307de7d9]:
{{{
#!CommitTicketReference repository=""
revision="307de7d9e80e1148685b981eede0bd2fdbe91d46"
[1.9.x] Refs #16501, #26474 -- Added xregexp.js source file.
Backport of 054e74420b7a31bac67d4993b462eea7b9b7a5ba from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/16501#comment:28>