I wanna provide listing based in state / city /zones. Some citys in
Colombia are "Medellín", "Santa Marta" and so on...
So, the url are transformed to Medell%C3%ADn and Santa%20Marta.
Easy, I think... in the urls:
(r'^(?P<depto>[a-zA-Z0-9%\-]+)/(?P<city>[a-zA-Z0-9%\-]+)/$',
'restaurant.views.byCity' ),
to match depto and city.
I test this in the interactive mode:
re.match(r'^(?P<depto>[a-zA-Z0-9%\-]+)/(?P<city>[a-zA-Z0-9%\-]+)/$',r'a/Bogot%C3%A1/').groups()
>>('a', 'Bogot%C3%A1')
However, the django site not can found this...
Page not found (404)
I don't find another regular expression that work fine....
Any idea?
So, unless I'm a moron (in the Pilgrim sense[1]), in general, you can expect to get UTF-8 URLs.
I modified the regex to:
^(?<depto>[a-zA-Z0-9%\\\-]+)/(?<city>[a-zA-Z0-9%\\\-]+)/$
With this test string:
re.match(r'^(?P<depto>[a-zA-Z0-9\%\\\-]+)/(?P<city>[a-zA-Z0-9\%\\\-]+)/$',r'Medell%C3%ADn/Medell\xEDn/').groups()
Work outside django but not inside it...
In where I need to look to see exactly what is evaluated?
django.core.urlresolvers.RegexURLPattern.resolve
...and from the look of that test, no, you don't understand the encoding issue.
it'd be more like:
re.match(r'[a-zA-Z0-9\%\\\xED-\xEF]', 'Medill\xEDn')
...In other words, your character class should include every character
you'll accept.
Here's an excellent tutorial:
http://www.regular-expressions.info/unicode.html
Unfortunately, googling for "unicode regex url" turned up nothing
useful. I think a django-provided character class for "any char other
than URL-specials like ?#&/" would be good.
On that tack, perhaps [^?#&=/] (or similar) is what you want. ;-)