As you known, the browser will encode the non-ascii characters in URL,
for example
http://localhost:8080/tag/%E9%A3%8E%E9%99%A9%E7%AE%A1%E7%90%86
The non-ascii characters will be encoded with UTF-8 first, and convert
it to a percent-encoded string
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set
[UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be
represented
as "%C3%80", and the character KATAKANA LETTER A would be
represented
as "%E3%82%A2".
http://tools.ietf.org/html/rfc3986
So, tornado should decode it in reverse, and pass the decoded string
to the request handler
But if we define a handler with parameter, like this
URLSpec(r'/tag/(?P<name>.+)', TagHandler, name='tag'),
Tornado will decode the whole url and parameters as Unicode, the non-
ascii code will be unquoted to a invalid string
# web.py:1198
for spec in handlers:
match = spec.regex.match(request.path) # path is a
unicode string
if match:
# None-safe wrapper around urllib.unquote to
handle
# unmatched optional groups correctly
def unquote(s):
if s is None: return s
return urllib.unquote(s) # it should be
urllib.unquote(str(s)).decode('utf-8')