e.g.
http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
should become
http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
that's a EN DASH (unicode 8211, #o20023, #x2013).
I know there's a
(require 'gnus-util)
gnus-url-unhex-string
but that just unhex, and generate gibberish if the url contain unicode
chars.
thanks.
Xah
∑ http://xahlee.org/
☄
Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.
Until a proper implementation is done, I guess you could work around
the problem with something like this:
(decode-coding-string
(apply 'unibyte-string
(string-to-list
(url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
'utf-8)
(yes, it's ugly as hell but hey, it's free ;])
I've just sent this very message as a bug report to the Emacs team.
Cheers,
--
José A. Romero L.
escherdragon at gmail
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)