Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

bug#6252: Emacs does not implement URL (aka "percent") decoding correctly.

5 views
Skip to first unread message

José A. Romero L.

unread,
May 22, 2010, 7:46:54 PM5/22/10
to 62...@debbugs.gnu.org
On May 18, 20:14, Xah Lee <xah...@gmail.com> wrote:

> is there emacs lisp function that decode the url percent encoding?
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
> should become
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
> that's a EN DASH (unicode 8211, #o20023, #x2013).
> I know there's a
> (require 'gnus-util)
> gnus-url-unhex-string
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
(...)

Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.

Until a proper implementation is done, I guess you could work around
the problem with something like this:

(decode-coding-string
(apply 'unibyte-string
(string-to-list
(url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
'utf-8)

(yes, it's ugly as hell but hey, it's free ;])

I've just sent this very message as a bug report to the Emacs team.

Cheers,
--
José A. Romero L.
escher...@gmail.com
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)

0 new messages