Effective February 22, 2024, Google Groups will no longer support new Usenet content. Posting and subscribing will be disallowed, and new content from Usenet peers will not appear. Viewing and searching of historical data will still be supported as it is done today.

bug#6252: Emacs does not implement URL (aka "percent") decoding correctly.

5 views
Skip to first unread message

José A. Romero L.

unread,
May 23, 2010, 2:46:54 AM5/23/10
to 62...@debbugs.gnu.org
On May 18, 20:14, Xah Lee <xah...@gmail.com> wrote:

> is there emacs lisp function that decode the url percent encoding?
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
> should become
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
> that's a EN DASH (unicode 8211, #o20023, #x2013).
> I know there's a
> (require 'gnus-util)
> gnus-url-unhex-string
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
(...)

Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.

Until a proper implementation is done, I guess you could work around
the problem with something like this:

(decode-coding-string
(apply 'unibyte-string
(string-to-list
(url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
'utf-8)

(yes, it's ugly as hell but hey, it's free ;])

I've just sent this very message as a bug report to the Emacs team.

Cheers,
--
José A. Romero L.
escher...@gmail.com
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)

Reply all
Reply to author
Forward
0 new messages