Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

emacs lisp: function to decode url percentage encoding

133 views
Skip to first unread message

Xah Lee

unread,
May 18, 2010, 2:14:49 PM5/18/10
to
is there emacs lisp function that decode the url percent encoding?

e.g.
http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem

should become

http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem

that's a EN DASH (unicode 8211, #o20023, #x2013).

I know there's a

(require 'gnus-util)
gnus-url-unhex-string

but that just unhex, and generate gibberish if the url contain unicode
chars.

thanks.

Xah
http://xahlee.org/


José A. Romero L.

unread,
May 22, 2010, 7:24:18 PM5/22/10
to
On May 18, 20:14, Xah Lee <xah...@gmail.com> wrote:
> is there emacs lisp function that decode the url percent encoding?
>
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem

>
> should become
>
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
>
> that's a EN DASH (unicode 8211, #o20023, #x2013).
>
> I know there's a
>
>   (require 'gnus-util)
>  gnus-url-unhex-string
>
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
(...)

Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.

Until a proper implementation is done, I guess you could work around
the problem with something like this:

(decode-coding-string
(apply 'unibyte-string
(string-to-list
(url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
'utf-8)

(yes, it's ugly as hell but hey, it's free ;])

I've just sent this very message as a bug report to the Emacs team.

Cheers,
--
José A. Romero L.
escherdragon at gmail
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)

0 new messages