emacs quiz: replacing html entities with unicode char

53 views
Skip to first unread message

Xah Lee

unread,
Sep 23, 2011, 2:56:56 AM9/23/11
to
emacs quiz: replacing html entities with unicode char

here's a emacs quiz for you advanced emacs users out there.

HTML version at
http://xahlee.blogspot.com/2011/09/emacs-quiz-replacing-html-entities-with.html

plain text version follows:
───────────────────────────────────────

I have a file with content like this:


<tr><td>pound</td><td>&#163;</td><td>pound sign, U+00A3</td></tr>
<tr><td>curren</td><td>&#164;</td><td>currency sign, U+00A4</td></tr>
<tr><td>yen</td><td>&#165;</td><td>yen sign = yuan sign, U+00A5</td></
tr>
<tr><td>brvbar</td><td>&#166;</td><td>broken bar = broken vertical
bar, U+00A6</td></tr>
<tr><td>sect</td><td>&#167;</td><td>section sign, U+00A7</td></tr>
<tr><td>uml</td><td>&#168;</td><td>diaeresis = spacing diaeresis, U
+00A8</td></tr>
<tr><td>copy</td><td>&#169;</td><td>copyright sign, U+00A9</td></tr>
<tr><td>ordf</td><td>&#170;</td><td>feminine ordinal indicator, U
+00AA</td></tr>
<tr><td>laquo</td><td>&#171;</td><td>left-pointing double angle
quotation mark = left pointing guillemet, U+00AB</td></tr>

I need it to be like this:


<tr><td>pound</td>£<td>pound sign, U+00A3</td></tr>
<tr><td>curren</td>¤<td>currency sign, U+00A4</td></tr>
<tr><td>yen</td>¥<td>yen sign = yuan sign, U+00A5</td></tr>
<tr><td>brvbar</td>¦<td>broken bar = broken vertical bar, U+00A6</td></
tr>
<tr><td>sect</td>§<td>section sign, U+00A7</td></tr>
<tr><td>uml</td>¨<td>diaeresis = spacing diaeresis, U+00A8</td></tr>
<tr><td>copy</td>©<td>copyright sign, U+00A9</td></tr>
<tr><td>ordf</td>ª<td>feminine ordinal indicator, U+00AA</td></tr>
<tr><td>laquo</td>«<td>left-pointing double angle quotation mark =
left pointing guillemet, U+00AB</td></tr>

How would you do it using emacs's power?

I'll post a solution in 2 days.

PS you can use emacs lisp too, whichever solution you find easier.

Xah

Carlos

unread,
Sep 23, 2011, 5:00:49 PM9/23/11
to
[Xah Lee <xah...@gmail.com>, 2011-09-22 23:56]
[...]
> I have a file with content like this:
> <tr><td>pound</td><td>&#163;</td><td>pound sign, U+00A3</td></tr>
[...]
> I need it to be like this:
>
> …
> <tr><td>pound</td>£<td>pound sign, U+00A3</td></tr>
[...]

Type this:

C-M-% &#\([[:digit:]]+\); RET \,(char-to-string \#1) RET

(C-M-% runs the command query-replace-regexp.)

Greetings.
--

Xah Lee

unread,
Sep 24, 2011, 5:50:25 PM9/24/11
to
Nice one.

here's my solution, basically the same but written out fully with a
function.

http://xahlee.org/emacs/elisp_replace_html_entities.html

good for emacs newbies.

PS Carlos, if you have a website, blog, or some online profile page,
i'd be happy to link your name to it. Thanks.

Xah
Reply all
Reply to author
Forward
0 new messages