[erlang-questions] Unescape HTML text

63 views
Skip to first unread message

Zabrane Mickael

unread,
Sep 25, 2012, 2:02:06 PM9/25/12
to Erlang-Questions Questions
Hi huis,

I want  to convert an HTML escaped text (http://www.w3schools.com/tags/ref_entities.asp) like this one:

bourgé Cop

to:

bourgé Cop

Is there any Erlang library for this?

Regards,
Zabrane

Zabrane Mickael

unread,
Sep 25, 2012, 2:48:15 PM9/25/12
to Erlang-Questions Questions
answering my own question:

unescape(<<>>) -> 
    <<>>;
unescape([]) ->
    <<>>;
unescape(L) when is_list(L) ->
    unescape(list_to_binary(L));
unescape(B) when is_binary(B) ->
    unescape(B, <<>>).

unescape(<<>>, Acc) -> 
    Acc;
unescape(<<"&nbsp;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, " ">>);
unescape(<<"&amp;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "&">>);
unescape(<<"&quot;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "\"">>);
unescape(<<"&apos;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "\'">>);
unescape(<<"&#39;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "'">>);
unescape(<<"&lt;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "<">>);
unescape(<<"&gt;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, ">">>);
unescape(<<"&euro;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "€">>);
unescape(<<"&ccedil;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ç">>);
unescape(<<"&agrave;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "à">>);
unescape(<<"&acirc;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "â">>);
unescape(<<"&auml;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ä">>);
unescape(<<"&aelig;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "æ">>);
unescape(<<"&egrave;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "è">>);
unescape(<<"&eacute;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "é">>);
unescape(<<"&ecirc;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ê">>);
unescape(<<"&euml;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ë">>);
unescape(<<"&icirc;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "î">>);
unescape(<<"&iuml;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ï">>);
unescape(<<"&ouml;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ö">>);
unescape(<<"&ugrave;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ù">>);
unescape(<<"&uacute;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ú">>);
unescape(<<"&ucirc;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "û">>);
unescape(<<"&uuml;", T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, "ü">>);

unescape(<<C, T/binary>>, Acc) ->
    unescape(T, <<Acc/binary, C>>).


Regards,
Zabrane

Bob Ippolito

unread,
Sep 25, 2012, 3:31:37 PM9/25/12
to Zabrane Mickael, Erlang-Questions Questions
That's a subset of possible inputs, you're better off using a library that's a bit more complete. This may be useful: https://github.com/mochi/mochiweb/blob/master/src/mochiweb_charref.erl

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions


Zabrane Mickael

unread,
Sep 25, 2012, 3:33:09 PM9/25/12
to Bob Ippolito, Erlang-Questions Questions
Thanks Bob.
Exactly what I needed.

Regards,
Zabrane
Reply all
Reply to author
Forward
0 new messages