Why does html.UnescapeString not look for the trailing semicolon when finding HTML entities?

80 views
Skip to first unread message

Akash

unread,
Jul 2, 2020, 1:18:42 PM7/2/20
to golang-nuts
html.UnescapeString("Should this word, &currency, be unescaped?") 


Aren't HTML entities supposed to end with a semicolon? See https://developer.mozilla.org/en-US/docs/Glossary/Entity

I couldn't see any edge cases mentioned in the source .

Thanks.


David Finkel

unread,
Jul 5, 2020, 6:01:43 PM7/5/20
to Akash, golang-nuts
On Thu, Jul 2, 2020 at 1:18 PM Akash <akashk...@gmail.com> wrote:
html.UnescapeString("Should this word, &currency, be unescaped?") 


Aren't HTML entities supposed to end with a semicolon? See https://developer.mozilla.org/en-US/docs/Glossary/Entity
I think the answer is "yes", when sending, but "don't count on it" when parsing. (or unescaping in this case)

This comes under the heading of "Be conservative in what you do, be liberal in what you accept from others" (also, backwards compatibility

If the character reference was consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric, then, for historical reasons, flush code points consumed as a character reference and switch to the return state.

It looks like it wasn't required in HTML 4.01 as the section on entity references includes this note:
Note. In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present.

(which, fortunately, aligns with my feeling from the late-90s that the trailing semicolon was suggested but optional in HTML 4)

I couldn't see any edge cases mentioned in the source .

Thanks.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/baea05b9-6634-495b-a45f-78f02ec7a20bn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages