Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FYI: Named entities handling by innerHTML

3 views
Skip to first unread message

VK

unread,
Oct 10, 2009, 6:28:01 AM10/10/09
to
In continuation of discussion of
http://groups.google.ru/group/comp.lang.javascript/browse_frm/thread/f9904e15ef8618d0

To make the post HTML-viewing friendlier all named entities (NE) and
numeric character references (NCR) are given with leading * instead of
& to prevent their parsing if viewed as HTML.

When dealing with innerHTML results one needs to account that *lt;
*gt; *amp; and *nbsp; characters are back-formatted to their
respective NE, irrelevant if in the source they were NE or NCR
encoded. This behavior is consistent among all prominent browsers
except *nbsp; with Opera handling it as a regular character.
To safely handle element's content one should use innerText/
textContent instead where the results are consistent and as expected
for either NE or NCR

It is also worth to note that NE *apos; didn't make it into HTML 4.x
(yet part of HTML 5) so never was supported by IE and it is still not
supported by it. This exception is unique to IE.

Test page at http://jsnet.sourceforge.net/entities.html

1st block of results for innerText/textContent
---------------
2nd block of results for innerHTML


Results for innerHTML:

IE 8.0.6001, IE 6.0.2900
NE: *lt; NCR: *lt;
NE: *gt; NCR: *gt;
NE: *amp; NCR: *amp;
NE: *nbsp; NCR: *nbsp;
NE: *amp;apos; NCR: '

FF 3.5.3
NE: *lt; NCR: *lt;
NE: *gt; NCR: *gt;
NE: *amp; NCR: *amp;
NE: *nbsp; NCR: *nbsp;
NE: ' NCR: '

Safari 4.0.3, iPhone Safari
NE: *lt; NCR: *lt;
NE: *gt; NCR: *gt;
NE: *amp; NCR: *amp;
NE: *nbsp; NCR: *nbsp;
NE: ' NCR: '

Google Chrome 3.0.195
NE: *lt; NCR: *lt;
NE: *gt; NCR: *gt;
NE: *amp; NCR: *amp;
NE: *nbsp; NCR: *nbsp;
NE: ' NCR: '

Opera 10.0
NE: *lt; NCR: *lt;
NE: *gt; NCR: *gt;
NE: *amp; NCR: *amp;
NE: _ NCR: _
NE: ' NCR: '

0 new messages