Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: The less sign "<" in script elements

33 views
Skip to first unread message

Martin Honnen

unread,
Mar 18, 2018, 4:08:05 PM3/18/18
to
On 18.03.2018 17:08, Stefan Ram wrote:
> I remember that there was a time when I had to escape a
> less-sign "<" that was contained within a script element.
>
> I assume that this is not necessary anymore in HTML5,
> while »CDATA« must be used in XHTML, including XHTML5.
>
> So, is my memory correct that in HTML 2, HTML 3 and/or
> HTML 4, one had to escape a less sign in a script element,
> using, e.g., »&#60;«? E.g., for a comparison like »x<y«?

Certainly not with HTML 4 as it explicitly says in
https://www.w3.org/TR/html401/appendix/notes.html#notes-specifying-data

The DTD defines script and style data to be CDATA for both element
content and attribute values. SGML rules do not allow character
references in CDATA element content

Jon Ribbens

unread,
Mar 18, 2018, 6:14:18 PM3/18/18
to
On 2018-03-18, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> I remember that there was a time when I had to escape a
> less-sign "<" that was contained within a script element.

As far as I am aware there was never such a time, unless perhaps
for compatibility with broken browsers. It may well have been
(and still is) something people did to be cautious, but the contents
of <script> tags should be taken literally until at least a '</'
followed by an alphabetic character is reached.

Some browsers which didn't understand the '<script>' tag
probably have had lax comment parsing, so the technique of
prefixing the script with '<!--' wouldn't work unless you
also avoided the '<' character. The days are long gone that
you need to worry about that though.

> I assume that this is not necessary anymore in HTML5,
> while »CDATA« must be used in XHTML, including XHTML5.

In HTML5 you need to avoid '<!--', '<script' and '</script'.

> So, is my memory correct that in HTML 2, HTML 3 and/or
> HTML 4, one had to escape a less sign in a script element,
> using, e.g., »&#60;«? E.g., for a comparison like »x<y«?

No - HTML 2 didn't have <script>, and in HTML 3 and 4 its
contents are CDATA so you can't use entities. You'd need to
escape '<' using a JavaScript method instead, like '\x3c',
and avoiding comparing things to be less than ;-)

Thomas 'PointedEars' Lahn

unread,
Mar 18, 2018, 7:41:52 PM3/18/18
to
Stefan Ram wrote:

> So, is my memory correct that in HTML 2, HTML 3 and/or
> HTML 4, one had to escape a less sign in a script element,
> using, e.g., »&#60;«? E.g., for a comparison like »x<y«?

No. Escaping using *other* means was only necessary if “<” was followed by
“/”.

The SCRIPT element was formally introduced in HTML 3.2 (there is no HTML
3.0). However, in HTML 3.2 to 4.01, because of limitations of the SGML-
based DTD, the CDATA content of a SCRIPT element was defined to end at the
first occurrence of “</” (End-Tag Open delimiter, ETAGO). And this
substring occurred frequently in script code that generated HTML (especially
before the W3C DOM, hence the example below).

The easiest workaround, thus recommended and used by me as well, was to
write "<\/" in a string literal, or to move the source code for generated
markup to an external script resource. (Later, you could also get rid of
“</” by strictly using features introduced with the W3C DOM.)

[The misconception that it would be necessary to hide the offending
content from *seemingly* bad validators and editors (which were
actually *working*) caused the misconception in many people that in
XHTML (and in HTML for *actually* bad validators and editors) it would
a good idea to comment out the entire script contents:

<script …><!-- … --></script>

(a common example of this was code generated by Macromedia Dreamweaver)

But that merely masks the markup errors which is a problem when serving
XHTML with the proper Content-Type “application/xhtml+xml” (those
people then resorted to serving XHTML with Content-Type “text/html”,
which I think contributed to the demise of XHTML – most people simply
did not bother to understand the powerful tool that they were given).
The correct way in XHTML (as “application/xhtml+xml”) is, of course,
as you indicated,

<script …><![CDATA[ … ]]></script>

because the CDATA section declaration makes clear where the CDATA
begins and ends. However “]]>” must to be escaped instead, or an
external script resource must be used.]

I have written (this) *countless* times (about that) in several
newsgroups/forums, including this one; so have other people [the first
Google hit for “script etago” is an article by Mathias Bynens citing a test
case by IIRC then-cljs regular Juriy ‘kangax’ Zaytsev (Юрий Зайцев)].

You can still verify it if you use the W3C Markup Validator at
<http://validator.w3.org/>:

1. <https://validator.w3.org/#validate_by_input+with_options>
2. More Options → (·) Validate HTML fragment → Use Doctype: HTML 4.01
3. Markup to validate (without indentation):

<script type="text/javascript">document.write("<p>foo</p>");</script>

4. Check →
5. “Error: end tag for element "P" which is not open
[…]
If this error occurred in a script section of your document, you
should probably read this FAQ entry
<https://validator.w3.org/docs/help.html#faq-javascript>.”

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix>
<https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

Thomas 'PointedEars' Lahn

unread,
Mar 18, 2018, 10:47:24 PM3/18/18
to
Jon Ribbens wrote:

> On 2018-03-18, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
>> I remember that there was a time when I had to escape a
>> less-sign "<" that was contained within a script element.
>
> As far as I am aware there was never such a time, unless perhaps
> for compatibility with broken browsers. It may well have been
> (and still is) something people did to be cautious, but the contents
> of <script> tags should be taken literally until at least a '</'
> followed by an alphabetic character is reached.

Nonsense.

> Some browsers which didn't understand the '<script>' tag
> probably have had lax comment parsing, so the technique of
> prefixing the script with '<!--' wouldn't work unless you
> also avoided the '<' character. The days are long gone that
> you need to worry about that though.

Nonsense.

>> I assume that this is not necessary anymore in HTML5,
>> while »CDATA« must be used in XHTML, including XHTML5.
>
> In

… the HTML syntax of …

> HTML5 you need to avoid '<!--', '<script' and '</script'.

… within “script” elements.

>> So, is my memory correct that in HTML 2, HTML 3 and/or
>> HTML 4, one had to escape a less sign in a script element,
>> using, e.g., »&#60;«? E.g., for a comparison like »x<y«?
>
> No - HTML 2 didn't have <script>, and in HTML 3 and 4 its
> contents are CDATA so you can't use entities. You'd need to
> escape '<' using a JavaScript method instead, like '\x3c',
> and avoiding comparing things to be less than ;-)

Nonsense.

Jon Ribbens

unread,
Mar 19, 2018, 9:47:15 AM3/19/18
to
Maybe you should try reading some HTML specifications,
instead of just spouting nonsense.
Message has been deleted

Thomas 'PointedEars' Lahn

unread,
Mar 19, 2018, 4:38:05 PM3/19/18
to
Jon Ribbens wrote:

> Maybe you should try reading some HTML specifications,

Maybe I have not only tried to do that.

> instead of just spouting nonsense.

I leave that to you.

Jon Ribbens

unread,
Mar 19, 2018, 7:23:52 PM3/19/18
to
On 2018-03-19, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
> Jon Ribbens wrote:
>> Maybe you should try reading some HTML specifications,
>
> Maybe I have not only tried to do that.

The evidence suggests you haven't.

>> instead of just spouting nonsense.
>
> I leave that to you.

Perhaps you could progress beyond just spouting nonsense then,
and make some specific criticisms or corrections.

Thomas 'PointedEars' Lahn

unread,
Mar 19, 2018, 7:41:28 PM3/19/18
to
Jon Ribbens wrote:

> On 2018-03-19, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>> Jon Ribbens wrote:
>>> Maybe you should try reading some HTML specifications,
>> Maybe I have not only tried to do that.
>
> The evidence suggests you haven't.

You either do not know or do not consider all the evidence.

>>> instead of just spouting nonsense.
>> I leave that to you.
>
> Perhaps you could progress beyond just spouting nonsense then,
> and make some specific criticisms or corrections.

I did. Had you not full-quoted what I have replied to you, and had you read
what I have posted in the other subthread, you would have realized that.

Jon Ribbens

unread,
Mar 20, 2018, 7:37:05 AM3/20/18
to
On 2018-03-19, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
> Jon Ribbens wrote:
>> Perhaps you could progress beyond just spouting nonsense then,
>> and make some specific criticisms or corrections.
>
> I did. Had you not full-quoted what I have replied to you, and had
> you read what I have posted in the other subthread, you would have
> realized that.

I already read that post and there is nothing in it that is a specific
criticism of or correction to my post. There's barely anything in it
that disagrees with what I said, except that you clearly have not read
either the HTML 3.2 or HTML 4.01 specifications.

Thomas 'PointedEars' Lahn

unread,
Mar 20, 2018, 8:56:38 AM3/20/18
to
So much for “specific creticism”.

Jon Ribbens

unread,
Mar 20, 2018, 9:21:25 AM3/20/18
to
On 2018-03-20, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
> Jon Ribbens wrote:
>> On 2018-03-19, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>>> Jon Ribbens wrote:
>>>> Perhaps you could progress beyond just spouting nonsense then,
>>>> and make some specific criticisms or corrections.
>>> I did. Had you not full-quoted what I have replied to you, and had
>>> you read what I have posted in the other subthread, you would have
>>> realized that.
>>
>> I already read that post and there is nothing in it that is a specific
>> criticism of or correction to my post. There's barely anything in it
>> that disagrees with what I said, except that you clearly have not read
>> either the HTML 3.2 or HTML 4.01 specifications.
>
> So much for “specific creticism”.

You said:

> However, in HTML 3.2 to 4.01, because of limitations of the SGML-
> based DTD, the CDATA content of a SCRIPT element was defined to end
> at the first occurrence of “</” (End-Tag Open delimiter, ETAGO).

In fact the HTML 3.2 specification says 'ETAGO ("</") delimiters
followed immediately by a name character [a-zA-Z]', and the HTML 4.01
specification says 'ETAGO ("</") delimiter followed by a name start
character ([a-zA-Z])'.

You also said:

> The misconception that it would be necessary to hide the offending
> content from *seemingly* bad validators and editors (which were
> actually *working*) caused the misconception in many people that in
> XHTML (and in HTML for *actually* bad validators and editors) it
> would a good idea to comment out the entire script contents:
>
> <script …><!-- … --></script>

This was nothing to do with "seemingly bad validators and editors",
it was to do with backwards-compatibility with browsers and tools
that pre-dated the <script> tag and therefore could not know that
its contents should be parsed as CDATA.

It is now your turn to provide specific criticism of my earlier post.

Thomas 'PointedEars' Lahn

unread,
Mar 20, 2018, 11:45:38 AM3/20/18
to
Jon Ribbens wrote:

> On 2018-03-20, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>> Jon Ribbens wrote:
>>> On 2018-03-19, Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>>>> Jon Ribbens wrote:
>>>>> Perhaps you could progress beyond just spouting nonsense then,
>>>>> and make some specific criticisms or corrections.
>>>> I did. Had you not full-quoted what I have replied to you, and had
>>>> you read what I have posted in the other subthread, you would have
>>>> realized that.
>>>
>>> I already read that post and there is nothing in it that is a specific
>>> criticism of or correction to my post. There's barely anything in it
>>> that disagrees with what I said, except that you clearly have not read
>>> either the HTML 3.2 or HTML 4.01 specifications.
>>
>> So much for “specific creticism”.

I probably wanted to say “cretinism” instead.

> You said:

In Usenet, one posts a follow-up to the posting with the content that one is
referring to; one does _not_ quote statements out of context like you do.

>> However, in HTML 3.2 to 4.01, because of limitations of the SGML-
>> based DTD, the CDATA content of a SCRIPT element was defined to end
>> at the first occurrence of “</” (End-Tag Open delimiter, ETAGO).
>
> In fact the HTML 3.2 specification says 'ETAGO ("</") delimiters
> followed immediately by a name character [a-zA-Z]', and the HTML 4.01
> specification says 'ETAGO ("</") delimiter followed by a name start
> character ([a-zA-Z])'.

You ought to have *cited* evidence.

It is correct that those Specifications said/say that. HTML 3.2 said it in
a section that was normative¹ (but back in the day the distinction between
normative and informative was not made). HTML 4.01, however, says it in
Appendix B which is *informative* only.

AFAIK, SGML, with which HTML up to version 4.01 is defined –

<http://www.w3.org/TR/1999/REC-html401-19991224/sgml/sgmldecl.html>

– makes no such statements; instead one finds in the official summary:

,-<http://xml.coverpages.org/wlw14.html>
|
| […]
| As an example, a typical end tag in an SGML document "</elemname>"
| contains two delimiters and a name. "</;" is the delimiter which
| indicates the start of an end tag; its role is called "etago".

(Given that the first sentence does not contain a semicolon after “</”, the
one in the second sentence must be considered a typo.)

I do not know how the W3C Markup Validator handled it when it (the
Validator’s behavior) had been *cited by people to me repeatedly* as the
reason for wanting to comment out the “script” element contents; currently,
it does not consider a standalone “</” within a HTML 4.01 “SCRIPT” element
an ETAGO.

How HTML user agents actually handled it back then I do not know; but
kangax’s test case should provide some insight; in fact, he claims that “It
seems like de-facto standard is [the] ‘</script’ sequence, not just ‘</"’.”,
which would refute *both* our statements.

_______
¹ HTML 3.2 is a *Superseded* Recommendation since 2018-03-15, due to
HTML 5.2 at least. It is now inappropriate to cite the former as
normative:

<https://www.w3.org/TR/2018/SPSD-html32-20180315/> p.

> You also said:
>
>> The misconception that it would be necessary to hide the offending
>> content from *seemingly* bad validators and editors (which were
>> actually *working*) caused the misconception in many people that in
>> XHTML (and in HTML for *actually* bad validators and editors) it
>> would a good idea to comment out the entire script contents:
>>
>> <script …><!-- … --></script>
>
> This was nothing to do with "seemingly bad validators and editors",
> it was to do with backwards-compatibility with browsers and tools
> that pre-dated the <script> tag and therefore could not know that
> its contents should be parsed as CDATA.

Incorrect. First of all, apparently you have not noticed that I was
primarily talking about *XHTML* in that paragraph.

In XHTML, the content model of the “script” element is _not_ CDATA, but
PCDATA (parsed CDATA). This causes those *XML* comment delimiters to
actually comment out the content:

<https://www.w3.org/TR/2008/REC-xml-20081126/#sec-comments>

So, AISB, people who used this in XHTML simply did not know what they are
doing.

As regards my referring to HTML, maybe you are dereferring to HTML 4.01,
§ 18.3.2, where this is claimed, or to Mathias Bynens’ article that I
referred to where the latter is quoted. (Given that you did not know that
HTML 4.01 Appendix B is informative only, it is doubtful whether you have
actually read the HTML 4.01 Specification.)

However, the cited section is questionable in several regards.

First of all, the W3C does not have a mandate to specify, contrary to the
statement in that section, how script engines work. ECMAScript, for
example, is specified by Ecma International instead. In particular (with
the exception of the period from 1995-12 to 1996-08; see the ECMAScript
Support Matrix) there has not been and there is no single “the JavaScript
engine” (as there is no single JavaScript, “Javascript” or “javascript”
programming language; ibid.)

Second, HTML 4.01 became a W3C Recommendation on 1999-12-24. But versions
of HTML before 3.2, which did not specify the SCRIPT element, have been made
*obsolete* by RFC 2854 already; that became a RFC in 2000-06, so must have
been submitted to the IETF about 6 months prior to that, i.e. at the time of
promoting HTML 4.01 a REC; in fact, the first draft, which already contained
“Obsoletes: RFC 1866”, was submitted 1999-09-21 (see “draft-conolly-…”).
Given that Conolly drafted and published this as a representative of the
W3C, the content of that section must be considered an oversight by the HTML
Working Group and the W3C Membership. It should never have made it to REC
in its current form. At best, this section should be considered
informative, if not entirely *historical* now.

But more importantly, this was published *long* before people *kept using*
this. That is, there was no software that should have been considered
*working* at the time that still parsed the content of the SCRIPT element
(CDATA provisions aside), including, but not limited to, in HTML 3.2, ISO
HTML, and HTML 4.01. There certainly is no such software now – not based on
wishful thinking, but on *normative requirements*.

Therefore, my statement is *correct* that only “*actually* bad validators
and editors” would (have [and actually had]) require(d) this.

> It is now your turn to provide specific criticism of my earlier post.

I see no logical reason to repeat myself.

My replying simply with “Nonsense.” to most (_not_ *all*) of your statements
is the summary of what I wrote in my other post that you have evidently not
read carefully enough, and to which you did not post a follow-up.

Where you have offered unfounded assumption based on wishful thinking, I
offered historical fact because (apparently by contrast to you) not only did
I do research on this, *I* *have* *been* *there*. Do the comparison.

Jon Ribbens

unread,
Mar 20, 2018, 12:05:38 PM3/20/18
to
You would have to do something once to be able to repeat it.
No matter - I didn't expect you to actually be able to back up
your pathetic bleating, so it is no great surprise to find that
you cannot.

Thomas 'PointedEars' Lahn

unread,
Mar 20, 2018, 1:09:49 PM3/20/18
to
JFTR: That is not everything that I replied. Not at all.

Once again I have fallen into the trap of an ignoramus’ claiming nonsense,
and accepting their shifting the burden of proof, wasting my precious time
in the process.

_____________ _____________
`-._ ..::| `-._ ..::| .
`. ..::| `. ..::| /|
| ..::| | ..::| /.|
| ..::| _____ | ..::| / :|
.--------.| ..::|.-' ..::-.---. .-----| ..::| / .:|
| /\ .::. ..:.' ..::`. ' | ..::| / .::| /\
|/ \ .::\../ ..::\ | ..::| / ..::|/ \
.---' '---..::bd _ ..::b.._ | ..::|/ ..---' '---.
`-. .-' .::PI (_) ..::m ) | ..::`-. .-'
/ \ ..:/.q ..::w / .| .:' / \
/_.-``-._\..:' ..\ ..::/ / .:| ''---/_.-``-._\
' | ..:.` | ..:`. ..::,' / .::| ..:. `
| ..:| | ..::|`-.__..::-':| / .::' | ..:::|'. ..:\
| ..:J ,' ..:::. ,' ..::/ ..:' ,' ..::::. ) .::b
| ..:/ /____..::::\ /____...:/ .:' /____..:::::/ ..::P
|.:,' /.:' / ..:::'
|,' /.' / ..:-'
' ' /,-'
'

F’up2 poster
0 new messages