| Lynx isn't an older browser. :-)
Of course not. The phrase is a bogosity indicator. It's just that
people in love with their primitive Tag Slurpers need to reaffirm
constantly their belief that they're using something "advanced".
| the most common way to hide Javascripts is by using
| <SCRIPT>
| <!-- // Hide script
| script goes here
| // -->
| </SCRIPT>
| but this causes some problems if you want to validate your
| documents.
The apparent decision to go with CDATA declared content for SCRIPT
hasn't helped any. The "fix" is worse than the original problem.
With a (#PCDATA) content model, the problem is two-fold:
(i) Javascript syntax is not safe with respect to comment declaration
syntax, and
(ii) Netscape's broken parsing has habituated its faithful "designers"
to broken syntax for comment declarations in general.
Grandfathering this broken legacy meant finding a way to *suppress*
SGML parsing of comment declarations. Hence CDATA declared content.
But this has its own parsing gotcha: CDATA content ends with the first
occurence of ETAGO ('</' followed immediately by any character in the
class [A-Za-z>] usually) and this triggers an error if the endtag
isn't the expected one. Which basically means you can't embed a
"syntactically correct" endtag in CDATA declared content.
Not even document.write('</tag>'). With all the tag salad being
javascripted these days, it's a tossup which content type for SCRIPT,
(#PCDATA) content model or CDATA declared content, would cause more
documents to fail validation.
I'm predicting the workaround, document.write('</'+'tag>'), will
become a FAQ soon.
Meanwhile, there is some chance that the CDATA decision may be
rescinded, because the (#PCDATA) content model offers the much safer
option of a CDATA marked section. Best would be if the programmers of
the Mosaic spawn RTFM just once and implemented the syntax. But even
without such an unlikely occurrence, the syntax of marked sections
might still be faked past these browsers, thanks to their essential
stupidity.
Will Netscape treat this the same way as the example above:
<SCRIPT><![CDATA[>
<!-- // Hide script
script goes here
// --><!]]>
</SCRIPT>
If yes, that would go a long way towards minimizing the noxious
effects of the stupid kludge that started all this.
:ar
[much snipped]
> Will Netscape treat this the same way as the example above:
>
> <SCRIPT><![CDATA[>
> <!-- // Hide script
> script goes here
> // --><!]]>
> </SCRIPT>
>
> If yes, that would go a long way towards minimizing the noxious
> effects of the stupid kludge that started all this.
>
Arjun,
I'm not going to pull the argument from c.i.w.a.html over to here; that
would be useless. Just to answer your question, however: Unless
Javascripting is disabled, Netscape will interpret *everything* between
the <script> and </script> tags *as Javascript.* That <![CDATA[> string
you list will produce a Javascript error message if it cannot be
interpreted as legitimate Javascript.
The trailing <!]]> part is part of a one-line comment, and so will be
ignored by the script interpreter.
Right, wrong, or indifferent, that is the way the SCRIPT container
works.
Quite frankly, I'd be interested in seeing some way of validating both
HTML and Javascript in a single document (as long as the solution isn't
even worse than the current implementation), keeping in mind that
Javascript != HTML, but is allowed to send HTML and text to the
document.
There may not be a simple answer.
--
Ken
Are you interested in |
byte-sized education | http://www.play-hookey.com
over the Internet? |
++ I'm predicting the workaround, document.write('</'+'tag>'), will
++ become a FAQ soon.
I seriously doubt that. Netscape will have no problem scanning for
</script>, and the rest of the browsers will just follow Netscape.
It will not at all be standard, but I seriously doubt any browser with
a reasonable market share will barf on encountering document.wrote
('</tag>'); The hack to make this work is just too tempting to leave it
out.
The people who will realise there is more than just N$-foo and M$-bar
browsers don't need to ask the question about the problem; and the rest
doesn't care.
Abigail -- Pessimistic? Sure...
--
Anyone who slaps a "this page is best viewed with Browser X" label
on a Web page appears to be yearning for the bad old days, before the
Web, when you had very little chance of reading a document written on
another computer, another word processor, or another network.
[Tim Berners-Lee in Technology Review, July 1996]
No, this is how this weeks version of *Netscape* works.
But how does Netscape deal with:
<SCRIPT>
<!-- // --><![CDATA[
script goes here
//]]>
</SCRIPT>
Some quick testing seem to indicate this weeks version of Netscape
accepts it. (Leaving the // inside the comment "works" too, but
that's even more depending on idiotic behaviour)
Abigail
|> Will Netscape treat this the same way as the example above:
|>
|> <SCRIPT><![CDATA[>
|> <!-- // Hide script
|> script goes here
|> // --><!]]>
|> </SCRIPT>
|>
|> If yes, that would go a long way towards minimizing the noxious
|> effects of the stupid kludge that started all this.
| Unless Javascripting is disabled, Netscape will interpret
| *everything* between the <script> and </script> tags *as
| Javascript.*
The first question, then, is: *which* </script> tag, where? What
happens with the following?
<SCRIPT>
<!-- // Hide script
script here
// --></SCRIPT>
Suppose the pseudo-comment syntax were removed?
<SCRIPT>
script here
//</SCRIPT>
| That <![CDATA[> string you list will produce a Javascript error
| message if it cannot be interpreted as legitimate Javascript.
In other words
<SCRIPT>script can start on the same line
and continue here
</SCRIPT>
and, more importantly, the string '<!--' appears significant for
*Javascript*, i.e.
<SCRIPT>
some script
more script <!-- is this a Javascript comment?
even more script
</SCRIPT>
| The trailing <!]]> part is part of a one-line comment, and so will
| be ignored by the script interpreter.
| Right, wrong, or indifferent, that is the way the SCRIPT container
| works.
No, as Abigial points out, that's how some version of Netscape works.
(Rather, maybe: I haven't verified any of these behaviors myself.) The
data content of a SCRIPT element is subject to SGML parsing rules in a
SGML application such as Wilbur: the relevant question is *which* SGML
parsing rules apply.
| Quite frankly, I'd be interested in seeing some way of validating
| both HTML and Javascript in a single document (as long as the
| solution isn't even worse than the current implementation), keeping
| in mind that Javascript != HTML, but is allowed to send HTML and
| text to the document.
Who cares about the *output* of Javascript? The issue of validation
pertains to the characters in the original document. For instance, the
fragment
<H1>..<TABLE>..<LI>..</H1>..<UL>..</TABLE>..</UL>
clearly wouldn't validate in an alleged HTML document. Stick the
fragment into a document.write(), though, and the issue is entirely
different, as long as the *string of characters*
document.write('<H1>..<TABLE>..<LI>..</H1>..<UL>..</TABLE>..</UL>')
from the 'd' of 'document' to the final ')' were parsed as *verbatim
data* according to SGML rules.
And that's *exactly* what CDATA content accomplishes.
<SCRIPT><![CDATA[
Everything here is scanned as verbatim data.
No SGML parsing for markup occurs.
]]>
</SCRIPT>
| There may not be a simple answer.
I'll only say that the CDATA marked section is by no means new. And
Netscape cannot claim ignorance, since they participated in the
"Livescript and HTML" thread on the HTML Working Group mailing list[1]
*after* Liam Quin posted the following [2]:
= To Any Netscape People Who Are Listening:
=
= If html-wg comes up with a practical alternative that's
= syntactically valid HTML, will you adopt it?
with some suggestions, to which Dan Connolly responded with a
description of how CDATA marked sections work [3].
The simpler answer, Ken, has been known since October 1995. What have
your Undying Heroes done about it?
:ar
[1]http://www.acl.lanl.gov/HTML_WG/html-wg-95q4.messages/subject.html
[2]http://www.acl.lanl.gov/HTML_WG/html-wg-95q4.messages/0180.html
[3]http://www.acl.lanl.gov/HTML_WG/html-wg-95q4.messages/0191.html
Uh, Abigail -- that's what Arjun asked. Therefore my answer was
specifically limited to the behavior of Netscape. I'm just getting over
a bout of *very* unpleasant fever, so I'm going to hold off responding
to the going battle at c.i.w.a.h until this weekend; I just don't have
the energy right now. In the meantime, do we really have to fight at
*every* opportunity??
>
> But how does Netscape deal with:
>
> <SCRIPT>
> <!-- // --><![CDATA[
> script goes here
> //]]>
> </SCRIPT>
>
> Some quick testing seem to indicate this weeks version of Netscape
> accepts it. (Leaving the // inside the comment "works" too, but
> that's even more depending on idiotic behaviour)
>
By all Netscape behavior with Javascript (even when they started with
"LiveScript" before combining effort with Sun), the <!-- which HTML sees
as an indefinite comment start, is treated as a one-line comment within
Javascript. The first '//' you list above is superfluous; you have
merely specified a one-line comment starting with a one-line comment
marker. Harmless but useless.
Quite frankly, I don't expect Netscape to change this behavior in future
editions.
Please note that since you ended the HTML comment on the same line where
you started it, a non-scripting browser will take the first occurrence
of '>' as the end of the CDATA, whether it is a "greater than"
comparison, a tag in a "document.write()" method, or the 'real' end of
the CDATA. I tend to suspect that that is specifically why Netscape
recommends the use of the full comment declaration to hide the script --
standards or no, it is far less prone to accidental termination.
I covered that some time back, when I pointed out just this type of
kludge containing a '/*' multi-line comment marker, followed by another
one containing a '*/' marker. The content in the middle was then
readable only by *non-scripting* browsers. It's a kludgy but workable
way of creating a NOSCRIPT container.
The specific answer to your question above is that Javascript
interpretation will continue until the *Javascript* interpreter
identifies a (non-commented) </script> tag.
>
> Suppose the pseudo-comment syntax were removed?
>
> <SCRIPT>
> script here
> //</SCRIPT>
Irrelevant to Javascript. Visit Netscape's framed site:
http://home.netscape.com/eng/mozilla/2.0/handbook/javascript/index.html
and select "Navigator Scripting" from the left side menu. The first
example does not use the HTML comment container. The second one
explains:
Code Hiding
Scripts can be placed inside comment fields to ensure that your
JavaScript code is
not displayed by old browsers that do not recognize JavaScript. The
entire script is
encased by HTML comment tags:
<!-- Begin to hide script contents from old browsers.
// End the hiding here. -->
>
> | That <![CDATA[> string you list will produce a Javascript error
> | message if it cannot be interpreted as legitimate Javascript.
>
> In other words
>
> <SCRIPT>script can start on the same line
> and continue here
> </SCRIPT>
>
> and, more importantly, the string '<!--' appears significant for
> *Javascript*, i.e.
>
> <SCRIPT>
> some script
> more script <!-- is this a Javascript comment?
> even more script
> </SCRIPT>
I explained that as well, in one of our earlier arguments. Evidently you
didn't trouble yourself to pay any attention or learn what it was about.
Netscape's Javascript interpreter reads '<!--' as being logically
equivalent to '//' (comment to end of line).
>
> | The trailing <!]]> part is part of a one-line comment, and so will
> | be ignored by the script interpreter.
>
> | Right, wrong, or indifferent, that is the way the SCRIPT container
Until the parser sees any of:
if (a > b)
for (i=15; i>=0; i--)
document.write("<b>");
in many possible variations, at which point it will assume the end of
the CDATA and begin interpreting again, and displaying a hodgepodge of
stuff never intended for display. It will also respond to the tags
placed in there, in all likelihood never getting fully straightened out.
> ]]>
> </SCRIPT>
>
> | There may not be a simple answer.
>
> I'll only say that the CDATA marked section is by no means new. And
> Netscape cannot claim ignorance, since they participated in the
> "Livescript and HTML" thread on the HTML Working Group mailing list[1]
> *after* Liam Quin posted the following [2]:
>
> = To Any Netscape People Who Are Listening:
> =
> = If html-wg comes up with a practical alternative that's
> = syntactically valid HTML, will you adopt it?
>
> with some suggestions, to which Dan Connolly responded with a
> description of how CDATA marked sections work [3].
>
> The simpler answer, Ken, has been known since October 1995. What have
> your Undying Heroes done about it?
You have made an unwarranted conclusion stemming from invalid
assumptions based upon insufficient data. I will answer you more fully
in ciwah when I'm better recovered from a nasty throat infection and
fever. In the meantime, please note: Netscape aren't my "heroes,"
undying or otherwise. They have simply found a way to make scripting
work in a way that is more reliable than depending on just a single
character to specify resumption of HTML interpretation for those
browsers that don't support scripting.
Quite frankly, if this required them to step away from the standard, so
be it. As I recall from a previous battle in ciwah, the "standard" is
nothing more than a snapshot of "current practice" as of a selected
date. This implies that various browser companies and users routinely go
beyond the current standard (using what are often called extensions) to
accomplish more than was possible at the time the standard was written.
If everybody always wrote new browsers *only* according to the existing
standard, there would never be any need for a new standard. And no
experimentation, no new knowledge, nothing learned, nothing gained. If
that's the world you'd prefer to have, I feel sorry for you, but I don't
share that vision, and neither do many others. You have every right to
regulate your pages, your browsing activities, and your whole life
however you see fit. You have no such rights over my behavior, or my
preferences.
Have a good day.
| Abigail wrote:
| >
| > <SCRIPT>
| > <!-- // --><![CDATA[
| > script goes here
| > //]]>
| > </SCRIPT>
| >
| > Some quick testing seem to indicate this weeks version of Netscape
| > accepts it. (Leaving the // inside the comment "works" too, but
| > that's even more depending on idiotic behaviour)
Developing the theme, how about this (relevant "tokens", real or
bogus, separated with white space for clarity):
<SCRIPT>
<!-- --> <![CDATA[ <!--
script goes here
// --> <! ]]>
</SCRIPT>
| Please note that since you ended the HTML comment on the same line
| where you started it, a non-scripting browser will take the first
| occurrence of '>' as the end of the CDATA, whether it is a "greater
| than" comparison, a tag in a "document.write()" method, or the
| 'real' end of the CDATA. I tend to suspect that that is specifically
| why Netscape recommends the use of the full comment declaration to
| hide the script -- standards or no, it is far less prone to
| accidental termination.
Your suspicion concedes altogether too much to Netscape's phony
solicitousness for "old browsers". You really have a soft spot for
their guff.
Among the class of non-scripting old browsers, "less prone" applies
only to Netscape 1.2, and possibly 1.1. Specifically, when they, er,
implemented the "comment tag" (a moronic solecism that now appears to
have been coined in Mountain View, for such is the usage in their
Javascript documentation), they added an extra twist to the much more
common "declaration heuristic" of suppressing material between '<!'
and '>'. Under certain circumstances, there's a check for a leading
'--'.
The solicitous protection and sagely considered recommendation is for
old *Netscape* browsers. My, my. We must all have been born yesterday.
:ar
| The specific answer to your question above is that Javascript
| interpretation will continue until the *Javascript* interpreter
| identifies a (non-commented) </script> tag.
Since by now it's clear you're incapable of answering a question in
context, forget that I asked. Sorry, my mistake.
|http://home.netscape.com/eng/mozilla/2.0/handbook/javascript/index.html
[Netscape online documentation]
= Code Hiding
=
= Scripts can be placed inside comment fields to ensure that your
= JavaScript code is not displayed by old browsers that do not
= recognize JavaScript.
Translated: "Some older versions of Netscape (that do not recognize
Javascript) will treat the strings '<!--' and '-->' as markers for
commented material. To ensure that these versions of Netscape will
suppress scripts, use these markers around your Javascript."
= The entire script is encased by HTML comment tags:
Some idiot in Mountain View needs to RTFM on "HTML comment tags". A
propensity to garble facts is integral to the Netscape Mystique.
|>| Quite frankly, I'd be interested in seeing some way of validating
|>| both HTML and Javascript in a single document (as long as the
|>| solution isn't even worse than the current implementation),
|>| keeping in mind that Javascript != HTML, but is allowed to send
|>| HTML and text to the document.
|>
|> The issue of validation pertains to the characters in the original
|> document.
|>
|> <SCRIPT><![CDATA[
|> Everything here is scanned as verbatim data.
|> No SGML parsing for markup occurs.
|> ]]>
|> </SCRIPT>
| Until the parser sees any of:
| if (a > b)
| for (i=15; i>=0; i--)
| document.write("<b>");
| in many possible variations, at which point it will assume the end
| of the CDATA
Which parser? There's an elementary distinction between SGML parsing
rules and ad hoc heuristics. How an ad hoc parser could make a hash of
valid documents is neither surprising nor relevant.
You asked about *validation*. Either you're dense, or you're incapable
of focusing on issue and context. It doesn't matter, because expecting
you to learn something you don't know is clearly hopeless.
| [Netscape] have simply found a way to make scripting
| work in a way that is more reliable than depending on just a single
| character to specify resumption of HTML interpretation for those
| browsers that don't support scripting.
Netscape didn't "find" anything, unless it was a way to protect their
own software. Talk October 1995. The "reliability" shibboleth applied
only to some Netscape 1.x versions, which had gone from simply broken
to hideously broken "parsing" of comment declarations.
Perhaps the only other thing they found was hordes of idiots to
swallow their guff and regurgitate it in public.
| Quite frankly, if this required them to step away from the standard,
| so be it.
It only required them to *implement* the standard. But such a simple
concept would be beyond you, of course. (Not to mention, them too.)
| If everybody always wrote new browsers *only* according to the
| existing standard, there would never be any need for a new standard.
| And no experimentation, no new knowledge, nothing learned, nothing
| gained.
Duh. Implement a spec before you criticise it. All you have, Ken, is
your babbling on and on about things for which you can cite nothing
better than anecdotal evidence; it becomes necessary to assert your
personal authoritativeness to make up for a lack of specs, source
code, or any form of objective verification. You're free to prefer
such an environment. Just spare us the gush and evasion.
| If that's the world you'd prefer to have, I feel sorry for you, but
| I don't share that vision, and neither do many others.
-sigh-. You have no idea. RTFM. A penny might drop, somewhere,
sometime.
:ar
| Developing the theme, how about this (relevant "tokens", real or
| bogus, separated with white space for clarity):
| <SCRIPT>
| <!-- --> <![CDATA[ <!--
| script goes here
| // --> <! ]]>
| </SCRIPT>
Oops, missed a bogus token:
<SCRIPT>
<!-- --> <![CDATA[ > <!--
script goes here
// --> <! ]]>
</SCRIPT>
The relevant considerations appear to be
1. The body of the script requires "comment tags" for the benefit of
older Netscape versions.
2. With (or even without) such a wrapper, the entirety needs to be
CDATA for an SGML parser. Hence a CDATA marked section.
3. '<![CDATA[' and ']]>' need to be "balanced" with offsetting '>' and
'<!' respectively for the benefit of any heuristic browser.
4. '<![CDATA[' may trigger a Javascript parsing error: hence protect
it with a leading comment declaration on the same line.
5. This leading comment declaration needs to be syntactically correct
for SCRIPT with (#PCDATA) content model.
6. '</SCRIPT>' need to be on a separate line in relation to Javascript
tokenization of line-oriented comments.
Any other gotchas?
:ar
In article <57j72i$4...@client3.news.psi.net>,
ar...@nmds.com (Arjun Ray) wrote:
> <SCRIPT>
> <!-- --> <![CDATA[ > <!--
> script goes here
> // --> <! ]]>
> </SCRIPT>
>
> Any other gotchas?
Why all this messing with comments? Shouldn't just enclosing the
script in <![CDATA[ and ]]> be sufficient? Any 'heuristic' browser
will think that it's a very big unknown tag with lots of attributes
and hide the text.
Galactus
- --
E-mail: gala...@htmlhelp.com .................... PGP Key: 512/63B0E665
Maintainer of WDG's HTML reference: <http://www.htmlhelp.com/reference/>
-----END PGP SIGNED MESSAGE-----
| In article <57j72i$4...@client3.news.psi.net>,
| ar...@nmds.com (Arjun Ray) wrote:
| > <SCRIPT>
| > <!-- --> <![CDATA[ > <!--
| > script goes here
| > // --> <! ]]>
| > </SCRIPT>
| >
| > Any other gotchas?
| Why all this messing with comments? Shouldn't just enclosing the
| script in <![CDATA[ and ]]> be sufficient? Any 'heuristic' browser
| will think that it's a very big unknown tag with lots of attributes
| and hide the text.
The first (legitimate) comment declaration, for the contention that
otherwise the string '<![CDATA[' would be visible to the Javascript
interpreter *in Netscape*. (Other implementations are free to exercise
minimal intelligence w.r.t SGML syntax.)
The second (bogus) comment declaration, for older Netscape versions
that look for '-->' after '<!--'. Again, the criterion is that these
contortions for valid documents have to "work" *in Netscape*.
But you're right. Were it not for the troubles in finessing Netscape's
garbled concepts of document processing, an implementation of CDATA
marked sections would have been enough (for that matter, even in
heuristic browsers!)
:ar
If you just going to run on at each other, do it in private please...
Flame each other until you've agreed, then come back and tell us...
Rather than wasting everyones bandwidth...
H.
Uhm, no, unless you assume the JavaScript part doesn't contain a >.
You have accomplished nothing new here, and you have gone to great
lengths to do so. A single comment declaration will serve the purpose of
preventing non-scripting browsers of all flavors from trying to
interpret or display the script; here you have *two* comment
declarations in an effort to support your fetish of declaring CDATA. A
total waste of time and effort.
>
> | Please note that since you ended the HTML comment on the same line
> | where you started it, a non-scripting browser will take the first
> | occurrence of '>' as the end of the CDATA, whether it is a "greater
> | than" comparison, a tag in a "document.write()" method, or the
> | 'real' end of the CDATA. I tend to suspect that that is specifically
> | why Netscape recommends the use of the full comment declaration to
> | hide the script -- standards or no, it is far less prone to
> | accidental termination.
>
> Your suspicion concedes altogether too much to Netscape's phony
> solicitousness for "old browsers". You really have a soft spot for
> their guff.
By your efforts above, you have conceded the need to hide the script
from browsers that cannot interpret it correctly, simply to avoid
hashing the displayed page. It has nothing to do with my opinion of
Netscape (or yours, for that matter). Rather, it has to do with allowing
the page to degrade gracefully (something you, Abigail, and others have
screamed in favor of in ciwah) on browsers that can't handle the script.
>
> Among the class of non-scripting old browsers, "less prone" applies
> only to Netscape 1.2, and possibly 1.1. Specifically, when they, er,
> implemented the "comment tag" (a moronic solecism that now appears to
> have been coined in Mountain View, for such is the usage in their
> Javascript documentation), they added an extra twist to the much more
> common "declaration heuristic" of suppressing material between '<!'
> and '>'. Under certain circumstances, there's a check for a leading
> '--'.
>
> The solicitous protection and sagely considered recommendation is for
> old *Netscape* browsers. My, my. We must all have been born yesterday.
Still wrong. *All* browsers written to HTML 2.0 or earlier will have no
notion of how to handle SCRIPT tags, and will ignore them, but not the
contents. This includes Netscape, early MSIE, older Lynx browsers, and a
whole flock of lesser-known flavors. You may have a personal distaste
for Netscape, but there is no call for you to blame them for all the
ills of the Web world.
As I demonstrated in my previous post, Javascript (or Jscript or
VBscript) may quite legitimately include a ">" character almost anywhere
inside the script. This will, in accordance with RFC-1866, terminate the
assumed CDATA for non-scripting browsers. Therefore, the use of the
basic CDATA declaration is *not* adequate to prevent non-scripting
browsers from displaying a mish-mosh created by trying to interpret the
rest of the Javascript as if it were intended as HTML. Only a full
comment declaration can be sure of accomplishing that goal.
Face it, Arjun: Javascript is an *extension* to HTML 2.0, not part of
the standard itself. W3C is currently in the process of figuring out how
to incorporate Javascript (and style sheets) into the new standard. They
aren't done yet. Even when they are done, the HTML-3.2 standard will,
like the HTML-2.0 standard now in place, be nothing more than a
specification of current practice as of a selected date. There was
nothing in RFC-1866 that said no further development was permitted, and
the same will be true of HTML-3.2.
In the meantime, *you* may worship at the pedestal of RFC-1866; *I*
don't. Standards like this are designed to serve both HTML authors and
browser programmers. The moment you start insisting that we serve the
standard instead, you have lost touch with reality.
Wrong. Netscape *suggests* the use of "comment tags" for the purpose of
preventing *all* script-blind browsers from trying to handle the script
as if it were HTML.
>
> 2. With (or even without) such a wrapper, the entirety needs to be
> CDATA for an SGML parser. Hence a CDATA marked section.
Also wrong. From:
http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml-lex
I find:
The string <! followed by a name begins a markup declaration. The name
is followed by parameters and a >. A [ in the parameters opens a
declaration subset, which is a construct prohibited by this report.
The string <!-- begins a comment declaration. The -- begins a comment,
which continues until the next occurrence of --. A comment declaration
can contain zero or more comments. The string <!> is an empty comment
declaration.
The string <![ begins a marked section declaration, which is prohibited
by this report.
Both SGML and its subset, HTML, permit the use of a comment declaration.
Neither one defines anything that either must or may not be present
within that declaration, except that "--" followed by ">" (with or
without intervening whitespace) needs to be handled carefully if it is
not to be accepted as the end of the comment declaration. Any
intervening printing character of course destroys the sequence.
Whether you like it or not, and regardless of your opinion as to what is
"appropriate" content for the comment declaration, it is both legal and
sufficient in HTML and SGML to surround Javascript program code with
these comment markers, and thereby to tell an SGML or HTML parser or
validator to ignore the contents.
>
> 3. '<![CDATA[' and ']]>' need to be "balanced" with offsetting '>' and
> '<!' respectively for the benefit of any heuristic browser.
>
> 4. '<![CDATA[' may trigger a Javascript parsing error: hence protect
> it with a leading comment declaration on the same line.
Both of these points are irrelevant to the use of Javascript. You do not
need any kind of CDATA marker as such; a comment declaration covers the
entire need without adding complexity. You are ignoring the KISS
principle in your effort to have things the way you want them instead of
the way they are.
>
> 5. This leading comment declaration needs to be syntactically correct
> for SCRIPT with (#PCDATA) content model.
Javascript doesn't give a hoot about the presence or absence of the
comment declaration; that is there only to tell non-scripting browsers
to refrain from interpreting the script. Therefore, both the beginning
and ending comment markers do indeed need to be syntactically correct in
HTML/SGML.
>
> 6. '</SCRIPT>' need to be on a separate line in relation to Javascript
> tokenization of line-oriented comments.
Not quite. The </script> tag must not be commented out, if it is to be
recognized by the Javascript interpreter as being executable (and
therefore as the end of the script). As long as it is not placed in a
comment line or field, it can go anywhere and still be recognized in its
proper context.
>
> Any other gotchas?
Any more efforts at making a simple concept more complicated?
Thank you, Abigail. This is a point I made awhile back in this thread. I
know you have your disagreements with me on this topic, but I
nevertheless thank you for exercising a bit of intellectual self-honesty
in this regard.
If you are incapable of figuring out which </script> tag in your own
example is uncommented, then you are truly hopeless. My response was
within the context of *your* example, which you have carefully snipped
out to make it less obvious that you are making blind accusations.
Your real mistake is in attempting to force everybody to behave
according to *your* preferences.
>
> |http://home.netscape.com/eng/mozilla/2.0/handbook/javascript/index.html
>
> [Netscape online documentation]
> = Code Hiding
> =
> = Scripts can be placed inside comment fields to ensure that your
> = JavaScript code is not displayed by old browsers that do not
> = recognize JavaScript.
>
> Translated: "Some older versions of Netscape (that do not recognize
> Javascript) will treat the strings '<!--' and '-->' as markers for
> commented material. To ensure that these versions of Netscape will
> suppress scripts, use these markers around your Javascript."
And still you persist in looking only at Netscape. Are you truly so
completely blind to reality that you believe that *no other browser*
recognizes the comment declaration? Or do you truly believe that *only
early versions of Netscape* will fail to recognize SCRIPT tags?
Either way, you are hopelessly out of touch.
>
> = The entire script is encased by HTML comment tags:
>
> Some idiot in Mountain View needs to RTFM on "HTML comment tags". A
> propensity to garble facts is integral to the Netscape Mystique.
It seems Arjun Ray has no trouble garbling facts to any desired extent,
in his effort to bolster his chosen position. I could care less whether
you call it a content declaration, a container, or tags. The meaning is
clear to anyone who isn't deliberately closing their mind.
>
> |>| Quite frankly, I'd be interested in seeing some way of validating
> |>| both HTML and Javascript in a single document (as long as the
> |>| solution isn't even worse than the current implementation),
> |>| keeping in mind that Javascript != HTML, but is allowed to send
> |>| HTML and text to the document.
> |>
> |> The issue of validation pertains to the characters in the original
> |> document.
> |>
> |> <SCRIPT><![CDATA[
> |> Everything here is scanned as verbatim data.
> |> No SGML parsing for markup occurs.
> |> ]]>
> |> </SCRIPT>
>
> | Until the parser sees any of:
>
> | if (a > b)
> | for (i=15; i>=0; i--)
> | document.write("<b>");
>
> | in many possible variations, at which point it will assume the end
> | of the CDATA
>
> Which parser? There's an elementary distinction between SGML parsing
> rules and ad hoc heuristics. How an ad hoc parser could make a hash of
> valid documents is neither surprising nor relevant.
Oh, brother! Talk about taking things out of context! You even carefuly
rearranged the placement of my comment above, which I had placed after
your "No SGML parsing..." line. You speak of SGML/HTML parsing, I point
out a flaw in your thinking, and you immediately rearrange the
discussion in an effort to turn it around. Good God, man, you're
changing your story faster and more often than Bill Clinton.
>
> You asked about *validation*. Either you're dense, or you're incapable
> of focusing on issue and context. It doesn't matter, because expecting
> you to learn something you don't know is clearly hopeless.
Great! Another example of misplaced context on your part. No, I didn't
"ask" about validation. I merely stated that I'd like to see a way of
validating both the HTML and the Javascript in a single Web page. I also
didn't state this in responding to your "SGML parser" example; it comes
from a prior post. But then you've always been good at mixing up
timelines from various posts and pretending combinations that never
happened.
>
> | [Netscape] have simply found a way to make scripting
> | work in a way that is more reliable than depending on just a single
> | character to specify resumption of HTML interpretation for those
> | browsers that don't support scripting.
>
> Netscape didn't "find" anything, unless it was a way to protect their
> own software. Talk October 1995. The "reliability" shibboleth applied
> only to some Netscape 1.x versions, which had gone from simply broken
> to hideously broken "parsing" of comment declarations.
Still wrong. Netscape is not the only browser to recognize comment
declarations, nor are its early versions the only browsers to be unable
to recognize or implement SCRIPT tags. I covered this particular
thoughtlessness of yours above.
>
> Perhaps the only other thing they found was hordes of idiots to
> swallow their guff and regurgitate it in public.
Who brainwashed you? Never mind, you will believe what you want, without
regard to such minor things as "facts."
>
> | Quite frankly, if this required them to step away from the standard,
> | so be it.
>
> It only required them to *implement* the standard. But such a simple
> concept would be beyond you, of course. (Not to mention, them too.)
I have already proven you wrong in your chosen implementation. However,
the use of a comment declaration *is* entirely within the HTML/SGML
standards, as I have also demonstrated. Thus, it is still true to say
that Netscape found (or invented, or implemented) a method of hiding the
script from script-blind browsers, which nevertheless falls within the
specifications of the existing standards. The only thing that isn't
standard HTML is the script itself, which a script-aware browser can
nevertheless deal with. Your opinion of the method is irrelevant.
>
> | If everybody always wrote new browsers *only* according to the
> | existing standard, there would never be any need for a new standard.
> | And no experimentation, no new knowledge, nothing learned, nothing
> | gained.
>
> Duh. Implement a spec before you criticise it. All you have, Ken, is
> your babbling on and on about things for which you can cite nothing
> better than anecdotal evidence; it becomes necessary to assert your
> personal authoritativeness to make up for a lack of specs, source
> code, or any form of objective verification. You're free to prefer
> such an environment. Just spare us the gush and evasion.
>
> | If that's the world you'd prefer to have, I feel sorry for you, but
> | I don't share that vision, and neither do many others.
>
> -sigh-. You have no idea. RTFM. A penny might drop, somewhere,
> sometime.
Do the world a favor, Arjun: before you reply, indulge in a little
private meditation on the subject of self-honesty. You have presented a
theoretical implementation of SCRIPT the way you think it ought to be, I
have refuted you based on practical, real-world examples from legitimate
scripting code, and you speak here of "anecdotal evidence."
At this point, you need first to be honest with yourself, and then to be
honest in your dealings with others (such as right here). Either do
that, or else go back to whatever it is that you have been smoking, and
leave the harsh, horrible, all too practical world outside alone.
In article <329F09...@www.play-hookey.com>,
Ken Bigelow <kbig...@www.play-hookey.com> wrote:
> Whether you like it or not, and regardless of your opinion as to what is
> "appropriate" content for the comment declaration, it is both legal and
> sufficient in HTML and SGML to surround Javascript program code with
> these comment markers, and thereby to tell an SGML or HTML parser or
> validator to ignore the contents.
It may be legal, but it's certainly not sufficient:
<SCRIPT>
<!-- Hide script
stuff...
document.write('Click here --> for neat effect.');
more stuff...
// -->
</SCRIPT>
Besides, any script containing '--' doesn't even pass validation.
| On Thu, 28 Nov 1996 22:33:39 GMT, Arjun Ray wrote in
comp.infosystems.www.authoring.html,comp.lang.javascript:
|++ In <3Vfny4uY...@htmlhelp.com>,
|++ gala...@htmlhelp.com (Arnoud "Galactus" Engelfriet) writes:
|++
|++| Why all this messing with comments? Shouldn't just enclosing the
|++| script in <![CDATA[ and ]]> be sufficient? Any 'heuristic'
|++| browser will think that it's a very big unknown tag with lots of
|++| attributes and hide the text.
|++ But you're right. Were it not for the troubles in finessing
|++ Netscape's garbled concepts of document processing, an
|++ implementation of CDATA marked sections would have been enough
|++ (for that matter, even in heuristic browsers!)
| Uhm, no, unless you assume the JavaScript part doesn't contain a >.
Um, no. I said *implementation* of CDATA marked sections, not use of
CDATA marked sections. The heuristic you're talking about, suppressing
material between '<!' and '>', would get any kind of marked section
wrong for the same reason. However, the same heuristic could be
*extended* (as an implementation enhancement) to recognize some part
of the syntax of marked sections. e.g. at a heuristic minimum, on
'<![' to scan forward for ']]>' rather than '>'. Or some variant
thereon. (More than a year ago [1], I suggested that HTML adopt an
application convention that the absence of a status keyword, i.e.
'<![[', default to "IGNORE" rather than "INCLUDE". The idea was to
encourage implementation of some *easy* subset of the syntax. But this
went against ISO 8879, and anyway nothing came of the suggestion.)
Galactus' point, however, was directed at the validation issue, where
indeed it suffices for the purpose of SGML parsing to place the script
contents within '<![CDATA[' and ']]>', assuming SCRIPT has a (#PCDATA)
content model. The "messing with comments" had to with Netscape (as
far as observed behavior apparently indicates) not mishandling a
*valid* document.
All of this, however, is independent of the unadorned "declaration
heuristic" in other implementations. Specifically, that a stupid
browser breaks on CDATA marked sections with '>' is irrelevant to the
issue of document validation. Stupid browsers break on any number of
valid constructs. IMHO, it's better to fix problems than kludge
workarounds, much less compound the problem with even more aberrant
behavior [2].
[1]http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1234.html
(BTW, the quoted material in this message refer to Dan Connolly's work
on what is now the Sgml-lex analyser. AFAICT, his exclusion of marked
sections from the lexical analyser was mainly due to the decision to
exclude support for internal declaration subsets, without which marked
sections lose a lot of their power. Without that power, it was best
for the purposes of that report to restrict the analyser to as small a
subset of SGML syntax as would cover the scopes of existing heuristic
parsers. Nevertheless, as you yourself have shown ("Parsing Isn't
Rocket Science"), marked section syntax isn't inherently difficult.)
[2]http://www.acl.lanl.gov/HTML_WG/html-wg-95q4.messages/0307.html
should dispel any doubts about my opinions on stupid or inadequate
heuristics.
:ar
First things first.
To date, Ken, you haven't managed a single correct statement on SGML.
It doesn't matter how categorically you say such things; the fact
remains that your pronouncements on SGML are bogus and worthless.
Either learn the subject matter, or stay out of discussions where even
the elementary concepts are beyond your wit.
This would be enough, had you not resorted to lies to cover arguing
out of ignorance. The "References:" header in your post cites the
following, in order:
1. <5711jq$t...@ns2.southeast.net> This was adjung's originating post
for the thread.
2. <BTKly4uY...@htmlhelp.com> Galactus' response, where he
offered an answer to adjung's question,
= <SCRIPT>
= <!-- // Hide script
= script goes here
= // -->
= </SCRIPT>
and mentioned a caveat. The caveat is the theme of this thread:
= but this causes some problems if you want to validate your
= documents.
The context, Ken, is validation: the form(s) that *valid* documents
have to take for SGML parsers to pass them. Validation is a matter of
conformance to SGML rules.
3. <575qp6$m...@client3.news.psi.net> My followup, where I elaborated
on the problems for validation.
= The apparent decision to go with CDATA declared content for SCRIPT
= hasn't helped any. The "fix" is worse than the original problem.
= [discussion of (#PCDATA) content model vs CDATA declared content]
= Meanwhile, there is some chance that the CDATA decision may be
= rescinded, because the (#PCDATA) content model offers the much safer
= option of a CDATA marked section.
The context, Ken, is validation. I offered an example based on what a
(#PCDATA) content model would require:
= Will Netscape treat this the same way as the example above:
=
= <SCRIPT><![CDATA[>
= <!-- // Hide script
= script goes here
= // --><!]]>
= </SCRIPT>
The context, Ken, is validation: first construct a document that will
validate, only then consider what Netscape might do with it.
4. <329712...@www.play-hookey.com> Your entry into the discussion
= Unless Javascripting is disabled, Netscape will interpret
= *everything* between the <script> and </script> tags *as
= Javascript.* That <![CDATA[> string you list will produce a
= Javascript error message if it cannot be interpreted as legitimate
= Javascript.
In short, this version of a *valid* document would cause a problem in
Netscape. There's no spec for this; the best available evidence is
observed behavior (e.g. Netscape reporting a Javascript error), which
is fine on an issue of fact regarding *this* form of a valid document.
The context, Ken, is validation.
You went on to say:
= Quite frankly, I'd be interested in seeing some way of validating
= both HTML and Javascript in a single document (as long as the
= solution isn't even worse than the current implementation), keeping
= in mind that Javascript != HTML, but is allowed to send HTML and
= text to the document.
The context, Ken, is validation: what form(s) can a HTML document with
embedded Javascript take such that a SGML parser will validate it.
5. <57bfqm$g...@client3.news.psi.net> has my response to this paragraph
= The issue of validation pertains to the characters in the original
= document.
= [example illustrating requirments of SGML parsing as verbatim data]
= And that's *exactly* what CDATA content accomplishes.
=
= <SCRIPT><![CDATA[
= Everything here is scanned as verbatim data.
= No SGML parsing for markup occurs.
= ]]>
= </SCRIPT>
The context, Ken, is validation: what a SGML parser will do with the
contents of a document, in particular, a CDATA marked section.
6. <329B55...@www.play-hookey.com> had your *irrelevant* objection
based on a failure to grasp the meaning of "No SGML parsing of markup
occurs":
= Until the parser sees any of:
=
= if (a > b)
= for (i=15; i>=0; i--)
= document.write("<b>");
=
= in many possible variations, at which point it will assume the end
= of the CDATA and begin interpreting again, and displaying a
= hodgepodge of stuff never intended for display.
You described the behavior of a common kind of heuristic parser, that
everybody and even my grandmother knows about. This had nothing to do
with a SGML parser's treatment of CDATA, which was what I was talking
about.
The context, Ken, is validation. Are you confused about how a SGML
parser identifies the end of CDATA?
7. <57ipog$s...@client3.news.psi.net> I pointed out the irrelevance:
= Which parser? There's an elementary distinction between SGML parsing
= rules and ad hoc heuristics. How an ad hoc parser could make a hash
= of valid documents is neither surprising nor relevant.
The context, Ken, is validation. Seven posts deep into the thread, I
emphasized this, because you had missed the point:
= You asked about *validation*. Either you're dense, or you're
= incapable of focusing on issue and context. It doesn't matter,
= because expecting you to learn something you don't know is clearly
= hopeless.
The context, Ken, is validation. Yet, you produced this:
In <329F18...@www.play-hookey.com>, Ken Bigelow
<kbig...@www.play-hookey.com> writes:
| Great! Another example of misplaced context on your part.
The context, Ken, is validation.
| No, I didn't "ask" about validation. I merely stated that I'd like
| to see a way of validating both the HTML and the Javascript in a
| single Web page.
If validation weren't the context, I wouldn't have bothered to explain
CDATA marked sections for SGML parsers.
| I also didn't state this in responding to your "SGML parser"
| example; it comes from a prior post.
Study the quoting. The quoting depths of #4, #5, and #6 are indicated
correctly.
| But then you've always been good at mixing up timelines from various
| posts and pretending combinations that never happened.
Why don't you study the timelines indicated by the headers on your own
posts? Does Dejanews give you nightmares?
Quite simply, Ken, you're a liar. A pathetic, foolish, ignorant liar.
| Do the world a favor, Arjun: before you reply, indulge in a little
| private meditation on the subject of self-honesty.
Tell me about it in e-mail please. You have no compunctions parading
your ignorance and inability to focus. At least spare the world your
lectures.
| You have presented a theoretical implementation of SCRIPT the way
| you think it ought to be, I have refuted you based on practical,
| real-world examples from legitimate scripting code, and you speak
| here of "anecdotal evidence."
Duh. Here I am talking about validation, and you're babbling about ad
hoc parsers. If you want to talk about ad hoc parsers, please say so
and please indicate explicitly your desire to change the subject.
| At this point, you need first to be honest with yourself, and then
| to be honest in your dealings with others (such as right here).
Now why should take any advice from a proven liar?
:ar
| Arjun Ray wrote:
|
|> <SCRIPT>
|> <!-- --> <![CDATA[ > <!--
|> script goes here
|> // --> <! ]]>
|> </SCRIPT>
|>
|> The relevant considerations appear to be
Allow me to draw your attention to the word "relevant". The issue is a
*valid* document that *also* works with all relevant versions of
Netscape (1.[12] and later.)
|> 1. The body of the script requires "comment tags" for the benefit
|> of older Netscape versions.
| Wrong. Netscape *suggests* the use of "comment tags" for the purpose
| of preventing *all* script-blind browsers from trying to handle the
| script as if it were HTML.
What Netscape suggests, right or wrong, is irrelevant. The suggestion
is wrong in its putative scope, but that too is neither here nor
there. The relevant consideration is the behavior of older Netscape
versions in the 1.x series. Remove that consideration and the
workaround simplifies to
<SCRIPT>
<!-- --> <![CDATA[
script goes here
// ]]>
</SCRIPT>
Moreover, your emphasis on "*all*" is inconsistent with the facts.
Acquire the Mosaic 2.4 source code from NCSA and study the HTMLparse.c
file in the "HTML widget", libhtmlw. Acquire the WWW Library source
code from W3C and study the SGML.c file. Acquire the Lynx source code
and study the *two* modes (one substantially correct, the other
deliberately broken for "compatibility") in its SGML.c file. You have
no clue about the variety of ad hoc treatments of declaration syntax
among browsers.
Personally, I prefer the second version above; the first version was a
practical concession to Netscape-oriented sensibilities. Sorry to
offend.
|> 2. With (or even without) such a wrapper, the entirety needs to be
|> CDATA for an SGML parser. Hence a CDATA marked section.
| Also wrong.
What makes you think you have any idea of what I'm talking about? Can
you even define CDATA?
| From: http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml-lex
| I find:
[quoted material from the report]
Next time, read Dan Connolly's report *carefully*. Study the source.
| Both SGML and its subset, HTML,
You have just demonstrated your ignorance conclusively. Is ANSI C a
subset of BNF? RTFM. There are pointers to SGML primers in Robin
Cover's page of SGML resources:
<URL:http://www.sil.org/sgml/>
[paraphrase of comment declaration syntax where better versions are
available from authoritative sources]
| Whether you like it or not, and regardless of your opinion as to
| what is "appropriate" content for the comment declaration, it is
| both legal and sufficient in HTML and SGML to surround Javascript
| program code with these comment markers, and thereby to tell an SGML
| or HTML parser or validator to ignore the contents.
Comment markers? Do you know about ISO 8879, Clause 10.3? Try
productions [91] and [92] from
<URL:ftp://ftp.ifi.uio.no/pub/SGML/productions>
|> 3. '<![CDATA[' and ']]>' need to be "balanced" with offsetting '>'
|> and '<!' respectively for the benefit of any heuristic browser.
|>
|> 4. '<![CDATA[' may trigger a Javascript parsing error: hence
|> protect it with a leading comment declaration on the same line.
| Both of these points are irrelevant to the use of Javascript. You do
| not need any kind of CDATA marker as such; a comment declaration
| covers the entire need without adding complexity.
No it doesn't. If it did, the newer DTDs (Wilbur and Cougar) wouldn't
be moving towards CDATA declared content for SCRIPT. Do you have any
idea *why*? (Hint: so that comment declarations *aren't*.)
| You are ignoring the KISS principle in your effort to have things
| the way you want them instead of the way they are.
Um no. I'm trying to find a way to compensate for gratuitous moronic
divergences ("Keep It Stupid, Simple"?) from a standard syntax. It
could be that the idiocies have put paid to validation as a practical
matter (i.e. it's "best" not to give a damn), but the possibility of a
solution -- no matter the convolutions -- at least offers people who
might care a *choice* in the matter. Your sympathy is not necessary.
|> 5. This leading comment declaration needs to be syntactically
| correct for SCRIPT with (#PCDATA) content model.
| Javascript doesn't give a hoot about the presence or absence of the
| comment declaration;
Um, Ken, wasn't it *you* who said that something like this would
trigger a Javascript error?
<SCRIPT>
<![CDATA[
script goes here
// ]]>
</SCRIPT>
A syntactically correct comment declaration in front is a workaround.
|> 6. '</SCRIPT>' need to be on a separate line in relation to
| Javascript tokenization of line-oriented comments.
| Not quite. The </script> tag must not be commented out, if it is to
| be recognized by the Javascript interpreter as being executable (and
| therefore as the end of the script). As long as it is not placed in
| a comment line or field, it can go anywhere and still be recognized
| in its proper context.
Sounds like you're talking about multi-line comments (/* ... */.) That
case gets covered as long such "enclosed" '</script>'s are within the
scope of the CDATA marked section (because tags aren't recognized.) If
necessary, multiple CDATA marked sections can be used.
|> Any other gotchas?
| Any more efforts at making a simple concept more complicated?
Any more clueless argumentativeness?
:ar
| By your efforts above, you have conceded the need to hide the script
| from browsers that cannot interpret it correctly, simply to avoid
| hashing the displayed page.
Actually no. I have *accomodated* the Netscape-inspired desire to use
a comment declaration. It appears to be of some benefit for older
Netscape versions. so no real harm is done. By all means take it out.
I'm all for it.
| It has nothing to do with my opinion of Netscape (or yours, for that
| matter). Rather, it has to do with allowing the page to degrade
| gracefully (something you, Abigail, and others have screamed in
| favor of in ciwah) on browsers that can't handle the script.
Me? Produce a quote, and it had better be in context. Either you're a
liar, or you have a serious reading and comprehension problem. I've
made my opinions on the subject quite clear, in fact. Here's a
reminder
<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q4.messages/0307.html>
|> The solicitous protection and sagely considered recommendation is
|> for old *Netscape* browsers. My, my. We must all have been born
|> yesterday.
| Still wrong. *All* browsers written to HTML 2.0 or earlier will have
| no notion of how to handle SCRIPT tags, and will ignore them, but
| not the contents. This includes Netscape, early MSIE, older Lynx
| browsers, and a whole flock of lesser-known flavors. You may have a
| personal distaste for Netscape, but there is no call for you to
| blame them for all the ills of the Web world.
In this case, I'm not blaming them for anything. The wording in their
documentation is calculated to convey a "justification" that simply
isn't true -- *except* for some their older versions. It's called
being disingenuous.
| As I demonstrated in my previous post, Javascript (or Jscript or
| VBscript) may quite legitimately include a ">" character almost
| anywhere inside the script. This will, in accordance with RFC-1866,
| terminate the assumed CDATA for non-scripting browsers.
Says who? Produce the relevant section of RFC 1866 that talks about
the treatment of CDATA and in particular mandates that '>' terminates
CDATA.
| Therefore, the use of the basic CDATA declaration is *not* adequate
| to prevent non-scripting browsers from displaying a mish-mosh
| created by trying to interpret the rest of the Javascript as if it
| were intended as HTML.
Is this the conclusion you're trying to establish that you have to
invent non-existent premises?
The fact of the matter is that ad hoc parsers are ad hoc parsers. No
one in his right mind would really *expect* ad hoc parsers to get
anything right except by accident. But all of this is supremely
irrelevant to the structure of *valid* documents. If you're arguing
that ad hoc behavior makes validation not worthwhile, you could be
right, but that's a separate issue. If you're arguing that ad hoc
behavior *refutes* validation, you're simply wrong. They have nothing
to do with each other.
| Only a full comment declaration can be sure of accomplishing that
| goal.
Something tells me you still haven't grasped either the syntax of
comment declarations or their treatment in SGML parsing and
validation. We've had this discussion before, and you still haven't
learned anything. You're going by what you've observed of Netscape's
behavior, and extrapolating that to what you think you need the specs
such as RFC 1866 to say such that your theories would be true.
RTFM.
| Face it, Arjun: Javascript is an *extension* to HTML 2.0, not part
| of the standard itself.
Acutally Javascript isn't part of HTML at all. All that matters for
any relevant version of HTML is how the contents of SCRIPT are encoded
in a valid document. Invalid documents are simply not amenable to
rational discussion.
| In the meantime, *you* may worship at the pedestal of RFC-1866; *I*
| don't.
Hello, pay attention for a change, please. The day a spec appears that
categorically states that HTML is *not* a SGML application, you may
rest assured I will refrain from mentioning SGML.
| Standards like this are designed to serve both HTML authors and
| browser programmers. The moment you start insisting that we serve
| the standard instead, you have lost touch with reality.
There's a reality out there, Ken, that you're not even aware of. The
key concept is interoperability. Ask yourself sometime *why* the HTML
Working Group insisted on SGML, even when HTML practice appears to be
steadily diverging. What were they trying to safeguard? Follow some of
the pointers on Robin Cover's page, and discover for yourself why an
investment in SGML is the best bet for the long haul.
Dismiss that if you will, but only *after* you've looked into it.
Educate yourself, first.
:ar
[among other nonsense]
> Me? Produce a quote, and it had better be in context. Either you're a
> liar, or you have a serious reading and comprehension problem.
[more garbage snipped]
Let's see, now. Some time back I quoted a paragraph directly from one of
your posts, and you insisted I had paraphrased it.
More recently, you took two paragraphs that I had written in a single
post, stuck something of your own in between them, and then treated my
second paragraph as if it were a response to your insertion. Talk about
setting up straw men!
Just a few days ago, you took some lines from my post, rearranged them,
and then posted the result as if it was a direct quote. This isn't
merely disingenuous, it's a deliberate falsification. It matters not how
many or how few lines you switched around; your own behavior proves you
to be a liar of the first order, by delibwerate intent. And here you
have the sheer balls to demand that I keep *my* quotes in context!
I'll admit I've made my share of mistakes; everybody does. But one
thing's for sure: I have never gone in and deliberately modified someone
else's statement (even yours) and then pretended it was a direct quote.
If I make a mistake, at least it's an honest one.
You, however, in your unbelievable arrogance, are trying to pretend that
you're somehow perfect. Apparently you can't even admit the truth to
yourself.
One of your most recent posts claims that I haven't yet been right about
SGML -- even though I quoted part of the W3C documentation on SGML in a
post to you. It would seem that you're so eager to claim that you're
right that you can't even recognize a quote when I tell you where it's
from. Your hypocracy knows no bounds, and the truth isn't in you.
One thing you have convinced me of: it's not worth my time or effort to
continue this argument for any reason. I do not debate with the wind.
| Let's see, now. Some time back I quoted a paragraph directly from
| one of your posts, and you insisted I had paraphrased it.
Try Dejanews. Start with the Query Filter at
<URL:http://www.dejanews.com/forms/dnsetfilter.html>
Newsgroups : comp.infosystems.www.authoring.html
Freeform date(s): 1996/11/*
Author(s) : Ken Arjun
Subject(s) : possible & force & ignore
Among the 33 hits, you'll find (cross-checking against the References
headers) that the sequence of articles in question is
1 <55un7s$d...@client3.news.psi.net> on 11/08
2 <32859A...@www.play-Hookey.com> on 11/10
3 <565pv1$5...@client2.news.psi.net> on 11/10
4 <328F2F...@www.play-Hookey.com> on 11/17
5 <56rce8$9...@client2.news.psi.net> on 11/19
In #1, I had written (on Opera's apparent error recovery heuristics
and suppression of SCRIPT contents as per Wilbur)
= So, in this case at least, the EMBED gets ignored too, even though
= this construction is strictly illegal: SCRIPT contents are (#PCDATA)
= which does not allow other elements.
In #2, after a comprehensive quote of my post, at some point you wrote
= You are quite correct that the contents of any script are supposed
= to be treated as #PCDATA, which means that all such contents should
= be regarded as being other than HTML, so that any inadvertent HTML
= tags should nevertheless be ignored.
This "explanation" of what #PCDATA "means" -- as a paraphrase of my
statement "which does not allow other elements" -- was utterly bogus.
In #3, quoting this specific passage, I wrote
= If I said that, I was definitely *not* correct. #PCDATA is a keyword
= meaning "parsed character data". It describes a parsing context
= where all forms of markup (declarations, instructions, references,
= etc.) are *recognized*, but elements are not *allowed*. There are
= other parsing contexts (RCDATA and CDATA) with more stringent
= restrictions, where what looks like markup nevertheless is *not*
= recognized to be such but is parsed as verbatim text.
=
= #PCDATA content *is* HTML (mainly, text and entity references only),
= and inadvertent tags are *errors*. Here is what I wrote:
= [ the excerpt from #1 above ]
In #4, you responded with an evasion
= It still looks to me, from your own quote above, that you are saying
= that SCRIPT contents are #PCDATA. I see no other way to read your
= statement, which does indeed match my earlier quote of your prior
= post.
But the issue was not whether SCRIPT content was #PCDATA. It was the
*meaning* of #PCDATA. In #5, I wrote
= There's no match and it's not your quote: it's your paraphrase:
= [ the excerpt from #2 above ]
= And your paraphrase in its plain meaning is *wrong*. (Whether that
= was my fault or yours doesn't matter any more.)
#PCDATA does *not* mean "all such contents should be regarded as being
other than HTML, so that any inadvertent HTML tags should nevertheless
be ignored". That's *your* paraphrase of "which does not allow other
elements", and it is hopelessly, cluelessly wrong.
To recap:
| Let's see, now. Some time back I quoted a paragraph directly from
| one of your posts, and you insisted I had paraphrased it.
That's right, you did, in an effort to suggest that I was "correct" in
implying an absurdity. Read the posts again, especially your own. I've
even given you the means to locate them at Dejanews.
| More recently, you took two paragraphs that I had written in a
| single post, stuck something of your own in between them, and then
| treated my second paragraph as if it were a response to your
| insertion. Talk about setting up straw men!
I suspect this silly accusation is one I've already dealt with, in
<57tgm8$7...@client3.news.psi.net> on 12/02. Do your own research this
time, please. And think about quoting *levels* for context.
| Just a few days ago, you took some lines from my post, rearranged
| them, and then posted the result as if it was a direct quote.
Excuse me. I edited lines that were your (voluminous) quotes of *my*
material. I took out stuff (on an example) that *I* wrote, and brought
back two final lines, that you had pointlessly left quoted elsewhere,
in order to retain just enough comprehensible context of what *I* had
written.
| This isn't merely disingenuous, it's a deliberate falsification.
The substance of your response (or rather, objection, to CDATA marked
sections) was unaffected: its point on parsing was quite clear, even
though cluelessly irrelevant. This too I've dealt with elsewhere.
| It matters not how many or how few lines you switched around; your
| own behavior proves you to be a liar of the first order, by
| delibwerate intent. And here you have the sheer balls to demand that
| I keep *my* quotes in context!
That's right. You're habitually long on claims and short on evidence.
| One of your most recent posts claims that I haven't yet been right
| about SGML -- even though I quoted part of the W3C documentation on
| SGML in a post to you.
Dan Connolly's Sgml-lex? <57tlda$9...@client3.news.psi.net> has my
response to yet another clueless objection to CDATA marked sections.
The irrelevance of the three paragraphs you quoted can be gauged from
the fact that Sgml-lex in its current incarnation doesn't handle CDATA
*at all*. In fact, had you bothered to read all of the *same* section
from which you quoted, you would have found this
= The following are valid SGML constructs that are prohibited by this
= report:
= [...]
= <![ CDATA [ lskdjf lskdjf lksjdf ]]>
Your normal style is categorical claims instead of references. Here,
you went out of your way to produce a quote from an *explicitly*
irrelevant source. IOW, you got it wrong, yet again.
| It would seem that you're so eager to claim that you're right that
| you can't even recognize a quote when I tell you where it's from.
| Your hypocracy knows no bounds, and the truth isn't in you.
You seem to have the peculiar notion that authoritativeness (or even
correctness) on Usenet can or should pertain to persons. Clue: stick
to the references. Objectively verifiable facts. RTFM.
:ar
It is of course true that anything like this can be broken if you try.
That is especially true when we are unable to adjust the rules by which
the older browsers play the game. However, it is a lot easier to find a
way to *avoid* breaking the full comment notation, than it is to avoid
the ">" character everywhere in a script, to avoid breaking the simpler
'<! ... >' notation.
Oh, yes! I'm sure you know this one, Arnoud, but for those thinking that
the notation '<![CDATA[ ... ]]>' would be harder to break because it
involves more characters: I'm sorry, but it doesn't work that way. The
'[...]' notation used here is widely used in many programming languages,
operating systems, etc., and means nothing more than 'May optionally
contain...' . The brackets are *not* part of the actual container
notation.
>
> Besides, any script containing '--' doesn't even pass validation.
Hmmm. I understood that multiple comments were legal? I can understand
that a script containing, say, a 'count--' post-decrement notation might
be assumed to have ended the first comment at that point. But could we
not add an extra '--' if needed, so that the end-of-comment marker might
be:
// -- -->
just to keep an even number of '--' strings in the overall sequence? Or
will the validator still barf at that?
But the string ']]>' *is* the actual terminator. Not the ]] followed by a >.
$ cat > test.html
<!doctype HTML system "http://www.ny.fnx.com/abigail/abigail.dtd">
<html>
<head>
<title>Test</title>
</head>
<body>
<p>
<![ INCLUDE CDATA [
Barf ]] <foo> <bla> ]]>
<p>Foo
</body>
</html>
^D
$ sgmls -s -m catalog html.decl temp.html or echo "Valid"
Valid
$ cat > test.html
<!doctype HTML system "http://www.ny.fnx.com/abigail/abigail.dtd">
<html>
<head>
<title>Test</title>
</head>
<body>
<p>
<![ INCLUDE CDATA [
Barf ]]> <foo> <bla> ]]>
<p>Foo
</body>
</html>
^D
$ sgmls -s -m catalog html.decl temp.html or echo "Valid"
sgmls: SGML error at /tmp/test.html, line 11 at ">":
Undefined FOO start-tag GI ignored; not used in DTD
sgmls: SGML error at /tmp/test.html, line 11 at ">":
Undefined BLA start-tag GI ignored; not used in DTD
sgmls: SGML error at /tmp/test.html, line 11 at ">":
Marked section end ignored; not in a marked section
$
I guess you, once again, made a wrong assumption.
But you are right in the aspect you can "break" the marked
section as well by putting in a document.write ("]]>") in it.
++ > Besides, any script containing '--' doesn't even pass validation.
++
++
++ Hmmm. I understood that multiple comments were legal? I can understand
++ that a script containing, say, a 'count--' post-decrement notation might
++ be assumed to have ended the first comment at that point. But could we
++ not add an extra '--' if needed, so that the end-of-comment marker might
++ be:
++
++ // -- -->
++
++ just to keep an even number of '--' strings in the overall sequence? Or
++ will the validator still barf at that?
If you make your code: count-- --
it will validate.
And surely, JavaScript could use that as the start of a new comment?
|> Besides, any script containing '--' doesn't even pass validation.
| Hmmm. I understood that multiple comments were legal? I can
| understand that a script containing, say, a 'count--' post-decrement
| notation might be assumed to have ended the first comment at that
| point. But could we not add an extra '--' if needed, so that the
| end-of-comment marker might be:
| // -- -->
| just to keep an even number of '--' strings in the overall sequence?
| Or will the validator still barf at that?
Yes, it will. From ISO 8879 Clause 10.3 "Comment Declaration":
= [91] comment declaration =
= MDO
= ( comment
= ( s |
= comment )* )?,
= MDC
=
= [92] comment =
= COM,
= SGML character*,
= COM
=
= No markup is recognized in a comment, other than the COM delimiter
= that terminates it.
(Note: in the Reference Concrete Syntax, the abstract delimiters MDO,
MDC and COM are bound to the strings '<!', '>' and '--' respectively.)
The best plain-English version of this that I know of is due to James
Clark (author of sgmls and SP), and was incorporated into RFC 1866.
From Section 3.2.5 "Comments":
= A comment declaration consists of `<!' followed by zero or more
= comments followed by `>'. Each comment starts with `--' and includes
= all text up to and including the next occurrence of `--'. In a
= comment declaration, white space is allowed after each comment, but
= not before the first comment.
The basic point is that between the '--' ending one comment and the
'--' starting the next comment, there can only be whitespace.
I'd be the first to agree that this syntax is kinda screwy. But it's
important to note that there is a rationale (briefly, that _comments_
occur only in _declarations_, of which there are many kinds), and the
nature of the rationale strongly suggests that comment declarations
while allowed aren't really supposed to be used in document text, as
it's so easy to get them wrong inadvertently. The better syntax in
SGML usage is the IGNORE marked section '<![IGNORE[ ... ]]>', where
all the tricky business of toggling between odd and even occurrences
of '--' and dealing with intervening whitespace etc. is replaced by a
straight scan for the specific sequence ']]>'. Moreover, the fragility
of this syntax underscores why it's so wrong to mislead people with
talk of "comment tags" or "comment markers": the truth comes as a
nasty surprise to those who *do* act in good faith to be compliant.
From a validation perspective, the problem is intractable: not only is
the comment declaration syntax likely to be violated in practice, but
also syntactically *correct* instances won't preserve their semantics
in SGML terms when comment declarations are used to "encapsulate" data
for the application: the data get suppressed rather than delivered
intact. This is the fundamental reason for the move in Cougar to have
CDATA declared content for the SCRIPT and STYLE elements, insofar as
the popular browser vendors are still pushing a bogus/broken syntax:
in CDATA there's no such thing as a broken comment declaration,
because in parsing CDATA there's no such *thing* as a comment
declaration! See the thread "STYLE and comments in HTML" on ciwah back
in July, where Toby Speight reported some tests with a SGML parser to
prove this to the skeptical.
:ar
|[...] for those thinking that the notation '<![CDATA[ ... ]]>' would
| be harder to break because it involves more characters: I'm sorry,
| but it doesn't work that way. The '[...]' notation used here is
| widely used in many programming languages, operating systems, etc.,
| and means nothing more than 'May optionally contain...' . The
| brackets are *not* part of the actual container notation.
Actually, they are, and appearances notwithstanding, the ']]' towards
the end is a single token. From ISO 8879 Clause 10.4 "Marked Section
Declaration":
= [93] marked section declaration =
= marked section start,
= status keyword specification,
= DSO,
= marked section,
= marked section end
=
= [94] marked section start =
= MDO,
= DSO
=
= [95] marked section end =
= MSC,
= MDC
=
= [96] marked section =
= SGML character*
=
= The marked section must comply with the syntactic and semantic
= requirements that govern the context in which the marked section
= declaration occurs
=
= A marked section end that occurs outside of a marked section
= declaration is an error.
(Note: the Reference Concrete Syntax binds the abstract delimiters
MDO, DSO, MSC and MDC to '<!', '[', ']]' and '>' respectively.)
The status keyword specification when present can be any combination
of the words (in order of priority) "IGNORE", "CDATA", "RCDATA", and
"INCLUDE". Absence of the s.k.s means "INCLUDE".
Marked sections come in basically two flavors, the INCLUDE/IGNORE
variety (where they function very much like the "#if ... #endif' of C)
and the CDATA/RCDATA variety. The first kind are used mainly in DTDs
or other non-text portions of a SGML "document entity" (typically to
control the effective set of declarations), and the second kind are
used only in document text specifically to modify scanning behavior in
the SGML parser. In a CDATA marked section, no markup is recognized
except the terminating MSC-MDC sequence (']]>'): everything in between
is scanned as verbatim data. RCDATA is the same, except that entity
references are recognized and resolved. The basic purpose of CDATA or
RCDATA marked sections -- i.e. the *designed* function -- is to allow
the inclusion of text without wholesale "escaping" of stuff that looks
like markup.
<![ CDATA [
<foo> is just <foo>, not a tag.
&bar; is just &bar;, not an entity reference.
Just what you need, e.g., to embed examples of markup!
]]>
As Abigail has shown recently, implementing marked section syntax
isn't inherently difficult. A subset would be a trivial extension even
in heuristic parsers, e.g. scan for ']]>' on encountering '<![' (bonus
points for handling the '%keyword [' part, but that can wait.) The
only real gotcha is that IGNORE/INCLUDE marked sections are nestable,
but we're not likely to see examples of that for quite a while yet.
:ar
In article <32A866...@www.play-hookey.com>,
Ken Bigelow <kbig...@www.play-hookey.com> wrote:
> the notation '<![CDATA[ ... ]]>' would be harder to break because it
> involves more characters: I'm sorry, but it doesn't work that way. The
The only way to prematurely terminate the CDATA section is to use
']]>' inside the script, which is a lot less likely than using '--'
or '>' there.
> '[...]' notation used here is widely used in many programming languages,
> operating systems, etc., and means nothing more than 'May optionally
> contain...' . The brackets are *not* part of the actual container
> notation.
The only way I can think of to get ']]>' in a script would be
a comparison with a nested array or something.
if (mytext[mylist[15]]>15) then go_wild();
Since the '>' also breaks the old browsers, you should rewrite this
anyway, so the problem should not exist.
> > Besides, any script containing '--' doesn't even pass validation.
>
> Hmmm. I understood that multiple comments were legal? I can understand
Yes, but
<!-- Comment 1 -- fuuzle -- Comment 2 -->
doesn't pass validation, as it contains data 'fuuzle' between the
multiple comments. Think of '--' as a toggle. First one turns 'comment'
on, second turns it off, third turns it on again, ...
Not quite. I *tried it out.* I don't have a whole passel (sp?) of
browsers at home, but I went ahead and set up a page for testing
purposes. I originally did it as a test for a kludgy version of a
NOSCRIP container (which sort of works, but not 100%). It *is* useful
for seeing how different browsers behave with events within a SCRIPT
container.
In any case, I added a second section to test this <![CDATA[ .. ]]>
thing, and found that with every browser/version I have, the single '>'
character is enough to tell the browser to resume interpretation and
display.
You are welcome to check it with your own browser(s) -- I'd be
interested in what you find. Take a look at:
http://www.play-hookey.com/scripttest/
As I said, I don't keep a lot of browsers here, but I do go back and
check if I have any concerns. I get the same result on this question
with every version of Netscape I still have, for both Windows 3.1 and
for BSDi Unix. I get the same result with Lynx on my Unix platform. In
all cases, the '>' character ends the CDATA declaration. Lynx also ends
the full comment declaration with just that character, although later
versions of Lynx *may* have fixed that item.
When I get back to work, I'll try it with IE2 and IE3, and whatever alse
I can find.
>
> But you are right in the aspect you can "break" the marked
> section as well by putting in a document.write ("]]>") in it.
>
> ++ > Besides, any script containing '--' doesn't even pass validation.
> ++
> ++
> ++ Hmmm. I understood that multiple comments were legal? I can understand
> ++ that a script containing, say, a 'count--' post-decrement notation might
> ++ be assumed to have ended the first comment at that point. But could we
> ++ not add an extra '--' if needed, so that the end-of-comment marker might
> ++ be:
> ++
> ++ // -- -->
> ++
> ++ just to keep an even number of '--' strings in the overall sequence? Or
> ++ will the validator still barf at that?
>
> If you make your code: count-- --
> it will validate.
>
> And surely, JavaScript could use that as the start of a new comment?
I'm not sure about that one. Javascript notation may (probably will)
require intervening characters. For example, a legitimate JS
construction might be:
for (i=max; i >= 0; i--) { ~~~ }
That closing parenthesis has to be there, and cannot be commented out. I
suppose it would accept a comment notation with the ')' coming on the
next line, but that would still mean some intervening characters
bwetween the two instances of '--' which is what I gather will break a
validator.
Let me make something as clear as I can. I am trying to find techniques
and methods that work *as desired* on real-world browsers, running on
real-world platforms. I am far less interester in The Official Rules,
especially where TOR are not followed in real-life situations. If I can
accomplish my purpose by following TOR, fine! If TOR don't cover the
situation, or if the real world is ignoring TOR, then I must still deal
with the real world. In such a case, the abstract TOR will take a back
seat.
At work, I have *finally* been assigned to actually write some material
for delivery over the Web. As with my own pages, but even more so, I
will have to produce something that accomplishes the desired purpose.
And as I said before, I'll happily stay within the rules as much as I
can. But when TOR says "you shouldn't do that" and my boss says "do
that," I give you one guess as to whose word is more important to me.
In all cases, I will be working out practical solutions to real
problems, including some situations that aren't even mentioned in TOR.
So, please -- If I say that such-and-such technique works or doesn't
work, don't just tell me to RTFM. Sometimes TFM doesn't accurately cover
the real-world situation. I don't guarantee to always be right (and if
you make such a claim for yourself, I'll wonder very much about your own
integrity). But if I've tried something and report the result, telling
me to RTFM because TFM says otherwise is a waste of time. TFM isn't
perfect either.
Your own quote above from TFM is a clear example of exactly this sort of
thing. You are talking about what the specifications say *ought* to
happen. You clearly didn't actually try it to see if real broswers
actually do it that way. I don't know if there are any browsers at all
that follow TFM as per your quote, but it is clear that at least some
versions of two classes of browsers (text/Lynx and GUI/Netscape) *don't*
obey TFM. As I said above, I'll check a few more just to see. But the
mere fact that the most widely used browser has been observed to behave
in a particular way means that I will necessarily be writing my pages to
work on that browser (and as many more as possible), and *not* writing
pages according to TFM as interpreted by you.
Please check my reply to Abigail -- by observation, this is not true of
actual browser behavior, at least for those browsers I could try from
home. I'm far less interested in theoretical discussions than in dealing
with the realities of existing browsers.
>
> > '[...]' notation used here is widely used in many programming languages,
> > operating systems, etc., and means nothing more than 'May optionally
> > contain...' . The brackets are *not* part of the actual container
> > notation.
>
> The only way I can think of to get ']]>' in a script would be
> a comparison with a nested array or something.
>
> if (mytext[mylist[15]]>15) then go_wild();
>
> Since the '>' also breaks the old browsers, you should rewrite this
> anyway, so the problem should not exist.
No argument there; having seen Lynx accept '>' as the end of '<!--' I
can see I have some adjustments to make. But in all honesty, I'll make
the adjustments so that they will *work* on as many browsers as
possible. If this means ignoring the spec because the browsers did so,
then so be it. I'd much rather write something worthwhile that doesn't
follow the spec, than something useless that does. Do you really think
this is an unreasonable approach?
>
> > > Besides, any script containing '--' doesn't even pass validation.
> >
> > Hmmm. I understood that multiple comments were legal? I can understand
>
> Yes, but
>
> <!-- Comment 1 -- fuuzle -- Comment 2 -->
>
> doesn't pass validation, as it contains data 'fuuzle' between the
> multiple comments. Think of '--' as a toggle. First one turns 'comment'
> on, second turns it off, third turns it on again, ...
Accepted. But that may mean that such scripts will just have to suffer
the disgrace of validation failure. Better that than to have the script
not work at all.
| Let me make something as clear as I can. I am trying to find
| techniques and methods that work *as desired* on real-world
| browsers, running on real-world platforms. I am far less interester
| in The Official Rules, especially where TOR are not followed in
| real-life situations. If I can accomplish my purpose by following
| TOR, fine! If TOR don't cover the situation, or if the real world is
| ignoring TOR, then I must still deal with the real world. In such a
| case, the abstract TOR will take a back seat.
Presumably it helps to misstate, misconstrue and misunderstand TOR.
By all means compose broken documents to "work" on broken browsers. By
all means take pride in adding to a legacy. But don't delude yourself
that you've accomplished something constructive. The WWW has a scope
far wider than what you've chosen to let your browsers show you.
:ar
--
"I'm an American. I have the RIGHT to be stupid" - Tommy Smothers
++ No argument there; having seen Lynx accept '>' as the end of '<!--' I
++ can see I have some adjustments to make.
That's a stupid attitude. Lynx can be configured to be bug compatable,
to downgrade to Netscape. Either Netscape 3.0 (close comments on -->),
or further (close comments on >).
However, Lynx can as easily be configured to handle comments the right
way. Now, you cannot do that with Netscape.
In article <32AB0B...@www.play-hookey.com>,
Ken Bigelow <kbig...@www.play-hookey.com> wrote:
> Arnoud Galactus Engelfriet wrote:
> > The only way to prematurely terminate the CDATA section is to use
> > ']]>' inside the script, which is a lot less likely than using '--'
> > or '>' there.
>
> Please check my reply to Abigail -- by observation, this is not true of
> actual browser behavior, at least for those browsers I could try from
> home.
The reason to use CDATA sections is to ensure that you can *validate*
documents containing Javascript. I know that current browsers don't
understand it, which is why the comments are still necessary.
Unfortunately, that gets ugly very quickly:
<SCRIPT>
<!CDATA[ > <!--
script here
// < -->
]]>
</SCRIPT>
is as far as I can see a legal way to do it. Current browsers who
do not support SCRIPT at all will see <!-- followed by stuff followed
by -->, and ignore everything in there. Validators will ignore the
script as it is all in a CDATA marked section, so there is no way
to terminate the comment accidentally.
The extra < and > are added to "balance" the opening and closing
of the various 'tags'.
> then so be it. I'd much rather write something worthwhile that doesn't
> follow the spec, than something useless that does. Do you really think
> this is an unreasonable approach?
My advice would be to write things that do not violate the spec, and
if you have to, try to limit yourself as much as possible. Meanwhile,
complain to the authors of the broken browser.
Fascinating. Abigail, in spite of the clear attitude implied by her own
signature, tells me it's stupid for me to adjust my pages so that Lynx
can be sure to display them correctly. Or perhaps she thinks I should
slap a "This page beswt viewed with any browser *except* Lynx" in there?
What the H***, Abby? Do you now have some problem with the idea of
making pages as usable and user-friendly as possible?
>
> However, Lynx can as easily be configured to handle comments the right
> way. Now, you cannot do that with Netscape.
You don't *have* to do that with Netscape; it correctly recognizes the
full comment notation, and requires '-- >' to end '<!--' as it should.
Or are you going to sit there and tell me that the '<![' notation begins
a comment? If so, it is *your* turn to RTFM, which is quite clear:
"The string <![ begins a marked section declaration"
As I said before, I'm *trying* to set up my pages for maximum user
functionality with minimum chance for faulty displays. I find it very
difficult to understand why you would consider this attitude to be
stupid.
>
> Abigail
> --
> Anyone who slaps a "this page is best viewed with Browser X" label
> on a Web page appears to be yearning for the bad old days, before the
> Web, when you had very little chance of reading a document written on
> another computer, another word processor, or another network.
> [Tim Berners-Lee in Technology Review, July 1996]
--
A practical problem with this construct is that the string '<!CDATA[ >'
(since it comes after the <script> tag) must be interpreted as
Javascript, by scripting browsers. The ending ']]>' string will also be
interpreted as Javascript. Both will result in errors, since they are
not legal Javascript (or VBscript, or whatever).
(I note you left out the first '[' above; I quoted it as stated.)
>
> The extra < and > are added to "balance" the opening and closing
> of the various 'tags'.
Understood clearly. I understand the "legal" theory behind the construct
you have made above, and which has been suggested before. My problem
with it is that it doesn't produce correct results in the browsers most
people are using. Whatever the reason, and whatever your objections, the
observed results are what they are; squawking about broken browsers
won't fix them, and won't change the world.
>
> > then so be it. I'd much rather write something worthwhile that doesn't
> > follow the spec, than something useless that does. Do you really think
> > this is an unreasonable approach?
>
> My advice would be to write things that do not violate the spec, and
> if you have to, try to limit yourself as much as possible. Meanwhile,
> complain to the authors of the broken browser.
That would be fine for *one* broken browser. What if they're *all*
broken in this regard? In trying my test page, I have found that both
MSIE and Netscape are "broken" in this regard. Skipping the argument
about which browser has what market share, these two account for the
bulk of accesses to my site, according to my agent log. Since I posted
my earlier reply to Abigail, my access log shows quite a number of
accesses to my little scripttest directory, although I have not yet seen
any responses.
Have you checked that little page? If so, with what browser, and with
what results? I really am interested, and want my pages to be correctly
displayed by as many browsers as possible.
If you have not looked at it, why not? Do you choose to argue only on
the basis of theory? If so, you won't get anywhere with me. I'm far more
interested in making my pages work correctly and with the desired
functionality, than in dotting every 'I' and crossing every 'T.' If I
were to limit myself to just those capabilities that both meet the spec
and also work correctly on all browsers, I would have to consider the
result to be dull, lifeless, ineffective, and not worth doing.
| In any case, I added a second section to test this <![CDATA[ .. ]]>
| thing, and found that with every browser/version I have, the single
| '>' character is enough to tell the browser to resume interpretation
| and display.
Yes, because all those browsers use heuristic parsers. None of them
*recognize* CDATA marked sections. While regretable, this is by no
means surprising.
| You are welcome to check it with your own browser(s) -- I'd be
| interested in what you find. Take a look at:
| http://www.play-hookey.com/scripttest/
Your additional test indicates that you've misunderstood the meaning
(and thus misconstrued the purpose) of CDATA marked sections.
(Quouted text is from URL above)
> <h1>Additional Test, of '<![CDATA[ ... ]]>' nomenclature</h1><p>
> <![CDATA[According to the 'rules,' you should not be able to see this on
> your display.<p>
The *rules* of CDATA marked sections have nothing to do with hiding
anything. The *rules* have to do with inhibiting markup recognition: a
*compliant* parser is required to treat character sequences, that
would normally be parsed as markup, as verbatim data instead. In
general, non-compliant parsers will get this wrong. Not only that, but
heuristic parsers of the kind found in popular HTML browsers will
compound the error by applying their default "declaration heuristic"
of suppressing material between '<!' and '>'.
> If you see this sentence on your display but nothing between it and the
> header, the notation started OK, but did not correctly require three
> ending characters.<p ]]>
Exactly as you observed.
But, according to the *rules*, you should see all of the regular text,
*and* the strings '<p>' after the first sentence and '<p ' after the
second, because in a CDATA marked section, tags are *not* recognized.
> According to the rules, you should see nothing between this sentence and
> the header for the additional test.<p>
No. No such rules exist. CDATA means verbatim data. The following is a
*valid* HTML document:
<html><head>
<title>Believe It or Not</title>
</head><body>
<H1><![CDATA[</TABLE>foo<LI>bar<UL>baz</H2>]]></H1>
</body></html>
Why? Because the *text* of the H1 element is the *literal* string
'</TABLE>foo<LI>bar<UL>baz</H2>'
According to the rules. :-)
:ar
| Abigail wrote:
|> However, Lynx can as easily be configured to handle comments the
|> right way. Now, you cannot do that with Netscape.
| You don't *have* to do that with Netscape; it correctly recognizes
| the full comment notation,
No, it doesn't.
| and requires '-- >' to end '<!--' as it should.
False. Proof by demonstration of Netscape's ***incorrect*** parsing:
<!-- -- --> Is this in a comment, and why? <!-- -- -->
Yes, this is a trick question. There's an answer according to the
rules, and there's an answer according to Netscape. The two answers
differ.
:ar
What point are you trying to make Ken? That because Lynx can be
configured to be bug compatible with older versions of Netscape,
I should add bugs to my page?
FYI:
# If HISTORICAL_COMMENTS is TRUE, Lynx will revert to the "Historical"
# behavior of treating any '>' as a terminator for comments, instead of
# seeking a valid '-->' terminator (note that white space can be present
# between the '--' and '>' in valid terminators). The compilation default
# is FALSE.
#
# The compilation default, or default defined here, can be toggled via a
# "-historical" command line switch, and via the LYK_HISTORICAL command key.
++ > However, Lynx can as easily be configured to handle comments the right
++ > way. Now, you cannot do that with Netscape.
++
++ You don't *have* to do that with Netscape; it correctly recognizes the
++ full comment notation, and requires '-- >' to end '<!--' as it should.
Oh, please Ken, try to act like being that can learn.
-- TOGGLES comment on/off. > OUTSIDE of a comment ends the comment
declaration.
$ cat > test.html
<!doctype HTML system "http://www.ny.fnx.com/abigail/abigail.dtd">
<html>
<head>
<title>Test</title>
</head>
<body>
<p>
<!-- -- --> If this is shown, your browser BUGS <!-- -- -->
</body>
</html>
^D
$ lynx -dump http://localhost/abigail/test.html
If this is shown, your browser BUGS
$ lynx -minimal -dump http://localhost/abigail/test.html
$ netscape -remote 'openURL(http://localhost/abigail/test.html)'
$ netscape -remote 'saveAs(test.txt,Text)'
$ cat test.txt
If this is shown, your browser BUGS
$
Getting the difference Ken?
++ Or are you going to sit there and tell me that the '<![' notation begins
++ a comment? If so, it is *your* turn to RTFM, which is quite clear:
++
++ "The string <![ begins a marked section declaration"
I know what the manual says. And I am not claiming '<![' starts
a comment.
++ As I said before, I'm *trying* to set up my pages for maximum user
++ functionality with minimum chance for faulty displays. I find it very
++ difficult to understand why you would consider this attitude to be
++ stupid.
Because you are assuming that if one version of Lynx is configured
to be bug-compatible with older versions of Netscape, you assume
they all are, and cannot do the right thing.
But they can Ken. And Lynx users can configure their browser to
adapt to author mistakes. You cannot do that with Netscape.
I do wish you'd stop trying to put words in my mouth, Abigail. You're
doing an exceedingly poor job of it.
I have never told you how to write your pages. I don't have that kind of
authority over you, nor do you have such authority over me. I expect you
to put in your pages whatever suits you and your purpose, with only the
limitation that you are resposible for what you publish on the Web, just
as much as you would be when publishing in any other medium.
By the same token, I will write my pages in the manner and with the
content that suits me and my purpose. So that you won't have to try to
guess again at the point I'm trying to make, I'll tell you straight out:
My first priority is for my pages to function in the manner I want them
to, on *as many browsers as possible.* I will make whatever adjustments
may be required to accomplish that goal. Compliance with a specification
not in common use and not recognized by commonly-used browsers takes a
distant second place to working functionality.
[info on Lynx snipped]
>
> ++ > However, Lynx can as easily be configured to handle comments the right
> ++ > way. Now, you cannot do that with Netscape.
> ++
> ++ You don't *have* to do that with Netscape; it correctly recognizes the
> ++ full comment notation, and requires '-- >' to end '<!--' as it should.
>
> Oh, please Ken, try to act like being that can learn.
>
> -- TOGGLES comment on/off. > OUTSIDE of a comment ends the comment
> declaration.
Oh, please, Abigail, try to stop making such an effort to find fault
with everything I do or say. I carefully left a space between '--' and
'>' above, specifically to signify that I am well aware that such
whitespace is permitted here. I also did *not* put such a space in the
opening marker, to signify that I am equally well aware that those four
characters *must* appear together without whitespace, to be a legitimate
start of the comment declaration.
> ++ As I said before, I'm *trying* to set up my pages for maximum user
> ++ functionality with minimum chance for faulty displays. I find it very
> ++ difficult to understand why you would consider this attitude to be
> ++ stupid.
>
> Because you are assuming that if one version of Lynx is configured
> to be bug-compatible with older versions of Netscape, you assume
> they all are, and cannot do the right thing.
>
> But they can Ken. And Lynx users can configure their browser to
> adapt to author mistakes. You cannot do that with Netscape.
I assumed nothing of the kind. In fact, I chose to assume that *I don't
know* whether or not the browser is configured correctly. For that
self-same reason, I have begun escaping out the '<' and '>' in quoted
tags withing document.write strings. I can't do anything about
comparisons and such; they have to have angle brackets explicitly
stated. So any old browser that is broken to the extent that it will
allow '>' to end '<!--' is necessarily SOL in this regard.
In any case, I choose to write my pages so that, if at all possible,
they will display correctly *whether or not* the browser is properly
configured. I consider this to be more reasonable than expecting a lot
of non-technical people to configure their browsers to suit my
preferences.
Look again at your own .sig -- it takes a lot of arrogance to demand
that all users use a specific browser. It takes just as much arrogance
to demand a particular configuration or capability. I prefer to
accommodate as many variations as I can. How arrogant do you choose to
be???
>Have you checked that little page? If so, with what browser, and with
>what results? I really am interested, and want my pages to be correctly
>displayed by as many browsers as possible.
Just now. Lynx2-6rpu (2.6+Hiram Lester's patches+my style patches),
parsed the page correctly. Screen dump follows...
Test of a Kludgy NoScript Workaround
This page is a fast test of one rather kludgy method of obtaining the
effect of a NOSCRIPT container for browsers that don't recognize
scripting or else don't recognize NOSCRIPT tags.
If the script following this sentence works correctly, it should
accurately tell you whether or not Javascript is possible/enabled on
your browser.
Additional Test, of '<![CDATA[ ... ]]>' behavior
According to the rules, you should see nothing between this sentence
and the header for the additional test.
What point are you trying to make? That browsers don't understand marked
sections? Well, nice try. I'm fairly sure that if a loose, ramshackle
bunch[1] like the Lynx developers can get a browser than understands
marked sections, multi-billion dollar corporations can afford to add the
same functionality to their browsers.
Rob Partington
Netlink Administrator
[1] I hope they don't mind me calling us this...
| On Mon, 09 Dec 1996 20:34:06 +0000,
| Ken Bigelow <kbig...@www.play-hookey.com> wrote:
|> Have you checked that little page? If so, with what browser, and
|> with what results? I really am interested, and want my pages to be
|> correctly displayed by as many browsers as possible.
| Just now. Lynx2-6rpu (2.6+Hiram Lester's patches+my style patches),
| parsed the page correctly. Screen dump follows...
| Test of a Kludgy NoScript Workaround
[...]
| Additional Test, of '<![CDATA[ ... ]]>' behavior
|
| According to the rules, you should see nothing between this sentence
| and the header for the additional test.
| What point are you trying to make?
That no law bars the public display of intransigent ignorance.
:ar
| [info on Lynx snipped]
Also snipped was info on Netscape. Very revealing info.
|>++> However, Lynx can as easily be configured to handle comments the
|>++> right way. Now, you cannot do that with Netscape.
|>++
|>++ You don't *have* to do that with Netscape; it correctly
|>++ recognizes the full comment notation, and requires '-- >' to end
|>++ '<!--' as it should.
Please note the use of the words "correctly", "full", and "should".
And consider what a browser, if it were recognizing the full comment
notation correctly, should do with this:
<!-- -- --> Is your browser broken? <!-- -- -->
|> Oh, please Ken, try to act like being that can learn.
|>
|> -- TOGGLES comment on/off. > OUTSIDE of a comment ends the comment
|> declaration.
|
| Oh, please, Abigail, try to stop making such an effort to find fault
| with everything I do or say. I carefully left a space between '--'
| and '>' above, specifically to signify that I am well aware that
| such whitespace is permitted here. I also did *not* put such a space
| in the opening marker, to signify that I am equally well aware that
| those four characters *must* appear together without whitespace, to
| be a legitimate start of the comment declaration.
An entire paragraph of exquisitely irrelevant technical detail on, of
all things, whitespace -- when Abigail was making a fundamental point
on '--' as a TOGGLE and '>' OUTSIDE a comment to be a terminator. i.e.
what *should* be the *correct* recognition of the *full* notation.
Alan, you were right. This is hopeless.
:ar
--
"Against stupidity the Gods themselves contend in vain." - Schiller
Hello Arnoud,
As you were saying......
>Besides, any script containing '--' doesn't even pass validation.
Ok. We now know it won't pass validation. But that's not the question here.
Who really cares if it'll pass if it works? Just about any work around for
any problem won't pass validation in one form or another. What matter to me
is that it show what I want it to. Nothing more, nothing less. As long as
this is TRUE then everything else plays a backseat tune.
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
= TagIt/2 by Robert Spangler Author & Support =
- E-Mail: tagit2....@bms.franken.de -
= Web : http://www.franken.de/users/bms/tagit2.html =
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
E-Mail : bos...@bms.franken.de TeamOS/2 Germany
In article <58v2ja$b...@hub-r.franken.de>,
bos...@bms.franken.de (Robert Spangler) wrote:
> In message <hMJoy4uY...@htmlhelp.com> - gala...@htmlhelp.com (Arnoud
> "Galactus" Engelfriet)Sat, 30 Nov 1996 21:03:45 +0100 writes:
> >Besides, any script containing '--' doesn't even pass validation.
>
> Ok. We now know it won't pass validation. But that's not the question here.
Actually, it was. This whole thread (in particular the threads
between Arjun Ray and Ken Bigelow) was about finding a way to
hide Javascripts and keeping the HTML document containing it
syntactically legal.
> Who really cares if it'll pass if it works? Just about any work around for
> any problem won't pass validation in one form or another.
There are *very* few situations in which a non-validating solution
is the only one, and hiding scripts isn't one of them.
> What matter to me
> is that it show what I want it to. Nothing more, nothing less. As long as
> this is TRUE then everything else plays a backseat tune.
A Javascript containing '--' is displayed as plain text (after the '--')
on every browser that understands comments.
Actually, Rob, I was trying to *find out* something. This may be a
concept beyond your personal experience or imagination, but I really did
want to learn the results obtained with a wide range of browsers,
exactly as I said in my earlier post. Is that really so amazing to you?
If you absolutely *have* to have a point, try this one: it is far better
IMHO to write pages that display as desired on *as many browsers as
possible,* than to sit and bewail the observed fact that not all
browsers behave exactly the same way with any given markup.
What the Hell? Several weeks ago this NG sported a thread involving the
use of a <base target="_top"> tag. The purpose was to prevent the pages
on one site from being loaded into frames defined from another site.
Seems a reasonable goal to me. Even the HTML "purist" group agreed that
this was a reasonable and acceptable thing to do, *in spite of the fact
that it will not validate* because it is not a standard usage. The
advice from all sides was to simply ignore the error message from the
validator.
I figure the same logic can and should apply to Javascript embedded in
an HTML file. Actually, an HTML 3.2 validator following Wilbur should
simply ignore the entire SCRIPT container no matter what it contains. A
level 2.0 validator *should* see the comment markers and ignore
everything inside of that. The use of the '--' decrement operator is
actually infrequent. But even if a validator objects to the Javascript
code as being invalid HTML, SO WHAT??? I prefer to make my pages work
correctly on the browsers. If a validator can't handle the Javascript,
that's a shortcoming of the validator.
It makes no difference to me whether the browser is or is not
technically breaking some little-used specification. If my pages operate
correctly on the browsers people use, that's good enough for me. Strict
compliance with every rule is far less important to me than correct
operation in the real world.
Besides, if some rule is being widely ignored because it serves no
useful or necessary purpose, it's time to change the rule -- not to
require everyone to comply with a useless and unnecessary rule. The Web
is still an infant, still growing and evolving. As we learn more, we
leave some faulty ideas behind. Including outmoded rules and
specifications.
You're almost right in the way I was thinking, Arnoud. I would have
preferred to find a solution that satisfied all the angles. However, all
suggestions and arguments to avoid the use of the full comment
declaration have foundered on the rocks of hard truth: some browsers
simply won't handle other methods in the approved fashion.
It does no good to squawk about broken browsers -- when the vast
majority of browsers in the hands of the general public behave in
thus-and-such a way, it's time to write pages that will work on those
browsers. In this regard, the strict rules have to take a back seat to
the reality.
My first priority has always been to write pages that work correctly.
While an active site can probably never suit all possible browsers, it
can be written to suit the wide majority in some way or another.
If I could have done this strictly according to the HTML rules (even for
the Javascript), fine. But when newer functionality starts to clash with
the older rules, it's time to update the rules to fit the changed
reality.
>
> > Who really cares if it'll pass if it works? Just about any work around for
> > any problem won't pass validation in one form or another.
>
> There are *very* few situations in which a non-validating solution
> is the only one, and hiding scripts isn't one of them.
>
> > What matter to me
> > is that it show what I want it to. Nothing more, nothing less. As long as
> > this is TRUE then everything else plays a backseat tune.
>
> A Javascript containing '--' is displayed as plain text (after the '--')
> on every browser that understands comments.
Uh, verify, please. Checking the W3C site, I note that multiple comments
are permitted within a single comment declaration, and that *only* the
sequence '--' (followed by optional whitespace and then) '>' can end the
comment declaration. I would have expected that a browser that fully
understands comments would not revert to displaying the text following
an intermediate '--' string, since that is not the legal end of a
comment declaration. A following character other than '>' is still not
the legal end.
If this is so, I would expect that a page where *all* of the angle
brackets are escaped (which I am in the process of doing on my pages)
would necessarily wait until the '-->' string is found, since that would
be the very first instance of '>' following the <!-- string. A browser
might conceivably ignore that if it had already seen an even number of
'--' strings, but that would only mean that HTML beyond the </script>
tag would also be ignored.
I realize and agree that the rules object to intermediate text between
marked comments, but those same rules are also very clear on what
constitutes the end of the comment declaration, and hence the end of the
CDATA that was inside that declaration.
Also, one question for my own edification: what browser(s) actually
behave as you describe above, and start displaying text immediately
following the '--' string? I have not yet encountered any.
In article <32B70E...@www.play-hookey.com>,
Ken Bigelow <kbig...@www.play-hookey.com> wrote:
> If I could have done this strictly according to the HTML rules (even for
> the Javascript), fine. But when newer functionality starts to clash with
> the older rules, it's time to update the rules to fit the changed
> reality.
The point here is, the "new" functionality is implemented in a
*broken* way. There IS a perfectly acceptable solution, but it's not
implemented. Surely you agree that it's backwards to compromise the
specification to accomodate broken implementations?
> > A Javascript containing '--' is displayed as plain text (after the '--')
> > on every browser that understands comments.
>
> Uh, verify, please.
<!-- This is a comment -- This isn't >
> I would have expected that a browser that fully
> understands comments would not revert to displaying the text following
> an intermediate '--' string, since that is not the legal end of a
> comment declaration.
You may have a point there. The specification does not say what
should be done with the text "This isn't" in my above example. Of
course, if the "This isn't" bit contains a ">" then a compliant
browser will show the text (if it treats the "This isn't" as invalid
markup, and the ">" as end of the SGML declaration).
> Also, one question for my own edification: what browser(s) actually
> behave as you describe above, and start displaying text immediately
> following the '--' string? I have not yet encountered any.
I believe Lynx 2.6 does this, with the right combination of
"heuristic" and "historical" comment parsing. I did some fiddling
with the settings some time ago to solve a problem similar to this.