HTML and CDATA produced by Rails

Peter Michaux

unread,

Dec 9, 2006, 11:27:42 PM12/9/06

to

Hi,

I am experimenting with some of the Ruby on Rails JavaScript generators
and see something I haven't before. Maybe it is worthwhile?

In the page below the script is enclosed in

//<![CDATA[
//]]>

Is this trick grounded in any real information about HTML vs XHTML? I
think what they are trying to achieve is a way to generate a script
block that can be used in either HTML or XHTML. Have they managed that
goal? It looks fine for HTML but will the double slashes be ok in
XHTML?

Thanks,
Peter

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">

<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Fork JavaScript</title>
</head>

<body>

</body>
</html>

David Golightly

unread,

Dec 9, 2006, 11:50:46 PM12/9/06

to

Peter Michaux wrote:
> //<![CDATA[
> //]]>
>
> Is this trick grounded in any real information about HTML vs XHTML? I
> think what they are trying to achieve is a way to generate a script
> block that can be used in either HTML or XHTML. Have they managed that
> goal? It looks fine for HTML but will the double slashes be ok in
> XHTML?

Yes, this is not anything unique to Rails. This is SGML for "ignore
whatever comes between these marks". The leading slashes are OK
because they have no significance in SGML syntax. This is supposed to
prevent the parser from misreading any characters that would otherwise
be read as markup, such as angle brackets (> and <) which have a
different meaning in JavaScript:

http://www.w3.org/TR/REC-xml/#sec-cdata-sect

David

Richard Cornford

unread,

Dec 10, 2006, 2:09:52 AM12/10/06

to

Peter Michaux wrote:
> Hi,
>
> I am experimenting with some of the Ruby on Rails JavaScript
> generators and see something I haven't before. Maybe it is worthwhile?
>
> In the page below the script is enclosed in
>
> //<![CDATA[
> //]]>
>
> Is this trick grounded in any real information about HTML vs XHTML?

Not really, It is a cargo cult/mystical incantation thing. In HTML the
contents of SCRIPT elements are CDATA anyway and the two lines posted
are just end of line comments. In XHTML the contents of SCRIPT elements
are PCDATA, but there is no need to attempt to comment out the CDATA
block mark-up. And as it is largely impractical to write non-trivial
scripts that will function with both HTML DOMs and XHTML DOMs there is
no need for a construct to 'normalise' SCRIPT element contents to CDATA
for use with both types of DOM.

> I think what they are trying to achieve is a way to
> generate a script block that can be used in either
> HTML or XHTML.

They may think that they are trying to do that, but probably without any
understanding of the differences between an XHTML DOM and an HTML DOM,
and so no appreciation of the worthlessness of the exercise.

> Have they managed that goal?

Yes, but the scripts being inserted into that construct are extremely
unlikely to work if the mark-up that contains them is ever interpreted
as XHTML and an XHTML DOM exposed to the script. Which renders the
effort moot.

> It looks fine for HTML but will the double slashes be
> ok in XHTML?

They will be fine, it is the scripts they contain that will likely go
belly-up if the documents are ever interpreted as XHTML by a web
browser.

Richard.

Richard Cornford

unread,

Dec 10, 2006, 2:09:57 AM12/10/06

to

David Golightly wrote:
> Peter Michaux wrote:
>> //<![CDATA[
>> //]]>
>>
>> Is this trick grounded in any real information about HTML vs
>> XHTML? I think what they are trying to achieve is a way to
>> generate a script block that can be used in either HTML or
>> XHTML. Have they managed that goal? It looks fine for HTML
>> but will the double slashes be ok in XHTML?
>
> Yes, this is not anything unique to Rails.

That is certainly true. Rails is not the only system that includes
spurious constructs in mark-up for no particularly good reason.

> This is SGML for "ignore
> whatever comes between these marks".

Nonsense. The <![CDATA[ ... ]]> mark up in XML is adopted directly form
SGML and has exactly the same meaning in both. In the context of the
contents of an HTML SCRIPT element, which is already CDATA, the
construct has no significance as such contents are not parsed for
mark-up, beyond identifying where they end.

> The leading slashes are OK
> because they have no significance in SGML syntax.

No, they are OK because the contents of an HTML SCRIPT element are CDATA
and so not mark-up at all, and the contents of an XHTML script element
are PCDAT and so the slashes outside the <![CDATA[ and ]]> delimiters
will be parsed and the (unchanged) results used as part of the script
source code, with the unparsed contents of the CDATA block inserted in
place of the CDATA delimiters.

> This is supposed to prevent the parser from misreading any
> characters that would otherwise be read as markup, such as
> angle brackets (> and <) which have a different meaning in
> JavaScript:

And is worthless in an HTML document where no such misinterpretation is
possible, while in an XHTML document there is no need for the end of
line comment syntax.

Richard.

Peter Michaux

unread,

Dec 10, 2006, 2:45:27 AM12/10/06

to

Richard Cornford wrote:
> Peter Michaux wrote:
> > Hi,
> >
> > I am experimenting with some of the Ruby on Rails JavaScript
> > generators and see something I haven't before. Maybe it is worthwhile?
> >
> > In the page below the script is enclosed in
> >
> > //<![CDATA[
> > //]]>
> >
> > Is this trick grounded in any real information about HTML vs XHTML?
>
> Not really, It is a cargo cult/mystical incantation thing. In HTML the
> contents of SCRIPT elements are CDATA anyway and the two lines posted
> are just end of line comments. In XHTML the contents of SCRIPT elements
> are PCDATA, but there is no need to attempt to comment out the CDATA
> block mark-up. And as it is largely impractical to write non-trivial
> scripts that will function with both HTML DOMs and XHTML DOMs there is
> no need for a construct to 'normalise' SCRIPT element contents to CDATA
> for use with both types of DOM.
>
> > I think what they are trying to achieve is a way to
> > generate a script block that can be used in either
> > HTML or XHTML.
>
> They may think that they are trying to do that, but probably without any
> understanding of the differences between an XHTML DOM and an HTML DOM,
> and so no appreciation of the worthlessness of the exercise.

I haven't read anything about how a script would have to be written
differently depending if it is in an HTML document or an XHTML document
(being correctly interpreted as XHTML).

Do you have a tiny example or pointer to some information about this?

Thanks,
Peter

David Golightly

unread,

Dec 10, 2006, 4:20:05 AM12/10/06

to

Richard Cornford wrote:

> David Golightly wrote:
> > This is SGML for "ignore
> > whatever comes between these marks".
>
> Nonsense. The <![CDATA[ ... ]]> mark up in XML is adopted directly form
> SGML and has exactly the same meaning in both. In the context of the
> contents of an HTML SCRIPT element, which is already CDATA, the
> construct has no significance as such contents are not parsed for
> mark-up, beyond identifying where they end.

Richard - Read once more what I wrote, then what you wrote, then cool
down for once, then tell me how they disagree with each other.

David Golightly

unread,

Dec 10, 2006, 4:35:26 AM12/10/06

to

Richard Cornford wrote:
> David Golightly wrote:

> > Yes, this is not anything unique to Rails.

> > The leading slashes are OK
> > because they have no significance in SGML syntax.
>
> No, they are OK because the contents of an HTML SCRIPT element are CDATA
> and so not mark-up at all, and the contents of an XHTML script element
> are PCDAT and so the slashes outside the <![CDATA[ and ]]> delimiters
> will be parsed and the (unchanged) results used as part of the script
> source code, with the unparsed contents of the CDATA block inserted in
> place of the CDATA delimiters.

That's exactly what I said. Thanks for reiterating.

>
> > This is supposed to prevent the parser from misreading any
> > characters that would otherwise be read as markup, such as
> > angle brackets (> and <) which have a different meaning in
> > JavaScript:
>
> And is worthless in an HTML document where no such misinterpretation is
> possible, while in an XHTML document there is no need for the end of
> line comment syntax.
>

Sorry Richard, you're off base on this one. Case in point: closing
tags in script strings. Consider the following code:

As we all know, older HTML parses will barf at the literal closing
</script> tag in the string. That's why you see strings broken up
like:

'<'+'/script' etc.

so browsers don't mistake this for an HTML element and parse it
mistakenly. (See Flanagan, JavaScript: TDG, 5th Edition, page 247).
With a CDATA section explicitly defined, this problem doesn't occur and
coders don't have to worry about this mistake. This alone, and the
fact that the CDATA tag doesn't add any other drawbacks over the
conventional  comment, make this a preferred approach for
commenting out script sections.

RobG

unread,

Dec 10, 2006, 6:48:11 AM12/10/06

to

Peter Michaux wrote:
> Hi,
>
> I am experimenting with some of the Ruby on Rails JavaScript generators
> and see something I haven't before. Maybe it is worthwhile?
>
> In the page below the script is enclosed in
>
> //<![CDATA[
> //]]>
>
> Is this trick grounded in any real information about HTML vs XHTML?

There is a possibility that if a parser encounters "</" or "</script>"
within a script it may be interpreted as an end of script element tag,
and anything after may be thought of as markup.

The simple solution is to use external scripts. If the script is
included in the page, wherever "</" might be encountered either:

1. separate the "<" and "/" with a space: "< /"
2. use a backslash to quote it: "<\/"

as appropriate. I can't think of a case to use the former as I don't
think it's legal ECMAScript syntax (but it might occur in some other
script language), the latter can be used where HTML is included in the
script, say as innerHTML or with document.write.

ISTM much better to deal with those cases as they occur, rather than
with the blanket inclusion of comment delimiters which are otherwise
pointless.

--
Rob

Richard Cornford

unread,

Dec 10, 2006, 11:48:33 AM12/10/06

to

Peter Michaux wrote:
> Richard Cornford wrote:
>> Peter Michaux wrote:

<snip>

>>> In the page below the script is enclosed in
>>>
>>> //<![CDATA[
>>> //]]>

<snip>

>>> I think what they are trying to achieve is a way to
>>> generate a script block that can be used in either
>>> HTML or XHTML.
>>
>> They may think that they are trying to do that, but probably
>> without any understanding of the differences between an XHTML
>> DOM and an HTML DOM, and so no appreciation of the
>> worthlessness of the exercise.
>
> I haven't read anything about how a script would have to be
> written differently depending if it is in an HTML document
> or an XHTML document (being correctly interpreted as XHTML).

Why would you expect to? There is no commercial application for
client-side scripting of XHTML for as long as IE is not capable of
interpreting XHTML and creating an XHTML DOM to be scripted.

> Do you have a tiny example or pointer to some information
> about this?

Examples are posted to the group, maybe not frequently but regularly
enough to make it obvious that most who wander into XHTML DOM scripting
are not expecting what they find. An archive search (probably restricted
to the last 5 years or so) with keywords corresponding to the namespace
qualified DOM menthols (such as - createElementNS - or -
setAttributeNS -) should turn a number of real life examples (as
namespace qualified methods need to be used with XHTML DOMs).

Richard.

Richard Cornford

unread,

Dec 10, 2006, 11:48:38 AM12/10/06

to

David Golightly wrote:
> Richard Cornford wrote:
>> David Golightly wrote:
>>> This is SGML for "ignore
>>> whatever comes between these marks".
>>
>> Nonsense. The <![CDATA[ ... ]]> mark up in XML is adopted
>> directly form SGML and has exactly the same meaning in both.
>> In the context of the contents of an HTML SCRIPT element,
>> which is already CDATA, the construct has no significance as
>> such contents are not parsed for mark-up, beyond identifying
>> where they end.
>
> Richard - Read once more what I wrote,

I did read what you wrote, it was, "ignore whatever comes between theses
marks", and that is very much not true. SGML includes the marked section
keyword IGNORE so that if it does need to "ignore whatever comes between
theses marks" that facility exists. CDATA is not ignored it is just not
parsed (beyond scanning for its closing delimiter).

> then what you wrote, then cool
> down for once, then tell me how they disagree with
> each other.

The CDATA marked section is not ignored in XHTML as its contents must
contribute to the source characters that will be interpreted as script,
and in HTML there is no CDATA marked section within the SCRIPT element
as the SCRIPT element's contents are already CDATA and so the
<![CDATA[ and ]]> delimiters are just sequences of text characters. That
is; in XHTML they are not ignored and in HTML they never existed as
mark-up and so could not mark their contents as to be ignored.

Richard.

Richard Cornford

unread,

Dec 10, 2006, 11:48:42 AM12/10/06

to

David Golightly wrote:
> Richard Cornford wrote:
>> David Golightly wrote:
>> > Yes, this is not anything unique to Rails.
>> > The leading slashes are OK
>> > because they have no significance in SGML syntax.
>>
>> No, they are OK because the contents of an HTML SCRIPT element
>> are CDATA and so not mark-up at all, and the contents of an
>> XHTML script element are PCDAT and so the slashes outside the
>> <![CDATA[ and ]]> delimiters will be parsed and the (unchanged)
>> results used as part of the script source code, with the
>> unparsed contents of the CDATA block inserted in place of the
>> CDATA delimiters.
>
> That's exactly what I said.

That is not what you said. You said that "The leading slashes are OK
because they have no significance in SGML syntax", and while the leading
slashes have no significance in SGML (which is not necessarily true as
SGML is _extremely_ flexible in which characters and character sequences
may be used as delimiters and/or be significant) it is not this absence
of significance that makes them OK.

There are no shortage of other things that could be included in those
location that have "no significance in SGML" (HTML or XHTML) that would
be anything but OK. Thus the OK-ness here cannot follow form the
insignificance in SGML.

What makes them OK is not that they are not significant to SGML but that
they are in a location where their only significance is to a javascript
parser and they are OK to the javascript parser.

> Thanks for reiterating.

>>> This is supposed to prevent the parser from misreading
>>> any characters that would otherwise be read as markup,
>>> such as angle brackets (> and <) which have a different
>>> meaning in JavaScript:
>>
>> And is worthless in an HTML document where no such
>> misinterpretation is possible, while in an XHTML
>> document there is no need for the end of line comment
>> syntax.
>
> Sorry Richard, you're off base on this one.

Don't be so sure. You have not had much practical experience of web
browsers and seem to be inclined to swallow the stories you read on the
Internet without understanding or questioning them.

Remember what I have said here; the <![CDATA[ ... ]]> mark-up is
worthless in an HTML document and the javascript comments are worthless
in an XHTML document.

> Case in point: closing
> tags in script strings. Consider the following code:
>
> <script type="text/javascript">
> //
> </script>
>
> As we all know, older HTML parses will barf at the
> literal closing </script> tag in the string.

All HTML parsers should stop treating the character data as character
data as soon as they encounter the character sequence - </script> -
within the data. Those are the rules for parsing HTML documents.

> That's why you see strings broken up
> like:
>
> '<'+'/script' etc.

What I see are character sequences that resemble closing SCRIPT tags
modified into "<\/script>", when I see '<'+'/script>' I am just reminded
that people who don't really understand what they are doing will or why
they are doing it will tend to do things badly, and do introduce the two
unnecessary extra source characters, the runtime overhead of the
concatenation and the two intermediate strings, where one expedient
backslash removes the HTML parsing issue in a way that is invisible
beyond the tokenising of the source code.

> so browsers don't mistake this for an HTML element and
> parse it mistakenly. (See Flanagan, JavaScript: TDG,
> 5th Edition, page 247).

Very funnny.

> With a CDATA section explicitly defined, this problem
> doesn't occur

Nonsense. In HTML it makes no difference. The contents of the HTML
SCRIPT element are already CDATA so the "<![CDATA[" character sequence
will be seen as nothing but a sequence of characters. Thus it will not
modify the interpretation of the character data that follows it and
precedes the ]]> and any occurrences of "</script>" in that data will
still be interpreted as terminating the CDAT content of the HTML SCRIPT
element.

And so, as I said, the <![CDATA[ ... ]]> mark-up is worthless in an HTML
document because in a context that is already CDATA it cannot influence
the interpretation of anything, and certainly not "</script>" character
sequences inside javascript strings.

> and coders don't have to worry about this mistake.

If the coders gets fooled into not worrying that would be a problem for
him, as nothing has changed as a result of wrapping <![CDATA[ ... ]]>
around a sub-section of data that was already CDATA..

> This alone, and the fact that the CDATA tag doesn't
> add any other drawbacks over the conventional 
> comment, make this a preferred approach for commenting
> out script sections.

You seem to like declaring things as "preferred". You don't seem too
keen on saying who it is doing this preferring. Obviously they prefer
making up their own excuses for their preferences, rather than having
any genuine technical justification for them.

No round here we 'prefer' that the whole "commenting out script
sections" thing die the death that is now at leas half a decade overdue.

Richard.

John G Harris

unread,

Dec 10, 2006, 10:51:26 AM12/10/06

to

In article <1165751291.2...@l12g2000cwl.googlegroups.com>, RobG
<rg...@iinet.net.au> writes

>
>Peter Michaux wrote:
>> Hi,
>>
>> I am experimenting with some of the Ruby on Rails JavaScript generators
>> and see something I haven't before. Maybe it is worthwhile?
>>
>> In the page below the script is enclosed in
>>
>> //<![CDATA[
>> //]]>
>>
>> Is this trick grounded in any real information about HTML vs XHTML?
>
>There is a possibility that if a parser encounters "</" or "</script>"
>within a script it may be interpreted as an end of script element tag,
>and anything after may be thought of as markup.

<snip>

It's stronger than that. The HTML parser is *required* to treat </
followed by any letter as the end of the script.

John
--
John Harris

VK

unread,

Dec 10, 2006, 1:50:57 PM12/10/06

to

Peter Michaux wrote:
> Hi,
>
> I am experimenting with some of the Ruby on Rails JavaScript generators
> and see something I haven't before. Maybe it is worthwhile?
>
> In the page below the script is enclosed in
>
> //<![CDATA[
> //]]>

That is one of variants of combo-comment. Combo-comments are *not*
needed for the regular scripts run exclusively on HTML pages (neither
regular comments of any kind are, as pointed many times in this
newsgroup). At the same time they are vital in the higher level
development where the parsing goes by XML rules or where it may go both
by HTML rules or XML rules in different environment. I'm not working
with Ruby, but I had to solve the same problems with behaviors, so I
can comment on it. Let's take a primitive fragment like:

and save it as script.xml (Further it is presumed that you edit and
open this file in your browser with XML support). Drag and drop it onto
your browser: it will be parsed OK because it's a well-formed XML
fragment.

Now change this to:

Oops... lt sign breaks the well-formedness, this segment will not go
through a XML parser. Comments are our friends:

Well-formedness restored. But:

Well-formedness is broken again. This is the point where different
amateurish manuals starting to advise do not use unary minus: which is
a 100% pure b.s. of course because at no circumstances the program
*logic* can be affected by external textual parser.

But no problem, just making it a CDATA section and the worries away!

So are we done now? Not yet really. Script parsers are instructed to
ignore opening HTML comment, but not the opening CDATA tag ("- Parsers
are not instructed to ignore...", "-There can be parsers..." - anyone
willing to say this b.s. one more time is insistently suggested to shut
up before I came to smack you. There are some limits to any patience.
That is not addressed to OP).

So now we have a fragment which as well-formed for XML parsing but
illegal for script parser. Human mind has no borders :-) Let's use
JavaScript comments atop of CDATA wrapper:

This way XML parser will see <script> node with text content // (two
forward slashes) which is OK in any circumstances. At the same time
script parser will not see CDATA tags because they are hidden behind
script comments.