Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

text/html for xml extensions of XHTML

22 views
Skip to first unread message

William F. Hammond

unread,
Apr 30, 2001, 4:59:59 PM4/30/01
to www-...@w3.org, mozilla...@mozilla.org
XHTML may be served through http as "text/html" according to the XHTML
specification

http://www.w3.org/TR/2000/REC-xhtml1-20000126

if it conforms to Appendix C on compatibility with older user agents,
as provided in section 5.1.

Section 3.1.2 explains how one may use namespaces to enrich the tagset
in XHTML. I see no conflict between section 3.1 and Appendix C except
possibly in regard to C.11 (DOM handling). Clearly, DOM handling must
be subordinate to the question of whether to call the XML parser and if
C.11 was intended to be definitive on the content type issue, then it
is in conflict with section 5, and the document stands with an error.

For a user agent, such as Mozilla (http://www.mozilla.org/), that
houses an xml parser there is the general question of when that parser
should be called for an object served through http, and there is
current controversy needing resolution over the question of whether
the http content type is the correct basis for that decision.

Some assert that any XHTML document with namespace extensions must
be served as "text/xml" and must not be served as "text/html".

This issue was last discussed here in February. See

http://lists.w3.org/Archives/Public/www-talk/2001JanFeb/0079.html
http://lists.w3.org/Archives/Public/www-talk/2001JanFeb/0080.html

A user agent with an xml parser need look no further than the first
instance tag. Thus, a user agent with an xml parser should call that
parser if any of the following is true:

1. The instance is served through http as "text/xml". (Please note,
however, that

2. The instance is served through http as "text/html" and any of
the following is true:

a. The instance begins with the string "<?xml" .

b. The instance has a string matching the case-sensitive pattern
"<!DOCTYPE html PUBLIC .*XHTML" before the first document
instance tag.

c. The first document instance tag is an open tag for the element
"html" (all lower case) with a value specified for the attribute
"xmlns".

See also:

Connolly and Masinter, RFC 2854: "The 'text/html' Media Type"
http://www.ietf.org/rfc/rfc2854.txt

I note that the proposed recommendation

http://www.w3.org/TR/2001/PR-xhtml11-20010406

(review ends on May 7) does not mention content-type.
(And content-type is not really a markup issue.)

Now in the mozilla-mathml discussion we are told that there have been
recent further deliberations on this question at W3C.

Can anyone report definitively?

Thanks.

-- Bill


William F. Hammond Dept. of Mathematics & Statistics
518-442-4625 The University at Albany
ham...@math.albany.edu Albany, NY 12222 (U.S.A.)
http://www.albany.edu/~hammond/ Dept. FAX: 518-442-4731

Never trust an SGML/XML vendor whose web page is not valid HTML.
And always support affirmative action on behalf of the finite places.


Roger B. Sidje

unread,
Apr 30, 2001, 5:27:34 PM4/30/01
to William F. Hammond, www-...@w3.org, mozilla...@mozilla.org
Currently, W3C has a very clear position on this. One of the items in
its "Common User Agent Problems" is that a valid MIME type should take
precedence over anything else (no sniffing of the content, etc), i.e.,
A document served as text/xml -> xml parser
A document served as text/html -> html parser

This is recorded in the Mozilla's bug tracking system:
W3C CUAP: Respect Content-Type HTTP header
http://bugzilla.mozilla.org/show_bug.cgi?id=68421

With such a clear position, there hasn't be much motivation at looking at
the bug that I filed, requesting to use the file extension as a clue (it
is also easier to implement).
http://bugzilla.mozilla.org/show_bug.cgi?id=67646
---
RBS

Ian Hickson

unread,
Apr 30, 2001, 8:09:53 PM4/30/01
to William F. Hammond, www-...@w3.org, mozilla...@mozilla.org
On Mon, 30 Apr 2001, William F. Hammond wrote:
>
> XHTML may be served through http as "text/html" according to the XHTML
> specification
>
> http://www.w3.org/TR/2000/REC-xhtml1-20000126
>
> if it conforms to Appendix C on compatibility with older user agents,
> as provided in section 5.1.

The reasoning behind this, as I understand it, being that markup sent as
text/html should be renderable on older user agents. This, in my opinion
of course, implies that anything that is _not_ compatible with older user
agents should _not_ be sent as text/html. This would, I contend, include
markup from other namespaces.


> For a user agent, such as Mozilla (http://www.mozilla.org/), that

> houses an xml parser [...]

Other examples being IE or Opera, of course.


> Some assert that any XHTML document with namespace extensions must
> be served as "text/xml" and must not be served as "text/html".

That view point is consistent with the statement in section 5.1 of XHTML
1.0, which only says that XHTML documents which follow guidelines intended
to be backwardsly compatible may be sent as text/html. Since the use of
XML namespaces is not compatible with older user agents, a logical
assumption is that documents using such extensions should not be sent as
text/html, although the spec makes no recommendation about MIME labeling
of other XHTML documents".

Furthermore, if the content is not compatible with older user agents,
there is no reason to _want_ to send the objects as text/html. Conforming
XML parsers that handle XHTML with content from other namespaces will
handle it when sent as text/xml.


> A user agent with an xml parser need look no further than the first
> instance tag.

Which of course might come after any number of string that appear
to be instance tags hidden in comments, processing instructions,
and internal subsets, all of which, in browsers wishing to support
existing text/html tag soup, should be ignored.


> Thus, a user agent with an xml parser should call that parser if any
> of the following is true:
>
> 1. The instance is served through http as "text/xml".

Agreed.


> 2. The instance is served through http as "text/html" and any of
> the following is true:
>
> a. The instance begins with the string "<?xml" .

Nope. Here is a document that is valid text/html, but non-well-formed
text/xml, and which should therefore be sent through the HTML parser:

<?xml this is not?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
<!-- -- -->
This is a comment. This document is not XHTML.
<html xmlns="http://www.w3.org/1999/xhtml"/>
Ok, I'm done now. -->
<html>
<title> Need a title in HTML! </title>
<p> This is a valid HTML document.
</html>

See:
http://www.damowmow.com/mozilla/html-not-xml.html (the document)
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.damowmow.com%2Fmozilla%2Fhtml-not-xml.html&doctype=Inline

Note that Mozilla renders this document correctly.


> b. The instance has a string matching the case-sensitive pattern
> "<!DOCTYPE html PUBLIC .*XHTML" before the first document
> instance tag.

Hmm, the valid HTML document above also matches that string.


> c. The first document instance tag is an open tag for the element
> "html" (all lower case) with a value specified for the attribute
> "xmlns".

How do you know it is the first instance tag without having a full XML
parser to skip past PIs, comments, internal subsets, and the like?

--
Ian Hickson )\ _. - ._.) fL
Invited Expert, CSS Working Group /. `- ' ( `--'
The views expressed in this message are strictly `- , ) - > ) \
personal and not those of Netscape or Mozilla. ________ (.' \) (.' -' ______


Karl Ove Hufthammer

unread,
May 1, 2001, 2:33:16 AM5/1/01
to
ham...@csc.albany.edu (William F. Hammond) wrote in message
<200104302059...@pluto.math.albany.edu>:

> XHTML may be served through http as "text/html" according to the
> XHTML specification
>
> http://www.w3.org/TR/2000/REC-xhtml1-20000126

<URL: http://lists.w3.org/Archives/Public/www-html/2001Apr/0118.html >:

Abstract

This document defines the 'application/xhtml+xml' MIME media type
for XHTML based markup languages; it is not intended to obsolete
any previous IETF documents, in particular RFC 2854 which registers
'text/html'.

This document was prepared by members of the W3C HTML working group
based on the structure, and some of the content, of RFC 2854, the
registration of 'text/html'. Please send comments to
www-...@w3.org, a public mailing list (requiring subscription)
with archives at <http://lists.w3.org/Archives/Public/www-html/>.

--
Karl Ove Hufthammer

William F. Hammond

unread,
May 1, 2001, 8:13:21 AM5/1/01
to r...@maths.uq.edu.au, mozilla...@mozilla.org, www-...@w3.org
Roger B. Sidje <r...@maths.uq.edu.au> writes:

> Currently, W3C has a very clear position on this. One of the items in
> its "Common User Agent Problems" is that a valid MIME type should take

You mean http://www.w3.org/TR/2001/NOTE-cuap-20010206, a *note* ??

> precedence over anything else (no sniffing of the content, etc), i.e.,
> A document served as text/xml -> xml parser
> A document served as text/html -> html parser

The name of the XML version of html is "html". The issue here is what
is the definition of "text/html". The CUAP document does not address
that.

In the larger scheme of things the difference between html as xml
and classical html is small. It's largely a technical difference
with the xml version being something that is easier to render than
the classical.

"text/html" is not just another content-type because, as the web has
evolved, it is the most important of a very small number of content-types
that user agents have always handled internally.

What is at stake is whether the xml version of html is going to gain
acceptance as the lingua franca of the web and, on top of that, whether
or not name space extensions of html are going to be accepted under
that roof.

In my opinion a user agent with an xml parser looks very bad if it rolls
over the top of a document containing the lines

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">

without using its xml parser.

These lines are taken from http://www.w3.org/ , which is served with
content-type "text/html".

-- Bill


William F. Hammond

unread,
May 1, 2001, 12:14:59 PM5/1/01
to mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> writes:

> > 2. The instance is served through http as "text/html" and any of
> > the following is true:
> >
> > a. The instance begins with the string "<?xml" .
>
> Nope. Here is a document that is valid text/html, but non-well-formed
> text/xml, and which should therefore be sent through the HTML parser:

SGML validation does not pass on the merits of PI's. In today's world
the appearance of "<?xml " at the beginning of a text/html item
clearly indicates XML. Since the xml PI is present, I think that a
sane xml-aware user agent should discard this example since it is not
conforming xml even though it might validate (perhaps, however, not
without warnings) as sgml.

> <?xml this is not?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
> <!-- -- -->
> This is a comment. This document is not XHTML.
> <html xmlns="http://www.w3.org/1999/xhtml"/>
> Ok, I'm done now. -->
> <html>
> <title> Need a title in HTML! </title>
> <p> This is a valid HTML document.
> </html>

[snip]


> > b. The instance has a string matching the case-sensitive pattern
> > "<!DOCTYPE html PUBLIC .*XHTML" before the first document
> > instance tag.
>
> Hmm, the valid HTML document above also matches that string.

Well, yes, if you look beyond the end of the "<!DOCTYPE ...>". My
intention was that the string "XHTML" should be inside the value of
the FPI, and perhaps the string should be "DTD XHTML".

For the moment I don't know exactly how I would express it. Still I
think that an xml capable user agent will look bad rolling past a
correct document type declaration for XHTML.

> > c. The first document instance tag is an open tag for the element
> > "html" (all lower case) with a value specified for the attribute
> > "xmlns".
>
> How do you know it is the first instance tag without having a full XML
> parser to skip past PIs, comments, internal subsets, and the like?

Surely a user agent in classical mode has a way of knowing what is a
tag and what is not a tag.

Since many user agents appear to ignore PI's and document type
declarations and many extant html offerings do not have document type
declarations, (c) might reasonably be the sole criterion for calling
the xml parser.

Nonetheless a new user agent should be able to handle (a) and (b).

But does Mozilla call its xml parser for http://www.w3.org/ ?

-- Bill


Ian Hickson

unread,
May 1, 2001, 7:10:40 PM5/1/01
to William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Tue, 1 May 2001, William F. Hammond wrote:
>>>
>>> 2. The instance is served through http as "text/html" and any of
>>> the following is true:
>>>
>>> a. The instance begins with the string "<?xml" .
>>
>> Nope. Here is a document that is valid text/html, but
>> non-well-formed text/xml, and which should therefore be sent
>> through the HTML parser:
>
> SGML validation does not pass on the merits of PI's. In today's
> world the appearance of "<?xml " at the beginning of a text/html
> item clearly indicates XML.

Remember that the XML declaration is optional, and that giving the XML
declaration is discouraged by the XHTML compatability guidelines (see
section C.1), which are supposed to be followed in order to send XHTML
as text/html.

If you are willing to use the XML declaration as a signal to use XML,
you might as well use text/xml since it's not going to be compatible
with older browsers anyway.


>> <?xml this is not?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
>> <!-- -- -->
>> This is a comment. This document is not XHTML.
>> <html xmlns="http://www.w3.org/1999/xhtml"/>
>> Ok, I'm done now. -->
>> <html>
>> <title> Need a title in HTML! </title>
>> <p> This is a valid HTML document.
>> </html>
>>

>>> b. The instance has a string matching the case-sensitive
>>> pattern "<!DOCTYPE html PUBLIC .*XHTML" before the first
>>> document instance tag.
>> Hmm, the valid HTML document above also matches that string.
>
> Well, yes, if you look beyond the end of the "<!DOCTYPE ...>". My
> intention was that the string "XHTML" should be inside the value of
> the FPI, and perhaps the string should be "DTD XHTML".
>
> For the moment I don't know exactly how I would express it. Still I
> think that an xml capable user agent will look bad rolling past a
> correct document type declaration for XHTML.

The moment you get more complicated than "look for a pattern at the
start of the document" you end up having to write a fully fledged
parser. Extreme case in point:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"

[ <!-- SYSTEM "not XHTML" --> ]>


>>> c. The first document instance tag is an open tag for the element
>>> "html" (all lower case) with a value specified for the attribute
>>> "xmlns".
>>
>> How do you know it is the first instance tag without having a full
>> XML parser to skip past PIs, comments, internal subsets, and the
>> like?
>
> Surely a user agent in classical mode has a way of knowing what is a
> tag and what is not a tag.

By the time the classical parser has been invoked, it is too late to
back off and switch to an XML parser without a significant performance
hit. (This is definitely the case in Mozilla's architecture; I imagine
it is similar in other browsers that use distinct XML and HTML parsers
although of course maybe I am wrong in this.)


> Since many user agents appear to ignore PI's and document type
> declarations and many extant html offerings do not have document
> type declarations, (c) might reasonably be the sole criterion for
> calling the xml parser.

(c) is the most complicated to implement of the three.


> But does Mozilla call its xml parser for http://www.w3.org/ ?

Nope. If it did, it would render the page without any expanded
character entity references, since Mozilla is not a validating parser
and thus skips parsing the DTD and thus doesn't know what &nbsp;,
&middot; and &copy; are. Not to mention that it would end up ignoring
the print-media specific section of the stylesheet, which uses
uppercase element names and thus wouldn't match any of the lower case
elements (line 138 of the first stylesheet), and it would use an
unexpected background colour for the page because the stylesheet sets
the background on <body> and not <html>, which in XHTML will result in
a different rendering to the equivalent in HTML4 (same sheet, line 5).


Remind me, why would you want to send XML as text/html?

Robert Miner

unread,
May 1, 2001, 7:39:46 PM5/1/01
to i...@hixie.ch, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Hi Ian,

> Remind me, why would you want to send XML as text/html?

This is the crucial all-important point. No one wants to send XML as
text/html. The problem is that due to circumstances beyond their
control, large numbers of of authors will end up sending XML as
text/html for a long time to come.

The main reasons for this are:

1) improperly configured web servers at ISPs
2) old tools
3) ignorance/inconvenience

No matter how strong the technical and theoretical arguments in favor
of the strict use of MIME types, this fact is a major barrier to
adoption. I understand perfectly why W3C has to take that position,
and why you want to adhere strictly to W3C Recs. But I still need a
solution for our customers.

Let's try a completely different tack. What about a completely ad hoc
interim fix, such as permitting an optional declaration of the form

<!-- mozilla-mime-preference: text/xml -->

as the absolute first thing in a file to triggers the XML parser?

If you don't like that can you think of some other creative work
around? I'm not an XML guru, so there are probably better ways to
accomplish the idea.

--Robert

------------------------------------------------------------------
Robert Miner Rob...@dessci.com
MathML 2.0 Specification Co-editor 651-223-2883
Design Science, Inc. "How Science Communicates" www.dessci.com
------------------------------------------------------------------

Ian Hickson

unread,
May 1, 2001, 7:58:44 PM5/1/01
to Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Tue, 1 May 2001, Robert Miner wrote:
>
> Hi Ian,
>
>> Remind me, why would you want to send XML as text/html?
>
> This is the crucial all-important point. No one wants to send XML as
> text/html.

If that is really the case, then I am very glad to hear it. It sure sounds
like people actually want to though.


> The problem is that due to circumstances beyond their control, large
> numbers of of authors will end up sending XML as text/html for a long
> time to come.
>
> The main reasons for this are:
>
> 1) improperly configured web servers at ISPs

Configuring files with an "xml" extension to be sent as "text/xml" takes
one line in a multi-site configuration file.

Is it really worth the effort of making the change in user agents rather
than politely asking one's ISP to fix the configuration, or taking one's
business elsewhere?

How many servers still don't come preconfigured with "text/xml" as the
MIME type of ".xml" files? Are there any?


> 2) old tools

Which "old tools"? Tools won't just "work" with XML if it is sent as HTML.


> 3) ignorance

Users don't need to know about this, so long as XML editors output files
with "xml" extensions and web servers are configured to return these files
as "text/xml".

I don't hear anyone screaming for XSL to be sent as text/html, or CSS to
be sent as text/html -- why is XHTML a special case?


> /inconvenience

I cannot believe that renaming a file to ".xml" instead of ".html" when
the file contains XML instead of HTML is any more of an effort than
changing from outputting HTML to outputting XML in the first place.


> No matter how strong the technical and theoretical arguments in favor
> of the strict use of MIME types, this fact is a major barrier to
> adoption.

It doesn't seem to have stopped the adoption of CSS.


> I understand perfectly why W3C has to take that position, and why you
> want to adhere strictly to W3C Recs. But I still need a solution for
> our customers.

"Fix the webserver"?

Why is it easier/cheaper/safer to change the user agent and propagate it
to millions of users than it is to change a single line in a server
configuration file?


> Let's try a completely different tack. What about a completely ad hoc
> interim fix, such as permitting an optional declaration of the form
>
> <!-- mozilla-mime-preference: text/xml -->
>
> as the absolute first thing in a file to triggers the XML parser?

Hmm.

<!-- -moz-content-type-override: text/xml -->

I wouldn't be totally averse to that, although I still don't believe it is
easier to change every XHTML file than it is to fix the servers. Also, I
don't see how it will help the users of other compliant web browsers.

Robert Miner

unread,
May 1, 2001, 9:15:59 PM5/1/01
to i...@hixie.ch, Rob...@dessci.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Hi Ian,

> <!-- -moz-content-type-override: text/xml -->
>
> I wouldn't be totally averse to that, although I still don't believe it is
> easier to change every XHTML file than it is to fix the servers.

I agree with your point about fixing servers to a certain extent.
However, we have a lot of teachers as customers, and they tend often
to not be very technically empowered. They don't know about things
like MIME types, and if they did, they don't have time to fool with
it. They like things to work out of the box.

I would be interested in hearing other reactions to this as an interim
work around.

Aaron Swartz

unread,
May 1, 2001, 10:59:47 PM5/1/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

>> But does Mozilla call its xml parser for http://www.w3.org/ ?
>

> Nope. If it did, ...


I think the following are indicative of bugs in Mozilla and W3.org. I think
others would agree with my belief that we should be less forgiving of
mistakes in XHTML documents then we have been with HTML documents. It is my
hope that browsers that understand XHTML will do their best to inform users
when it encounters an invalid or broken page.

See also the recent ALA article, "Forgiving Browsers Considered Harmful":
http://www.alistapart.com/stories/forgiving/

> it would render the page without any expanded
> character entity references, since Mozilla is not a validating parser
> and thus skips parsing the DTD and thus doesn't know what &nbsp;,
> &middot; and &copy; are.

Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
thus expand these entities properly, even if it doesn't validate the page
(which I believe it should).

> Not to mention that it would end up ignoring
> the print-media specific section of the stylesheet, which uses
> uppercase element names and thus wouldn't match any of the lower case
> elements (line 138 of the first stylesheet),

This appears to be a mistake in the W3C's stylesheet. I have sent them an
email.

> and it would use an
> unexpected background colour for the page because the stylesheet sets
> the background on <body> and not <html>, which in XHTML will result in
> a different rendering to the equivalent in HTML4 (same sheet, line 5).

I have not heard of this change before. Can you point me to the section of
the XHTML spec that defines this?

--
[ Aaron Swartz | m...@aaronsw.com | http://www.aaronsw.com ]


Aaron Swartz

unread,
May 1, 2001, 11:03:40 PM5/1/01
to Ian Hickson, Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

> I don't hear anyone screaming for XSL to be sent as text/html, or CSS to
> be sent as text/html -- why is XHTML a special case?

Because XHTML is usually (always?) valid HTML. And most XHTML publishers
want their documents to be visible in the vast number of browsers which do
not understand XML. We cannot change these older browsers, but it should be
relatively easy to change browsers that do understand XML.

Ian Hickson

unread,
May 2, 2001, 4:47:17 AM5/2/01
to Aaron Swartz, Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Tue, 1 May 2001, Aaron Swartz wrote:
>
> Ian Hickson <i...@hixie.ch> wrote:
>
> > I don't hear anyone screaming for XSL to be sent as text/html, or CSS to
> > be sent as text/html -- why is XHTML a special case?
>
> Because XHTML is usually (always?) valid HTML.

Nope, XHTML is never valid HTML. One requires an "xmlns" attribute on the
root element, the other forbids it (for just one of the many differences).


> And most XHTML publishers want their documents to be visible in the
> vast number of browsers which do not understand XML. We cannot change
> these older browsers, but it should be relatively easy to change
> browsers that do understand XML.

But you can already do that. That's no problem, and works fine with
sending all text/html through HTML parsers. What people are arguing is
that text/html should be sent through _XML_ parsers on modern UAs, so that
namespaces can be processed... which immediately means that the document
would not work on older browsers, so the argument falls apart.

Ian Hickson

unread,
May 2, 2001, 4:51:08 AM5/2/01
to Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Tue, 1 May 2001, Aaron Swartz wrote:
>
>> it would render the page without any expanded
>> character entity references, since Mozilla is not a validating parser
>> and thus skips parsing the DTD and thus doesn't know what &nbsp;,
>> &middot; and &copy; are.
>
> Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
> thus expand these entities properly, even if it doesn't validate the page
> (which I believe it should).

(If it did, you couldn't arbitrarily use namespaces.)


>> and it would use an
>> unexpected background colour for the page because the stylesheet sets
>> the background on <body> and not <html>, which in XHTML will result in
>> a different rendering to the equivalent in HTML4 (same sheet, line 5).
>
> I have not heard of this change before. Can you point me to the section of
> the XHTML spec that defines this?

The HTML WG have asked the CSS WG to not extend CSS2 section 14.2 [1]
paragraph 4 to cover XHTML.

[1] http://www.w3.org/TR/REC-CSS2/colors.html#q2

William F. Hammond

unread,
May 2, 2001, 9:29:05 AM5/2/01
to mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> writes:

> Remember that the XML declaration is optional, and that giving the XML
> declaration is discouraged by the XHTML compatability guidelines (see
> section C.1), which are supposed to be followed in order to send XHTML
> as text/html.

> If you are willing to use the XML declaration as a signal to use XML,
> you might as well use text/xml since it's not going to be compatible
> with older browsers anyway.

Appendix C is addressed to content providers. It's not something for
xml-capable user agents to hide behind.

A content provider ought to be free to make the call whether he/she/it
wants to use "<?xml ..." at the top of a document.

> >>> b. The instance has a string matching the case-sensitive
> >>> pattern "<!DOCTYPE html PUBLIC .*XHTML" before the first
> >>> document instance tag.
> >> Hmm, the valid HTML document above also matches that string.
> >
> > Well, yes, if you look beyond the end of the "<!DOCTYPE ...>". My
> > intention was that the string "XHTML" should be inside the value of
> > the FPI, and perhaps the string should be "DTD XHTML".
> >
> > For the moment I don't know exactly how I would express it. Still I
> > think that an xml capable user agent will look bad rolling past a
> > correct document type declaration for XHTML.
>
> The moment you get more complicated than "look for a pattern at the
> start of the document" you end up having to write a fully fledged
> parser. Extreme case in point:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
> [ <!-- SYSTEM "not XHTML" --> ]>

Alas, email discussions do not always have the precision required of
code.

Here we have a document type declaration for the name "html" with a
formal public identifier that matches the pattern "DTD HTML" rather
than the pattern "DTD XHTML". So that is the decision point.

Let's try again:

-----
An item served as "text/html" should be handled as an XML version of
"html" if

a. It begins, apart from white space, with the string "<?xml " .
(See comment on this in regard to Appendix C of the XHTML spec
above.)

OR

b. The document begins with zero or more comments and processing
instructions that conform to the XML specification followed by
one of the following:

(i) A document type declaration not containing internal comments
with formal public identifier matching the pattern
"DTD XHTML".

(ii) An open tag for the element "html" with a value specification


for the attribute "xmlns".

-----

Rolling through any initial comments and PI's conforming to the xml
standard should be quick, but, if there is still concern about
performance, perhaps these could be banned when XHTML is served as
"text/html".

> > But does Mozilla call its xml parser for http://www.w3.org/ ?
>

> Nope. If it did, it would render the page without any expanded


> character entity references, since Mozilla is not a validating parser
> and thus skips parsing the DTD and thus doesn't know what &nbsp;,

> &middot; and &copy; are. Not to mention that it would end up ignoring

So perhaps there are other issues.

But let's not allow the other issues to distract us from the facts
that

1. the resource identified by "http://www.w3.org/" is XHTML.

2. XHTML is current html.

-- Bill


David Carlisle

unread,
May 2, 2001, 9:57:39 AM5/2/01
to ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

> a. It begins, apart from white space, with the string "<?xml " .

white space is not allowed before the XML declaration.
If you were going to do this testing I'd _only_ test for
that and only test the first few characters.

The comment that <?xml upsets some HTML browsers isn't really
applicable if it is being added specifically to flag the document needs
an xml parser.

It would be nice if Mozilla could do this, although I agree that it
would be nicer still if IE would recognise XHTML files served as XML.

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Scanning Service. For further information visit http://www.star.net.uk/stats.asp

Simon St.Laurent

unread,
May 2, 2001, 9:58:49 AM5/2/01
to Aaron Swartz, Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
At 09:59 PM 5/1/01 -0500, Aaron Swartz wrote:
>Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
>thus expand these entities properly, even if it doesn't validate the page
>(which I believe it should).

I thought Mozilla carried a local list of entities that it applied to HTML
based on the PUBLIC identifier from the DOCTYPE declaration. I could be
totally wrong, but that's what I understood last summer while working on
the Mozilla/XML articles for XML.com.

On the validation side, I think a lot of people are happy to acknowledge
that validation is grossly overrated, especially when presented as a task
that should be performed repeatedly at every step in a delivery
process. Given Mozilla's use of the non-validating (and totally
legitimate) expat parser as a foundation, demanding validation sounds like
a non-starter.

Next they'll want everyone to validate against W3C XML Schema. Bah.


Simon St.Laurent - Associate Editor, O'Reilly & Associates
XML Elements of Style / XML: A Primer, 2nd Ed.
XHTML: Migrating Toward XML
http://www.simonstl.com - XML essays and books


Aaron Swartz

unread,
May 2, 2001, 10:19:09 AM5/2/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

>>> it would render the page without any expanded
>>> character entity references, since Mozilla is not a validating parser
>>> and thus skips parsing the DTD and thus doesn't know what &nbsp;,
>>> &middot; and &copy; are.
>> Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
>> thus expand these entities properly, even if it doesn't validate the page
>> (which I believe it should).
> (If it did, you couldn't arbitrarily use namespaces.)

I don't know if this is possible with Mozilla's current technology, but
ideally it would validate the XHTML only within the HTML namespace (and the
attribute space).

>>> and it would use an
>>> unexpected background colour for the page because the stylesheet sets
>>> the background on <body> and not <html>, which in XHTML will result in
>>> a different rendering to the equivalent in HTML4 (same sheet, line 5).
>> I have not heard of this change before. Can you point me to the section of
>> the XHTML spec that defines this?
> The HTML WG have asked the CSS WG to not extend CSS2 section 14.2 [1]
> paragraph 4 to cover XHTML.
>
> [1] http://www.w3.org/TR/REC-CSS2/colors.html#q2

This seems very strange -- can you elaborate on the reasoning for this or is
it Member-Confidential?

Aaron Swartz

unread,
May 2, 2001, 10:19:09 AM5/2/01
to Ian Hickson, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

>> And most XHTML publishers want their documents to be visible in the
>> vast number of browsers which do not understand XML. We cannot change
>> these older browsers, but it should be relatively easy to change
>> browsers that do understand XML.
> But you can already do that. That's no problem, and works fine with
> sending all text/html through HTML parsers. What people are arguing is
> that text/html should be sent through _XML_ parsers on modern UAs, so that
> namespaces can be processed... which immediately means that the document
> would not work on older browsers, so the argument falls apart.

Sorry if my wording wasn't clear, but that's what I meant. XHTML is XML, so
it should be parsed as such by browsers that understand that. Such browsers
should complain when a document isn't well-formed, or when the
HTML-namespaced portions are invalid.

What I don't understand is why this wouldn't work on older browsers. Most of
the browsers I have seen are rather lenient and process a document
containing (sorry, don't know MathML):

<p>The n<m:sub>x</m:sub>n<m:power>2</m:power>...</p>

just fine. Sure, the math would come out wrong, but one can see a
mediator[1] parsing the MathML and replacing it with images, or someone
reading the majority of the document and ignoring the mathematical formulas.
Also, sense could be made of such a document by using XSL or CSS.

I think that as more namespaces are used in XHTML, this kind of thing should
be encouraged.

[1] http://lfw.org/

Robert Miner

unread,
May 2, 2001, 10:31:50 AM5/2/01
to i...@hixie.ch, asw...@swartzfam.com, Rob...@dessci.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Hi.

> But you can already do that. That's no problem, and works fine with
> sending all text/html through HTML parsers. What people are arguing is
> that text/html should be sent through _XML_ parsers on modern UAs, so that
> namespaces can be processed... which immediately means that the document
> would not work on older browsers, so the argument falls apart.

I presume you already know I disagree with this, but I feel obliged to
correct this misapprehension at every instance. I don't want to send
HTML to XML parsers -- I only want XML ***incorrectly identified with
the MIME type text/html due to circumstances beyond the authors
control*** to be sent to XML parsers.

Robert Miner

unread,
May 2, 2001, 11:53:35 AM5/2/01
to dav...@nag.co.uk, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Hi David,

> white space is not allowed before the XML declaration.
> If you were going to do this testing I'd _only_ test for
> that and only test the first few characters.

Is anything allowed before the XML declaration? I am pursuing the
idea of a "magic comment" or something of that sort that would say "In
a perfect would I would have been shipped as text/xml. Please help
me!"

> The comment that <?xml upsets some HTML browsers isn't really
> applicable if it is being added specifically to flag the document needs
> an xml parser.
>
> It would be nice if Mozilla could do this, although I agree that it
> would be nicer still if IE would recognise XHTML files served as XML.

I argue that both things should happen to pave the way for authors to
transition to using predominantly XHTML. Obviously IE needs to fix
the problem, or its hopeless. But assuming they do, lots of authors
will still have trouble because

1) their documents get shipped as text/html due to circumstances
beyond their control

2) they obstinately send their documents as text/html in order to get
them to at least sort of render in old browsers.

To address this category of problems, I think something an author
could put into the actual document to tell a new user agent to use the
text/xml MIME type would be very nice.

David Carlisle

unread,
May 2, 2001, 12:05:16 PM5/2/01
to Rob...@dessci.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

> Is anything allowed before the XML declaration?

No.

The idea is that the parser can auto-detect the encoding by looking at
the first few bytes and seeing if those equate to <?xml in any known
encoding, there's an appendix in the xml rec with all the gory details.

> In a perfect would I would have been shipped as text/xml. Please help
> me!"

The comment would have to go after the xmldec (If there was a
declaration). I think though the xml declaration on its own ought to
suffice.

While it _is_ a legal PI for SGML and so HTML there can not be any real
files in existence that start <?xml version="1.0" and aren't trying to
be XML.

Without knowing anything of the internals of mozilla it's hard to
believe that there is really big performance hit to back out of
HTMl parsing if the first four characters in a file are <?xml
(This is a lot easier to do than looking for regexp's in doctypes
as others have suggested), although its not entirely trivial due to
encoding considerations.

Karl Ove Hufthammer

unread,
May 2, 2001, 12:43:01 PM5/2/01
to
i...@hixie.ch (Ian Hickson) wrote in message
<Pine.WNT.4.31.01050...@HIXIE.netscape.com>:

> Configuring files with an "xml" extension to be sent as "text/xml"
> takes one line in a multi-site configuration file.

But it won't work in any browsers[1]. IE will probably show the XML
source (with fancy indenation and such), and other browsers will prompt
the user to save the file to disk ("unknown content type"). That's why we
have to use 'text/html' for XHTML documents.

[1] With a few exceptions.

--
Karl Ove Hufthammer

Ian Hickson

unread,
May 2, 2001, 7:51:07 PM5/2/01
to Robert Miner, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, Robert Miner wrote:
>
> Hi.
>
> > But you can already do that. That's no problem, and works fine with
> > sending all text/html through HTML parsers. What people are arguing is
> > that text/html should be sent through _XML_ parsers on modern UAs, so that
> > namespaces can be processed... which immediately means that the document
> > would not work on older browsers, so the argument falls apart.
>
> I presume you already know I disagree with this, but I feel obliged to
> correct this misapprehension at every instance. I don't want to send
> HTML to XML parsers -- I only want XML ***incorrectly identified with
> the MIME type text/html due to circumstances beyond the authors
> control*** to be sent to XML parsers.

Would you also like PNGs incorrectly identified with the MIME type
text/html due to circumstances beyond the authors control to be sent to
PNG decoders?

Ian Hickson

unread,
May 2, 2001, 7:54:49 PM5/2/01
to Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, Aaron Swartz wrote:
>
> Ian Hickson <i...@hixie.ch> wrote:
>
>>>> it would render the page without any expanded
>>>> character entity references, since Mozilla is not a validating parser
>>>> and thus skips parsing the DTD and thus doesn't know what &nbsp;,
>>>> &middot; and &copy; are.
>>> Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
>>> thus expand these entities properly, even if it doesn't validate the page
>>> (which I believe it should).
>> (If it did, you couldn't arbitrarily use namespaces.)
>
> I don't know if this is possible with Mozilla's current technology, but
> ideally it would validate the XHTML only within the HTML namespace (and the
> attribute space).

Forget Mozilla's current technology -- that's not even possible within the
W3C's technology. XSchemas are supposed to be the way to do that.


>>>> and it would use an
>>>> unexpected background colour for the page because the stylesheet sets
>>>> the background on <body> and not <html>, which in XHTML will result in
>>>> a different rendering to the equivalent in HTML4 (same sheet, line 5).
>>> I have not heard of this change before. Can you point me to the section of
>>> the XHTML spec that defines this?
>> The HTML WG have asked the CSS WG to not extend CSS2 section 14.2 [1]
>> paragraph 4 to cover XHTML.
>>
>> [1] http://www.w3.org/TR/REC-CSS2/colors.html#q2
>
> This seems very strange -- can you elaborate on the reasoning for this or is
> it Member-Confidential?

I'll defer to Steven Pemberton and Chris Lilley the chairs of the HTML and
CSS working groups respectively for the exact reasoning -- as far as I
know, though, it's just that the HTML WG wish to remove any "special
casing" of HTML in other specs. Rightly so, IMHO. The body->html backwards
background propagation rule is extremely hard to get right (as far as I
know, no browsers do it right).

Ian Hickson

unread,
May 2, 2001, 8:06:02 PM5/2/01
to William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, William F. Hammond wrote:
>
> Let's try again:
>
> -----
> An item served as "text/html" should be handled as an XML version of
> "html" if
>
> a. It begins, apart from white space, with the string "<?xml " .
> (See comment on this in regard to Appendix C of the XHTML spec
> above.)

That's implementable (although I still disagree with doing it).


> OR
>
> b. The document begins with zero or more comments and processing

> instructions that conform to the XML specification [...]

That's *not*. Not without writing a complex parser. Don't forget that the
point in time where you want to know what to do with the data stream is
before you've launched any (heavy weight) parsers -- in the network
library, in the lightweight piece of code implementing the MIME type
distribution. If you can't write the autodetection in 4 lines or so, then
forget it.


> 2. XHTML is current html.

I don't understand what that means. XHTML isn't HTML 4.01, nor is it tag
soup, nor is XHTML + MathML compatible with tag soup.

--
Ian Hickson )\ _. - ._.) fL

Netscape, Standards Compliance QA /. `- ' ( `--'
+1 650 937 6593 `- , ) - > ) \
irc.mozilla.org:Hixie _________________________ (.' \) (.' -' __________


Aaron Swartz

unread,
May 2, 2001, 8:28:32 PM5/2/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

>>>> Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
>>>> thus expand these entities properly, even if it doesn't validate the page
>>>> (which I believe it should).
>>> (If it did, you couldn't arbitrarily use namespaces.)
>> I don't know if this is possible with Mozilla's current technology, but
>> ideally it would validate the XHTML only within the HTML namespace (and the
>> attribute space).
> Forget Mozilla's current technology -- that's not even possible within the
> W3C's technology. XSchemas are supposed to be the way to do that.

And XSchemas are W3C technology, so what's the issue. (I use validate in a
broader sense than just DTDs to include anything that tells you when your
document is messed up.)

Ian Hickson

unread,
May 2, 2001, 8:47:17 PM5/2/01
to Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, Aaron Swartz wrote:
>
> Ian Hickson <i...@hixie.ch> wrote:
>
> >>>> Mozilla's XML parser should be smart enough to recognize the HTML DTDs and
> >>>> thus expand these entities properly, even if it doesn't validate the page
> >>>> (which I believe it should).
> >>> (If it did, you couldn't arbitrarily use namespaces.)
> >> I don't know if this is possible with Mozilla's current technology, but
> >> ideally it would validate the XHTML only within the HTML namespace (and the
> >> attribute space).
> > Forget Mozilla's current technology -- that's not even possible within the
> > W3C's technology. XSchemas are supposed to be the way to do that.
>
> And XSchemas are W3C technology, so what's the issue.

Well the issue _was_ the XSchemas were not yet a REC, but as of today,
that's no longer an issue.

So...

Never mind. :-)

--
Ian Hickson )\ _. - ._.) fL

Robert Miner

unread,
May 2, 2001, 11:09:30 PM5/2/01
to i...@hixie.ch, Rob...@dessci.com, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Hi.

> Would you also like PNGs incorrectly identified with the MIME type
> text/html due to circumstances beyond the authors control to be sent to
> PNG decoders?

Is this a trick question, perhaps? I think I would, wouldn't I? At
least it seems like I would be happier just having the image appear
properly, than having it interpreted as horribly garbled HTML. What's
the catch?

Ian Hickson

unread,
May 2, 2001, 11:31:39 PM5/2/01
to Robert Miner, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, Robert Miner wrote:
>
> Hi.
>
>> Would you also like PNGs incorrectly identified with the MIME type
>> text/html due to circumstances beyond the authors control to be sent to
>> PNG decoders?
>
> Is this a trick question, perhaps? I think I would, wouldn't I? At
> least it seems like I would be happier just having the image appear
> properly, than having it interpreted as horribly garbled HTML. What's
> the catch?

Should we throw away the whole basis of MIME types and the HTTP
Content-Type header, and just use content sniffing instead?

text/html adj: not precisely limited, determined, or distinguished:
"a text/html file". Said of documents whose exact content type is
not known, for example a dynamically generated image. Historically
used to describe HTML; this usage is deprecated in favour of
requiring user agents to magically guess at the contents of data
streams labelled as text/html. [syn: vague, unknown] [ant: defined]
Source: WordNet 4.2, (c) 2012 Princeton University

Aaron Swartz

unread,
May 2, 2001, 11:59:41 PM5/2/01
to Ian Hickson, Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

> text/html adj: not precisely limited, determined, or distinguished:
> "a text/html file". Said of documents whose exact content type is
> not known, for example a dynamically generated image. Historically
> used to describe HTML; this usage is deprecated in favour of
> requiring user agents to magically guess at the contents of data
> streams labelled as text/html. [syn: vague, unknown] [ant: defined]
> Source: WordNet 4.2, (c) 2012 Princeton University

Quite funny, Ian. However, I don't see what this has to do with the
substance of the argument.

I have an HTML document that is well-formed XML. I want it to be read by my
grandma who runs Netscape 3.0. I must send it as text/html so that she can
read it with Netscape's HTML parser. Netscape 7.0, which understands XML
just fine, realizes that my document is XML and thus parses it with its XML
parser. Everybody wins. Where is the issue, Ian?

Ian Hickson

unread,
May 3, 2001, 12:09:07 AM5/3/01
to Aaron Swartz, Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, Aaron Swartz wrote:
>
> I have an HTML document that is well-formed XML. I want it to be read by my
> grandma who runs Netscape 3.0. I must send it as text/html so that she can
> read it with Netscape's HTML parser. Netscape 7.0, which understands XML
> just fine, realizes that my document is XML and thus parses it with its XML
> parser. Everybody wins. Where is the issue, Ian?

In the case you describe, you would not be able to tell the difference
between Netscape 7.0 handling the document as text/html, and Netscape 7.0
handling the document as text/xml.

So clearly that is not the case you care about.

Aaron Swartz

unread,
May 3, 2001, 12:22:58 AM5/3/01
to Ian Hickson, Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

>> I have an HTML document that is well-formed XML. I want it to be read by my
>> grandma who runs Netscape 3.0. I must send it as text/html so that she can
>> read it with Netscape's HTML parser. Netscape 7.0, which understands XML
>> just fine, realizes that my document is XML and thus parses it with its XML
>> parser. Everybody wins. Where is the issue, Ian?
>
> In the case you describe, you would not be able to tell the difference
> between Netscape 7.0 handling the document as text/html, and Netscape 7.0
> handling the document as text/xml.

Then I think that Netscape 7.0 is broken, since it should throw an error if
my page is not well-formed XML.

> So clearly that is not the case you care about.

Actually it is. Others care about MathML, which I will also defend. Assuming
that I included MathML in this HTML document, would you have a problem with
the above scenario?

Ian Hickson

unread,
May 3, 2001, 12:40:24 AM5/3/01
to Aaron Swartz, Robert Miner, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Wed, 2 May 2001, Aaron Swartz wrote:
>>>
>>> I have an HTML document that is well-formed XML. I want it to be read by my
>>> grandma who runs Netscape 3.0. I must send it as text/html so that she can
>>> read it with Netscape's HTML parser. Netscape 7.0, which understands XML
>>> just fine, realizes that my document is XML and thus parses it with its XML
>>> parser. Everybody wins. Where is the issue, Ian?
>>
>> In the case you describe, you would not be able to tell the difference
>> between Netscape 7.0 handling the document as text/html, and Netscape 7.0
>> handling the document as text/xml.
>
> Then I think that Netscape 7.0 is broken, since it should throw an error if
> my page is not well-formed XML.

You *specifically* said that the document in question was well-formed:

>>> I have an HTML document that is well-formed XML.

Is this hypothetical document well-formed, or not?


>> So clearly that is not the case you care about.
>
> Actually it is. Others care about MathML, which I will also defend.
> Assuming that I included MathML in this HTML document, would you have
> a problem with the above scenario?

Yes; in my opinion there is very little point in sending MathML to
Netscape 3.0. If you want to support older UAs, then use <sup>, <sub> and
tables for the equations. Otherwise you will have dataloss -- your grandma
won't be able to tell what you equations are doing.

Dan Connolly

unread,
May 3, 2001, 5:00:46 AM5/3/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, site-c...@w3.org, www-...@w3.org
Thanks for the detective work, Ian.

Susan/site-comments folks: I suggest not depending
on the DTD for our home page...

Ian Hickson wrote:
> > But does Mozilla call its xml parser for http://www.w3.org/ ?
>

> Nope. If it did, it would render the page without any expanded


> character entity references, since Mozilla is not a validating parser
> and thus skips parsing the DTD and thus doesn't know what &nbsp;,
> &middot; and &copy; are.

Right... let's use &#160; in stead of nbsp. (tidy -n -ascii
will to this for free).

> Not to mention that it would end up ignoring

> the print-media specific section of the stylesheet, which uses
> uppercase element names and thus wouldn't match any of the lower case
> elements (line 138 of the first stylesheet),

oops! fixed in 2.5 2001/05/03 08:51:29

> and it would use an
> unexpected background colour for the page because the stylesheet sets
> the background on <body> and not <html>, which in XHTML will result in
> a different rendering to the equivalent in HTML4 (same sheet, line 5).

fixed.

$Id: home.css,v 2.6 2001/05/03 08:55:08 connolly Exp $

> Remind me, why would you want to send XML as text/html?

So that we can continue to serve our readership, as long
as most of the requests come from user agents that only
grok text/html, while managing the content as XML,
so that we can use XML tools on it; for example, transforming
it to RSS using XSLT:

Site Summaries from XHTML to RSS using XSLT, Aug 2000
http://www.w3.org/2000/08/w3c-synd/

--
Dan Connolly, W3C http://www.w3.org/People/Connolly/

David Carlisle

unread,
May 3, 2001, 6:15:13 AM5/3/01
to asw...@swartzfam.com, i...@hixie.ch, Rob...@dessci.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

> Then I think that Netscape 7.0 is broken, since it should throw an error if
> my page is not well-formed XML.

Hang on no one is arguinng that it should be an _error_ to serve SGML
(as opposed to XML) base html files as text/html are they?

I think it would be useful for mozilla/netscape to bail out of HTML
parsing if it sees an xml declaration at the start of the file, but it
certainly isn't broken if it does not do that.

Robert Miner

unread,
May 3, 2001, 10:39:08 AM5/3/01
to i...@hixie.ch, Rob...@dessci.com, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Ian,

Our recent dialog has run:

ian> Would you also like PNGs incorrectly identified with the MIME
ian> type text/html due to circumstances beyond the authors control to
ian> be sent to PNG decoders?

robert> Is this a trick question, perhaps? I think I would, wouldn't
robert> I? At least it seems like I would be happier just having the
robert> image appear properly, than having it interpreted as horribly
robert> garbled HTML. What's the catch?

ian> Should we throw away the whole basis of MIME types and the HTTP
ian> Content-Type header, and just use content sniffing instead?

This leaves me wondering if you are debating in good faith. If you
are just marking time defending a decision that is already carved in
stone, just say so and let's quit wasting time. I have a bunch of new
documentation to write explaining to our customers all the extra work
they will have to do if they want to try to accomodate readers using
Mozilla.

If not, you need to give me some sign that you actually understand the
issues at stake. From what you write, you give the clear impression
that you don't think either of the following issues are important:

1) For some time to come, most web authors will be preparing content
that will be read predominantly with older user agents, and
therefore need to send documents as text/html.

2) For some time to come, many web authors will end up sending XHTML
as text/html due to circumstances beyond their control, even if
they are willing to send it as text/xml.

If you don't acknowledge those points, there is nothing to talk about.
Good luck popularizing your software. You've got your work cut out
for you.

If you do acknowledge those points, then you don't need me to point
out why your analogy with PNGs is not very relevant.

William F. Hammond

unread,
May 3, 2001, 11:10:55 AM5/3/01
to dav...@nag.co.uk, mozilla...@mozilla.org, www-...@w3.org
David Carlisle <dav...@nag.co.uk> writes:

> I think it would be useful for mozilla/netscape to bail out of HTML
> parsing if it sees an xml declaration at the start of the file, but it
> certainly isn't broken if it does not do that.

From RFC 2854, 'The media type "text/html"':

-----

This document summarizes the history of HTML development, and
defines the "text/html" MIME type by pointing to the relevant W3C
recommendations;

. . .

Published specification: ... In addition, [XHTML1]
defines a profile of use of XHTML which is compatible with HTML
4.01 and which may also be labeled as text/html.

-----

An XML declaration has a parameter "encoding", and a text/html http
object, if not 7 bit ascii, is served with a charset value, as part of
its content type descriptor. (And, of course, there may also be a
content transfer encoding at the http level.)

I wonder if concern over the task of the proposed pre-parse fast light
weight analysis of the top of the http body involves charset issues.

Along with that I wonder if it might be useful to have an elaboration
in the definition of text/html of the relationship between
http/content-type/charset and xml/encoding when the XML form of HTML


is served as "text/html".

-- Bill


David Carlisle

unread,
May 3, 2001, 11:24:30 AM5/3/01
to ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

David Carlisle <dav...@nag.co.uk> writes:

> I think it would be useful for mozilla/netscape to bail out of HTML
> parsing if it sees an xml declaration at the start of the file, but it
> certainly isn't broken if it does not do that.

>From RFC 2854, 'The media type "text/html"':

-----

This document summarizes the history of HTML development, and
defines the "text/html" MIME type by pointing to the relevant W3C
recommendations;

. . .

Published specification: ... In addition, [XHTML1]
defines a profile of use of XHTML which is compatible with HTML
4.01 and which may also be labeled as text/html.

-----


Yes exactly. That says that you should only use text/html if you
either send HTML or use XML but restrict yourself to features that make
the document parsable by either system. Nothing in what you quote
invalidates the statement of mine that you quoted does it?

I agree with you that it would be useful to relax that restriction and
recommend that the browser handles some wider class of XHTML files that
are served as text/html, but arguing that the current mozilla behaviour
could helpfully be changed isn't the same as arguing it is broken
according to the spec.

Al Gilman

unread,
May 3, 2001, 12:50:52 PM5/3/01
to William F. Hammond, dav...@nag.co.uk, mozilla...@mozilla.org, www-...@w3.org, ietf-...@iana.org, www...@w3.org

AG::

Yes, it would be useful to have a blow by blow explanation.

No, the documentation for text/html is not the place to address this.

This is a matter of compatibilty between two IETF documents: HTTP and MIME.

It affects all XML documents carried in HTTP.

It is the sort of thing that has historically be discussed on ietf-types list.

Once the answer is known, the question of where-all to document and point to
the answer should be raised.

That addresses your specific point raised in this post.

On the broader issue, it is still an HTTP/MIME/packaging issue.

If people want not to be at the mercy of their server adminstrator as to the
[HTTP header] metadata that get ingested by their customers' User Agent, they
need to look at MIME or other packaging techniques to send a manifest up front
that can have all the logical complexity you want, with a search list of
parse-as directives governing the companion [de novo body] bundled payload.

The question is at what point is the world at large ready to get off RFC
822 as
the means of providing metadata wrappings?

Or what QA practice with regard to the servers will begin to get the metadata
served right? The server could afford the cycles to restart another parser if
confirming type metadata on upload. If they would but do it.

Al

>                                    -- Bill
>

Ian Hickson

unread,
May 3, 2001, 5:52:57 PM5/3/01
to Robert Miner, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Thu, 3 May 2001, Robert Miner wrote:
>
> If not, you need to give me some sign that you actually understand the
> issues at stake. From what you write, you give the clear impression
> that you don't think either of the following issues are important:
>
> 1) For some time to come, most web authors will be preparing content
> that will be read predominantly with older user agents, and
> therefore need to send documents as text/html.
>
> 2) For some time to come, many web authors will end up sending XHTML
> as text/html due to circumstances beyond their control, even if
> they are willing to send it as text/xml.

I acknowledge those points completely. Neither of these points require any
documents sent as text/html to be handled as text/xml by any browser.


> If you do acknowledge those points, then you don't need me to point
> out why your analogy with PNGs is not very relevant.

My analogy with PNGs is merely to highlight that content type sniffing is
fundamentally flawed.

Robert Miner

unread,
May 3, 2001, 6:16:38 PM5/3/01
to i...@hixie.ch, Rob...@dessci.com, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org

Hi.

> > 1) For some time to come, most web authors will be preparing content
> > that will be read predominantly with older user agents, and
> > therefore need to send documents as text/html.
> >
> > 2) For some time to come, many web authors will end up sending XHTML
> > as text/html due to circumstances beyond their control, even if
> > they are willing to send it as text/xml.
>
> I acknowledge those points completely. Neither of these points require any
> documents sent as text/html to be handled as text/xml by any browser.

Ah. I think I see where we differ then.

What I would like to be able to do is prepare an XHTML document in
accordance with the HTML compatibility guidelines from Appendix C of
the XHTML spec, *except* for the inclusion of MathML instances.
Obviously, the math is worthless in older user agents, but I would
like the rest of my page to show up, so I could, for example, tell
readers they need to get a spiffy new browser like Mozilla to see the
math properly.

But of course, that won't work. If my hypothetical reader does
install Mozilla, and revisits the page, the math will still be trash,
since the document is being served as text/html. Alternatively, if I
were to send the document as text/xml, then my hypothetical reader
never saw the message about upgrading in the first place, since
instead, he or she got a "Save File As" dialog box.

So, as you have suggested on a number of previous occasions, the only
solution remaining is for the author to learn about and implement
one of the various methods of detecting the user agent and either
redirecting to a different document or dynamically assigning a MIME
type. That isn't a very big deal for professionals, but it will take
it out of the realm of what is feasible for most school teachers.

Ian Hutchinson

unread,
May 3, 2001, 10:52:14 PM5/3/01
to Ian Hickson, mozilla...@mozilla.org, www-...@w3.org
Let's try to get some facts into this discussion.

Fire up mozilla 0.8.1 and visit the URL(s)
http://hutchinson.belmont.ma.us/tth/htmltab.html and all permutations
replacing "html" with "xml". The document codifies the results. What this
test shows is:

1. Mozilla already routinely DOES snooping in the document header, notably
the DOCTYPE, that changes its rendering, when the document is served as
HTML. This fact renders Hickson's many remarks about the excessive
computational cost of snooping irrelevant (to put it charitably).

2. When Mozilla receives a document served as XML its behaviour does not
seem to depend on the DOCTYPE.

[3. Both Mozilla, when rendering a document it takes to be XML, and Amaya
have broken table renderers.]

I assume that conclusion 1 above shows that it ought to be fairly trivial
for Mozilla to implement the detection of XML documents served as HTML on
the basis of their DOCTYPE, and enable the MathML parser for them.

Ian Hutchinson.

William F. Hammond

unread,
May 3, 2001, 10:18:21 PM5/3/01
to hu...@psfc.mit.edu, mozilla...@mozilla.org
Ian --

(I'm not copying this comment to www-talk.)

> Fire up mozilla 0.8.1 and visit the URL(s)
> http://hutchinson.belmont.ma.us/tth/htmltab.html and all permutations
> replacing "html" with "xml". The document codifies the results. What this
> test shows is:
>
> 1. Mozilla already routinely DOES snooping in the document header, notably
> the DOCTYPE, that changes its rendering, when the document is served as
> HTML. This fact renders Hickson's many remarks about the excessive
> computational cost of snooping irrelevant (to put it charitably).

. . .

Hmmm... I just noticed something along that line myself today.

But I can't get Mozilla 0.8.1 (Build 2001032722 under Windows 98) to
give a text/html serving of
http://www.mozilla.org/projects/mathml/start.xml the same treatment
that the text/xml serving gets.

-- Bill


Ian Hickson

unread,
May 3, 2001, 11:33:00 PM5/3/01
to Ian Hutchinson, mozilla...@mozilla.org, www-...@w3.org
On Fri, 4 May 2001, Ian Hutchinson wrote:
>
> Let's try to get some facts into this discussion.
>
> Fire up mozilla 0.8.1 and visit the URL(s)
> http://hutchinson.belmont.ma.us/tth/htmltab.html and all permutations
> replacing "html" with "xml". The document codifies the results. What this
> test shows is:
>
> 1. Mozilla already routinely DOES snooping in the document header, notably
> the DOCTYPE, that changes its rendering, when the document is served as
> HTML. This fact renders Hickson's many remarks about the excessive
> computational cost of snooping irrelevant (to put it charitably).

Wrong type of snooping. Snooping to decide *rendering* is indeed easy and
relatively cheap. Snooping to decide *parsing* is a much more sensitive
issue. Secondly, the parsing Mozilla does to sniff for the rendering mode
is remarkably simple [1]. Apart from the idea of a magic comment header
and the idea of using the XML PI, both of which have problems as I have
remarked in the recent past on this list, all other types of sniffing that
have been suggested are extremely involved.


> 2. When Mozilla receives a document served as XML its behaviour does not
> seem to depend on the DOCTYPE.

Indeed. Mozilla always handles XML in "Standard" mode.


> [3. Both Mozilla, when rendering a document it takes to be XML, and Amaya
> have broken table renderers.]

I didn't test Amaya, however Mozilla actually only does a *correct*
rendering in "Standard" mode (when sent as text/xml, or when sent with
an HTML4.01 Transitional DOCTYPE with a URI). The result you are expecting
are very likely to be not what the spec says because that document uses
"colspan=0" which is not handled correctly in older browsers.


> I assume that conclusion 1 above shows that it ought to be fairly trivial
> for Mozilla to implement the detection of XML documents served as HTML on
> the basis of their DOCTYPE, and enable the MathML parser for them.

DOCTYPEs are optional for well-formed XHTML documents. New DOCTYPEs get
added over time. DOCTYPE parsing is hard. DOCTYPEs may be hidden in
comments. DOCTYPE sniffing has been called harmful by many leading figures
at the W3C and elsewhere.

Overall, DOCTYPE sniffing is a poor solution to a problem which has
already been solved by a new MIME type.


-- Footnotes --

[1] And broken. One of the reasons I am so against using yet more sniffing
is that implementing "quirks mode" vs "standard mode" sniffing based on
the content of documents has proved to be hard, unreliable, and has caused
numerous problems of its own. The only reason Mozilla still has it is to
support the large quantity of legacy content that depends on non-standard
behaviour. With XML, since it is a new technology, there should be no
reason to support non-standard usage.

Ian Hickson

unread,
May 3, 2001, 11:40:44 PM5/3/01
to Robert Miner, asw...@swartzfam.com, ham...@csc.albany.edu, mozilla...@mozilla.org, www-...@w3.org
On Thu, 3 May 2001, Robert Miner wrote:
>>>
>>> 1) For some time to come, most web authors will be preparing content
>>> that will be read predominantly with older user agents, and
>>> therefore need to send documents as text/html.
>>>
>>> 2) For some time to come, many web authors will end up sending XHTML
>>> as text/html due to circumstances beyond their control, even if
>>> they are willing to send it as text/xml.
>>
>> I acknowledge those points completely. Neither of these points require any
>> documents sent as text/html to be handled as text/xml by any browser.
>
> Ah. I think I see where we differ then.
>
> What I would like to be able to do is prepare an XHTML document in
> accordance with the HTML compatibility guidelines from Appendix C of
> the XHTML spec, *except* for the inclusion of MathML instances.

In my opinion, browsers should not be promoting non-standard usage of W3C
technologies as you have just described.


> Obviously, the math is worthless in older user agents, but I would
> like the rest of my page to show up, so I could, for example, tell
> readers they need to get a spiffy new browser like Mozilla to see the
> math properly.

So make your index pages HTML, and inform your readers that they will need
more recent browsers on those pages, then link to your text/xml pages
with the text and maths in them.

This is just like the many sites that have PDF documents on them but use
text/html index pages to link to them, with the index pages informing them
that they will need a plugin to view the documents.

Ian Hickson

unread,
May 4, 2001, 5:03:44 AM5/4/01
to Gervase Markham, mozilla...@mozilla.org
On Fri, 4 May 2001, Gervase Markham wrote:
>
> "Download this. Stuff it in your website directory. Then create two files,
> index_math.html and index_normal.html, with the appropriate content."

Gerv means "index_math.xml", of course.

--
Ian Hickson )\ _. - ._.) fL

Gervase Markham

unread,
May 4, 2001, 4:56:23 AM5/4/01
to
> What I would like to be able to do is prepare an XHTML document in
> accordance with the HTML compatibility guidelines from Appendix C of
> the XHTML spec, *except* for the inclusion of MathML instances.
> Obviously, the math is worthless in older user agents, but I would
> like the rest of my page to show up, so I could, for example, tell
> readers they need to get a spiffy new browser like Mozilla to see the
> math properly.

Why can you not do what everyone else does in this situation - either have
a gateway page that sniffs the browser, or have an HTML 3.2 index page
which says "Here's my math - you'll need Mozilla or IE 5"?



> So, as you have suggested on a number of previous occasions, the only
> solution remaining is for the author to learn about and implement
> one of the various methods of detecting the user agent and either
> redirecting to a different document or dynamically assigning a MIME
> type. That isn't a very big deal for professionals, but it will take
> it out of the realm of what is feasible for most school teachers.

I think you underestimate the intelligence of most school teachers. But,
if you think it is a problem, produce a canned index page - it
browser-sniffs, and redirects to index_math.html or index_normal.html .
Then, you say to people:

"Download this. Stuff it in your website directory. Then create two fies,


index_math.html and index_normal.html, with the appropriate content."

No complexity required on their part.

Gerv

William F. Hammond

unread,
May 4, 2001, 7:35:35 AM5/4/01
to i...@hixie.ch, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> writes:

> is remarkably simple [1]. Apart from the idea of a magic comment header
> and the idea of using the XML PI, both of which have problems as I have
> remarked in the recent past on this list, all other types of sniffing that
> have been suggested are extremely involved.

A day or two ago you said that handling an item beginning with the
xml declaration was manageable although you said that you were opposed
to doing it. But you did not say why.

Please clarify.

-- Bill


William F. Hammond

unread,
May 4, 2001, 7:59:20 AM5/4/01
to i...@hixie.ch, mozilla...@mozilla.org, www-...@w3.org
Ian Hixie writes again in reply to Robert Miner:

> > What I would like to be able to do is prepare an XHTML document in
> > accordance with the HTML compatibility guidelines from Appendix C of
> > the XHTML spec, *except* for the inclusion of MathML instances.
>

> In my opinion, browsers should not be promoting non-standard usage of W3C
> technologies as you have just described.

Ummm... MathML instances, as namespace extensions of XHTML, are
not in the problem category for classical user agents described by
Appendix C. Although the handling of these things with classical
agents is not likely to be robust and content providers must be fully
aware of that, it is misleading to say that namespace extensions for
XHTML are outside of the territory where the Recommendation says that
text/html may be used by a content provider.

Namespace extensions are NOT on the list of 13 warnings to content
providers in Appendix C.

The example http://www.mozilla.org/projects/mathml/start.xml *ought*
to be servable also as text/html.

A user agent that fails to deal with that resource as XHTML is out
of full compliance with the spec. I would never, however, apply the
word "broken" to a user agent that is still under development and
for whose existence I am grateful.

-- Bill

P.S. Ian, of course you are absolutely right that a user agent
should never infer a content type from what appears to be a file
extension in a URI. User agents that do this ARE broken. Beyond
that there are security risks, especially in the case of things
served as "text/plain" and "application/octet-stream".


Ian Hickson

unread,
May 4, 2001, 8:04:45 AM5/4/01
to William F. Hammond, mozilla...@mozilla.org, www-...@w3.org

As I understand it there are four different reason to do want to do this:

1. Inability to configure servers to send content as text/xml.

Fix the server, or if that is not possible, fix the process or
politics behind the inability to fix the server. Adding text/xml to the
content types supported by servers is a one line change to the servers'
configuration file.


2. Wanting to send XHTML with namespaced content to old browsers.

These documents will not be properly processed by older browsers
anyway, so just do what people do with PDF, frames, and so on, which is
to provide alternative versions for older browsers, or have an
introductory page explaining the requirements to view the content.


3. Wanting to send pure XHTML to new browsers as XML and old browsers as HTML.

New browsers will also handle the XHTML file if it is sent as HTML, so
there is no need to add sniffing code to the browsers to handle this case.


4. A bug in IE causes it to mishandle XHTML documents with namespaced
content when it is sent as text/xml.

This is a bug in IE and should not be fixed by other browsers which are
handling things per the specs.


In addition, there are several reasons why this is a bad idea in
the first place:

A. content sniffing as a whole is a flawed concept, as demonstrated by the
problems found with MacIE, WinIE, and Mozilla having "quirks" vs
"standard" modes based on the DOCTYPE of HTML documents.

B. the specs do not specify how (or whether) to try to detect XML in
text/html data streams.

C. The XML PI is optional and causes problems with some older browsers
when present, so would be a bad trigger to use. All other proposal
except for a "magic comment" are hard to parse quickly. The "magic
comment" idea, while being the only one that might work, still doesn't
address any of the other points discussed here.

D. The Content-Type HTTP header is supposed to be the final word on how to
handle a data stream.

E. The XHTML spec only gives one reason to send XHTML as text/html, and
that's for compatability with older browsers (see point 3 above).
It is doubtful wether documents that don't follow the spirit of these
guidelines (e.g. by including MathML) should be allowed to be sent as
text/html at all.

HTH,


--
Ian Hickson )\ _. - ._.) fL

Ian Hickson

unread,
May 4, 2001, 8:20:24 AM5/4/01
to William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Fri, 4 May 2001, William F. Hammond wrote:
>
> Ian Hixie writes again in reply to Robert Miner:
>
>>> What I would like to be able to do is prepare an XHTML document in
>>> accordance with the HTML compatibility guidelines from Appendix C of
>>> the XHTML spec, *except* for the inclusion of MathML instances.
>>
>> In my opinion, browsers should not be promoting non-standard usage of W3C
>> technologies as you have just described.
>
> Ummm... MathML instances, as namespace extensions of XHTML, are
> not in the problem category for classical user agents described by
> Appendix C. Although the handling of these things with classical
> agents is not likely to be robust and content providers must be fully
> aware of that, it is misleading to say that namespace extensions for
> XHTML are outside of the territory where the Recommendation says that
> text/html may be used by a content provider.

As I said in [1], in my opinion there is very little point in sending
MathML to classical user agents. If you want to support older UAs, then


use <sup>, <sub> and tables for the equations. Otherwise you will have

dataloss -- your users won't be able to tell what you equations are doing.

Similarly, as I said in [2], if you wish to use MathML and still have
content available to your users on older user agents, make your index


pages HTML, and inform your readers that they will need more recent
browsers on those pages, then link to your text/xml pages with the text
and maths in them.

This is just like the many sites that have PDF documents on them but use
text/html index pages to link to them, with the index pages informing them
that they will need a plugin to view the documents.


[1] http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0030.html
[2] http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0043.html

> Namespace extensions are NOT on the list of 13 warnings to content
> providers in Appendix C.

Namespace extensions are invalid in XHTML 1.0 documents, so saying that
they shouldn't be included for compatability reasons would be somewhat
redundant. However, the section is entitiled "Compatibility Issues", and
thus the presence of MathML content in _well formed_ XHTML documents sent
as text/html (as opposed to valid XHTML documents sent as text/html) is,
in my opinion, against the spirit of the specification.


BTW, I just noticed C.11.13.5 says:

# [...] HTML rules apply to XHTML documents delivered as HTML and the
# XML rules apply to XHTML documents delivered as XML.

Now, this section is referring specifically to CSS' HTML rules vs CSS' XML
rules, but it gives an interesting insight into the working group's
intent, especially for those of you who don't accept my word that the HTML
working group have explicitly told Mozilla not to sniff for XML in
text/html data streams.

--

Gervase Markham

unread,
May 4, 2001, 10:23:45 AM5/4/01
to
> > "Download this. Stuff it in your website directory. Then create two files,
> > index_math.html and index_normal.html, with the appropriate content."
>
> Gerv means "index_math.xml", of course.

Indeed I do :-)

Gerv

Henri Sivonen

unread,
May 4, 2001, 11:48:28 AM5/4/01
to
In article <3AF26EB7...@univ.ox.ac.uk>, Gervase Markham
<gervase...@univ.ox.ac.uk> wrote:

> I think you underestimate the intelligence of most school teachers.

How about organizational inertia?

> But, if you think it is a problem, produce a canned index page - it
> browser-sniffs, and redirects to index_math.html or index_normal.html .

Most people (whether clients of an ISP or teachers/students with a
school/university server) don't have permissions to set up customized
server-side Perl sniffers. However, they might have the permission to
tweak less dangerous aspects of server configuration on a per directory
basis.

One could use Apaches built-in content negotiation, but alas, Mozilla
does an evil thing: it only accepts */* like IE. This should be fixed
ASAP.

--
Henri Sivonen
hen...@clinet.fi
http://www.clinet.fi/~henris/

Henri Sivonen

unread,
May 4, 2001, 1:08:58 PM5/4/01
to
In article <henris-44DA6F....@uutiset.saunalahti.fi>, Henri
Sivonen <hen...@clinet.fi> wrote:

> One could use Apaches built-in content negotiation, but alas, Mozilla
> does an evil thing: it only accepts */* like IE. This should be fixed
> ASAP.

http://bugzilla.mozilla.org/show_bug.cgi?id=58040

Gervase Markham

unread,
May 5, 2001, 4:18:39 AM5/5/01
to
> Most people (whether clients of an ISP or teachers/students with a
> school/university server) don't have permissions to set up customized
> server-side Perl sniffers. However, they might have the permission to
> tweak less dangerous aspects of server configuration on a per directory
> basis.

It would not be a "custom server-side Perl sniffer", it would be a
standard Javascript Ultimate Browser Sniffer, directing Gecko and IE 5 to
.xml pages and everything else to .html. If the user doesn't have a
MathML-enabled build, that's a separate problem.

Gerv

Aaron Swartz

unread,
May 6, 2001, 11:36:19 PM5/6/01
to David Carlisle, mozilla...@mozilla.org, www-...@w3.org
David Carlisle <dav...@nag.co.uk> wrote:

>> Then I think that Netscape 7.0 is broken, since it should throw an error if
>> my page is not well-formed XML.
>
> Hang on no one is arguinng that it should be an _error_ to serve SGML
> (as opposed to XML) base html files as text/html are they?

Sorry if this was unclear -- I meant that if I sent an XML page and it
wasn't well-formed.

Aaron Swartz

unread,
May 6, 2001, 11:45:22 PM5/6/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

> 2. Wanting to send XHTML with namespaced content to old browsers.
>
> These documents will not be properly processed by older browsers
> anyway, so just do what people do with PDF, frames, and so on, which is
> to provide alternative versions for older browsers, or have an
> introductory page explaining the requirements to view the content.

If I include a minor bit of namespaced content in my document (say, RDF
metadata -- which I certainly plan to do, or many other possibilites) it
will likely not change the vast majority of the document (RDF metadata will
not be visible to the average browser) and thus the document will be
properly processed by older browsers, or processed 90% correctly.

I think it is absolutely silly to force me to create two versions of every
document for no good reason. Please give us something we can add to our
text/html documents so that we can do this. I'd much rather add a but extra
to my document that have to make two versions of everything I do.

Ian Hickson

unread,
May 7, 2001, 12:03:36 AM5/7/01
to Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Sun, 6 May 2001, Aaron Swartz wrote:
>
> If I include a minor bit of namespaced content in my document (say, RDF
> metadata -- which I certainly plan to do, or many other possibilites) it
> will likely not change the vast majority of the document (RDF metadata will
> not be visible to the average browser) and thus the document will be
> properly processed by older browsers, or processed 90% correctly.

Given that Mozilla (currently) won't do anything with RDF metadata placed
in a text/xml XHTML file, that's not an argument to change Mozilla's
behaviour. (If you wish to make this argument for other browsers, mozilla-
mat...@mozilla.org is not the right forum.)

Aaron Swartz

unread,
May 7, 2001, 12:13:22 AM5/7/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

>> If I include a minor bit of namespaced content in my document (say, RDF
>> metadata -- which I certainly plan to do, or many other possibilites) it
>> will likely not change the vast majority of the document (RDF metadata will
>> not be visible to the average browser) and thus the document will be
>> properly processed by older browsers, or processed 90% correctly.
>
> Given that Mozilla (currently) won't do anything with RDF metadata placed
> in a text/xml XHTML file, that's not an argument to change Mozilla's
> behaviour. (If you wish to make this argument for other browsers, mozilla-
> mat...@mozilla.org is not the right forum.)

RDF was merely one example. In the case of MathML, if I were to state that I
like the expression:

e^i¼

(however it's represented in MathML) I'd like the rest of my homepage to
display properly in most browsers. Such XML additions should degrade
gracefully, not cause me to have to make new versions. Perhaps Mozilla could
introduce some sort of comment element for namespaces. Something like:

<nogrok:namespace uri="http://www.w3.org/1998/Math/MathML">Your browser
does not understand MathML -- the following formula will not make
sense.</nogrok:namespace>

Similar to the <noscript> hack for JavaScript.

Ian Hickson

unread,
May 7, 2001, 12:37:17 AM5/7/01
to Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On Sun, 6 May 2001, Aaron Swartz wrote:
>
> I'd like the rest of my homepage to display properly in most browsers.
> Such XML additions should degrade gracefully, not cause me to have to
> make new versions.

XML doesn't "degrade gracefully". One of the whole points of XML is that
it not "degrade gracefully". If you wish to argue this point, that's a
whole other kettle of fish, and one that I have no opinion on in the
context of changing Mozilla's behaviour, which is currently attempting to
follow existing W3C specifications.


> Perhaps Mozilla could introduce some sort of comment element for

> namespaces. Something [...] Similar to the <noscript> hack for
> JavaScript.

Emphasis on "hack".


I'm not saying these are not good ideas -- merely that they move WAY
beyond the existing specifications and it is therefore not appropriate to
just implement them. Get the W3C to agree on something, and then you have
something to get the implementors to do. Until then...

Aaron Swartz

unread,
May 7, 2001, 12:45:50 AM5/7/01
to Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Ian Hickson <i...@hixie.ch> wrote:

> XML doesn't "degrade gracefully". One of the whole points of XML is that
> it not "degrade gracefully".

I'm not sure what you mean by this, and although I've spent a lot of time in
the XML community, I've never heard it before. Certainly in data-oriented
XML formats the whole idea is for XML to degrade gracefully (ignore new
stuff). HTML has always followed this rule and I'd hope that XHTML would
continue this even further.

>> Perhaps Mozilla could introduce some sort of comment element for
>> namespaces. Something [...] Similar to the <noscript> hack for
>> JavaScript.
>
> Emphasis on "hack".
>
> I'm not saying these are not good ideas -- merely that they move WAY
> beyond the existing specifications and it is therefore not appropriate to
> just implement them. Get the W3C to agree on something, and then you have
> something to get the implementors to do. Until then...

If we only implemented specifications, it would be a very boring world. I
thought that part of the idea of namespaces was that we could implement
things that _weren't_ standardized. Mozilla has certainly done it's share of
standard-stretching to make things easier on its developers, so I don't
understand your reticence to make things easier on your users (web
publishers).

We're certainly not violating any W3C spec by doing what I suggest.

Terje Bless

unread,
May 7, 2001, 1:15:03 AM5/7/01
to Aaron Swartz, Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On 06.05.01 at 23:45, Aaron Swartz <asw...@swartzfam.com> wrote:

>Mozilla has certainly done it's share of
>standard-stretching to make things easier on its developers

"When the [W3C] standards catch up with us,
I'm sure we'll be 100% compatible" -- Don Hackler, Netscape, ca. 1998

-link, ducking. :-)

Simon St.Laurent

unread,
May 7, 2001, 8:50:46 AM5/7/01
to Ian Hickson, Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Just as an FYI, there is a MIME type registration in progress for
application/xhtml+xml:

http://www.ietf.org/internet-drafts/draft-baker-xhtml-media-reg-01.txt

I don't think it answers these questions, but it's another context.

Simon St.Laurent - Associate Editor, O'Reilly & Associates
XML Elements of Style / XML: A Primer, 2nd Ed.
XHTML: Migrating Toward XML
http://www.simonstl.com - XML essays and books


Simon St.Laurent

unread,
May 7, 2001, 8:48:54 AM5/7/01
to Aaron Swartz, Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
At 11:45 PM 5/6/01 -0500, Aaron Swartz wrote:
>I'm not sure what you mean by this, and although I've spent a lot of time in
>the XML community, I've never heard it before. Certainly in data-oriented
>XML formats the whole idea is for XML to degrade gracefully (ignore new
>stuff). HTML has always followed this rule and I'd hope that XHTML would
>continue this even further.

Huh? Ignore new stuff works in certain contexts, perhaps - with CSS, for
instance, when there isn't explicit formatting specified for a particular
element.

XML program structures, even without validation running, are typically far
too brittle to ignore extra information caused by extra child
elements. You'd get a lot of strange errors where documents that could be
processed in certain contexts would fail in others.

I've argued for a long while that flexibility (not standardization) of
vocabularies is the real lesson of XML, but that's not reflected in current
practice.

Seth Russell

unread,
May 7, 2001, 9:15:20 AM5/7/01
to Aaron Swartz, Ian Hickson, Simon St.Laurent, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
Please excuse this, perhaps out of context, response ....

From: "Simon St.Laurent" <simo...@simonstl.com>

> XML program structures, even without validation running, are typically far
> too brittle to ignore extra information caused by extra child
> elements. You'd get a lot of strange errors where documents that could be
> processed in certain contexts would fail in others.

But what is actually at error here: the brittle processors, or the XML
documents? I would say it was the brittle processors ... what would you
say? Incidentally I can't form a "valid to the author" response to all
the emails that appear in my mailbox ..... so I don't see why one could
assume that some XML processor should be expected to do any better.

> I've argued for a long while that flexibility (not standardization) of
> vocabularies is the real lesson of XML, but that's not reflected in
current
> practice.

Could you sketch for us what "flexibility of vocabularies" means to you ?

Seth

Simon St.Laurent

unread,
May 7, 2001, 9:42:49 AM5/7/01
to Seth Russell, Aaron Swartz, Ian Hickson, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
At 06:15 AM 5/7/01 -0700, Seth Russell wrote:
> > XML program structures, even without validation running, are typically far
> > too brittle to ignore extra information caused by extra child
> > elements. You'd get a lot of strange errors where documents that could be
> > processed in certain contexts would fail in others.
>
>But what is actually at error here: the brittle processors, or the XML
>documents? I would say it was the brittle processors ... what would you
>say? Incidentally I can't form a "valid to the author" response to all
>the emails that appear in my mailbox ..... so I don't see why one could
>assume that some XML processor should be expected to do any better.

I'd say it's a best practice issue, not an error. The tools (DOM, XPath)
commonly used to process XML documents require that developers have some
concept of what structures they'll be using. If there's been an error
made, it's at the level which defines such processing.

SAX does make it easier to ignore extra markup than the tree-based systems,
but I'm not sure that was a deliberate design choice or a side-effect of
event-based processing.

> > I've argued for a long while that flexibility (not standardization) of
> > vocabularies is the real lesson of XML, but that's not reflected in
>current
> > practice.
>

>Could you sketch for us what "flexibility of vocabularies" means to you ?

Generally speaking, it means that developers and users can create
vocabularies which are meaningful to them, and not necessarily to expert
committees. I've got a very sketchy outline at:
http://www.simonstl.com/articles/selfish2.htm

It's not done yet, probably won't be for a few weeks, but it might give an
outline. Note that I expect transformations to deal with a lot of the
issues described above, not inherently flexible processing. After years of
working with 'flexible HTML processing', I can't say I'm fond of
unpredictable error-correction and the like.

There's also a presentation focusing on the role of transformations in XML at:
http://www.simonstl.com/articles/transform/transform.html

Arnaud Deslandes

unread,
May 7, 2001, 10:02:50 AM5/7/01
to mozilla...@mozilla.org

Arnaud Deslandes ICQ : 51422074
CARDIWEB
The Webbing Management Agency - www.cardiweb.com

Terje Bless

unread,
May 7, 2001, 11:52:41 AM5/7/01
to Simon St.Laurent, Ian Hickson, Aaron Swartz, William F. Hammond, mozilla...@mozilla.org, www-...@w3.org
On 07.05.01 at 08:50, Simon St.Laurent <simo...@simonstl.com> wrote:

>Just as an FYI, there is a MIME type registration in progress for
>application/xhtml+xml:
>
>http://www.ietf.org/internet-drafts/draft-baker-xhtml-media-reg-01.txt
>
>I don't think it answers these questions, but it's another context.

AFAICT it's completely meaningless as it just reaffirms the status quo;
there __is_no__ standard for how to label it, just some vague mumblings in
various Reccomendations that don't really deal with the issues (IMO,
obviously).

Karl Ove Hufthammer

unread,
May 10, 2001, 3:30:06 PM5/10/01
to
ham...@csc.albany.edu (William F. Hammond) wrote in message
<200104302059...@pluto.math.albany.edu>:

From the WWW10 conference <URL:
http://www.xml.com/pub/a/2001/05/09/www10/index.html >:

As XHTML is strict, the browser doesn't have to waste time
guessing about what the page should look like: the page is
either correct or it isn't. What's more, it doesn't take much
to add this feature to browsers -- if a page's DOCTYPE is
XHTML, then switch in the new, fast, XHTML parser, if not, use
the existing code you have already. Unfortunately, browser
vendor Microsoft was utterly noncommittal about their plans to
implement XHTML. Dave Massey from Microsoft commented that they
are "investigating" XHTML and may add it to their browser, but
he couldn't say if or when.

--
Karl Ove Hufthammer

Dan Connolly

unread,
May 11, 2001, 3:38:21 PM5/11/01
to William F. Hammond, www-...@w3.org, mozilla...@mozilla.org
30 Apr 2001 "William F. Hammond" wrote:
[...]
> Now in the mozilla-mathml discussion we are told that there have been
> recent further deliberations on this question at W3C.
>
> Can anyone report definitively?

I started to try, but then realized that there are a lot
of subtleties and stuff that the HTML WG has dicussed
that I haven't followed.

I expected the HTML contacts from the W3C team to
respond... when they didn't, I asked why; they pointed
out that this thread is in www-talk; they keep
a close eye on www-html, but not www-talk.

They said they'd try to catch up and respond eventually.
But it may take some time.

--
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Przemek Klein

unread,
May 16, 2001, 6:24:18 AM5/16/01
to mozilla...@mozilla.org
0 new messages