Detecting proper XHTML

Roger B. Sidje

unread,

Dec 16, 2001, 12:43:47 PM12/16/01

to

This is a summary of the options that were suggested recently on this
newsgroup to deal with the ambivalence of XHTML which can be treated
either as HTML or XML. Feel free to insert any point I may have omitted
so that the note can be comprehensive as well. This is not the first time
and/or only place where this issue has been discussed, but the following
summary covers a fair bit of the dilemma as a whole, especially from the
viewpoint of implementors. (Looking more into the question with hindsight,
it seems to me that Options 3 & 4 could have well been valid W3C options
had W3C thought of the dilemma that the ambivalence is causing in the
absence of a specific recommendation to deal with the problem easily).

Option 1: Live with the XML/XHTML mimetype as is
================================================
Pros:
-----
- Supported in default configuration in Apache for .xml and .xhtml (and
.xht?) files.
Cons:
-----
- Discourage users from producing well-formed XHTML pages because it
is not yet supported in other browsers (e.g., it prompts "Save As..."
with no means to intercept it and to customize the prompt in a persuasive
way that could entice users to upgrade on the spot).
- Stop XHTML from seriously taking off. Sites continue to be served
as "text/html" to stay compatible with older browsers. As a result, some
sites claim to be XHTML when in reality they are still ill-formed tag soup.

Option 2: Variable Content-Types (or files) depending on the browser
====================================================================
Pros:
-----
- Workable according to the standard rules of content-based negotiation.
Cons:
-----
- Need latest Apache servers, or can be implemented more laboriously
in other servers with a server-side scripting language such as Perl.
- Need knowledgeable users (e.g., to set symbolic links, and to
set the .var configuration file with the appropriate q=)
- Cumbersome to maintain on large sites.

Option 3: XHTML DOCTYPE look-ahead (was "sniffing")
===================================================
Pros:
-----
- Allow authors to produce pages that are backward-compatible as HTML
in older browsers, and yet forward-compatible as XHTML in newer browsers.
- Protect XHTML from degenerating.
- There is a similar feature w.r.t. the charset. A document can set an
internal charset to supersede the charset parameter that may come from
the Content-Type.
Cons:
-----
- Hard to implement in Mozilla.
- Not backed by any W3C spec, and as a result gives the impression of
forcing XHTML via the back-door.
- Users may have to override the default configuration in Apache servers
to set the mimetype of .xhtml (and .xht) files to "text/html".

Option 4: "Content-Type: text/html; profile=xhtml"
==================================================
Pros:
-----
- Easy to implement in Mozilla.
- The XHTML mimetype already has a similar "profile" parameter
(http://www.ietf.org/internet-drafts/draft-baker-xhtml-media-reg-02.txt).
Cons:
-----
- The "profile" parameter is not part of the spec for the "text/html"
mimetype (there was another parameter called "version" that was proposed,
but it apparently was dropped because it was found of no use at the time).

Possible Resolutions:
=====================
1 - Keep the status quo and do nothing. This is equivalent to be
content with the XML/XHTML mimetype, and let users do content-based
negotiation, e.g., by serving different files depending on the browser.
2 - Implement a XHTML DOCTYPE look-ahead ("sniffing").
3 - Support profile=xhtml, by trivially substituting
"text/html; profile=xhtml" with the XHTML mimetype in necko.

Your vote?

---
RBS

Chris Hoess

unread,

Dec 16, 2001, 9:49:28 PM12/16/01

to

In article <3C1CDD53...@maths.uq.edu.au>, Roger B. Sidje wrote:
> This is a summary of the options that were suggested recently on this
> newsgroup to deal with the ambivalence of XHTML which can be treated
> either as HTML or XML. Feel free to insert any point I may have omitted
> so that the note can be comprehensive as well. This is not the first time
> and/or only place where this issue has been discussed, but the following
> summary covers a fair bit of the dilemma as a whole, especially from the
> viewpoint of implementors. (Looking more into the question with hindsight,
> it seems to me that Options 3 & 4 could have well been valid W3C options
> had W3C thought of the dilemma that the ambivalence is causing in the
> absence of a specific recommendation to deal with the problem easily).

The fundamental problem here seems to be the intersection of the */xml
(*/*+xml) MIME-types and the old text/html MIME-type, such that documents
can exist which may legitimately served as either. Both RFC 2854 and
Appendix C have, of course, done a masterful job of obfuscating the exact
nature of the intersection. I comment further below.

> Option 1: Live with the XML/XHTML mimetype as is
>================================================
> Pros:
> -----
> - Supported in default configuration in Apache for .xml and .xhtml (and
> .xht?) files.
> Cons:
> -----
> - Discourage users from producing well-formed XHTML pages because it
> is not yet supported in other browsers (e.g., it prompts "Save As..."
> with no means to intercept it and to customize the prompt in a persuasive
> way that could entice users to upgrade on the spot).
> - Stop XHTML from seriously taking off. Sites continue to be served
> as "text/html" to stay compatible with older browsers. As a result, some
> sites claim to be XHTML when in reality they are still ill-formed tag soup.

I don't quite understand this. text/html tag soup has already made
substantial inroads, and I don't see how this trend can be easily
reversed. This is a retrospectively unsurprising corollary of the W3C's
zeal for "backwards-compatibility".

> Option 2: Variable Content-Types (or files) depending on the browser
>====================================================================
> Pros:
> -----
> - Workable according to the standard rules of content-based negotiation.
> Cons:
> -----
> - Need latest Apache servers, or can be implemented more laboriously
> in other servers with a server-side scripting language such as Perl.
> - Need knowledgeable users (e.g., to set symbolic links, and to
> set the .var configuration file with the appropriate q=)
> - Cumbersome to maintain on large sites.

This seems to me to be the best of the solutions. Admittedly, this does
require a reasonable degree of clue, but in the context of its use
(delivering mixed (XHTML, MathML, SVG) content), these are rather advanced
technologies, in terms of browser support. I think that asking authors
who wish to use something this advanced (in pragmatic terms, anyway,
regardless of the process of W3C recommendations) to have some knowledge
of server configutation is not unreasonable.

> Option 3: XHTML DOCTYPE look-ahead (was "sniffing")
>===================================================
> Pros:
> -----
> - Allow authors to produce pages that are backward-compatible as HTML
> in older browsers, and yet forward-compatible as XHTML in newer browsers.
> - Protect XHTML from degenerating.
> - There is a similar feature w.r.t. the charset. A document can set an
> internal charset to supersede the charset parameter that may come from
> the Content-Type.
> Cons:
> -----
> - Hard to implement in Mozilla.
> - Not backed by any W3C spec, and as a result gives the impression of
> forcing XHTML via the back-door.
> - Users may have to override the default configuration in Apache servers
> to set the mimetype of .xhtml (and .xht) files to "text/html".

IMO, it's too late to protect XHTML from "degenerating", if by "XHTML" you
mean "anything with an XHTML DOCTYPE". As it stands, tag soup with such a
doctype is already multiplying; it's used on MSN, as we found out in the
wake of the MSN-Opera-standards spat; and, most importantly, "IE supports
it." Having fought a number of standards battles, won some, and lost
others, I am firmly convinced that it will not be possible to "reclaim"
XHTML text/html for XML well-formedness in Mozilla.

> Option 4: "Content-Type: text/html; profile=xhtml"
>==================================================
> Pros:
> -----
> - Easy to implement in Mozilla.
> - The XHTML mimetype already has a similar "profile" parameter
> (http://www.ietf.org/internet-drafts/draft-baker-xhtml-media-reg-02.txt).
> Cons:
> -----
> - The "profile" parameter is not part of the spec for the "text/html"
> mimetype (there was another parameter called "version" that was proposed,
> but it apparently was dropped because it was found of no use at the time).

Theoretically, this could be very useful, and I'm glad to see it can be
deployed for XHTML, at least. However, doing this for text/html will
create serious problems in the future. To expand:

The "HTML" in "XHTML" is gradually becoming a historical anomaly. XHTML
1.1 provides for inclusions of other XML applications in XHTML through
namespaces. While certain XML applications may degrade gracefully in
tag-soup UAs, this is more a matter of luck and coincidence than anything
else. With the release of XHTML 2.0, it will be positively *dangerous* to
serve it as text/html; the WG has suggested that they will make
substantial, non-backwards-compatible changes in the language, as well as
taking advantage of XLink, XForms, &etc. Allowing the additional features
of XHTML 1.1 and 2.0 to work when served as text/html in an XML-aware UA
like Mozilla will result in increasingly XML-ized content being served as
text/html (because people will do the minimum amount of fussing with
content-types required for it to work). This, in turn, will result in:
a) Increasing amounts of XML-ized content being delivered to tag-soup UAs
incapable of handling it.
b) Increasing pressure for Mozilla to handle tag-soup delivered as XHTML
1.1, etc. because the author has managed to use whatever small subset of
XHTML elements remain backwards-compatible. (And people *do* tie
themselves into knots doing this sort of thing.)

> Possible Resolutions:
>=====================
> 1 - Keep the status quo and do nothing. This is equivalent to be
> content with the XML/XHTML mimetype, and let users do content-based
> negotiation, e.g., by serving different files depending on the browser.
> 2 - Implement a XHTML DOCTYPE look-ahead ("sniffing").
> 3 - Support profile=xhtml, by trivially substituting
> "text/html; profile=xhtml" with the XHTML mimetype in necko.
>
> Your vote?
>

Ultimately, I think that the current state of affairs, troublesome as it
may be in some aspects, is the only safe one. While I realize that
Mozilla's current inability to deal with XHTML+MathML (which I assume has
been at the back of this all along) when served as text/html is
frustrating, I think that's part of the price we'll have to pay to set a
firewall between XML and tag-soup. Right now, text/html is sort of the
"special" content-type for "the stuff we send over the WWW". It's also
been polluted quite heavily both by tag-soup documents and tag-soup
permissive browsers. What the W3C is moving towards is making XHTML "just
another XML application" and getting browsers to support XML in general
rather than just (X)HTML. As the XHTML standards evolve, newer versions
of XHTML simply aren't going to fit in the same box with HTML 4.01, 3.2,
2.0, and so forth. At some point, we have to step in and say "Everything
above this line is */xml and doesn't go to a tag-soup parser, because even
if this is sort of backwards-compatible, it's not long before things will
really start breaking." I think this is the point to do it. The
combination of XHTML and other XML applications is not an SGML mechanism;
it's XML, plain and simple, and I don't think that the historical oddity
of one of those applications being similar to what's served under
text/html MIME-types nowadays is enough to justify serving it up under
that magic MIME-type.

The sad corollary to this is that I don't think the time is yet ripe to
deploy MathML on the WWW. If large numbers of visitors are experiencing
problems with an XML document-type, than the message is, alas, that
delivering namespaced XML content is jumping the gun. I realize this is a
bitter pill to swallow for all the people who have put so much time into
the Mozilla MathML implementation (or, for that matter, for hopeful
consumers of MathML like me), but I think that rushing ahead to push
MathML on all UAs before they're ready will backfire on us in the future.

As a practical suggestion, you mention elsewhere that IE can do MathML
through a plugin. Is it possible to use external MathML files and
<object> or some such to achieve rendering in both IE and Mozilla?

--
Chris Hoess

Andreas J. Guelzow

unread,

Dec 16, 2001, 10:08:24 PM12/16/01

to mozilla-mathml

Roger B. Sidje wrote:

>
> Option 2: Variable Content-Types (or files) depending on the browser
> ====================================================================
> Pros:
> -----
> - Workable according to the standard rules of content-based negotiation.
> Cons:
> -----
> - Need latest Apache servers, or can be implemented more laboriously
> in other servers with a server-side scripting language such as Perl.
> - Need knowledgeable users (e.g., to set symbolic links, and to
> set the .var configuration file with the appropriate q=)
> - Cumbersome to maintain on large sites.
>

...

>
> Possible Resolutions:
> =====================
> 1 - Keep the status quo and do nothing. This is equivalent to be
> content with the XML/XHTML mimetype, and let users do content-based
> negotiation, e.g., by serving different files depending on the browser.
> 2 - Implement a XHTML DOCTYPE look-ahead ("sniffing").
> 3 - Support profile=xhtml, by trivially substituting
> "text/html; profile=xhtml" with the XHTML mimetype in necko.
>
> Your vote?
>

My vote (if you really want to know):

Option 2, that means resolution 1.

As a note to this: resolution 3 of course also requires knowledgable
users since they have to learn to add profile=... In any case, to switch
to XHTML some learning is necessary, for many people in fact a whole lot
of learning since they may not even know HTML correctly.

Andreas

--
Prof. Dr. Andreas J. Guelzow
http://www.math.concordia.ab.ca/aguelzow

Roger B. Sidje

unread,

Dec 17, 2001, 3:45:36 AM12/17/01

to Chris Hoess

Chris Hoess wrote:
>
> The sad corollary to this is that I don't think the time is yet ripe to
> deploy MathML on the WWW. If large numbers of visitors are experiencing
> problems with an XML document-type, than the message is, alas, that
> delivering namespaced XML content is jumping the gun.

Interesting analysis but dubious corollary. With the same reasoning, CSS2/3
should be ignored because no other browsers support them. Or, in general,
pioneering work to support W3C features should be kept in the closet until
everybody and their other browsers are ready. The bottom line not to forget is
that XHTML (plus MathML/SVG) served as xml has no problems in Mozilla -- which
is the right thing expected for a XML/XHTML compliant browsers. Non-standard
options to be "compatible" with other (dominant) browsers are just helpful
extras to make Mozilla competitive in the real world.

> As a practical suggestion, you mention elsewhere that IE can do MathML
> through a plugin. Is it possible to use external MathML files and
> <object> or some such to achieve rendering in both IE and Mozilla?

Not to my knowledge. The plug-in under consideration was developed by
DSI and it is specifically tuned for IE. Enough internal APIs were extended
by the IE team to allow the plug-in to interact with the IE layout engine
closely. But it doesn't offer those DOM-related functions that are trademark to
Mozilla (e.g., search of math text, selection of individual pieces, JavaScripting,
etc).

Besides, back in the standards Mozilla land, the thought of <math> (clean
XML) in HTML (tag-soup) is resisted to start with.
---
RBS

Bruno....@laposte.net

unread,

Dec 17, 2001, 5:21:36 AM12/17/01

to

I vote for Option 1.
(See the thread "Examples" 28/11/2001 for my arguments.)

I can just add that DocBook (an xml/sgml application) is moving to
strict XHTML and better use of namespaces because Mozilla break on it.
This show that there is an emergency to draw a clear frontiere between
html and XML (XHTML, MathML, SVG ...) The earlier you clean a problem
the easier it is : don't let html and xml mixed everywhere.

Another point : this "Vote".
- 1 : The author of this "poll" is personnaly involved in the 3 option
and is "lobbying" for it. He could have mention this, since others ones
would have made a quite different summary. But the author did the work
so he marks a point (and deserve all my respect for this, I don't want
to upset anyone !)
- 2 : Who will read this poll and who will vote ? and what decision will
be taken based on this ? (Quite existential, isn't it ?)
- 3 : IE has already vote for the 3° and this makes a majority.
- 4 : Apache has vote for 1° (by setting defaults mime-type)
- 5 : The reference site for MathML has vote for 3° (He is the author of
this thread and marks another big point.)
- 6 : (Taking the risk to look as an idiot) Isn't it the role of the W3
to decide ?

Thanks

Chris Hoess

unread,

Dec 17, 2001, 9:05:00 AM12/17/01

to

In article <3C1DB0B0...@maths.uq.edu.au>, Roger B. Sidje wrote:
> Chris Hoess wrote:
>>
>> The sad corollary to this is that I don't think the time is yet ripe to
>> deploy MathML on the WWW. If large numbers of visitors are experiencing
>> problems with an XML document-type, than the message is, alas, that
>> delivering namespaced XML content is jumping the gun.
>
> Interesting analysis but dubious corollary. With the same reasoning, CSS2/3
> should be ignored because no other browsers support them. Or, in general,
> pioneering work to support W3C features should be kept in the closet until
> everybody and their other browsers are ready. The bottom line not to forget is
> that XHTML (plus MathML/SVG) served as xml has no problems in Mozilla -- which
> is the right thing expected for a XML/XHTML compliant browsers. Non-standard
> options to be "compatible" with other (dominant) browsers are just helpful
> extras to make Mozilla competitive in the real world.

Well, CSS is sort of a different kettle of fish; the parsing rules of CSS1
ensure that a style system can just drop new rules it doesn't understand
without choking on the entire stylesheet. And the stylesheet is auxiliary
to the content, so it doesn't matter if it completely disappears.
Essentially, CSS has graceful degradation built into its design;
unfortunately, with HTML/XML, we have to resort to the more difficult
content-negotiation.

>> As a practical suggestion, you mention elsewhere that IE can do MathML
>> through a plugin. Is it possible to use external MathML files and
>> <object> or some such to achieve rendering in both IE and Mozilla?
>
> Not to my knowledge. The plug-in under consideration was developed by
> DSI and it is specifically tuned for IE. Enough internal APIs were extended
> by the IE team to allow the plug-in to interact with the IE layout engine
> closely. But it doesn't offer those DOM-related functions that are trademark to
> Mozilla (e.g., search of math text, selection of individual pieces, JavaScripting,
> etc).
>
> Besides, back in the standards Mozilla land, the thought of <math> (clean
> XML) in HTML (tag-soup) is resisted to start with.

Indeed. I'm sorry if I've sounded a bit snippy here, but one of the
problems I perceive is the "quest for text/html", all stemming out of the
W3C's misguided backwards-compatibility dictum in Appendix C (when, as I
pointed out, an SGML-based system and an XML-based system will both
"validate" XHTML documents, but display different content if the document
contains empty elements). Unfortunately, this has, not surprisingly,
encouraged the idea that anything can be jammed under the text/html
umbrella if it shows signs of HTML-ness, hindering the proper deployment
of XML.

AIUI, right now, the only way to get MathML to work in IE is via this
plugin, by inserting <math> in HTML documents, whether they be well-formed
XHTML or tag-soup. It is the last that is the problem. If we start
making well-formed XHTML + MathML work in Mozilla when served as
text/html, we *will* get people complaining because:
a) They've seen MathML work for other people in Mozilla.
b) But the MathML-in-tag-soup they're serving doesn't work.
To keep the tag-soupers off our backs, we need to do *something* different
with the MIME-type, because if people can just take current tag-soup and
shove MathML in without changing anything, it will be very unpleasant for
us.

For this, I favor the content-negotiation approach, for several reasons:
1) The idea that XML can be legitimately served as text/html is a
delusion, albeit a W3C-sponsored one. Encouraging it is not in our best
interests, nor those of the WWW.
2) Delivering embedded MathML to older browser versions that don't know
about "profile" will degrade poorly and cause poor usability.
3) Supporting MathML in text/html more or less in perpetuity would
serve to legitimate the current IE practice, which I do not believe to be
consistent with standards. Since it is easier to do than setting up XML +
content-negotiated fallback, I would anticipate that we would see most
MathML on the WWW served in this manner, delaying the deployment of XML.

Again, I understand what must be great frustration over what seem to be
trivial quibbles by standards people. (For that matter, I'm frustrated
that IE is able to Do The Wrong Thing and get away with it, because it's
easier for people to ask an open project to change something than make an
impact on MS.) But I really think that all these tricks to work around
the HTML, XML, and HTTP standards will, in the long run, hinder XML deployment
and perpetuate the reign of text/html, DOCTYPE sniffing, and other
unpleasantries.

--
Chris Hoess

William F. Hammond

unread,

Dec 17, 2001, 11:46:00 AM12/17/01

to mozilla...@mozilla.org

Bruno....@LaPoste.net writes:

> - 3 : IE has already vote for the 3° and this makes a majority.

Oh? IE voted in a poll about what the Mozilla project should do?

> - 6 : (Taking the risk to look as an idiot) Isn't it the role of the
> W3 to decide ?

It is the role of W3C to set standards regarding HTML, XHTML, and
MathML. As I read RFC 2854, "The 'text/html' media type", W3C into the
future can say what is and what is not covered under "text/html".

It is the role of the Mozilla Project to decide what Mozilla is going
to do.

-- Bill

William F. Hammond

unread,

Dec 17, 2001, 12:16:43 PM12/17/01

to Chris Hoess, mozilla...@mozilla.org

Chris Hoess <cho...@force.stwing.upenn.edu> writes:

> unfortunately, with HTML/XML, we have to resort to the more difficult
> content-negotiation.

Content negotiation is available though more difficult and not always
reliable because of caches, proxies, and firewalls, not to mention
bureaucratic server-manager lameness and obstinance.

The issue is whether or not to give content providers more flexibility
than they now have. If that flexibility is not provided, they may
decide to ignore MathML.

Remember that providers now under new site mandates to provide
*accessible* content may no longer use bitmap images nor rely on PDF.
If MathML is not a robust option, the fallback is TeX source in
regular HTML pages for math content.

Remember that there are examples of old browsers that handle XHTML
plus MathML well when it is served as text/html.

> . . . (when, as I

> pointed out, an SGML-based system and an XML-based system will both
> "validate" XHTML documents, but display different content if the document
> contains empty elements).

This is possibly an accurate description of particular systems but
otherwise is nonsense since (1) every XML application gives rise in a
canonical way to an SGML application and (2) SGML applications do not
prescribe rendered display.

-- Bill

David Carlisle

unread,

Dec 17, 2001, 12:26:20 PM12/17/01

to ham...@csc.albany.edu, cho...@force.stwing.upenn.edu, mozilla...@mozilla.org

> > pointed out, an SGML-based system and an XML-based system will both
> > "validate" XHTML documents, but display different content if the document
> > contains empty elements).
>
> This is possibly an accurate description of particular systems but
> otherwise is nonsense since (1) every XML application gives rise in a
> canonical way to an SGML application and (2) SGML applications do not
> prescribe rendered display.

I assume he was refering to the fact that with the default SGML
declaration for HTML the syntax /> is legal, but / closes the start tag
and the > is character data which should be printed as part of element
content. This is typically different content than you get if you use the
SGML declaration for XML.

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

Chris Hoess

unread,

Dec 17, 2001, 1:39:55 PM12/17/01

to

In article <i7wuzla...@pluto.math.albany.edu>, William F. Hammond wrote:
> Chris Hoess <cho...@force.stwing.upenn.edu> writes:
>
>> unfortunately, with HTML/XML, we have to resort to the more difficult
>> content-negotiation.
>
> Content negotiation is available though more difficult and not always
> reliable because of caches, proxies, and firewalls, not to mention
> bureaucratic server-manager lameness and obstinance.

Well, sure. It's also difficult to write well-formed XHTML if you're tied
into some WYSIWYG development environment/process that can't accomodate
HTML Tidy. The reward of taking the time to do so is pages that are much
more likely to work consistently in all browsers, and which won't break in
a new browser release because a bug was fixed. The reward of taking the
trouble to set up content-type negotiation is the ability to satisfy both
new and old browsers with XHTML+MathML content.

> The issue is whether or not to give content providers more flexibility
> than they now have. If that flexibility is not provided, they may
> decide to ignore MathML.
>
> Remember that providers now under new site mandates to provide
> *accessible* content may no longer use bitmap images nor rely on PDF.
> If MathML is not a robust option, the fallback is TeX source in
> regular HTML pages for math content.

If MathML is not a robust option, it is because certain browsers have
failed to implement it correctly. If it can't be deployed reliably yet,
then falling back on earlier measures is indeed the best option.

> Remember that there are examples of old browsers that handle XHTML
> plus MathML well when it is served as text/html.

Coincidentally, perhaps; it's certainly not a behavior that can be relied
on in general, and tacitly endorsing it will result in more ammunition for
the Teeming Millions who want permission to shove XHTML+(some XML) into
text/html regardless of whether it breaks (the reason being they don't
want to take the trouble to change it to a more appropriate */xml). The
W3C has already experimented with this sort of lenity; the growing pool of
text/html tag-soup with a jaunty XHTML doctype slapped on top is evidence
of its failure.

>> . . . (when, as I
>> pointed out, an SGML-based system and an XML-based system will both
>> "validate" XHTML documents, but display different content if the document
>> contains empty elements).
>
> This is possibly an accurate description of particular systems but
> otherwise is nonsense since (1) every XML application gives rise in a
> canonical way to an SGML application and (2) SGML applications do not
> prescribe rendered display.
>

As David and I have pointed out elsewhere, the supposed canonical SGML
application generated by an XML application will not, for any XML
application, be identical to the SGML application known as "HTML". The
consequences of this can include the interpretation of data characters in
one as markup characters in the other, or vice versa, a problem more
deeply rooted than "rendered display".

--
Chris Hoess

William F. Hammond

unread,

Dec 17, 2001, 5:28:39 PM12/17/01

to mozilla...@mozilla.org

"Roger B. Sidje" <r...@maths.uq.edu.au> has presented three possible
resolutions:

1. Status quo (which includes the possibility of content negotiation).
2. Looking ahead for an XHTML document type declaration.
3. Mime-level support for "text/html; profile=xhtml".

I suggest that we set aside resolution (2) prior to Mozilla 1.0
because of the performance hit and the late date.

I am advocating (3) since it gives content providers added flexibility
without getting in the way of anything else. Moreover, it does not in
any way support the furtherance of tag soup since switching Mozilla
into application/xhtml+xml mode will make Mozilla intolerant of
errors. Because "profile" is a new content type parameter for
text/html, nobody is going to use it inadvertently.

It would probably not be a good idea for a content-provider to serve a
document without a document type declaration as "text/html; profile=xhtml".
But its formal meaning exists at the mime-level and is only that an
XHTML capable agent recognizing the new parameter will handle the
document as "application/xhtml+xml", while agents not recognizing the
new parameter will ignore it.

-- Bill

Ian Hickson

unread,

Dec 19, 2001, 10:45:54 AM12/19/01

to Roger B. Sidje

Roger B. Sidje wrote:

> This is a summary of the options that were suggested recently on this
> newsgroup to deal with the ambivalence of XHTML which can be treated
> either as HTML or XML.

Good summary.

> Option 1: Live with the XML/XHTML mimetype as is
> ================================================

> - Discourage users from producing well-formed XHTML pages because it

> is not yet supported in other browsers (e.g., it prompts "Save As..."
> with no means to intercept it and to customize the prompt in a persuasive
> way that could entice users to upgrade on the spot).

XHTML isn't supported by the majority of the deployed market yet. This
has nothing to do with Mozilla. If people spent as much time complaining
the Microsoft about this as they do to Mozilla, maybe we wouldn't have a
problem at all.

> - Stop XHTML from seriously taking off. Sites continue to be served
> as "text/html" to stay compatible with older browsers.

This will have to go on for at least five years or so anyway. What we do
won't affect this.

> As a result, some
> sites claim to be XHTML when in reality they are still ill-formed tag soup.

If they are labelled text/html they are not claiming to be XHTML.

> Option 2: Variable Content-Types (or files) depending on the browser
> ====================================================================

This is what happened when IE went ahead of Netscape. People forked
their sites to have legacy content and new content on different pages.

Why should it be different here?

> Option 3: XHTML DOCTYPE look-ahead (was "sniffing")
> ===================================================

>

> - Not backed by any W3C spec

Not only is it not backed, the HTML chair publically said we shouldn't
do it.

> Option 4: "Content-Type: text/html; profile=xhtml"
> ==================================================

If people can do this they can almost certainly do server side sniffing
anyway and give the correct MIME type.

However if the attribute was renamed "-moz-override" then I would not be
opposed. Something like:

Content-Type: text/html; -moz-override=text/xml

...would be ok.

In fact, we could extend that to be supported for all MIME types as a
simple, consistent way of overriding the Content-Type header for Mozilla
only. (An alternative would be to introduce a specific header in the
same vein, e.g. 'Moz-Content-Type' or some such.)

> Possible Resolutions:
> =====================
> 1 - Keep the status quo and do nothing. This is equivalent to be
> content with the XML/XHTML mimetype, and let users do content-based
> negotiation, e.g., by serving different files depending on the browser.

My personal favourite. It should be complemented by customers telling
Microsoft to get their act together.

> 2 - Implement a XHTML DOCTYPE look-ahead ("sniffing").

I am strongly opposed to this.

> 3 - Support profile=xhtml, by trivially substituting
> "text/html; profile=xhtml" with the XHTML mimetype in necko.

If this was changed to be outside the IETF namespace (i.e. using the
-moz- prefix) then I would not be opposed to this hack. However web
authors will have to realise that what they are doing is fundamentally
wrong. MathML is XML, not HTML. Using the text/html MIME type is a lie.

--
Ian Hickson

Chris Hoess

unread,

Dec 19, 2001, 11:58:54 AM12/19/01

to

In article <3C20B632...@hixie.ch>, Ian Hickson wrote:
>
>> Option 4: "Content-Type: text/html; profile=xhtml"
>> ==================================================
>
>
> If people can do this they can almost certainly do server side sniffing
> anyway and give the correct MIME type.
>
> However if the attribute was renamed "-moz-override" then I would not be
> opposed. Something like:
>
> Content-Type: text/html; -moz-override=text/xml
>
> ...would be ok.

After some discussion on IRC, I'd be willing to support this. I'm not
particularly happy with it on principle (the adjective "deprecated" comes
to mind), but if we think of this as a sort of singular ad-hoc MIME-type,
I could live with that.

> In fact, we could extend that to be supported for all MIME types as a
> simple, consistent way of overriding the Content-Type header for Mozilla
> only. (An alternative would be to introduce a specific header in the
> same vein, e.g. 'Moz-Content-Type' or some such.)
>

This I disagree with, and I think Hixie isn't that enthusiastic about it
either (although I can't speak for him). We should not be hacking up a
generalized mechanism to break Content-Type due to the failings of other
UAs.

>> Possible Resolutions:
>> =====================
>> 1 - Keep the status quo and do nothing. This is equivalent to be
>> content with the XML/XHTML mimetype, and let users do content-based
>> negotiation, e.g., by serving different files depending on the browser.
>
>
> My personal favourite. It should be complemented by customers telling
> Microsoft to get their act together.
>

I'd much prefer this solution, but people seem strongly opposed to it.
Unfortunately, customer feedback does not seem to have made much of an
impact on practices at 1 Microsoft Way; I can't imagine that they haven't
had demand to support XHTML-as-XML.

>> 2 - Implement a XHTML DOCTYPE look-ahead ("sniffing").
>
>
> I am strongly opposed to this.
>

Agreed.

>> 3 - Support profile=xhtml, by trivially substituting
>> "text/html; profile=xhtml" with the XHTML mimetype in necko.
>
>
> If this was changed to be outside the IETF namespace (i.e. using the
> -moz- prefix) then I would not be opposed to this hack. However web
> authors will have to realise that what they are doing is fundamentally
> wrong. MathML is XML, not HTML. Using the text/html MIME type is a lie.
>

I think this is what will probably be adopted, despite suboptimality, and
I think this will eventually come around to bite us when people really do
need to start rolling over to XML MIME-types, the "text/html is all"
situationw ill have ossified worse than today. But I guess we'll have to
deal with that then.

--
Chris Hoess

William F. Hammond

unread,

Dec 19, 2001, 1:00:38 PM12/19/01

to Ian Hickson, mozilla...@mozilla.org, Roger B. Sidje

Ian Hickson <i...@hixie.ch> writes:

> > Option 4: "Content-Type: text/html; profile=xhtml"
> > ==================================================

> If people can do this they can almost certainly do server side
> sniffing anyway and give the correct MIME type.

The "correct" types (application/xhtml+xml or maybe text/xml) block
display in old user agents. This thread is about providing
flexibility for content providers who wish to override that behavior.
See, for example, lynx 2.8. There are closely related WAI issues.

> However if the attribute was renamed "-moz-override" then I would not
> be opposed. Something like:
>
> Content-Type: text/html; -moz-override=text/xml
>
> ...would be ok.

> In fact, we could extend that to be supported for all MIME types as a
> simple, consistent way of overriding the Content-Type header for

> Mozilla only. . . .

A useful thing for content providers in connection with any HTML/XHTML
user agent that has dual parsers. (Single parser agents exist. Maybe
Mozilla 2?)

My intention with "profile=xhtml" was to make it very simple for
content providers and have it not be Mozilla-specific.

I have no personal objection to "-moz-override". But there is a class
of content-providers who on principle object to, or by mandate are
prevented from, writing anything that smacks of being user-agent
specific.

> > 3 - Support profile=xhtml, by trivially substituting
> > "text/html; profile=xhtml" with the XHTML mimetype in necko.
>
> If this was changed to be outside the IETF namespace (i.e. using the

> -moz- prefix) then I would not be opposed to this hack. . . .

Do you object to my proposal in www-...@w3.org to have
"profile=xhtml" added as an HTTP content type parameter for text/html
as a way of giving content providers additional flexibility?

Please remember that we have two types of customers: users and
content providers.

-- Bill

William F. Hammond

unread,

Dec 19, 2001, 2:52:10 PM12/19/01

to mozilla...@mozilla.org

In mozilla...@mozilla.org I wrote:

> Ian Hickson <i...@hixie.ch> writes:

> > Content-Type: text/html; -moz-override=text/xml
> >
> > ...would be ok.

. . .

> I have no personal objection to "-moz-override". But there is a class
> of content-providers who on principle object to, or by mandate are
> prevented from, writing anything that smacks of being user-agent
> specific.

What about going to W3C to ask for:

Content-Type: text/html; http-equiv=text/xml
and/or
Content-Type: text/html; http-equiv=application/xhtml+xml ?

Would that be be OK as a substitute for "-moz-override"?

-- Bill

Chris Hoess

unread,

Dec 19, 2001, 2:52:12 PM12/19/01

to

In article <i7zo4fu...@pluto.math.albany.edu>, William F. Hammond wrote:
> Ian Hickson <i...@hixie.ch> writes:
>
>> > Option 4: "Content-Type: text/html; profile=xhtml"
>> > ==================================================
>
>> If people can do this they can almost certainly do server side
>> sniffing anyway and give the correct MIME type.
>
> The "correct" types (application/xhtml+xml or maybe text/xml) block
> display in old user agents.

As well they should; after all, they're HTML user agents, not magic
XML/XHTML/MathML/SomethingML UAs.

> This thread is about providing
> flexibility for content providers who wish to override that behavior.

Which would be ludicrous if it were true, it being the "flexibility" to
insert into a user's browser content which his browser does not understand
and which it properly rejects. Of course, it isn't about this at all;
it's making polite conversation to explain away the need for 50 pounds of
bananas without mentioning the 600-pound gorilla of UAs sitting in the
corner picking its teeth with "Content-Type:".

> See, for example, lynx 2.8. There are closely related WAI issues.

I'd imagine that trying to force-feed a browser content it doesn't grok
would be considered an accessibility issue, yes.

>> However if the attribute was renamed "-moz-override" then I would not
>> be opposed. Something like:
>>
>> Content-Type: text/html; -moz-override=text/xml
>>
>> ...would be ok.
>
>> In fact, we could extend that to be supported for all MIME types as a
>> simple, consistent way of overriding the Content-Type header for
>> Mozilla only. . . .
>
> A useful thing for content providers in connection with any HTML/XHTML
> user agent that has dual parsers. (Single parser agents exist. Maybe
> Mozilla 2?)

Don't bet on it. "Ol' Tag Soup is here to stay, it will never die..."
"Useful" only because the gorilla hasn't quite figured out how to link up
the "Accept:" header with what it displays.

> My intention with "profile=xhtml" was to make it very simple for
> content providers and have it not be Mozilla-specific.
>
> I have no personal objection to "-moz-override". But there is a class
> of content-providers who on principle object to, or by mandate are
> prevented from, writing anything that smacks of being user-agent
> specific.

An interesting perspective, given that the purpose of this whole exercise
is to compensate for the errors of a specific user-agent.

>> > 3 - Support profile=xhtml, by trivially substituting
>> > "text/html; profile=xhtml" with the XHTML mimetype in necko.
>>
>> If this was changed to be outside the IETF namespace (i.e. using the
>> -moz- prefix) then I would not be opposed to this hack. . . .
>
> Do you object to my proposal in www-...@w3.org to have
> "profile=xhtml" added as an HTTP content type parameter for text/html
> as a way of giving content providers additional flexibility?

I object to it on the grounds that it encourages thinking that:
1) Delivering XML content disgused as text/html is a standardized and
useful behavior.
2) Whenever the gorilla belches, we'll convene a committee and declare it
to be music.

This is within the letter of the standards, but it's a one-time piece of
ad-hockery designed to clean up Microsoft's mess. It should feel, look,
and smell like ad-hockery, not be dressed up in a specious song-and-dance
of standards and praised for "flexibility".

> Please remember that we have two types of customers: users and
> content providers.
>

Neither of whom are terribly interested in crawling out of the primorial
tag-soup and breathing air, apparently.

--
Chris Hoess

William F. Hammond

unread,

Dec 19, 2001, 3:30:14 PM12/19/01

to Chris Hoess, mozilla...@mozilla.org

Chris Hoess <cho...@force.stwing.upenn.edu> writes:

> and which it properly rejects. Of course, it isn't about this at all;
> it's making polite conversation to explain away the need for 50 pounds of
> bananas without mentioning the 600-pound gorilla of UAs sitting in the
> corner picking its teeth with "Content-Type:".

That's only a part of what it's about. Please re-read the related
threads.

-- Bill

Chris Hoess

unread,

Dec 19, 2001, 5:40:40 PM12/19/01

to

Right, the other part is forcing markup into browsers which happen to
degrade it in a vaguely respectable manner (and some of which may not).

Chris Hoess

unread,

Dec 19, 2001, 5:42:33 PM12/19/01

to

Er, sorry, accidentally sent before this was finished--I didn't mean to
make a one-shot jab. The archives I've seen so far don't make me feel
better about standardizing this, but I'll post a more extensive followup
later.

--
Chris Hoess

Chris Hoess

unread,

Dec 19, 2001, 6:59:07 PM12/19/01

to

After a review of what seems to be the principal thread, that of this May
in n.p.m.mathml, it appears that what is being called for by "content
providers" is a text/html of cheveril, that stretches from an inch narrow
to an ell broad. The reasons of this seem to be threefold:

1) We can't feed IE any of the xml Content-Types, or it starts to sulk.

This reasons I've dealt with above, and while I highly dislike the idea of
cleaning up after Microsoft's mess, I'm willing to concede some sort of
ad-hockery to work around it if it will get people to deploy XHTML+MathML.
I'm somewhat worried about what will happen when people start tossing
MathML croutons into their pre-existing tag-soup and find that it doesn't
work in Mozilla even after tweaking the Content-Type header (or, for that
matter, what will happen when they start serving documents to IE with
Content-Type: text/html alone, and it doesn't work in Moz), but the fact
that MathML content providers are presumably a notch up on the clue ladder
gives me some hope.

2) Changing MIME-types is hard.

This was brought up in the earlier threads, but is obviously moot in the
context of this discussion.

3) We want to feed XHTML+MathML to older, pre-XML UAs.

This, it seems to me, is where we begin wandering down the primrose path.
(Well, not beginning, really; the W3C kicked things off.) The W3C
included Appendix C on the premises that a certain subset of XHTML was
parseable as HTML. In theory, this would mean that an SGML application
using the HTML 4.01 SGML declaration would use the XHTML 1.0 DTD to parse
and render such XHTML documents. In practice, of course, this isn't true,
but the rational given to accept it was "well, no one really uses SGML
parsers anyway, it's all tag-soup, so XHTML documents and HTML 4.01
documents will appear identical in all browsers". Now, you're suggesting
that serving XHTML+MathML to old text/html (tag-soup) UAs is good, nay,
even desireable, because it "doesn't break too badly in them". Of course,
some (all?) of the formatting will disappear to leave a puddle of goop,
but evidently that's not considered too bad. Well, by that definition,
NOTHING will break badly in a tag-soup parser. After all, they were
"designed" for resilience in the era of proprietary tags; just shrug off
what you don't know and cough the content onto the page. By this logic,
any sort of XML document can be delivered as text/html, as long as there's
a little sprinkling of HTML here and there to hold the goo together.

I realize that, in practice, it may be necessary to serve XHTML+MathML as
text/html;-magic-keyword to work around IE's brokenness, to which I
acquiese. However, the idea that we should be allowed to shove all kinds
of XHTML+(something) at tag-soup browsers is, I believe, antithetical to
the principles behind these recommendations and standards, more dangerous
to accessibility than it is helpful (RISKS Digest will love a
transformation like 2^4 -> 24 when formatting gets lost), and will, given
enough momentum, cause severe problems when XHTML 2.0 rolls around.
text/html;-magic-keyword is ad-hockery and remains ad-hockery; it should
no more recieve a legitimizing gloss than doctype sniffing, residual style
handling, or any of the other hacks we perpetuate to deal with the errors
and fallacies of authors and other user-agents.

--
Chris Hoess

Henri Sivonen

unread,

Dec 20, 2001, 7:03:28 AM12/20/01

to

In article <i7wuzla...@pluto.math.albany.edu>,

ham...@csc.albany.edu (William F. Hammond) wrote:

> Chris Hoess <cho...@force.stwing.upenn.edu> writes:
>
> > unfortunately, with HTML/XML, we have to resort to the more difficult
> > content-negotiation.
>
> Content negotiation is available though more difficult and not always
> reliable because of caches, proxies, and firewalls,

The Apache team has specifically implemented workarounds for those
problems. Do you have evidence that the workarounds don't work?

> The issue is whether or not to give content providers more flexibility
> than they now have.

I think implementing endless DWIM is bad "flexibility". I think it is
reasonable to expect the author to do things right.

I think considering images helps to put the DWIM "flexibility" into
perspective: If an author can't use an image authoring tool (eg.
Photoshop) properly, the images just look bad and browsers don't fix
them. If the author sends images using bogus content types or if the
image files are just corrupt, Mozilla doesn't fix them.

> Remember that providers now under new site mandates to provide
> *accessible* content may no longer use bitmap images nor rely on PDF.

Why would text/html soup would be more accessible? Dumping MathML to an
old browser that doesn't grok it doesn't make the content more
accessible.

--
Henri Sivonen
hen...@clinet.fi
http://www.hut.fi/u/hsivonen/

Henri Sivonen

unread,

Dec 20, 2001, 7:33:29 AM12/20/01

to

In article <3C1CDD53...@maths.uq.edu.au>, "Roger B. Sidje"

<r...@maths.uq.edu.au> wrote:

Chris Hoess already said a number of insightful things. "Metoo" to those.

> Option 1: Live with the XML/XHTML mimetype as is
> ================================================
> Pros:
> -----
> - Supported in default configuration in Apache for .xml and .xhtml (and
> .xht?) files.

- Straight-forward: promotes the implementation of
interoperable XML systems.
- Doesn't put well-formedness in jeopardy: promotes the
implementation of interoperable XML systems.

> Cons:
> -----
> - Discourage users from producing well-formed XHTML pages because it
> is not yet supported in other browsers (e.g., it prompts "Save As..."
> with no means to intercept it and to customize the prompt in a persuasive
> way that could entice users to upgrade on the spot).

The front page of a document collection could still be mathless
text/html that explains the requirements.

> - Stop XHTML from seriously taking off.

Is XHTML taking off a value in itself? If one is authoring pages with
XHTML 1.0 only (without math), plain HTML 4 would do just as well. Why
should people adopt "new technology" when there are no tangible new
benefits?

(When you start adding the new benefits [MathML, SVG etc. extensions],
the result is no longer compatible with old tag slurpers.)

> Option 2: Variable Content-Types (or files) depending on the browser
> ====================================================================
> Pros:
> -----
> - Workable according to the standard rules of content-based negotiation.
> Cons:
> -----
> - Need latest Apache servers,

No, the functionality has been around quite some time. Anyway, everyone
should be using a sufficiently recent version because of other things
(security).

> - Need knowledgeable users (e.g., to set symbolic links, and to
> set the .var configuration file with the appropriate q=)

Not necessarily. A ready-made helper script can be used as a replacement
for knowledge/skill. :-)

> Option 3: XHTML DOCTYPE look-ahead (was "sniffing")
> ===================================================
> Pros:
> -----
> - Allow authors to produce pages that are backward-compatible as HTML
> in older browsers, and yet forward-compatible as XHTML in newer browsers.

I'm tempted to quote Jef raskin:
"Any time 'backwards-compatibility' is given as a reason for doing
something, you may ignore the word 'compatibility'."

MathML is useless in old browsers.

> - Protect XHTML from degenerating.

XHTML as text/html is already souped.

> - There is a similar feature w.r.t. the charset. A document can set an
> internal charset to supersede the charset parameter that may come from
> the Content-Type.

The http-equiv stuff isn't elegant at all. Rather, it could be used as a
reason against sniffing.

> Cons:

> - Users may have to override the default configuration in Apache servers
> to set the mimetype of .xhtml (and .xht) files to "text/html".

That's their fault, then. Mozilla just can't start guessing whether a
particular server (mis)configuration was intentional.

> Possible Resolutions:
> =====================
> 1 - Keep the status quo and do nothing. This is equivalent to be
> content with the XML/XHTML mimetype, and let users do content-based
> negotiation, e.g., by serving different files depending on the browser.
> 2 - Implement a XHTML DOCTYPE look-ahead ("sniffing").
> 3 - Support profile=xhtml, by trivially substituting
> "text/html; profile=xhtml" with the XHTML mimetype in necko.
>
> Your vote?

Resolution 1.

Henri Sivonen

unread,

Dec 20, 2001, 7:43:00 AM12/20/01

to

In article <slrna22al8...@force.stwing.upenn.edu>, Chris Hoess
<cho...@force.stwing.upenn.edu> wrote:

> I realize that, in practice, it may be necessary to serve XHTML+MathML as
> text/html;-magic-keyword to work around IE's brokenness,

It isn't necessarily necessary. The market share of Windows IE + plus
some plug-in is negligible. MathML content providers will want to
support Mozilla. Surely MathML content providers can run a script that
sets up the .var files accordingly if they are "a notch up on the clue
ladder" and could figure out how to put a different kind of magic
keyword in the Content-Type header.

Ian Hickson

unread,

Dec 20, 2001, 11:44:32 AM12/20/01

to William F. Hammond, mozilla...@mozilla.org, Roger B. Sidje

William F. Hammond wrote:
>

> The "correct" types (application/xhtml+xml or maybe text/xml) block
> display in old user agents.

Every time we have this discussion I have trouble understanding this
argument.

The only other UA that matters is IE6, no? All other UAs will choke on
MathML regardless of what happens. No?

If you are targetting the XHTML+MathML documents at old browsers,
wouldn't embedding the MathML using <object> make much more sense?

> A useful thing for content providers in connection with any HTML/XHTML
> user agent that has dual parsers. (Single parser agents exist. Maybe
> Mozilla 2?)

Eh?

> My intention with "profile=xhtml" was to make it very simple for
> content providers and have it not be Mozilla-specific.

The non-Mozilla-specific answer is "text/xml".

> I have no personal objection to "-moz-override". But there is a class
> of content-providers who on principle object to, or by mandate are
> prevented from, writing anything that smacks of being user-agent
> specific.

Then they should use "text/xml".

> Do you object to my proposal in www-...@w3.org to have
> "profile=xhtml" added as an HTTP content type parameter for text/html
> as a way of giving content providers additional flexibility?

Yes. It is pointless. There are several classes of documents:

Tag soup: Send as text/html.

HTML: Compatible with tag soup. Send as text/html.

XHTML targetted at all browsers: Silly. Use HTML. (The "oh but that
means we can't use XML tools" argument is stupid, as mapping XHTML to
HTML is trivial.)

XHTML + MathML/SVG/XUL targetted at all browsers: Silly. Use HTML.

XHTML + MathML/SVG/XUL targetted at new browsers: Send as text/xml.

Unfortunately a bug in IE6 means that it doesn't correctly support XHTML
sent as text/xml. This is therefore the only problem. The solution to
this problem is to fix IE. If Microsoft won't fix their browser then
content providers should use browser sniffing.

I'm sorry if this post sounds arrogant -- this is a very old argument
and I'm getting tired of it.

As a temporary measure I would not be opposed to letting a blatently
Mozilla-specific extension be implemented that would force us into XML
parsing mode even if technically we should use the tag soup parser.

Content-Type: text/html;-moz-override=text/xml

Just imagine it as a new MIME type.

I am *STRONGLY* against adding poluting the standards with features
which are purely there in order to work around long standing bugs in
specific browsers, incompetent server administrators, and other
political problems. Changing the standards doesn't solve these problems,
it merely pushes them under the carpet.

--
Ian Hickson

Ian Hickson

unread,

Dec 20, 2001, 12:01:16 PM12/20/01

to Henri Sivonen

Henri Sivonen wrote:
>

> Is XHTML taking off a value in itself? If one is authoring pages with
> XHTML 1.0 only (without math), plain HTML 4 would do just as well. Why
> should people adopt "new technology" when there are no tangible new
> benefits?
>
> (When you start adding the new benefits [MathML, SVG etc. extensions],
> the result is no longer compatible with old tag slurpers.)

Hear hear!

--
Ian Hickson

Chris Hoess

unread,

Dec 20, 2001, 2:34:57 PM12/20/01

to

In article <3C221570...@hixie.ch>, Ian Hickson wrote:
>
> Unfortunately a bug in IE6 means that it doesn't correctly support XHTML
> sent as text/xml. This is therefore the only problem. The solution to
> this problem is to fix IE. If Microsoft won't fix their browser then
> content providers should use browser sniffing.

As David points out in the other thread, there's a workaround for this;
create a 1-line XSLT identity transform, reference that as an
XML-stylesheet in the XHTML-MathML document, and IE will apparently turn
it into HTML and work just fine.

> I'm sorry if this post sounds arrogant -- this is a very old argument
> and I'm getting tired of it.
>
> As a temporary measure I would not be opposed to letting a blatently
> Mozilla-specific extension be implemented that would force us into XML
> parsing mode even if technically we should use the tag soup parser.
>
> Content-Type: text/html;-moz-override=text/xml
>
> Just imagine it as a new MIME type.
>
> I am *STRONGLY* against adding poluting the standards with features
> which are purely there in order to work around long standing bugs in
> specific browsers, incompetent server administrators, and other
> political problems. Changing the standards doesn't solve these problems,
> it merely pushes them under the carpet.

Actually, in light of David's workaround (which I was completely unaware
of up until now), I no longer think we should implement this. If the
workaround allows content providers to serve both IE and Mozilla (oh,
Amaya does MathML, yes?) XHTML+MathML with an XML MIME-type, then that is
what should be done. It will deliver content correctly identified by
MIME-type to those browsers that accept XML, and it will not deliver it to
tag-soup user agents that will dissolve it. I consider it very important
not to encourage the idea that it is permissible to dump any sort of
markup into tag-soup (text/html) user agents as long as there's a little
HTML to produce formatting. After all, XML essentially dumbed down SGML
to a level commensurate with the sort of broken algorithms used by
tag-soup parsers; I'd find it highly unlikely that any given piece of XML
would cause a tag-soup parser to fail spectacularly. That doesn't mean
that the breakdown of XML documents in such parsers is acceptable,
however.

--
Chris Hoess

us...@domain.invalid

unread,

Dec 20, 2001, 3:26:08 PM12/20/01

to

Roger B. Sidje wrote:

[ A bunch of interesting stuff...]

With all due respect (and a lot of it, to be sure), I'm not
sure the "Right" question is being asked. It seems to me
that we will all be better served in the long term if we
base the approach on some `objective definitions'

Regarding XHTML+MathML+SVG (+ XNewBuzzML)

Is it XML ? (hence mime text/xml or app/xml) Obviously yes.
If a UA recognizes */xml and attempts to display it, it
must either display a generic tree, or "sniff" (at least
with the fore-knowledge that it _is_ xml) the DOCTYPE,
and probably the stylesheet (unless the DOCTYPE says XHTML,
or another known doctype).

The _real_ question (imho): Is it _ALSO_ HTML
--- or close enough? (and hence text/html)

I dont think that anyone maintains that any "Random UA" should
_abort_ displaying an HTML page if it encounters a DOCTYPE that it
doesn't recognize? DOCTYPEs in HTML have been too optional for
too long; It's too late for that.

Otherwise, RandomUA, fed what is claimed to be text/html
(presumably free to ignore any trailing "profile"), can rightfully
expect to understand it (modulo tag soup, random proprietary
tags (from the _other_ proprietor), etc). In that context,
"profile" is just an efficiency hack to warn some UAs about
which DOCTYPE they will eventually find, if they care.

One could perhaps argue that pure XHTML is close enough to
HTML (ignoring empty element problems, etc) for this to work.
One might even claim that carefully crafted[*] XHTML
+(presentation)MathML would, at least be somewhat readable
and hint at the content. Beyond that, with more general
(content) MathML, or SVG, or the next great XNewBuzzML,
you're more and more likely to end up with garbage.
At that point, you will have prefered the "Save As" option.

So, I end up concluding that we eventually will have to serve
as something/xml, and the really real question is "Do we want
to do it now, or later?"

Thanks for considering these thoughts.
bruce miller (bruce.miller @ nist.gov)

[*] "carefully crafted" to degrade gracefully, possible at the
expense of being good MathML (?)

Chris Hoess

unread,

Dec 20, 2001, 3:55:17 PM12/20/01

to

In article <3C22496...@domain.invalid>, us...@domain.invalid wrote:

[snip a nice discussion on finding the dividing line between */xml and
text/html]

>
> So, I end up concluding that we eventually will have to serve
> as something/xml, and the really real question is "Do we want
> to do it now, or later?"

You have cut to the heart of the issue. I would argue that the loss of
formatting inherent in a text/html processor is sufficient loss to
discourage serving it MathML. Furthermore, since most XML applications
are likely to encapsulate human-readable text in a similar matter (SVG is
an exception here), I would argue that the proposition that it should be
permissible to feed MathML to text/html processors is equivalent to the
proposition that it should be permissible, in general, to feed XML to
text/html processors, which I consider unacceptable. Opinions, of course,
seem to vary on this.

> Thanks for considering these thoughts.
> bruce miller (bruce.miller @ nist.gov)

--
Chris Hoess

William F. Hammond

unread,

Dec 20, 2001, 7:36:59 PM12/20/01

to mozilla...@mozilla.org

Ian Hickson <i...@hixie.ch> writes:

> > The "correct" types (application/xhtml+xml or maybe text/xml) block
> > display in old user agents.

> Every time we have this discussion I have trouble understanding this
> argument.

For example, one can use the XHTML Ruby module with MathML, embedding
a math element in a ruby base and using the ruby text to amplify it in
various ways, i.e., with inline images, anchors to images, or TeX
source.

The ruby text portion should be entirely visible in a legacy text/html
agent (checked with NS 4.5 and lynx). (Amaya 5.2, 29 Oct 2001, does
MathML but not ruby.) I cannot check up on the ruby status of Mozilla
right now.

In a user agent that handles ruby, the ruby text portion (everything
except the MathML) should disappear because "ruby parentheses" are
used.

Not that anyone here actually needs to look, but I've done up a demo
that validates against the Carlisle/Altheim DTD for XHTML 1.1 plus
MathML 2.0.

It is anchored off the GELLMU Veteran's Page where one can choose
between text/html and text/xml (and also "basic" GELLMU source):
http://www.albany.edu/~hammond/gellmu/veterans.html .

Users with old agents get the content and also realize that their
agents are dated. Some content providers may not want to follow this
route, but those who do should be free to do so.

-- Bill

P.S. Apparently I've not previously mentioned this here, but I did
mention it in www-...@w3.org in August.

Bruno....@laposte.net

unread,

Dec 21, 2001, 3:41:42 AM12/21/01

to

CERT® Advisory CA-2001-36 Microsoft Internet Explorer Does Not Respect
Content-Disposition and Content-Type MIME Headers

"Web pages and HTML email messages usually contain HTML text, but other
files may also be included. The MIME headers Content-Disposition and
Content-Type provide the information needed by the HTML rendering
software to determine the type of these files. In Microsoft Internet
Explorer, these MIME headers are consulted when evaluating whether to
process an embedded file, but they are ignored when the file is actually
processed."

http://www.cert.org/advisories/CA-2001-36.html

OK. This is not directly related to HTML/XHTML, but maybe IE will
consider mime-type more closely next time ...

David Carlisle

unread,

Dec 21, 2001, 5:07:14 AM12/21/01

to mozilla...@mozilla.org

> I dont think that anyone maintains that any "Random UA" should
> _abort_ displaying an HTML page if it encounters a DOCTYPE that it
> doesn't recognize? DOCTYPEs in HTML have been too optional for
> too long; It's too late for that.

I agree that special casing on a fixed set of doctypes is just wrong.
However there are constructs that an HTML agent can know for certain are
not HTML,
in particular the xml declaration and xml-stylesheet PI (in theory these
could be DGML PIs that just happen to end with ? but I dubt there is a
single real example of that so these are I claim reliable indicators of
XML.

I do think it reasonable that an HTML agent, if it detects that it has
been given XML bails out and hands its input to an XML agent.

Bruce Miller

unread,

Dec 21, 2001, 8:57:54 AM12/21/01

to

[sorry about the us...@domain.invalid, before...]

David Carlisle wrote:

>>I dont think that anyone maintains that any "Random UA" should
>>_abort_ displaying an HTML page if it encounters a DOCTYPE that it
>>doesn't recognize? DOCTYPEs in HTML have been too optional for
>>too long; It's too late for that.
>>
>
> I agree that special casing on a fixed set of doctypes is just wrong.
> However there are constructs that an HTML agent can know for certain are
> not HTML,
> in particular the xml declaration and xml-stylesheet PI (in theory these
> could be DGML PIs that just happen to end with ? but I dubt there is a
> single real example of that so these are I claim reliable indicators of
> XML.

For new agents, possibly; but even <?...> constructs _could_ show up
in broken situations (unprocessed php, eg), so it's also not unreasonable
(in the tag-soup frame of mind) to ignore these; and existing agents
will, anyway.

> I do think it reasonable that an HTML agent, if it detects that it has
> been given XML bails out and hands its input to an XML agent.

It's not unreasonble to _try_ to parse as XML. But if the xml parse fails,
should it perhaps fall back _again_ to tag-soup processing? I guess this
depends on how far the html processor should go to maintain the current
"any text is acceptable as html" approach (which has its populist benefits).

Another reason to draw the line between html & xml :>
I wouldn't suggest that something _claimed_ to be xml (text/xml) should be
re-re-parsed.

William F. Hammond

unread,

Dec 21, 2001, 9:47:34 AM12/21/01

to mozilla...@mozilla.org

Last night I wrote:

> In a user agent that handles ruby, the ruby text portion (everything
> except the MathML) should disappear because "ruby parentheses" are
> used.

Incorrect: the contents of ruby parentheses are what should disappear
in a user agent supporting the ruby module.

> Not that anyone here actually needs to look, but I've done up a demo
> that validates against the Carlisle/Altheim DTD for XHTML 1.1 plus
> MathML 2.0.
>
> It is anchored off the GELLMU Veteran's Page where one can choose
> between text/html and text/xml (and also "basic" GELLMU source):
> http://www.albany.edu/~hammond/gellmu/veterans.html .

I notice that Mozilla 0.9.6, MathML+SVG, public build 2001112310,
appears not to recognize ruby. But one can hide ruby parentheses
(which are allowed to contain only pcdata) using CSS.

The ruby spec says that the manner of display for ruby is not part
of the specification. Regardless, the example is not optimally
ruby-tuned yet.

Questions:

1. Is it correct that one should be able to use CSS to override a
ruby-supporting user agent's default rendering?

2. Is it sensible for an author to pass over ruby display-related
attributes in favor of CSS?

-- Bill