Examples

Alberto Manuel Brandao Simoes

unread,

Nov 28, 2001, 10:50:43 AM11/28/01

to mozilla...@mozilla.org

Hello!

I've compiled mozilla cvs head and it renders very well the mathml examples
from the mathml-mozilla page. On the other hand, if you visit

http://www.w3.org/Math/testsuite/Content/BasicContentElements/lambda/rec-lambda1.xhtml

One of the w3c mathml testsuite, it doesn't render a thing. Thanks for any
ideas/help.

Alberto
--
f u cn rd ths, u cn gt a gd jb n cmptr prgrmmng.

Henri Sivonen

unread,

Nov 29, 2001, 2:04:04 AM11/29/01

to

In article <2001112815...@alfarrabio.di.uminho.pt>,
al...@alfarrabio.di.uminho.pt wrote:

> http://www.w3.org/Math/testsuite/Content/BasicContentElements/lambda/rec-l

> ambda1.xhtml
>
> One of the w3c mathml testsuite, it doesn't render a thing. Thanks for
> any ideas/help.

They are serving it as text/html. That's bad. They should be serving it
with an XML content type.

--
Henri Sivonen
hen...@clinet.fi
http://www.clinet.fi/~henris/

Robert Miner

unread,

Nov 29, 2001, 11:03:11 AM11/29/01

to hen...@clinet.fi, mozilla...@mozilla.org

Hi.

> They are serving it as text/html. That's bad. They should be serving it
> with an XML content type.

Get used to it, since this is going to happen a million times over all
over the Web, and will be an enormous headache for all of us for years
to come. This is why in the past I have strenuously argued that while
using the MIME type exclusively to detect XML may be the theoretically
most appealing position, but for the average end user (and tech
support departments everywhere) it is a terrible mistake.

--Robert

------------------------------------------------------------------
Robert Miner Rob...@dessci.com
MathML 2.0 Specification Co-editor 651-223-2883
Design Science, Inc. "How Science Communicates" www.dessci.com
------------------------------------------------------------------

Henri Sivonen

unread,

Nov 29, 2001, 2:29:17 PM11/29/01

to

In article <2001112916...@wisdom.geomtech.com>,
Rob...@dessci.com (Robert Miner) wrote:

> > They are serving it as text/html. That's bad. They should be serving it
> > with an XML content type.
>
> Get used to it, since this is going to happen a million times over all
> over the Web, and will be an enormous headache for all of us for years
> to come. This is why in the past I have strenuously argued that while
> using the MIME type exclusively to detect XML may be the theoretically
> most appealing position, but for the average end user (and tech
> support departments everywhere) it is a terrible mistake.

But the W3C isn't an average user. And even for others, putting
AddType application/xhtml+xml;charset=utf-8 xhtml
in a .htaccess isn't that difficult. After all, one wouldn't serve
images in a PNG test suite as image/gif.

The MathML test suite isn't following what the W3C is saying elsewhere.
The HTML WG has said that browsers should parse text/html as HTML and
shouldn't attempt to guess whether a particular document sent as
text/html is actually parseable as XML.[1] Also, second guessing the
content type is considered harmful in the CUAP Note.[2]

[1] http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
[2] http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-no-override-ct

William F. Hammond

unread,

Nov 29, 2001, 4:38:35 PM11/29/01

to mozilla...@mozilla.org

Henri Sivonen <hen...@clinet.fi> writes:

> AddType application/xhtml+xml;charset=utf-8 xhtml

Isn't that type a proposal in an internet draft of uncertain status?

> The MathML test suite isn't following what the W3C is saying elsewhere.
> The HTML WG has said that browsers should parse text/html as HTML and
> shouldn't attempt to guess whether a particular document sent as
> text/html is actually parseable as XML.[1] Also, second guessing the
> content type is considered harmful in the CUAP Note.[2]

> [1] http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
> [2] http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-no-override-ct

The CUAP note [2] (which itself is fine) is not relevant in this context
so long as the meaning of the content-type "text/html" is at issue.

The September 2000 Pemberton comment [1] was rebutted by W3C's Dan
Connolly (overseer of the work of the HTML WG) who questions the use
of the term "sniffing", i.e., guessing from http body content, here
saying [3]:

I don't know what you mean by "sniffing"; it is a simple
computation to distinguish conforming XHTML documents
from conforming HTML 4.01, HTML 4.0, HTML 3.2, and HTML 2.0
documents: the latter begin with doctype declarations with
well-known FPIs and URIs, e.g.:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

Much more recently Pemberton [4] appears to have lined up with Connolly:

The definition of what text/html means is defined not by XHTML 1.0,
but by RFC 2854. The reason that Appendix C is informative is because
if you follow those guidelines, you may find that your documents will
work with user agents that accept text/html. It does not define
conformance, nor are the guidelines mandatory: you don't have to
follow them, but if you don't, don't expect your document to work on
old user agents.

Dan Connolly is one of the two authors of RFC 2854.

[1] http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
[2] http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-no-override-ct

[3] http://lists.w3.org/Archives/Public/www-html/2000Sep/0025.html
[4] http://lists.w3.org/Archives/Public/www-html/2001Oct/0067.html

-- Bill

Roger B. Sidje

unread,

Nov 29, 2001, 6:52:36 PM11/29/01

to Henri Sivonen

Henri Sivonen wrote:
>
> And even for others, putting
> AddType application/xhtml+xml;charset=utf-8 xhtml
> in a .htaccess isn't that difficult.

That brings back to the problem that it will prompt "Save As..." in other
browsers. The way things are currently doesn't allow degrading gracefully
in other browsers (e.g., a la <noframes>...download Mozilla...</noframes>).

Along the lines of overriding from the source (content) side, what if
Mozilla could support overriding via the <meta> element:

<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<title>Hello XHTML World</title>
</head>
<body>
...
</body>

Of course, not surprisingly, I just tested it and it didn't work.

In a sense, the fact that it didn't work is good news as it means
that it won't break in other browsers.

But the bad news is that it is suffering from the general problem that is
preventing content sniffing in general: that the parser cannot change from
one dialect to another half-way while parsing. To get the <meta> means that
the parser would have already started in HTML.

I could confirm the problem even with content="text/plain; charset=utf-8".
The document wouldn't render as text/plain in Mozilla (nor in IE, BTW).

The reason why I am alluding to the <meta> element is that Mozilla *already* has
a charset sniffing code, so that it is already parsing the <meta http-equiv=".">,
and does a full blown reload if the charset is different.

I had a quick look to see what it might take to leverage that to sniff the mimetype.
It looked pretty complex and besides, I am not even sure if people would prefer
sniffing the <meta http-equiv="."> instead of the <!DOCTYPE> ?
---
RBS

William F. Hammond

unread,

Nov 29, 2001, 8:46:38 PM11/29/01

to mozilla...@mozilla.org

"Roger B. Sidje" <r...@maths.uq.edu.au> writes:

> The reason why I am alluding to the <meta> element is that Mozilla
> *already* has a charset sniffing code, so that it is already parsing

Aha! But doesn't that induce the "performance hit" of last spring's
discussion?

> I had a quick look to see what it might take to leverage that to sniff
> the mimetype. It looked pretty complex and besides, I am not even
> sure if people would prefer sniffing the <meta http-equiv="."> instead
> of the <!DOCTYPE> ?

As I understand things, the various w3c specs for html preclude
miscellaneous junk such as comments and pi's (that would not be
precluded by sgml) in front of <!DOCTYPE ... for HTML 2.0, 3.2, 4.0,
and 4.01. For each of these it is
<!DOCTYPE HTML PUBLIC {FPI from a very limited list}
where the second field is case-insensitive and the FPI spec begins
"DTD HTML"

In the XHTML series the <!DOCTYPE may be preceded only by an xml
declaration and one has <!DOCTYPE html PUBLIC {FPI} {URI}>
where the second field is case sensitive and so far the FPI spec
begins "DTD XHTML".

All of this is up front and should be much faster to find than
any meta.

As I've said before, a w3c spec that was a little bit more explicit
about _very_ _quickly_ being able to distinguish hard ball from soft
ball would be a big help. Make hard ball tough to play. All the
rest may be handled as tag soup.

For example, can we depend on "DTD XHTML" always being there?

The distinction needs to be so brutally and immediately clear that
nobody will be tempted to keep using the term "sniffing".

-- Bill

Henri Sivonen

unread,

Nov 30, 2001, 3:45:31 AM11/30/01

to

In article <i7667tb...@pluto.math.albany.edu>,

ham...@csc.albany.edu (William F. Hammond) wrote:

> > AddType application/xhtml+xml;charset=utf-8 xhtml
>
> Isn't that type a proposal in an internet draft of uncertain status?

Yes, but the point was that adding a content type the server
configuration is not hard. In fact, sufficiently recent versions of
Apache associate the extension xml with the type text/xml by default. So
the point about being unable to serve docs using an XML content type is
moot.

> The CUAP note [2] (which itself is fine) is not relevant in this context
> so long as the meaning of the content-type "text/html" is at issue.
>
> The September 2000 Pemberton comment [1] was rebutted by W3C's Dan
> Connolly (overseer of the work of the HTML WG) who questions the use
> of the term "sniffing", i.e., guessing from http body content, here
> saying [3]:

What he appears to mean is exactly what is known as "doctype sniffing"
among Mozilla developers.

> Much more recently Pemberton [4] appears to have lined up with Connolly:

However, the messages you are referring to are about XHTML-only XHTML.
Still, there is no Recommendation that says that it is OK to serve
whatever XML document such as ones containing MathML or SVG markup as
text/html. Since documents

> [1] http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
> [2] http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-no-override-ct
> [3] http://lists.w3.org/Archives/Public/www-html/2000Sep/0025.html
> [4] http://lists.w3.org/Archives/Public/www-html/2001Oct/0067.html

--

Henri Sivonen

unread,

Nov 30, 2001, 4:14:34 AM11/30/01

to

In article <3C06CA44...@maths.uq.edu.au>, "Roger B. Sidje"
<r...@maths.uq.edu.au> wrote:

> Henri Sivonen wrote:
> >
> > And even for others, putting
> > AddType application/xhtml+xml;charset=utf-8 xhtml
> > in a .htaccess isn't that difficult.
>
> That brings back to the problem that it will prompt "Save As..." in other
> browsers. The way things are currently doesn't allow degrading gracefully
> in other browsers (e.g., a la <noframes>...download
> Mozilla...</noframes>).

There *is* a method that allows graceful degradation: Accept
header-based content negotiation. It even works without scripting, so it
can be used in environments where the admin hasn't permitted the use of
Perl or PHP or when the author doesn't know how to write in Perl or PHP.

Test case:
http://www.hut.fi/~hsivonen/test/multitype/test

The .var file looks like this:
-----8<-----
URI: test

URI: test.xhtml
Content-type: application/xhtml+xml;q=0.95

URI: test.xml
Content-type: text/xml;q=0.95

URI: test.html
Content-type: text/html;q=1
-----8<-----

I'm not the server admin and I didn't need to contact an administrator
in order to set up the test case.

> Along the lines of overriding from the source (content) side, what if
> Mozilla could support overriding via the <meta> element:

The question is: Why would anyone want to serve XHTML 1.1 plus MathML
2.0 content as text/html?

I know about the following three cases, none of which require changes to
Mozilla.

Case 1:
"Configuring the server is too difficult."

No it isn't. Adding a type association is a one-liner in a .htaccess
file. Sufficiently recent server version even associate the extension
.xml with the type text/xml by default. (And everyone should be using a
sufficiently recent version, because earlier versions tend to have known
security holes.)

Case 2:
"It shows 'Save As...' in old browsers."

Is "Save As..." really a big problem? Anyway, it can be worked around by
using content negotiation as described above.

Case 3:
"I want to serve text/html to IE, because under no circumstances can I
let IE users know that their browser can't handle XHTML as text/xml."

This case, too, can be dealt with. The same content negotiation approach
works here, too.

Of course, none of these should be a problem for the W3C MathML test
suite.

Roger B. Sidje

unread,

Nov 30, 2001, 5:22:05 AM11/30/01

to William F. Hammond, mozilla...@mozilla.org

"William F. Hammond" wrote:
>
> "Roger B. Sidje" <r...@maths.uq.edu.au> writes:
>
> > The reason why I am alluding to the <meta> element is that Mozilla
> > *already* has a charset sniffing code, so that it is already parsing
>
> Aha! But doesn't that induce the "performance hit" of last spring's
> discussion?

The way it is implemented doesn't, and that's one of the main reason why its
infrastructure isn't that simple (c.f. bug 81253 which brushes on the issues
a little bit).

> As I understand things, the various w3c specs for html preclude
> miscellaneous junk such as comments and pi's (that would not be
> precluded by sgml) in front of <!DOCTYPE ... for HTML 2.0, 3.2, 4.0,
> and 4.01. For each of these it is
> <!DOCTYPE HTML PUBLIC {FPI from a very limited list}
> where the second field is case-insensitive and the FPI spec begins
> "DTD HTML"
>
> In the XHTML series the <!DOCTYPE may be preceded only by an xml
> declaration and one has <!DOCTYPE html PUBLIC {FPI} {URI}>
> where the second field is case sensitive and so far the FPI spec
> begins "DTD XHTML".
>
> All of this is up front and should be much faster to find than
> any meta.

Either way looks convoluted, and unless I am much mistaken in my
reading of the code, sniffing the <!DOCTYPE> might involve as
much gymnastics as that used in the charset.

Here is what I could gather:
1) Once the document is started to be fetched, the networking code
(necko) determines the content-type using a limited (but fast) parsing
on the request header. Not all documents are text/[x]html. In fact
at startup, many are chrome stuff coming from the chrome registry
as RDF datasources (text/rdf).

2) Once the content-type is known (right in the request header), a
variety of objects are created in succession (the function from where
this fans out is OnStartRequest()) in anticipation of the data that
will flow in; examples of objects created on the way include, the
content sink, the document viewer, etc., and then, when real data
arrives, the network just passes it to whoever registers that it
wants to be notified when data is available (the function from where
this fans out is OnDataAvaiable()).

3) To sniff the <DOCTYPE>:
a) Wait until there is data, and sniff the DOCTYPE before kicking off the
process I described above. This _is_ a "performance hit" because it affects
everything. The pipeline has to be stopped until the <DOCTYPE> is known.
Since there is no way to know in advance if the <DOCTYPE> will be there,
the freeze would have been done for nothing if the <DOCTYPE> isn't there.
Pretty much hard to sell...

b) Do what the charset sniffing is doing, i.e., assume HTML and let things go,
then sniff the <!DOCTYPE> if it is there, and if it says something else, re-start
afresh because a different content-type is likely going to need a different
chain of objects. If there is no <DOCTYPE>, there is no penalty. A major
difference is that the charset can re-use the same objects. HTML is HTML
whether in utf-8 or iso-8859-1... So I suspect there could be a further level
of difficulty in switching the content-type on the fly this way.

4) On balance, since sniffing the <DOCTYPE> and the <meta> seem to need the same
gymnastics, I wonder what it might take to try to leverage the existing code of
the charset to also sniff the earlier string that is skipped in the <meta> when
looking up the charset ?!

For the anti-extension and anti-DOCTYPE sniffing (bugs 67646 and bug 109837), the
answer might already be clear. Forget about all this.

> As I've said before, a w3c spec that was a little bit more explicit
> about _very_ _quickly_ being able to distinguish hard ball from soft
> ball would be a big help. Make hard ball tough to play. All the
> rest may be handled as tag soup.

Unfortunately, that's the grimed picture of the current reality.
---
RBS

Roger B. Sidje

unread,

Nov 30, 2001, 5:40:48 AM11/30/01

to mozilla...@mozilla.org

Henri Sivonen wrote:
>
> Test case:
> http://www.hut.fi/~hsivonen/test/multitype/test
>
> The .var file looks like this:
> -----8<-----
> URI: test
>
> URI: test.xhtml
> Content-type: application/xhtml+xml;q=0.95
>
> URI: test.xml
> Content-type: text/xml;q=0.95
>
> URI: test.html
> Content-type: text/html;q=1
> -----8<-----

Interesting -- apart from the need to maintain what amounts to
be multiple websites with the same files (picturing these files
as seating in different servers and served in different browsers).

Any simple trick how to serve the same file? (Hixie had a script
in bug 67646 - http://software.hixie.ch/utilities/cgi/xhtml-for-ie/).
---
RBS

Henri Sivonen

unread,

Nov 30, 2001, 10:33:48 AM11/30/01

to

In article <3C076230...@maths.uq.edu.au>, r...@maths.uq.edu.au
(Roger B. Sidje) wrote:

> Henri Sivonen wrote:
> >
> > Test case:
> > http://www.hut.fi/~hsivonen/test/multitype/test
> >
> > The .var file looks like this:

> Interesting -- apart from the need to maintain what amounts to

> be multiple websites with the same files

The alternative files can be symlinks to the main files if the author
just wants to (for whatever reason) dump the XML content as text/html to
IE.

> Any simple trick how to serve the same file?

Here's a script that sets up .html symlinks and .var typemaps for all
.xml files in the directory where the script is run. I think using this
script would make maintenance very easy. It would just be a matter of
running the script whenever a .xml file is added.

----8<-----
#!/usr/bin/perl -w

@files = `ls -1`;
foreach(@files) {
chomp;
if(/^(.*)\.xml$/) {
$stem = $1;
symlink("$stem.xml", "$stem.html");
open(OUT, "> $stem.var")
or die ("Can't open $stem.var for output, $!\n");
print OUT "URI: $stem\n\n";
print OUT "URI: $stem.xml\n";
print OUT "Content-type: text/xml;q=0.99\n\n";
print OUT "URI: $stem.html\n";
print OUT "Content-type: text/html;q=1\n";
close(OUT);
chmod(0644, "$stem.var");
}
}
----8<-----

A MultiViews-based solution would be more elegant, but, AFAIK, can't be
configured at the .htaccess level.

Robert Miner

unread,

Nov 30, 2001, 11:08:21 AM11/30/01

to hen...@clinet.fi, mozilla...@mozilla.org

Hi.

> The question is: Why would anyone want to serve XHTML 1.1 plus MathML
> 2.0 content as text/html?
>
> I know about the following three cases, none of which require changes to

> Mozilla. [...snip...]

My personal view is that this backwards. The real question to me is:

Why would anyone want to have to think about this, learn a bunch of
arcane stuff that keeps changing, and do something special to put
their documents online, however simple?

It's like the famous story about Edison charging Ford $5000 for
showing him how to use a chalk mark and a strobe to set the timing of
an engine. The fee consisted of $1 to make the mark, and $4999 for
knowing to make the mark.

Of course, this doesn't apply to the contributors to this thread who
can already quote IETF documents from memory and can tell you what all
the latest Apache config tricks are. But it applies with a vengeance
to the other 99% of web users.

As I've said before, I'm don't particularly dispute the arguments
folks put forward as to the theoretical supremacy of the MIME only
scheme. I basically buy them to a point. However, that is totally
orthogonal to the fact that it makes for a bad user/author experience,
and that just doesn't get any mind share in the debate. The only
reason I made the original posting is that just once I would like to
see that acknowledged.

It's fine for you to say, "but it's just one simple line in your
Apache .htaccess file!" But for our average customer -- say a
highschool teacher whose knowledge of the Web stops with what buttons
are on the FrontPage toolbar -- you might as well be speaking, well,
Finnish ;-) Apache what? MIME who? ht huh? ...

Of course I can fix the W3C MathML test suite. But the fact that I
have to look into what to do and then take some special action means
it has to go on the "to do" list, and so it won't happen for a while.
In the meantime, maybe I should just suggest people use "Save As" to
make a local copy of MathML pages and rename the documents with an XML
extension. Really, it's no hassle -- it just takes a second or two
and requires almost no special knowledge, etc, etc ... :-)

William F. Hammond

unread,

Nov 30, 2001, 12:17:38 PM11/30/01

to mozilla...@mozilla.org

Under the hypothesis that the DOCTYPE declaration can be preceded
only by an optional xml declaration (and _no_ comments) and that
the FPI spec of _all_ xhtml served as text/html must begin with
"-//W3C//DTD XHTML" (xhtml is W3C's baby, after all)
couldn't a small piece of DOCTYPE checking code be run in a
flash _before_ any parser is launched?

Or is the point that Mozilla loads the parser before any data is seen?
How serious a delay can there be between the http header for a
text/html object and the top of its data stream. If there is a
follow-up action to see data in a keep-alive connection, would it be
harmful to ask for more before the http header is digested?

N.B. This is not just about math. There are other namespace
extensions.

-- Bill

Roger B. Sidje

unread,

Nov 30, 2001, 3:07:54 PM11/30/01

to William F. Hammond, mozilla...@mozilla.org

"William F. Hammond" wrote:
>
> Or is the point that Mozilla loads the parser before any data is seen?

Yes, that's the point. The point is not really about the _amount_ of
data that may or may not precede the <DOCTYPE>. Rather, the whole
architecture involved here is event-based. And so there is a bunch
of C++ objects that are instantiated when OnStartRequest() is fired.
These objects (content sink, scanner, parser, html/dom document,
content viewer, etc.) are inter-related and work in unison in a
way that complicates content sniffing at the stage when real data
flows in. The complication stems from the fact that many of the objects
are not generic. They are specialized depending on the content-type (e.g.,
the content-sink is a HTML content sink as opposed to a XML content sink),
and they need to be destroyed if the content-type is switched on the fly.

But there is an exception which already does content sniffing: the
file:// protocol... It internally uses the nsUnknownDecoder object
(LXR search: nsUnknownDecoder::OnStartRequest/OnDataAvailable) to see
how the data is buffered. Then when a sufficient amount of data
is known, it calls DetermineContentType to figure out what to do.
(However, this decoder doesn't come into play into the http case
even when the content-type is bogus, instead the "Save As..." is
prompted from the nsExternalAppHandler -- LXR search again).

> How serious a delay can there be between the http header for a
> text/html object and the top of its data stream. If there is a

i.e., handle all data like the file protocol is doing?!? I have no
quantitative idea of how much performance hit that will be.
---
RBS

Henri Sivonen

unread,

Dec 3, 2001, 2:51:01 AM12/3/01

to

In article <2001113016...@wisdom.geomtech.com>,
Rob...@dessci.com (Robert Miner) wrote:

> > The question is: Why would anyone want to serve XHTML 1.1 plus MathML
> > 2.0 content as text/html?
> >
> > I know about the following three cases, none of which require changes
> > to
> > Mozilla. [...snip...]
>
> My personal view is that this backwards. The real question to me is:
>
> Why would anyone want to have to think about this, learn a bunch of
> arcane stuff that keeps changing, and do something special to put
> their documents online, however simple?

I see your point, but I disagree. XHTML 1.1 plus MathML 2.0 does not fit
well under the text/html label. The required parsing approach is
significatly different and the document instances can contain MathML
markup old browsers have no knowledge about. I think it is backwards to
use a type label that was intended for one thing as a label for
something significatly different.

text/html in practice means HTML-like tag soup. It can't be
realistically parsed with a parser that enforces conformance
requirements. OTOH, text/xml can *and must* be parsed with a parser that
takes a Draconian approach to well-formedness errors. This is a Good
Thing. It saves everyone the trouble of implementing increasingly
convoluted and incompatible DWIM algoritms.

Attempting to support XML sent as text/html has a great risk. There are
already pages out there that claim to be XHTML 1.0 Transitional pages
but aren't well-formed. If browser-makers started to feel the need to
make XML parsers forgiving, XML would degenerate into tag soup and all
the benefits of XML would be lost as well as the objective to bring "the
rigor of XML to Web pages". It would be a very *Bad Thing*.

If Mozilla supported XML as text/html, soon IE-centric Web authors would
be all over Bugzilla complaining that Mozilla gives XML errors even
though "it works in IE". It would be downhill from then on.

In addition to that, sniffing is bad for browser performance and
requires implementation resources that would be better spent elsewhere.
Besides, home-grown sniffing methods are bad for interoperability, so
the W3C would have to publish a Recommendation about doctype sniffing,
which they probably wouldn't want to do.

> It's fine for you to say, "but it's just one simple line in your
> Apache .htaccess file!" But for our average customer -- say a
> highschool teacher whose knowledge of the Web stops with what buttons
> are on the FrontPage toolbar -- you might as well be speaking, well,
> Finnish ;-) Apache what? MIME who? ht huh? ...

It seems to me that you are arguing that people should be able to do
something new (publishing in MathML) without learning something new. I
don't think it works that way. In order to produce MathML content, one
has to know how to do it. I think producing the MathML markup is a
bigger challange than naming the file in such a way that it gets sent
out as text/xml. Servers come preconfigured with .xml associated with
text/xml.

> Of course I can fix the W3C MathML test suite.

Thanks. That would be appreciated.

--
Henri Sivonen
hen...@clinet.fi
http://www.hut.fi/u/hsivonen/

William F. Hammond

unread,

Dec 4, 2001, 11:30:15 AM12/4/01

to mozilla...@mozilla.org

Roger Sidje writes:

> > Or is the point that Mozilla loads the parser before any data is seen?
>
> Yes, that's the point. The point is not really about the _amount_ of
> data that may or may not precede the <DOCTYPE>. Rather, the whole

Could Mozilla think about resurrecting the "version" parameter for the
HTTP content-type header as an optional parameter admitting the token
value "xhtml", i.e.,

Content-Type: text/html; version=xhtml

No sniffing, no performance hit. This parameter was discarded in
RFC 2854, but I believe that its old purpose was never exercised.

Still I think that Mozilla should give serious thought to re-starting
itself if a text/html document begins with an xml declaration or a
document type declaration with fpi matching "DTD XHTML".

-- Bill

Roger B. Sidje

unread,

Dec 4, 2001, 3:51:06 PM12/4/01

to William F. Hammond, mozilla...@mozilla.org

"William F. Hammond" wrote:
>
> Could Mozilla think about resurrecting the "version" parameter for the
> HTTP content-type header as an optional parameter admitting the token
> value "xhtml", i.e.,
>
> Content-Type: text/html; version=xhtml
>
> No sniffing, no performance hit. This parameter was discarded in
> RFC 2854, but I believe that its old purpose was never exercised.

This looks interesting, but on the other hand you are saying that
the parameter is not spec-compliant anymore :-( What are the cons
of supporting this parameter now that it has been
removed?!

In my previous hunt, I had figured out where the content-type is
parsed in necko and it would be relatively straightforward for me now
to substitute this "text/html; version=xhtml" with "application/xhtml+xml"
so that the rest of the rendering chain will work transparently.

The compromise might also be appealing to the "anti-sniffing lobby" because
the decision is coming from the content-type, and authors who take the
trouble of sending this content-type will need to *know* what they are doing,
and so there is little risk of having ill-formed XHTML sent this way by
authors who are unaware of the consequences.

> Still I think that Mozilla should give serious thought to re-starting
> itself if a text/html document begins with an xml declaration or a
> document type declaration with fpi matching "DTD XHTML".

I think too that this option is worthwhile. HTML is going to be around
for quite a little, and how could authors be encouraged to produce XHTML
if there is not a simple bridge to transition between HTML and XHTML in
browsers. Let's keep our feet on the ground here.
---
RBS

Roger B. Sidje

unread,

Dec 4, 2001, 5:13:39 PM12/4/01

to Henri Sivonen

Henri Sivonen wrote:
>
> Attempting to support XML sent as text/html has a great risk. There are
> already pages out there that claim to be XHTML 1.0 Transitional pages
> but aren't well-formed. If browser-makers started to feel the need to
> make XML parsers forgiving, XML would degenerate into tag soup and all
> the benefits of XML would be lost as well as the objective to bring "the
> rigor of XML to Web pages". It would be a very *Bad Thing*.

The intention is not to parse XML as tag soup. On the contrary, the fact
that Mozilla will parse XHTML as XML will _protect_ XHTML from degenerating.

Authors retain the final decision as to whether or not their document can be
treated as HTML or XHTML. I see little reason to place any blame on Mozilla on
anything if authors deliberately take the trouble of using a header like this:

<?xml [...]>
<DOCTYPE [...] DTD/XHTML [...]>

There are _several_ DOCTYPEs to pick from, and if they pick that one, then it
could be seen as a tacit acceptance that that their document can be treated as XML.
If the browser chokes, authors will be led to admit that after all their document
wasn't XHTML. They will take the step to fix the ill-formedness, and... they remain
free to pick another less boastful header. The beneficiary here is XHTML since
tag-soup documents that claim to be XHTML won't proliferate.
---
RBS

William F. Hammond

unread,

Dec 4, 2001, 10:10:40 PM12/4/01

to mozilla...@mozilla.org

"Roger B. Sidje" <r...@maths.uq.edu.au> writes:

> > Content-Type: text/html; version=xhtml
> >
> > No sniffing, no performance hit. This parameter was discarded in
> > RFC 2854, but I believe that its old purpose was never exercised.
>
> This looks interesting, but on the other hand you are saying that
> the parameter is not spec-compliant anymore :-( What are the cons
> of supporting this parameter now that it has been
> removed?!

Actually it never was in an adopted spec. "version" was a
proposed optional parameter in the 1995 draft of the HTML 3.0 spec,
the one with math tags in HTML. Remember that? :-)
AFAIK it never had more than draft status at W3C.

Perhaps it would be cleaner to give the optional parameter a new name
rather than use an old name. One could revise the idea to use the
name "profile" instead of "version", i.e.,

Content-Type: text/html; profile=xhtml

-- Bill

Henri Sivonen

unread,

Dec 10, 2001, 8:27:54 AM12/10/01

to

In article <3C0D4A93...@maths.uq.edu.au>, "Roger B. Sidje"
<r...@maths.uq.edu.au> wrote:

> The intention is not to parse XML as tag soup. On the contrary, the fact
> that Mozilla will parse XHTML as XML will _protect_ XHTML from
> degenerating.

A tag souper might make a sorta-XHTML doc and see that it "works" in old
browsers that parse as tag soup. If the document was in fact ill-formed
and Mozilla rejected it, the tag souper would see Mozilla as the
problem. Any attempt to "fix" the problem on the browser side would put
non-soupness of XML in danger.

> I see little reason to place any blame on Mozilla on anything if
> authors deliberately take the trouble of using a header like this:
>
> <?xml [...]>
> <DOCTYPE [...] DTD/XHTML [...]>

There are already docs out there that pretend to be XHTML but aren't.
For example, what if Mozilla attempted to use the XML parser with
http://www.oreilly.com/ and a Mozilla distributor made it a marketing
requirement that the page must be displayed?

> There are _several_ DOCTYPEs to pick from, and if they pick that one,
> then it could be seen as a tacit acceptance that that their document
> can be treated as XML.
> If the browser chokes, authors will be led to admit that after all their
> document wasn't XHTML. They will take the step to fix the
> ill-formedness, and... they remain free to pick another less boastful
> header. The beneficiary here is XHTML since tag-soup documents that
> claim to be XHTML won't proliferate.

You are assuming that the authors would realize and admit that they have
made a mistake even though IE displayed the page. Sadly, I don't think
the "but it works in IE" crowd would look at it that way. :-(

The problem with "but it works in IE" doesn't arise with the XML
doctypes.

Henri Sivonen

unread,

Dec 10, 2001, 8:31:27 AM12/10/01

to

In article <i73d2ro...@pluto.math.albany.edu>,

ham...@csc.albany.edu (William F. Hammond) wrote:

> Could Mozilla think about resurrecting the "version" parameter for the
> HTTP content-type header as an optional parameter admitting the token
> value "xhtml", i.e.,
>
> Content-Type: text/html; version=xhtml

But why? Once you go there, you could just as well use a real XML
content type.

Roger B. Sidje

unread,

Dec 11, 2001, 2:41:03 AM12/11/01

to Henri Sivonen, mozilla...@mozilla.org

Henri Sivonen wrote:
>
> In article <i73d2ro...@pluto.math.albany.edu>,
> ham...@csc.albany.edu (William F. Hammond) wrote:
>
> > Could Mozilla think about resurrecting the "version" parameter for the
> > HTTP content-type header as an optional parameter admitting the token
> > value "xhtml", i.e.,
> >
> > Content-Type: text/html; version=xhtml
>
> But why? Once you go there, you could just as well use a real XML
> content type.

As already said, the XML content-type prompts the "Save As...".
---
RBS

Roger B. Sidje

unread,

Dec 11, 2001, 3:06:10 AM12/11/01

to Henri Sivonen, mozilla...@mozilla.org

Henri Sivonen wrote:
>
> In article <3C0D4A93...@maths.uq.edu.au>, "Roger B. Sidje"
> <r...@maths.uq.edu.au> wrote:
>
> > The intention is not to parse XML as tag soup. On the contrary, the fact
> > that Mozilla will parse XHTML as XML will _protect_ XHTML from
> > degenerating.
>
> A tag souper might make a sorta-XHTML doc and see that it "works" in old
> browsers that parse as tag soup. If the document was in fact ill-formed
> and Mozilla rejected it, the tag souper would see Mozilla as the
> problem. Any attempt to "fix" the problem on the browser side would put
> non-soupness of XML in danger.

Looks like speculations to me.

Let's look at the other side of coin too. Mozilla is designed for the
future and it is renowned to handle new formats better than old browsers.
An error due to ill-formedness can not be confused with something else
(there is big XML parsing error in yellow color that is displayed to
the "tag souper"). A document that is not well-formed doesn't qualify
as XHTML. If the new format (XHTML) doesn't work, the "tag souper" is
likely going to think that the old browser is still in HTML.

It seems that what could shed some definitive light on this kind of
things is to have a working sniffer to get real user feedback. The
longer there is no protection from degeneration, the more people
will produce ill-formed pages in the "name" of XHTML.
---
RBS

William F. Hammond

unread,

Dec 11, 2001, 10:28:43 AM12/11/01

to mozilla...@mozilla.org, www-...@w3.org, www-...@w3.org

There is a serious roadblock for the use of MathML in web pages
arising from the differing expectation of user agents in regard to
the HTTP content-type for a web page with such content.

This situation actually reflects inadequate flexibility for content
providers under the plans represented by the Baker draft for IETF
registration of the content type "application/xhtml+xml".

I would like to propose that added flexibility for content providers
be arranged by using a new HTTP content-type parameter called
"profile"

I am suggesting that it should have two initial values, one for the
MathML issue and the other for the more general need of content
providers wishing to have a smooth migration path from classical HTML
to fully functioning XHTML.

Older agents should not change behavior in the face of an unknown HTTP
content-type parameter for text/html. All of the older agents I've
checked are OK in this regard.

For the MathML situation a content provider may use:

Content-Type: text/html; profile=math

which means:

The object should be handled as HTML extended by MathML using
the best means, if any, the user agent provides, while other
user agents should be sane handling the content classically
(though some meaning of content in name space extensions may
be lost).

Without regard to the particular situation in MathML a content
provider who wishes to migrate from HTML to fully functioning XHTML
without providing dual resources and without "throwing a switch" may
use:

Content-Type: text/html; profile=xhtml

which means:

The object should be handled as fully functional XHTML by those
user agents prepared to deal with XHTML while older agents
should be sane handling it classically.

The "xhtml" value might be imagined to fail in usefulness for the
MathML issue at the point where a browser handling MathML only in
classic HTML mode might begin to recognize the new profile parameter.

Note also that there is an HTTP content-type parameter for the
proposed type "application/xhtml+xml" called "profile" with a somewhat
different, more elaborate, purpose.

Appendix C of XHTML 1.0 is relevant for all XHTML, not just 1.0, if a
content provider wishes new content to be maximally useful in older
user agents. There is specified user agent behavior for unknown
element names. Appendix C is, however, informative. Decisions
about following its guidelines belong to content providers, but
content providers are encouraged to follow those guidelines.

-- Bill

Henri Sivonen

unread,

Dec 11, 2001, 1:39:42 PM12/11/01

to

In article <3C15B88F...@maths.uq.edu.au>, r...@maths.uq.edu.au
(Roger B. Sidje) wrote:

> As already said, the XML content-type prompts the "Save As...".

Why is "Save As..." bad? At least it tells the user honestly that the
browser can't handle the document on its own. OTOH, if MathML content is
dumped to an old browser as text/html, the user sees a confusing
partially failed attempt to display the page.

Henri Sivonen

unread,

Dec 11, 2001, 1:44:16 PM12/11/01

to

In article <3C15BE72...@maths.uq.edu.au>, r...@maths.uq.edu.au
(Roger B. Sidje) wrote:

> Henri Sivonen wrote:

> > A tag souper might make a sorta-XHTML doc and see that it "works" in
> > old
> > browsers that parse as tag soup. If the document was in fact ill-formed
> > and Mozilla rejected it, the tag souper would see Mozilla as the
> > problem. Any attempt to "fix" the problem on the browser side would put
> > non-soupness of XML in danger.
>
> Looks like speculations to me.

It is speculation, but considering numerous "but it works in IE" bug
reports in Bugzilla, I think it is realistic speculation.

> An error due to ill-formedness can not be confused with something else
> (there is big XML parsing error in yellow color that is displayed to
> the "tag souper").

What if the tag souper doesn't test with Mozilla? Then the tag souper
wouldn't see the error message but Mozilla users would complain.

Roger B. Sidje

unread,

Dec 12, 2001, 11:10:31 AM12/12/01

to Henri Sivonen, mozilla...@mozilla.org

Henri Sivonen wrote:
>
> In article <3C15B88F...@maths.uq.edu.au>, r...@maths.uq.edu.au
> (Roger B. Sidje) wrote:
>
> > As already said, the XML content-type prompts the "Save As...".
>
> Why is "Save As..." bad? At least it tells the user honestly that the
> browser can't handle the document on its own. OTOH, if MathML content is
> dumped to an old browser as text/html, the user sees a confusing
> partially failed attempt to display the page.

"Save As..." doesn't offer the possibility to programmatically
degrade gracefully. You get the "Save As..." and that is.
(Not so long ago, the google tab wasn't working and remember
how its "Save As..." was an annoyance.)

With the content window, you can at least say,
<a-la-noframes>
look, here is how it <img="screenshot.png">is</img> in a
<a href="download/mozilla">MathML-capable browser</a>.
</a-la-noframes>

You can do more of course. But this should give you an idea.
Anyone can gather from this example that from their old browser,
they are enticed to download the brand new mozilla (or one of its
derivatives), and they do this straight from there. There are
other things that can be gathered from that example for anyone
who cares to think about them.
---
RBS

Roger B. Sidje

unread,

Dec 12, 2001, 11:20:03 AM12/12/01

to Henri Sivonen, mozilla...@mozilla.org

Henri Sivonen wrote:
>
> What if the tag souper doesn't test with Mozilla? Then the tag souper
> wouldn't see the error message but Mozilla users would complain.

Yes, Mozilla users will complain to the tag-souper, and this is a good
to thing. The tag-souper will fix, and perhaps the tag-souper will also
download Mozilla (or one of its derivatives) to verify the mis-attribution
(calling tag-soup as XHTML) for themselves.
---
RBS

Bruno....@laposte.net

unread,

Dec 13, 2001, 11:33:08 AM12/13/01

to

From a web developper point of view, I absolutely agree with Henri Sivonen.

It seems quite obvious that .xml, and .xhtml too, should be served as
text/xml ! So obvious that it is the default configuration of Apache.

If not, we could as well throw "mime type" to the trash box.

DOCTYPE is usefull to use http://validator.w3.org/ but we can't expect a
lot of persons to use it.

Still I would know how to set the mime type in php or jsp.

Thanks

David Carlisle

unread,

Dec 13, 2001, 11:46:42 AM12/13/01

to Bruno....@laposte.net, mozilla...@mozilla.org

> It seems quite obvious that .xml, and .xhtml too, should be served as
> text/xml

Neither of those is in fact obvious. Many people have claimed text/xml
should never (or rarely) be used and that XML files are not really
readable text so should be served as application/xml (I don't really
agree with this but it is an often stated view) And given that XHTML
(1.0 at least) goes to such lengths to "work" in legacy html systems it
makes sense to serve some at least of those files as text/html it may
not be a brilliant idea to give those files an extension of .xhtml
but it is not "obviously" bad.

> DOCTYPE is usefull to use http://validator.w3.org/ but we can't expect a
> lot of persons to use it.

Why not? Its use is mandatory according to the xhtml specs, and if any
named entities are used, its use is mandatory according to XML.

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

Christian Biesinger

unread,

Dec 13, 2001, 4:06:47 PM12/13/01

to

Bruno....@LaPoste.net wrote:

> Still I would know how to set the mime type in php or jsp.

I don't know about JSP, but in PHP, it should work like this:

header("Content-Type: text/xml");

Bruno....@laposte.net

unread,

Dec 14, 2001, 4:44:35 AM12/14/01

to

You are saying that XHTML is barely readable, but still should be served
as text/html because nobody agree about the new way to do things ?

In a transition period, we must accept that some things won't work
anymore. But it still seems obvious to me that XHTML is very different
from HTML (like gif and png.) If you want your xhtml to be treated as
xml, you have to serve it at xml. Otherwise, it will be html (so without
MathML !)

If not, what is the mime type for ?

If the rules are clear, then the servers (Apache, tomcat, IIS ...) can
put the right mime type by default and the tools used to authoring xhtml
will create files with the right extention (.xhtml .html). (Like a image
tool create .gif or .png.) So, the average end user won't have to worry
about all thoses discussions.

In the mean time, we may have to add a line in the .htaccess apache
configuration file and learn how to put a mime type in php or jsp.

What I *really* don't understand here is : why is the W3 leaving this to
us ?? Why are *we* discussing all this ??? Insn't it a bit late ????

What if we put an option in the browser to let the user choose between
using the mime type, or trying to sniff xml or html ?

Thanks

Roger B. Sidje

unread,

Dec 14, 2001, 7:01:33 AM12/14/01

to Bruno....@laposte.net, mozilla...@mozilla.org

Bruno....@LaPoste.net wrote:
>
> In a transition period, we must accept that some things won't work
> anymore. But it still seems obvious to me that XHTML is very different
> from HTML (like gif and png.)

This kind of comparisons is not relevant. XHTML (1.0) is after all a reformulation
of HTML. In fact, for the sake of argument, imagine if the ocean of tag-soup on
the web today was miraculously cleaned and all documents converted to well-formed
XHTML, the web wouldn't breakdown, it would still work with conventional browsers.
On the other hand, gif and png are completely different formats. If all images
on the web were converted to png, countless pages would breakdown on conventional
browsers that don't support png.

> If you want your xhtml to be treated as
> xml, you have to serve it at xml. Otherwise, it will be html (so without
> MathML !)

This is an easy theory to put forward when one is not really faced with
the task of making such pages available. Re-read the thread to see the
pros and cons of this option.

> If not, what is the mime type for ?

What is the DOCTYPE for? Why can't XHTML be parsed as XML? Is XHTML
already meaningless? Since XHTML is an application of XML, a document
that claims to be XHTML should be well-formed XML, otherwise the
document doesn't deserve to be called XHTML, and there are several
alternative DOCTYPEs to choose from.

> If the rules are clear, then the servers (Apache, tomcat, IIS ...) can
> put the right mime type by default and the tools used to authoring xhtml
> will create files with the right extention (.xhtml .html). (Like a image
> tool create .gif or .png.) So, the average end user won't have to worry
> about all thoses discussions.

The pros and cons of this have been detailed already. Summary: it is not
a panacea.

> What I *really* don't understand here is : why is the W3 leaving this to
> us ?? Why are *we* discussing all this ??? Insn't it a bit late ????

That's the spilled milk. OK, after crying over it, what next?

> What if we put an option in the browser to let the user choose between
> using the mime type, or trying to sniff xml or html ?

Many endorse a sniffing option (and its consequences). But some people
are still hesitant to see this happens, although it appeals to reason
that something like that needs to happen in order to gradually phase
out HTML for the benefit of clean and tidy XHTML.
---
RBS

Bruno....@laposte.net

unread,

Dec 14, 2001, 9:31:22 AM12/14/01

to

Roger B. Sidje wrote:

> Bruno....@LaPoste.net wrote:
>
>>In a transition period, we must accept that some things won't work
>>anymore. But it still seems obvious to me that XHTML is very different
>>from HTML (like gif and png.)
>>
>
> This kind of comparisons is not relevant. XHTML (1.0) is after all a reformulation
> of HTML. In fact, for the sake of argument, imagine if the ocean of tag-soup on
> the web today was miraculously cleaned and all documents converted to well-formed
> XHTML, the web wouldn't breakdown, it would still work with conventional browsers.
> On the other hand, gif and png are completely different formats. If all images
> on the web were converted to png, countless pages would breakdown on conventional
> browsers that don't support png.
>

But if one makes a site with png images : one knows that old browsers
will break ! You can choose to use gif if you want to be read by old
browsers. It is fine and clear situation. Why don't you want the same
with html and xhtml ?
By the way, the point is that there *is* a transition between html and
xhtml : your example states that this transition wouldn't exist ? The
transition would be clean if html and xhtml have there respectiv mime
type as well as gif and png. (Imagine that we could have had to sniff
the content of all images files to find their format if they haden't
diferent mime type !)

>
>>If you want your xhtml to be treated as
>>xml, you have to serve it at xml. Otherwise, it will be html (so without
>>MathML !)
> This is an easy theory to put forward when one is not really faced with
> the task of making such pages available. Re-read the thread to see the
> pros and cons of this option.

I already read the thread, but I have to spend some time to test Apache
configuration and to see how to set the mime type with some scripting
language. I have a job, I can't spend my time on the newsgroup.

>>If not, what is the mime type for ?
>>
> What is the DOCTYPE for? Why can't XHTML be parsed as XML? Is XHTML
> already meaningless? Since XHTML is an application of XML, a document
> that claims to be XHTML should be well-formed XML, otherwise the
> document doesn't deserve to be called XHTML, and there are several
> alternative DOCTYPEs to choose from.

DOCTYPE are OK if you want to find what kind of HTML you will find in a
HTML document (HTML 2, 3, 4) or what kind of XML you will find in a
XML document (XHTML 1, SVG, MathML, XHTML+MathML ...)
It is useful to validate a document too.
So, if this is wrong, again : what would be the mime type for ??

>>If the rules are clear, then the servers (Apache, tomcat, IIS ...) can
>>put the right mime type by default and the tools used to authoring xhtml
>>will create files with the right extention (.xhtml .html). (Like a image
>>tool create .gif or .png.) So, the average end user won't have to worry
>>about all thoses discussions.
>>
>
> The pros and cons of this have been detailed already. Summary: it is not
> a panacea.

You mean like democraty is not the best system ?

>>What I *really* don't understand here is : why is the W3 leaving this to
>>us ?? Why are *we* discussing all this ??? Insn't it a bit late ????
>
> That's the spilled milk. OK, after crying over it, what next?

Next is up to you, anyway. You makes the rules by publishing
recommendation and making examples avaible, you show us the right way.

>>What if we put an option in the browser to let the user choose between
>>using the mime type, or trying to sniff xml or html ?
>
> Many endorse a sniffing option (and its consequences). But some people
> are still hesitant to see this happens, although it appeals to reason
> that something like that needs to happen in order to gradually phase
> out HTML for the benefit of clean and tidy XHTML.

Many ? who ?
Like for gif and png, some people will gradually see less and less
pages. They will gradually switch to new browsers, like they always do
and at an increasing speed (for more futile reason sometimes.)
If there are no clean frontiere between the two, it will be much harder
and longer to phase out.

That a too big for a mail, but you like to have the last word, doesn't
you ? :)

William F. Hammond

unread,

Dec 14, 2001, 11:18:37 AM12/14/01

to mozilla...@mozilla.org

Bruno....@LaPoste.net writes:

> But if one makes a site with png images : one knows that old
> browsers will break ! You can choose to use gif if you want to be read
> by old browsers. It is fine and clear situation. Why don't you want
> the same with html and xhtml ?

The relationship between classic HTML and the new XML form of HTML
is more complicated than that between those two graphic formats.

> ... The transition would be clean if html and xhtml have there
> respectiv mime type as well as gif and png. ...

This is what is meant by "throwing a switch". If one has an archive
of N web pages to upgrade, then either one throws a switch, converting
all pages overnight and losing readers with old browsers, or one puts
up 2 N pages under what is called "dual service"? Is this an
effective way to persuade content providers to use XHTML?

There is another way that is explained in Appendix C of the W3C
specification for XHTML 1.0. It would be possible to have mimetype
delineation of this case if the specification were amended to provide
an appropriate content-type parameter for "text/html" (that would be
ignored as unknown by classic user agents).

There is also the point that both "text/xml" and "application/xml" are
vast classes very much larger than their intersections with
"text/html" and "application/xhtml+xml". But most widely distributed
user agents are not going to be ready for general objects of those
wider types.

> Next is up to you, anyway. You makes the rules by publishing
> recommendation and making examples avaible, you show us the right way.

No, the rules are not made here. The recommendations are formulated
and published by W3C. Discussion about this belongs in the list
www-...@w3.org.

-- Bill

Roger B. Sidje

unread,

Dec 14, 2001, 11:38:30 AM12/14/01

to Bruno....@laposte.net, mozilla...@mozilla.org

Bruno....@LaPoste.net wrote:
>
> But if one makes a site with png images : one knows that old browsers
> will break ! You can choose to use gif if you want to be read by old
> browsers. It is fine and clear situation. Why don't you want the same
> with html and xhtml ?

[At least with png, you won't get the "Save As...". Visitors can still read
your site, see what they miss, and download a newer browser to experience
your site fully.]

If one makes a site with MathML in XHTML served as text/html, IE6 can
render it using a plugin (OK, this is a lousy excuse). Mozilla needs
the same thing as XML. Other browsers wouldn't get the "Save As...".

> By the way, the point is that there *is* a transition between html and
> xhtml : your example states that this transition wouldn't exist ?

The transition is still there but in a different way -- in which people
are enticed to download a new browser for example. <thought>Thinking
of this dilemma, perhaps XHTML shouldn't have been made as a reformulation
of HTML. It seems that the need to have this "compatibility" with HTML
is the reason of the ambivalence. If XHTML was a different syntax
altogether, this discussion would have been a non-issue (comparable
to png vs gif).
</thought>

> > The pros and cons of this have been detailed already. Summary: it is not
> > a panacea.
>
> You mean like democraty is not the best system ?

I mean that that there is a resulting "Save As..." with its own effects
in this context.

> Many ? who ?

You are perhaps new to the newsgroup and don't know that this issue
has been recurrent. [If my memory serves me right, since the support
of <math> in tag-soup HTML was a no-no, this was seeing as a helpful
option.]

> Like for gif and png, some people will gradually see less and less
> pages. They will gradually switch to new browsers, like they always do
> and at an increasing speed (for more futile reason sometimes.)
> If there are no clean frontiere between the two, it will be much harder
> and longer to phase out.
>
> That a too big for a mail, but you like to have the last word, doesn't
> you ? :)

I am subscribed to the mail gateway and I am deleting previous posts.
I am lobbying for at least experimenting a kind of sniffing and have
already had a sniffing patch turned down and don't wish to see another
one (if any) turned down again just because any sniffing (in this context)
is unhelpfuly deemed bad on principle by some people who have not been
practically bitten by the issue at hand.
---
RBS

Bruno....@laposte.net

unread,

Dec 14, 2001, 12:10:44 PM12/14/01

to

> If one makes a site with MathML in XHTML served as text/html, IE6 can
> render it using a plugin (OK, this is a lousy excuse). Mozilla needs
> the same thing as XML. Other browsers wouldn't get the "Save As...".

Oh sorry, I didn't know that IE6 already feature MathML this way. You
may have begin by this point, since I think now that Mozilla should do
exactly the same. Otherwise, it will be just rejected.
Let's sniffing the DOCTYPE and do the same as IE as much as possible.
End of the discussion for me.

William F. Hammond

unread,

Dec 14, 2001, 1:33:15 PM12/14/01

to mozilla...@mozilla.org

"Roger B. Sidje" <r...@maths.uq.edu.au> writes:

> I am lobbying for at least experimenting a kind of sniffing and have
> already had a sniffing patch turned down and don't wish to see another
> one (if any) turned down again just because any sniffing (in this context)
> is unhelpfuly deemed bad on principle by some people who have not been
> practically bitten by the issue at hand.

1. Would you anticipate a similar turn down for a patch to use the
HTTP content-type parameter?

2. It would help to find a name for the restart issue other than
"sniffing".

(I agree about sniffing *if* it is indeed sniffing.)

All W3C specs for classic HTML beginning with version 2.0 (which was
actually RFC 1866, Nov. 1995, by T. Berners-Lee and D. Connolly) have
_required_ that the document instance begin with a DOCTYPE declaration
using an FPI from a very small list. A robust, fully spec-compliant,
text/html user agent that is XHTML capable simply _cannot_ roll past a
DOCTYPE declaration for XHTML.

"Sniffing" occurs when either there is no content-type (HTTP/0.9 or
FTP) or when a content-type boundary is crossed.

In the context of XHTML served as text/html (without a parameter) a
restart based on a DOCTYPE declaration for XHTML would not involve
"sniffing" unless content-type definitions clearly put the user agent
in the position of crossing a content-type boundary.

-- Bill

Roger B. Sidje

unread,

Dec 14, 2001, 2:11:44 PM12/14/01

to William F. Hammond, mozilla...@mozilla.org

"William F. Hammond" wrote:
>
> 1. Would you anticipate a similar turn down for a patch to use the
> HTTP content-type parameter?

If the |profile| (or |version|) parameter was still a standard parameter,
I would thing that it would appeal to more people. Nevertheless, there are
some complementary features that Mozilla uses when standards are lagging.
So the parameter might still be appealing (as a |-moz-profile| parameter)
since its support requires just a little patch that reviewers often find
easier to deal with. The DOCTYPE route is more involved.

> 2. It would help to find a name for the restart issue other than
> "sniffing".

"detection" / "discovery"? -- but they don't convey the notion of
possible restart.
---
RBS

Chris Hoess

unread,

Dec 16, 2001, 10:20:46 PM12/16/01

to

In article <3C19EA1D...@maths.uq.edu.au>, Roger B. Sidje wrote:
> Bruno....@LaPoste.net wrote:
>>
>> In a transition period, we must accept that some things won't work
>> anymore. But it still seems obvious to me that XHTML is very different
>> from HTML (like gif and png.)
>
> This kind of comparisons is not relevant. XHTML (1.0) is after all a reformulation
> of HTML. In fact, for the sake of argument, imagine if the ocean of tag-soup on
> the web today was miraculously cleaned and all documents converted to well-formed
> XHTML, the web wouldn't breakdown, it would still work with conventional browsers.

In fact, this is not correct, because "conventional browsers" have grown
themselves around tag-soup. " ", the "W3C endorsed" technique for
XHTML compatibility, will produce a line feed followed by ">" in a proper
implementation of HTML 4.01 (i.e., with an SGML parser). Another reason
to take W3C's XHTML->HTML backwards-compatibility mania less than
seriously (including XHTML as text/html).

>
>> If not, what is the mime type for ?
>
> What is the DOCTYPE for?

Including a piece of external markup in the document, which is referenced
by a public identifier.

> Why can't XHTML be parsed as XML? Is XHTML
> already meaningless?

Yes, partly. Because of the W3C's insistence on a backwards-compatibility
only extant in broken browsers, text/html XHTML has already been lost to
tag soup. The only parts of XHTML that can still be held for
XML-well-formedness are those served as XML.

> Since XHTML is an application of XML, a document
> that claims to be XHTML should be well-formed XML, otherwise the
> document doesn't deserve to be called XHTML, and there are several
> alternative DOCTYPEs to choose from.

Yes indeed. Unfortunately, the meme that "DOCTYPE is just some funny line
you stick at the top of your document to tell the browser what it is" has
pretty well eradicated understanding of DTDs, the difference between SGML
and XML applications, etc. text/html is a lost cause; the MIME-type is
utterly given over to tag-soup. The best thing to do now is tiptoe
quietly away, let the tag-soupers play with their new DOCTYPE toy, and use
DOCTYPE correctly on XML documents, served as such.

>> If the rules are clear, then the servers (Apache, tomcat, IIS ...) can
>> put the right mime type by default and the tools used to authoring xhtml
>> will create files with the right extention (.xhtml .html). (Like a image
>> tool create .gif or .png.) So, the average end user won't have to worry
>> about all thoses discussions.
>
> The pros and cons of this have been detailed already. Summary: it is not
> a panacea.
>
>> What I *really* don't understand here is : why is the W3 leaving this to
>> us ?? Why are *we* discussing all this ??? Insn't it a bit late ????
>
> That's the spilled milk. OK, after crying over it, what next?
>
>> What if we put an option in the browser to let the user choose between
>> using the mime type, or trying to sniff xml or html ?
>
> Many endorse a sniffing option (and its consequences). But some people
> are still hesitant to see this happens, although it appeals to reason
> that something like that needs to happen in order to gradually phase
> out HTML for the benefit of clean and tidy XHTML.

What needs to happen in order to phase out tag-soup and phase in "clean
and tidy" XHTML is that people need to know what they're doing, rather
than blithely ignoring HTTP and using DOCTYPE as a magic invocation. The
rot has indeed spread deep; the assumption of tag-soupism fills not only
authoring, but browser authoring and server use as well. Attempting to
yet again fudge standards and break the web pages of the Teeming Millions
so as to impose order on the Web will simply result in hordes of
cargo-cultist coders breaking down the doors on Bugzilla and demanding
that we moronize the browser; BTDTGTTS.We'll have a clean, well-formed
XHTML-based web when most UAs are competent XML UAs, and not before.

--
Chris Hoess

William F. Hammond

unread,

Dec 17, 2001, 12:36:28 PM12/17/01

to Chris Hoess, mozilla...@mozilla.org

Chris Hoess <cho...@force.stwing.upenn.edu> writes:

> . . . " ", the "W3C endorsed" technique for

> XHTML compatibility, will produce a line feed followed by ">" in a proper
> implementation of HTML 4.01 (i.e., with an SGML parser).

No. By this reasoning " " would be as you suggest by the W3C's
testbed browser "Amaya".

A user agent for text/html that is strict and fully specification
compliant must read the document type declaration, if any, since it
is required for the classic versions of HTML and cannot be ignored,
if present, in XML versions of HTML.

> > What is the DOCTYPE for?
>
> Including a piece of external markup in the document, which is referenced
> by a public identifier.

Which is mainly the document's prolog.

> Yes indeed. Unfortunately, the meme that "DOCTYPE is just some funny line
> you stick at the top of your document to tell the browser what it is" has

Rubbish. See RFC 1866 by T. Berners-Lee and D. Connolly, Nov. 1995.

-- Bill

Chris Hoess

unread,

Dec 17, 2001, 1:20:52 PM12/17/01

to

In article <i7sna9a...@pluto.math.albany.edu>, William F. Hammond wrote:
> Chris Hoess <cho...@force.stwing.upenn.edu> writes:
>
>> . . . " ", the "W3C endorsed" technique for
>> XHTML compatibility, will produce a line feed followed by ">" in a proper
>> implementation of HTML 4.01 (i.e., with an SGML parser).
>
> No. By this reasoning " " would be as you suggest by the W3C's
> testbed browser "Amaya".

Try reading <URL:http://www.cs.tut.fi/~jkorpela/html/empty.html> for a
discussion of how the null end-tag went wrong. For that matter, compare
the NET delimiter in the SGML declaration of HTML 4.01 and the proposed
SGML declaration for XML at <URL:http://www.w3.org/TR/NOTE-sgml-xml>.
Really, I don't just make these things up.

> A user agent for text/html that is strict and fully specification
> compliant must read the document type declaration, if any, since it
> is required for the classic versions of HTML and cannot be ignored,
> if present, in XML versions of HTML.
>
>> > What is the DOCTYPE for?
>>
>> Including a piece of external markup in the document, which is referenced
>> by a public identifier.
>
> Which is mainly the document's prolog.
>
>> Yes indeed. Unfortunately, the meme that "DOCTYPE is just some funny line
>> you stick at the top of your document to tell the browser what it is" has
>
> Rubbish. See RFC 1866 by T. Berners-Lee and D. Connolly, Nov. 1995.
>

Perhaps I was unclear. The DOCTYPE is, essentially, a special kind of
entity reference, allowing one to avoid writing out the standard prolog in
every document. It performs this function in SGML; XML has a construct of
similar syntax that performs the same function. But HTML (as defined by
the SGML declarations of HTML 2.0 and HTML 4.01) is not XML, despite
valiant attempts from the W3C to conceal this fact. Even were one of the
SGML declarations from the Note above normatively adopted for XML,
documents might be parsed differently, and characters be interpreted as
data in one and markup in the other (see above on empty elements).
Approaching the question from a strictly standards-oriented perspective,
DOCTYPE is not (as you seem to regard it) some magic device that has
meaning in the framework of all markup languages; the fact that it is
referenced in the same manner in SGML and XML is coincidental. Handling
of the DOCTYPE is a function of the markup application, be it SGML or XML;
the DOCTYPE does not determine which of these markup languages it belongs
to, because its very interpretation is a function of that language.

There does, of course, exist a means of informing the user-agent of the
nature of the document: MIME-type. What you propose to do seems to be
essentially to bypass the MIME standard because of the sloth of server
administrators and kludge along serving everything as text/html, using
magic DOCTYPEs in place of MIME-type. There are several problems with
this.

1) Despite the coincidental fact that XHTML+MathML breaks down relatively
well in HTML systems, this cannot be said of XML extensions to HTML in
general, especially not for the proposed XHTML 2.0. Continued license in
the use of the text/html content type will result in continually worse
results in older browsers, rendering nugatory the whole idea that browsers
will not display that which they cannot handle well. Furthermore,
concessions at this point will make it harder and harder to convince
content providers of the eventual need to use an XML content-type.

2) As elucidated above, text/html will eventually become unsuitable for
XHTML 2.0, &etc., at which point it will be critical for authors to use
XML content-types to prevent it from going to old tag-soup UAs. Whether
you like it or not, MIME-types *will* have to be revived. Isn't it better
that we start applying pressure now, while the situation is still
controllable?

3) The primary effect of this will be to say to Microsoft, "It's OK if
your so-called implementation doesn't conform to standards and promotes
ignorance and hacks; we'll work around it so you can call the tune!" MS
has already done enough damage with their "file
extensions-over-content-type" approach, which was probably undertaken for
reasons similar to this--"Hey, MIME-types are hard! Let's go shopping^W^W
recognize file extensions instead! Maybe it's non-standard, but it works
90% of the time, and our lusers^W customers will never notice the
difference!". Just because something non-standard happens to work well on
the systems currently deployed is no guarantee it won't break horribly in
the future.

--
Chris Hoess

William F. Hammond

unread,

Dec 17, 2001, 5:20:39 PM12/17/01

to mozilla...@mozilla.org

Chris Hoess <cho...@force.stwing.upenn.edu> writes:

> Try reading <URL:http://www.cs.tut.fi/~jkorpela/html/empty.html> for a
> discussion of how the null end-tag went wrong. For that matter, compare
> the NET delimiter in the SGML declaration of HTML 4.01 and the proposed
> SGML declaration for XML at <URL:http://www.w3.org/TR/NOTE-sgml-xml>.

Agreed, but this was not the point. A DOCTYPE declaration with one of
the W3C HTML or XHTML formal public identifier strings implies the
SGML/XML status, i.e., the appropriate syntax as an SGML application.

Yes, writing " " in classic HTML is wrong, and, yes, the correct
use of " " in an XHTML document may cause unwanted character
bleed-through in some, not all, rendering user agents, but if a
correct document type declaration for XHTML is present, this is a
parsing error.

-- Bill

Chris Hoess

unread,

Dec 19, 2001, 12:18:29 PM12/19/01

to

In article <i7u1up4...@pluto.math.albany.edu>, William F. Hammond wrote:
> Chris Hoess <cho...@force.stwing.upenn.edu> writes:
>
>> Try reading <URL:http://www.cs.tut.fi/~jkorpela/html/empty.html> for a
>> discussion of how the null end-tag went wrong. For that matter, compare
>> the NET delimiter in the SGML declaration of HTML 4.01 and the proposed
>> SGML declaration for XML at <URL:http://www.w3.org/TR/NOTE-sgml-xml>.
>
> Agreed, but this was not the point. A DOCTYPE declaration with one of
> the W3C HTML or XHTML formal public identifier strings implies the
> SGML/XML status, i.e., the appropriate syntax as an SGML application.

This is incorrect; public text is public text. It is possible to write an
SGML application under which documents referencing the XHTML public text
could be valid.

> Yes, writing " " in classic HTML is wrong, and, yes, the correct
> use of " " in an XHTML document may cause unwanted character
> bleed-through in some, not all, rendering user agents, but if a
> correct document type declaration for XHTML is present, this is a
> parsing error.

On what grounds? HTML is an SGML application, as defined in the HTML 4.01
recommendation. There's nothing in RFC 2854 to suggest that they should
automagically become XML: after all, Appendix C XHTML is "compatible".
(Except for the W3C's slipup detailed above.) After all, the whole point
of this misbegotten mess of text/html XHTML was that legacy user agents
would accept XHTML *and parse it*. Such agents were pre-XML; expecting
them to parse XHTML as if it were XML is silly.

--
Chris Hoess

William F. Hammond

unread,

Dec 19, 2001, 3:56:07 PM12/19/01

to Chris Hoess, mozilla...@mozilla.org

Chris Hoess <cho...@force.stwing.upenn.edu> writes:

> . . . It is possible to write an

> SGML application under which documents referencing the XHTML public text
> could be valid.

And this is done, for example, in the W3C spec for XHTML 1.1. That is
the role of the *.dcl in the DTD subdirectory of the expanded tar kit.

Moreover, there was an SGML declaration for each of the W3C forms of
classic HTML.

Each of the W3C FPI's determines a unique SGML application.

However, validation of a document under the associated SGML application
is not entirely sufficient for compliance with the corresponding W3C
spec.

For example, classic HTML *must* begin with a DOCTYPE declaration without
preceding comments or PI's. That's a provision of the W3C spec that
is not enforceable in the SGML application.

It is not correct to say that classic HTML *is* an SGML application.
It is only correct to say that there is an associated SGML application.

-- Bill

Chris Hoess

unread,

Dec 19, 2001, 7:18:19 PM12/19/01

to

In article <i7y9jzr...@pluto.math.albany.edu>, William F. Hammond wrote:
> Chris Hoess <cho...@force.stwing.upenn.edu> writes:
>
>> . . . It is possible to write an
>> SGML application under which documents referencing the XHTML public text
>> could be valid.
>
> And this is done, for example, in the W3C spec for XHTML 1.1. That is
> the role of the *.dcl in the DTD subdirectory of the expanded tar kit.
>
> Moreover, there was an SGML declaration for each of the W3C forms of
> classic HTML.
>
> Each of the W3C FPI's determines a unique SGML application.
>
> However, validation of a document under the associated SGML application
> is not entirely sufficient for compliance with the corresponding W3C
> spec.
>
> For example, classic HTML *must* begin with a DOCTYPE declaration without
> preceding comments or PI's. That's a provision of the W3C spec that
> is not enforceable in the SGML application.

An "application convention"; see more below.

> It is not correct to say that classic HTML *is* an SGML application.
> It is only correct to say that there is an associated SGML application.
>

I think we begin to touch upon an area where the paint is peeling from the
W3C's brave facade of "HTML as an SGML application". SGML applications
are indeed allowed to apply conventions, but this is restricted by ISO
8879:1986 Clause 15.2.1: "A conforming SGML application's conventions can
affect only areas that are left open to specification by applications."
In the general scope of SGML documents, the SGML declaration can affect
how the DOCTYPE declaration can be parsed; i.e., "DOCTYPE" can be replaced
with another keyword, the delimiters can be changed from those of the
reference concrete syntax, etc. Logically, one cannot infer an SGML
declaration from the parsing of a doctype declaration, because the parsing
of the doctype declaration depends on the SGML declaration! Furthermore,
neither the HTML 3.2 nor HTML 4.01 specifications acknowledge the
existence of previous SGML declarations. Practically, of course, this
occurred because no one was bothering to parse SGML as HTML. The HTML 3.0
discussions vaguely suggest that a "version" parameter of the
Content-Type: header was intended to indicate the SGML declaration of the
document, but by the time it was standardized as 3.2, the W3C had its
hands full legitimating the latest k3wl wowser tricks. Regardless, the
fact remains that one cannot infer the application from a piece of markup
whose interpretation varies *depending on the application*.

--
Chris Hoess

Henri Sivonen

unread,

Dec 20, 2001, 6:49:12 AM12/20/01

to

In article <3C1A2B06...@maths.uq.edu.au>, r...@maths.uq.edu.au
(Roger B. Sidje) wrote:

> If one makes a site with MathML in XHTML served as text/html, IE6 can
> render it using a plugin (OK, this is a lousy excuse). Mozilla needs
> the same thing as XML.

Ah, so this is about IE after all.

If Mozilla "supported" IE's brokenness, we'd be stuck with ugly soup
plus XML-like islands stuff ad infinitum.

However:
What's really the market share of Windows IE + a plug-in capable of
rendering MathML as data island? I guess the market share is negligible.
If Netscape could be lobbied to ship MathML functionality, Mozilla's
MathML would be in a leading market position compared to any IE based
hack. I don't think it makes sense to break things in order to
accommodate IE's bugs when IE isn't in the leading position.

David Carlisle

unread,

Dec 20, 2001, 7:38:15 AM12/20/01

to hen...@clinet.fi, mozilla...@mozilla.org

> If Mozilla "supported" IE's brokenness, we'd be stuck with ugly soup
> plus XML-like islands stuff ad infinitum.

I don't think anyone is suggesting that Mozilla (or anything:-)
supports the strange IE XML concept of XML islands, which result in
files not parsable as either XML or SGML (using any standard SGML
declaration) as the html bits use HTML syntax and the XML bits use
XML syntax, so you need to swap parsers (or SGML declarations)
mid document.

I think that the best policy is to serve the files as text/xml (or
application/xml). This works in mozilla (and amaya) and can be made to
work in IE just by having a stylesheet PI at the top of the file to
a one-template XSLT stylesheet that maps XHTMl elements to HTML
(actually, since IE doesn't care, you can just use the identity
transformation).

This stylesheet can also under some circumstances add whatever is needed
to render mathml or svg etc. (Including, in mozilla's case, doing a
content mathml to presentation mathml transformation). Making this robust
at the moment is mainly a matter of waiting for XSLT in mozilla to
stabilise, it's getting a lot better each release so I'm reasonably
optimistic about this approach, although currently running XSLT in
mozilla tends to eat memory rather agressively:-)

However I still wish that Mozilla, as essentially a user-error
correction, would in some circumstances detect that what it has been
given is in fact an XML file and re-start itself in XML mode rather than
HTML. As I see this as error correction, I'm not too concerned about
any efficiency or reloading hits required to do that.
In particular I don't see any reason why, if in HTML mode, mozilla sees
an xml declaration or an xml-stylesheet PI that it shouldn't bail out
and re-start, parsing the file as XML. Surely it's better to do that
than to carry on and just hope for the best that what's following is
something approximating HTML?

There are many situations where people have access to simple web sites
via some dialup ISP and have essentially no control over any mime type
serving except via a fixed set of extension->mime type mappings.
They can (probably) always use .xml extension but it would be nice if
mozilla would be a bit more flexible here.

As well as xhtml documents this affects other things as well, in
particular xslt stylesheets. Given that the stylesheet PI specifies the
stylesheet is xslt, one may hope the browser would accept whatever was
returned from the URI and parse it as XML (IE does this) but mozilla
insists that the stylesheet is served as text/xml. The ISP I use from
home for example doesn't serve .xsl files as XML, so I have to use .xml
on XSL stylesheets. This works of course, but looks a bit odd and
confuses beginners.

Chris Hoess

unread,

Dec 20, 2001, 9:37:21 AM12/20/01

to

In article <2001122012...@penguin.nag.co.uk>, David Carlisle wrote:
>
>
> I think that the best policy is to serve the files as text/xml (or
> application/xml). This works in mozilla (and amaya) and can be made to
> work in IE just by having a stylesheet PI at the top of the file to
> a one-template XSLT stylesheet that maps XHTMl elements to HTML
> (actually, since IE doesn't care, you can just use the identity
> transformation).

WOW! Does this trick really work? I'd considered getting a rather crude
representation of HTML working in IE by styling those docs with html.css
(but no links, title, etc.), but I had no idea you could do this. If this
really works, I'd start leaning pretty hard for solution 1 again; the
arguments for delivering MathML to tag-soup browsers seem rather specious
to me, and I'd rather see something like this that ensures it only goes to
XML browsers, as is proper.

> This stylesheet can also under some circumstances add whatever is needed
> to render mathml or svg etc. (Including, in mozilla's case, doing a
> content mathml to presentation mathml transformation). Making this robust
> at the moment is mainly a matter of waiting for XSLT in mozilla to
> stabilise, it's getting a lot better each release so I'm reasonably
> optimistic about this approach, although currently running XSLT in
> mozilla tends to eat memory rather agressively:-)
>

Wait, I'm confused. I thought Mozilla was already able to handle MathML
without an additional XSLT stylesheet, or does this only appply to
presentational MathML? (And how bloaty is doing that identity transform?
Is that usable now?)

> However I still wish that Mozilla, as essentially a user-error
> correction, would in some circumstances detect that what it has been
> given is in fact an XML file and re-start itself in XML mode rather than
> HTML. As I see this as error correction, I'm not too concerned about
> any efficiency or reloading hits required to do that.
> In particular I don't see any reason why, if in HTML mode, mozilla sees
> an xml declaration or an xml-stylesheet PI that it shouldn't bail out
> and re-start, parsing the file as XML. Surely it's better to do that
> than to carry on and just hope for the best that what's following is
> something approximating HTML?

Well, this has already been pretty well hashed out in earlier discussions,
and the W3C seems to be dropping hints that it's a no-no.

> There are many situations where people have access to simple web sites
> via some dialup ISP and have essentially no control over any mime type
> serving except via a fixed set of extension->mime type mappings.
> They can (probably) always use .xml extension but it would be nice if
> mozilla would be a bit more flexible here.

I realize that fiddling with MIME-types is a distinct pain in the rear
right now for many users, but I think that rather than giving workarounds,
we should start putting pressure on for people to fix the problem. As
I've said earlier, the text/html horse, already foundering, isn't going to
carry us through XHTML 2.0, and the earlier people start urging ISPs to
get a clue, the less painful it will be when people start deploying it. I
realize that this sounds rather uncaring, but if we really want to see
something better than tag-soup, it can't be done by browser makers alone.
Content providers have to take a hand too, whether by writing and
deploying good markup or by encouraging their ISPs to be more clueful
in supplying them with the tools they need to do their job.

(And the text/xml situation isn't as bad as setting up
content-negotiation; I'd say it's probably comparable to getting people
to map .css to text/css, which we're doing in strict mode as of last week
or so. What does .xhtml map to, again?)

> As well as xhtml documents this affects other things as well, in
> particular xslt stylesheets. Given that the stylesheet PI specifies the
> stylesheet is xslt, one may hope the browser would accept whatever was
> returned from the URI and parse it as XML (IE does this) but mozilla
> insists that the stylesheet is served as text/xml. The ISP I use from
> home for example doesn't serve .xsl files as XML, so I have to use .xml
> on XSL stylesheets. This works of course, but looks a bit odd and
> confuses beginners.
>

I think this has to do with IE's early, non-standard use of text/xsl; I
presume the mapping of .xsl->text/xsl somehow became popular (why couldn't
.css->text/css spread in the same way?), and this misconfiguration is now
enshrined at your ISP. Somewhere in the archives of n.p.m.layout.xslt,
there's probably a thread dealing with the various practical and
theoretical MIME-types for XSL delivery; you might want to drop a note to
your server admins suggesting that the file type is mapped incorrectly,
and that they should change it to application/xml (XSLT really shouldn't
be delivered as text/anything) or application/xslt+xml or whatever
MIME-type actually works in both browsers at the moment.

Thanks for the interesting tip about XSLT for XHTML in IE.

--
Chris Hoess

David Carlisle

unread,

Dec 20, 2001, 11:00:32 AM12/20/01

to cho...@force.stwing.upenn.edu, mozilla...@mozilla.org

> WOW! Does this trick really work?

yes:-)

> Wait, I'm confused. I thought Mozilla was already able to handle MathML

> without an additional XSLT stylesheet,or does this only appply to

> presentational MathML? (And how bloaty is doing that identity transform?
> Is that usable now?)

mozilla is presentation mathml only.
Obviously the stylesheet gets a bit bigger if its doing
content-presentation transform but the delay in transforming is
typically of the same order of magnitude as the delay rendering the
transform (ie no worse than having a largish table in html)
but that's if you do the transform in IE, one problem is that currently
transforms in mozilla are a lot slower (and often basically just lock
up).
However it's not too bad to have two stylesheets, one for use when there
is only presentation mathml in the file, this just does an identity
transform if it detects it is running in mozilla and has no noticable
effect. the other one does a content-presentation transform for use with
content examples, but currently don't do anything too big if you are
using it client side in mozilla.

(By the way I do have stylesheets doing this that work and will be
freely available, but they need a bit more work and documentation, so
probably after Christmas...)

> I think this has to do with IE's early, non-standard use of text/xsl; I
> presume the mapping of .xsl->text/xsl

actually my ISP serves them as text/plain :-) The point is since the
mime type psuedo attribute in the PI specifies (one way or another) that
XSLT is at the end of the URI I can't really see why the browser cares
too much about the mime type on the entity that it receives.
Just deciding it is not text/xml and so hanging forever seems
suboptimal.

> you might want to drop a note to
> your server admins suggesting that the file type is mapped incorrectly,

I did, they suggested I might liek to take up their "commercial"
offering where you get essentially your own virtual apache server
to do what you want with. Actually I may do that but that isn't really
the point, I want it to be as easy to serve mathml files as html ones.
And that isn't the case if you have to do any server side configuration
at all.

Chris Hoess

unread,

Dec 20, 2001, 3:27:23 PM12/20/01

to

In article <2001122016...@penguin.nag.co.uk>, David Carlisle wrote:
>
>> WOW! Does this trick really work?
> yes:-)

You have just made my day, if not my entire month.

>> Wait, I'm confused. I thought Mozilla was already able to handle MathML
>> without an additional XSLT stylesheet,or does this only appply to
>> presentational MathML? (And how bloaty is doing that identity transform?
>> Is that usable now?)
>
> mozilla is presentation mathml only.
> Obviously the stylesheet gets a bit bigger if its doing
> content-presentation transform but the delay in transforming is
> typically of the same order of magnitude as the delay rendering the
> transform (ie no worse than having a largish table in html)
> but that's if you do the transform in IE, one problem is that currently
> transforms in mozilla are a lot slower (and often basically just lock
> up).
> However it's not too bad to have two stylesheets, one for use when there
> is only presentation mathml in the file, this just does an identity
> transform if it detects it is running in mozilla and has no noticable
> effect. the other one does a content-presentation transform for use with
> content examples, but currently don't do anything too big if you are
> using it client side in mozilla.
>
> (By the way I do have stylesheets doing this that work and will be
> freely available, but they need a bit more work and documentation, so
> probably after Christmas...)

Well, that's fine. Right now the most important issue (IMO) is getting
MathML of any sort properly deployed, which only depends on the identity
stylesheet. For the readiness of the rest, we trust in Axel. :-)

>> I think this has to do with IE's early, non-standard use of text/xsl; I
>> presume the mapping of .xsl->text/xsl
>
> actually my ISP serves them as text/plain :-) The point is since the
> mime type psuedo attribute in the PI specifies (one way or another) that
> XSLT is at the end of the URI I can't really see why the browser cares
> too much about the mime type on the entity that it receives.
> Just deciding it is not text/xml and so hanging forever seems
> suboptimal.

I have to admit, the arguments for this are not tremendously strong
(beyond correctness) at present, but I can think of certain situations
where it could become useful. If someone decides "XSLT is just too awful,
we need a different language" and it catches on (perhaps in some browsers,
but not in others), then one could assign the XSLT stylesheet and the
X-something-else-T stylesheet to the same URL, and let the browser use
content-type negotiation to decide which one it wants to transform with.
I realize this sounds sort of pie-in-the-sky right now, but having the
browser pick nits over things like this is the only way we'll be able to
persuade people to start setting up MIME-types correctly, which is the
first step in being able to do neat stuff like content negotiation. (And,
of course, Accept: headers that can change at runtime. Gerv?)

>> you might want to drop a note to
>> your server admins suggesting that the file type is mapped incorrectly,
>
> I did, they suggested I might liek to take up their "commercial"
> offering where you get essentially your own virtual apache server
> to do what you want with. Actually I may do that but that isn't really
> the point, I want it to be as easy to serve mathml files as html ones.
> And that isn't the case if you have to do any server side configuration
> at all.
>

Hmm. I'd think that just by making everything a .xml file, XHTML+MathML,
XSLT, etc. should all map to text/xml and just work without serverside
meddling (assuming .xml->text/xml, which may or may not be true,
admittedly). Obviously, this is a bit less convenient than uploading as
.xsl, .xhtml, whatever, but it should obviate the need for server-side
configuration on any server supporting XML. (Of course, a clueful server
administrator would periodically update their MIME-types list against
<URL:http://www.isi.edu/in-notes/iana/assignments/media-types/> or use
updates from his software vendor doing so, but Code Red says all that
needs to be said about web server administrators.)

Unfortunately, as I've said before, the joint will really start jumping
when XHTML 2.0 comes out and the usual cloud of avant-garde l33t
web-designers moves to adopt the latest and greatest. At this point, ISPs
are going to have to wake up to the fact that XML is here to stay, and is
no longer a "premium service" or any of that silliness. This will be,
however, a long and painful process. There are two things we can do to
mitigate this:

1) Get MIME-types nailed down. An RFC for an XHTML MIME-type is now in
draft. (A similar draft may or may not exist for XSLT, although it would
use the same general syntax; application/xhtml+xml and
application/xslt+xml). Another draft RFC exists for Javascript, although
timeless and I would like to comment on that. (It's a bit of a
rubber-stamp of existing practice.) MIME-types have, of course, been
standardized for XML, HTML, PNG, and CSS. (Eventually there may need to
be a content-type for MathML, but inserting MathML with XHTMLMOD seems to fall
under the application/xhtml+xml aegis.) There's no draft yet, but plans
to register image/svg+xml (which won't be relevant for a while anyway).
Once MIME-types for all the formats commonly used on the web today have
been hammered out, we can move on to...

2) Evangelism. This can't be just a Mozilla effort, the scope is much to
great. We need to start waking up people at the W3C, and other
influential voices (get Zeldman to make a Proclamation, etc.) Maybe
bclary and the rest of NSEvang can tell us the best fora for making
webhosts aware of this. Developing some evangelism tools to help
sysadmins update their extension->MIME-type mappings would probably not be
too difficult either. (Apache is ridiculously easy, we have NS resources
for NS Enterprise Server, and some poor soul will have to find out what
works on IIS.)

I mostly see this as a campaign undertaken by web developers, not Mozilla
contributors, but we need to start the ball rolling. Besides, all you
MathML adopters can sit back and snicker after going through the
hassle of beating clue into your webhosts, as all the 3l33t webdesigners
try XHTML 2.0 and come to grief.

Thanks again for the tip on IE and XHTML.

--
Chris Hoess

David Carlisle

unread,

Dec 21, 2001, 5:42:37 AM12/21/01

to cho...@force.stwing.upenn.edu, mozilla...@mozilla.org

> I'd think that just by making everything a .xml file, XHTML+MathML,
> XSLT, etc. should all map to text/xml and just work without serverside
> meddling (assuming .xml->text/xml, which may or may not be true,
> admittedly). Obviously, this is a bit less convenient than uploading as
> .xsl, .xhtml, whatever, but it should obviate the need for server-side
> configuration on any server supporting XML.

Yes that works, but when I put up a "private" test release of my
stylesheets from my server one of the more common queries was why the
stylesheet was called foo.xml:-) As I say, it's more a user confusion
issue than a technical one.

> Once MIME-types for all the formats commonly used on the web today have
> been hammered out, we can move on to...

personally I believe mime types (other than really general ones like
text/xml) have limited usefulness. If I have a docbook/xml file with
embedded svg and mathml and chemml and... what mime type can it have?
The only thing to do is ship it as text or application /xml and hope the
application triggers the correct processing based on the namespaces.
I can see that for binary formats like images that are processed as a
block, mime types are useful but for textual multi-format documents, mime
typing just works at the wrong level.