We should remove the Strict DTD

David Baron

unread,

Aug 15, 2000, 3:00:00 AM8/15/00

to

Currently the parser has a module called the Strict DTD, which does
strict parsing of HTML according to the rules in the HTML 4.0 strict
DTD. Whether to use the strict DTD is decided by |nsParser ::
DetermineParseMode|, which looks at the DOCTYPE declaration and guesses
whether it is appropriate.

I think the strict DTD should be removed from the first release of
Mozilla for the following reasons:

1) No improvement to standards compliance.

The strict DTD does not improve our standards compliance. The HTML
4.0 spec does not specify error handling behavior, and traditionally
error-handling in HTML has been very lenient. The strict DTD will
break backwards-compatibility with older browsers without any
improvement to standards compliance.

The web standards community already has an approach to encouraging
better HTML on the Web: XHTML. If we want to improve HTML on the
web, our top priority should be supporting the XHTML effort.

2) User Experience

The strict DTD causes some web sites to be displayed very poorly
(see bug 42388 for examples). This causes bad user experience. It
is of no value to end users.

Furthermore, it's not all that valuable to web authors as an
authoring tool, since it doesn't report any of the errors that it
finds. HTML Validators are considerably better. It's also not
clear that all cases where authors trigger the strict DTD would be
intentional. We certainly shouldn't require web authors to be aware
of our browser - that's not what web standards are about. (For
example, an author could copy the prolog of an existing strict
document and create a modified document, testing in IE.)

3) Existing bugs make it worse than NavDTD

The strict DTD has, right now, existing bugs that make it worse in
terms of standards compliance than the Nav DTD. See, for example,
bugs 16934, 45659, and 46475. There are two options:

+ ship with these bugs, and have less standards-compliance on new
documents and discourage authors from using new standards

+ use additional development time to fix these bugs, rather than
others

Considering the limited time before release, neither of these
options seems very good, leaving the third option:

+ turn off the strict DTD

4) Forward compatibility

The strict DTD attempts to apply the rules of HTML 4.0 strict to
future documents written for currently unknown DTDs. We make no
attempt to read the DTD, but instead blindly apply the rules of HTML
4.0 strict. In future DTDs, some of the nesting requirements of
HTML 4.0 strict could be relaxed. We would then have a
forward-compatibility problem where we break future pages. (This
already happens for current pages, since the strict DTD is used for
XHTML 1.0 transitional pages served as text/html.)

If we decide to apply the strict DTD only to pages with HTML 4.0
strict DOCTYPE declarations, then it won't be used very often
in the long run, and won't be very useful.

5) DOCTYPE sniffing harder

Forward-compatibility problems in the strict DTD make our problem of
determining the parser DTD and the layout quirks mode (see [1]) much
harder. We want layout to be in quirks mode only for old documents,
and want any new documents to be in standard mode, since layout's
quirks mode is a mode with standards-compliance bugs needed to
maintain backwards-compatibility. However, putting new documents
into the parser's strict mode causes forward compatibility problems
and should not be done. This complicates the requirements of the
DOCTYPE sniffing done in |nsParser::DetermineParseMode|, which
currently has serious bugs [2] and needs to be fixed before
release.

I think Netscape should stop spending valuable engineering resources on
a project whose value is still under debate and whose quality has not
yet had the chance to benefit from the testing given to other parts of
the parser. I believe we should pull the Strict DTD from Mozilla 1.0
and Seamonkey.

-David

[1] http://www.people.fas.harvard.edu/~dbaron/mozilla/modes
[2] http://bugzilla.mozilla.org/showdependencytree.cgi?id=34662

--
L. David Baron <URL: http://www.people.fas.harvard.edu/~dbaron/ >
Rising Junior, Harvard Summer Intern, Netscape
dba...@fas.harvard.edu dba...@netscape.com

Jonas Sicking

unread,

Aug 15, 2000, 3:00:00 AM8/15/00

to

I totally agree. I see no use to try to "save" HTML using a strict parser.
HTML4 is already a lost child and the future is in XHTML. But please put
resources into making Mozilla support XHTML, *that* would really help
standards along.

/ Jonas Sicking

"David Baron" <dba...@is01.fas.harvard.edu> wrote in message
news:slrn8pj5br...@is01.fas.harvard.edu...

Henri Sivonen

unread,

Aug 15, 2000, 3:00:00 AM8/15/00

to

In article <slrn8pj5br...@is01.fas.harvard.edu>,
dba...@fas.harvard.edu wrote:

> We want layout to be in quirks mode only for old documents,
> and want any new documents to be in standard mode, since layout's
> quirks mode is a mode with standards-compliance bugs needed to
> maintain backwards-compatibility.

Can CNavDTD, as is, be used with the standard layout mode? If so, will
valid HTML pass CNavDTD without CNavDTD breaking the markup?

Running the parser in a constant quirk recovery mode and switching the
layout mode based on the doctype is what IE 5 for Mac seems to be doing.

Supposing the StrictDTD and the TransitionalDTD were abandoned in favor
of CNavDTD, would it be possible (with reasonable effort considering the
engineering resources) to make CNavDTD signal whether the document was
OK or whether the parser had to recover from errors? That is, could we
still get a HTML quality indicator or possibly later even error
reporting? The bugs are:
http://bugzilla.mozilla.org/show_bug.cgi?id=47108
http://bugzilla.mozilla.org/show_bug.cgi?id=6211

--
Henri Sivonen
hen...@clinet.fi
http://www.clinet.fi/~henris/

David Baron

unread,

Aug 15, 2000, 3:00:00 AM8/15/00

to

In article <henris-4C84B9....@uutiset.saunalahti.fi>, Henri

Sivonen wrote:
> Can CNavDTD, as is, be used with the standard layout mode? If so, will
> valid HTML pass CNavDTD without CNavDTD breaking the markup?

Yes. It's what we did in M16 and earlier.

> Supposing the StrictDTD and the TransitionalDTD were abandoned in favor
> of CNavDTD, would it be possible (with reasonable effort considering the
> engineering resources) to make CNavDTD signal whether the document was
> OK or whether the parser had to recover from errors? That is, could we
> still get a HTML quality indicator or possibly later even error
> reporting? The bugs are:
> http://bugzilla.mozilla.org/show_bug.cgi?id=47108
> http://bugzilla.mozilla.org/show_bug.cgi?id=6211

I believe CNavDTD was the DTD originally doing the quality indicator.
The Strict DTD only started being used between M16 and M17.

-David

Vidur Apparao

unread,

Aug 15, 2000, 3:00:00 AM8/15/00

to Henri Sivonen

Henri Sivonen wrote:

> ....

> Supposing the StrictDTD and the TransitionalDTD were abandoned in favor
> of CNavDTD, would it be possible (with reasonable effort considering the
> engineering resources) to make CNavDTD signal whether the document was
> OK or whether the parser had to recover from errors? That is, could we
> still get a HTML quality indicator or possibly later even error
> reporting?

I'm curious as to why error reporting for HTML is something you'd like in a
browser. If the parser/DOM component were separable from the rest of the
system, I could see it used in the authoring process and it would make sense
to have a validating mode. For now, especially given the phase of
development we're in, would you agree that a strict mode for HTML (separate
from our handling of XHTML) is something that can be defered?

--Vidur

Axel Hecht

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to

Hi,
just wanna back with my personal experience:
Cocoon automatically sets the doctype to strict, no matter what you do
in the processing itself. And it won't read the PI to set the doctype, a
bug on cocoon's side, but another source for documents displayed ugly in
mozilla right now.

Axel

Henri Sivonen

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to

In article <3999CD5D...@netscape.com>, vi...@netscape.com wrote:

> I'm curious as to why error reporting for HTML is something you'd like in
> a browser.

It would be a useful authoring aid. Moreover, it would help to put the
blame about broken pages on the page author instead of the browser.
Making it a habit to make the indicator show the green check mark would
also pave the way for an easier migration to XHTML.

> For now, especially given the phase of development we're in,
> would you agree that a strict mode for HTML (separate
> from our handling of XHTML) is something that can be defered?

Given that CNavDTD doesn't break valid HTML, yes I suppose the stirct
parsing mode could be given up. It is not a "for now" choice, however.
It is a "for good" choice. Currently, there aren't many Web authors
using HTML 4.01 Strict. If the use of HTML 4.01 Strict becomes more
popular, but the strictness isn't enforced, it can't be enforced later
without causing a bad user experience.

As for XHTML, I think it is important to handle it strictly according to
the applicable specs so that XHTML would emerge as a non-quirky markup
language. Enforcing the specs is good considering apps that generate
XHTML or apps that read it.

Eric Krock

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to

All the information I have suggests that Strict DTD support is a "nice
to have" that would require extra work, not a "must have," so we should
drop support for it from N6 RTM's first release. Unless someone provides
a convincing argument otherwise within the next 24 hours, I recommend
that we go ahead and remove Strict DTD support.

Here's a summary of the issues I wrote after a conversation with Vidur,
Harish, and Nisheeth:

Issues:
1) We are currently over our optimal target nsbeta3+ bug budget with
more being nominated daily.
2) We are dropping "nice to have" bug fixes, features, and functionality
for N6 RTM in order to deliver the "must have" functionality in the
remaining time.
3) All the reasons for including Strict DTD support I have heard to date
fall into the "nice to have" category. (If there are "must have"
reasons, please reply to this thread immediately.)
4) There are open bugs that must be fixed if we are to support Strict
DTD for RTM. Given that, and the fact that (a) we have too many nsbeta3+
bugs already, and (b) Strict DTD appears to be a "nice to have," I'm
inclined to drop Strict DTD support for the first release.

Options I'm aware of:
a) as-is: support strict parsing and strict layout; requires additional
bug fixes
b) turn off Strict DTD parsing for FCS; support sympathetic parsing and
strict layout when in Strict DTD mode (unless strict layout depends on
strict parsing--in that case turn off both); could reconsider turning on
strict parsing in a future release
c) turn off both Strict DTD parsing and Strict DTD layout

Benefits I'm aware of for Strict DTD support:
i) Useful as an error checking tool for developers who are attempting to
write strict HTML. (However, there are web reporting tools that can do
this checking for you, so this is a "nice to have," not a "must have."
RTM is a content display tool for intermediate consumers, not an error
checking tool for developers.)
ii) Useful in assisting the transition to XHTML. (Again, nice to have,
not must have.)

Open questions:
A) Are there user/developer benefits to Strict DTD support that I'm
unaware of?
B) Are there any "must have" benefits to Strict DTD support?
C) Does strict layout assume strict parsing? (e.g. does strict layout
mode depend on support for strict parsing?)

FYI, here are the bugs on Harish's list that we can Future if we drop
Strict DTD support for RTM:
44415
46475
46958
45832
46107

P.S. For the record, "full support" for XHTML, though more desirable
still than Strict DTD mode, nonetheless falls into the same "nice to
have for RTM" category. (Netscape has never committed any support at all
for XHTML in the first release of N6 or Netscape Gecko, let alone full
support.) So don't think that dropping Strict DTD support will mean that
Netscape can "reassign" resources to XHTML work between now and RTM. We
need to remain ruthlessly focused on delivering the committed
functionality on schedule.

Henri Sivonen

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to

In article <399AB8F7...@netscape.com>, Eric Krock
<ekr...@netscape.com> wrote:

> I'm inclined to drop Strict DTD support for the first release.

I am not saying you shouldn't drop it, but dropping it is likely to mean
it can't be enabled in a later release, either.

> c) turn off both Strict DTD parsing and Strict DTD layout

That would be a huge disappointment for very many people. Standards
*layout* is a "must have".

> A) Are there user/developer benefits to Strict DTD support that I'm
> unaware of?

One general benefit of enforcing standards is making life easier for
people who write software that reads and writes the format. Of course
one might argue that HTML is just too broken already and XHTML is the
way to go for people who want unambiguous markup handling.

> So don't think that dropping Strict DTD support will mean that
> Netscape can "reassign" resources to XHTML work between now and RTM.

In order to promote XHTML as a quirkless format, it is very important
that no major Mozilla-based release ships with major XHTML bugs. Are
there still some other major XHTML-specific issues to deal with than
making sure that XHTML served as text/html goes to the XML parser and
that it gets the right style sheet in the right namespace?

Rick Gessner

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to dba...@fas.harvard.edu

I'm supposed to still be on vacation, but I thought a reply was in order.

To summarize, the strictDTD was enabled for beta2 to get bug reports, and will be conditionally enabled via a user pref for final ship. Only advanced users will see the strictDTD -- unless they encounter XHTML served as text/html. The rules in that scenario are much different than for XML, and the strictDTD handles these issues.

Other comments in blue.

David Baron wrote:

Currently the parser has a module called the Strict DTD, which does
strict parsing of HTML according to the rules in the HTML 4.0 strict
DTD. Whether to use the strict DTD is decided by |nsParser ::
DetermineParseMode|, which looks at the DOCTYPE declaration and guesses
whether it is appropriate.
I think the strict DTD should be removed from the first release of
Mozilla for the following reasons:
1) No improvement to standards compliance.
   The strict DTD does not improve our standards compliance. The HTML
   4.0 spec does not specify error handling behavior, and traditionally
   error-handling in HTML has been very lenient. The strict DTD will
   break backwards-compatibility with older browsers without any
   improvement to standards compliance.

SGML does, and the strict dtd tries to follow SGML rules more closely. In fact, the rules are much more predictable, and developers who wish utilize strict documents will really appreciate the improved predictability.

   The web standards community already has an approach to encouraging
   better HTML on the Web: XHTML. If we want to improve HTML on the
   web, our top priority should be supporting the XHTML effort.

The stictDTD code path is used in cases where we're given XHTML as text/html.

2) User Experience
   The strict DTD causes some web sites to be displayed very poorly
   (see bug 42388 for examples). This causes bad user experience. It
   is of no value to end users.

True, but so does nav4X. The bugs you cite are trivial (I estimate a day or so), but what's more important is that the casual user will never see these problems. Before my sabbatical, the head of marketing for the project agreed that we could add a user settable pref to control the strictDTD. We agreed further that this would be done upon my return. So most users will NEVER see the strictDTD, but developers who wish to take advantage of these improvements will benefit. And of course, XHTML will work better too.

   Furthermore, it's not all that valuable to web authors as an
   authoring tool, since it doesn't report any of the errors that it
   finds. HTML Validators are considerably better. It's also not
   clear that all cases where authors trigger the strict DTD would be
   intentional. We certainly shouldn't require web authors to be aware
   of our browser - that's not what web standards are about. (For
   example, an author could copy the prolog of an existing strict
   document and create a modified document, testing in IE.)

There is a mode built into the parsing engine to report errors. We've disabled this for 6.0, but I've had a many requests for error reporting. (There are even more in this thread). In post-6.0 release, we may enable error reporting.

By the way, many of the other validators may dubious suggestions/corrections to bad HTML. (Ask harishd).

3) Existing bugs make it worse than NavDTD
   The strict DTD has, right now, existing bugs that make it worse in
   terms of standards compliance than the Nav DTD. See, for example,
   bugs 16934, 45659, and 46475. There are two options:
   + ship with these bugs, and have less standards-compliance on new
   documents and discourage authors from using new standards
   + use additional development time to fix these bugs, rather than
   others
   Considering the limited time before release, neither of these
   options seems very good, leaving the third option:
   + turn off the strict DTD

Too subjective. There are 5 or so bugs I know of, and they look like a day of work.

4) Forward compatibility
   The strict DTD attempts to apply the rules of HTML 4.0 strict to
   future documents written for currently unknown DTDs. We make no
   attempt to read the DTD, but instead blindly apply the rules of HTML
   4.0 strict. In future DTDs, some of the nesting requirements of
   HTML 4.0 strict could be relaxed. We would then have a
   forward-compatibility problem where we break future pages. (This
   already happens for current pages, since the strict DTD is used for
   XHTML 1.0 transitional pages served as text/html.)

I recall reading that the 4.0 DTD from the w3c would be the last for HTML. I believe that schemas are the future of web.
The question of future pages is interesting, since I don't think they will be any more HTML specs from the w3c. 6.0 won't handle schemas anyway -- so we'll have to upgrade the browser to handle them. I think this is a non-issue.

   If we decide to apply the strict DTD only to pages with HTML 4.0
   strict DOCTYPE declarations, then it won't be used very often
   in the long run, and won't be very useful.

My experience with content developers at intenetworld, and builder.net suggest otherwise. They want to make a transition -- and need a path. We provide them the first (small steps).

5) DOCTYPE sniffing harder

Forward-compatibility problems in the strict DTD make our problem of
determining the parser DTD and the layout quirks mode (see [1]) much

harder. We want layout to be in quirks mode only for old documents,

and want any new documents to be in standard mode, since layout's
quirks mode is a mode with standards-compliance bugs needed to

maintain backwards-compatibility. However, putting new documents

   into the parser's strict mode causes forward compatibility problems
   and should not be done. This complicates the requirements of the
   DOCTYPE sniffing done in |nsParser::DetermineParseMode|, which
   currently has serious bugs [2] and needs to be fixed before
   release.

Please forward me the bug numbers.

I think Netscape should stop spending valuable engineering resources on
a project whose value is still under debate and whose quality has not
yet had the chance to benefit from the testing given to other parts of
the parser. I believe we should pull the Strict DTD from Mozilla 1.0
and Seamonkey.

-David

[1] http://www.people.fas.harvard.edu/~dbaron/mozilla/modes
[2] http://bugzilla.mozilla.org/showdependencytree.cgi?id=34662

--
L. David Baron ?URL: http://www.people.fas.harvard.edu/~dbaron/ ?

Rick Gessner

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to vi...@netscape.com

We have a mode where errors are reported, but it's only debug code (not shipping with 6.0). Post 6.0, we plan to make this available to third party tool developers who want to make better debugging tools. You can come by my cube (upon my return) for a demo.

Rick

Vidur Apparao wrote:

Henri Sivonen wrote:
? ....
? Supposing the StrictDTD and the TransitionalDTD were abandoned in favor
? of CNavDTD, would it be possible (with reasonable effort considering the
? engineering resources) to make CNavDTD signal whether the document was
? OK or whether the parser had to recover from errors? That is, could we
? still get a HTML quality indicator or possibly later even error
? reporting?

I'm curious as to why error reporting for HTML is something you'd like in a

browser. If the parser/DOM component were separable from the rest of the
system, I could see it used in the authoring process and it would make sense

to have a validating mode. For now, especially given the phase of

development we're in, would you agree that a strict mode for HTML (separate
from our handling of XHTML) is something that can be defered?

--Vidur

Eric Krock

unread,

Aug 16, 2000, 3:00:00 AM8/16/00

to

I tracked down Rick on sabattical and he made the following points,
which I'll attempt to relay faithfully and accurately (any errors are mine):

1) The reason that Strict DTD support is causing any "unexpected page
layout" problems at all is that we had decided to turn on Strict DTD
layout by default for Strict DTD doctype in PR2, simply to see how much
was out there and to get the code tested, and that the agreement was
that when Rick returned from sabattical, he would add in a preference
(defaulting to false) of "Use Strict layout for Strict DTD doctypes."
(Forget whether this was agreed to be a prefs.js pref only or whether
there's supposed to be a UI; Rick knows.) Apparently this approach was
discussed and approved as a scheduled exception feature in a meeting
(perhaps PDT?) with both hamerly and laguardia present. Finishing this
work will only take him an hour. Once this is done, strict layout will
only be invoked when the page author has specified Strict DTD *and* the
user has manually changed the preference. So ordinary users will never
be affected by Strict DTD support.

2) Rick will be returning from sabbatical Monday. The bug budget has not
been calculated with his participation in mind, and he'd be happy to
take the noted Strict DTD bugs off harish's plate.

3) I pointed out that we might prefer to have him spend those same
cycles on perf and stability instead. Rick stated that turning off
Strict DTD will cause problems for our XHTML support, and that fixing
those problems would be as much work or more as avoiding them by leaving
Strict DTD in.

Net net: there is a engineer-to-engineer difference of opinion here
about the relationship between Strict DTD support and XHTML support, and
the easiest way to get XHTML well supported without introducing
regressions. So, I'm scheduling a teleconference sometime Monday (which
will be open to mozilla community members who wish to participate--email
me your phone # if interested). Concerned parties can attend/dial in.
We will review the issues, level of effort, and assignment of who does
what and reach a final decision Monday.

Jonas Sicking

unread,

Aug 16, 2000, 7:31:11 PM8/16/00

to

"Henri Sivonen" <hen...@clinet.fi> wrote in message
news:henris-B39A4E....@uutiset.saunalahti.fi...

> > So don't think that dropping Strict DTD support will mean that
> > Netscape can "reassign" resources to XHTML work between now and RTM.
>
> In order to promote XHTML as a quirkless format, it is very important
> that no major Mozilla-based release ships with major XHTML bugs. Are
> there still some other major XHTML-specific issues to deal with than
> making sure that XHTML served as text/html goes to the XML parser and
> that it gets the right style sheet in the right namespace?

Would accepting Well-Formed but not Valid (in the XML sence) be considered a
major XHTML bug? I.E would allowing <em>'s within <table>'s be very bad?
IMHO it would even though I don't think it's required by the XHTML spec.

/ Jonas Sicking

Jonas Sicking

unread,

Aug 16, 2000, 10:20:03 PM8/16/00

to

"Eric Krock" <ekr...@netscape.com> wrote in message
news:399B3C3F...@netscape.com...

> Net net: there is a engineer-to-engineer difference of opinion here
> about the relationship between Strict DTD support and XHTML support, and
> the easiest way to get XHTML well supported without introducing
> regressions. So, I'm scheduling a teleconference sometime Monday (which
> will be open to mozilla community members who wish to participate--email
> me your phone # if interested). Concerned parties can attend/dial in.
> We will review the issues, level of effort, and assignment of who does
> what and reach a final decision Monday.

A couple of questions (I'm in Sweden so I won't be able to dial in...)

1. When talking about XHTML support does that include validating the XHTML
tree or does it only include supporting all tags/entities/namespaces?

2. Would the StrictDTD be used for XHTML even if the user hasn't checked the
use-StrictDTD-parser. (I assume yes)

3. Would XHTML always be validated wether it is served as text/html or
text/xml (or text/xhtml, is it ever supposed to be that?)

4. What kind of validation will XHTML with Transitional/Frameset doctypes
get (served as either text/html or text/xml)?

5. Will XHTML containing other namespaces (MathML) still be considered
valid?

6. What will happen to XHTML pages that don't validate?

/ Jonas Sicking

David Baron

unread,

Aug 16, 2000, 10:22:39 PM8/16/00

to

In article <399B3D43...@san.rr.com>, Rick Gessner wrote:
>
> To summarize, the strictDTD was enabled for beta2 to get bug reports, and
> will be conditionally enabled via a user pref for final ship. Only advanced
> users will see the strictDTD -- unless they encounter XHTML served as
> text/html. The rules in that scenario are much different than for XML, and
> the strictDTD handles these issues.

It is wrong to use the strict DTD for parsing XHTML. It could break
future XHTML strict documents, since we don't know what future DTDs or
schemas for XHTML will look like. This is a serious forward
compatibility risk.

And, it's not only forward-compatibility. It's the present. See:
http://www.people.fas.harvard.edu/~dbaron/tmp/xhtml-transitional.html
That's dataloss. Even if you exclude DTDs with the string
"Transitional" in them, you have no assurance that you won't break
things in the future.

Furthermore, recent discussion with people in the XHTML community
has led me to believe that XHTML sent as text/html should be handled
by the XML parser. This has been filed as bug 48351.

> David Baron wrote:
>
> > 1) No improvement to standards compliance.
> >
> > The strict DTD does not improve our standards compliance. The HTML
> > 4.0 spec does not specify error handling behavior, and traditionally
> > error-handling in HTML has been very lenient. The strict DTD will
> > break backwards-compatibility with older browsers without any
> > improvement to standards compliance.
> >
>
> SGML does, and the strict dtd tries to follow SGML rules more closely. In
> fact, the rules are much more predictable, and developers who wish utilize
> strict documents will really appreciate the improved predictability.

I'd never actually heard of these rules before. I'd be interested in
a reference. If you're correct, then my point (1) is invalid, but the
Strict DTD's error handling rules need to be tested against the SGML
spec.

> > The web standards community already has an approach to encouraging
> > better HTML on the Web: XHTML. If we want to improve HTML on the
> > web, our top priority should be supporting the XHTML effort.
> >
>
> The stictDTD code path is used in cases where we're given XHTML as
> text/html.

As I said above, that's wrong.

> > 2) User Experience
> >
> > The strict DTD causes some web sites to be displayed very poorly
> > (see bug 42388 for examples). This causes bad user experience. It
> > is of no value to end users.
> >
>
> True, but so does nav4X. The bugs you cite are trivial (I estimate a day or
> so), but what's more important is that the casual user will never see these

See all the duplicates of bug 42388 before you say the casual user
will never see problems.

I have no idea what you mean by "so does nav4X". Nav 4.x does have
bugs, but I don't think it intentionally chokes on pages.

> problems. Before my sabbatical, the head of marketing for the project agreed
> that we could add a user settable pref to control the strictDTD. We agreed
> further that this would be done upon my return. So most users will NEVER see
> the strictDTD, but developers who wish to take advantage of these
> improvements will benefit. And of course, XHTML will work better too.

Is an authoring tool that doesn't report errors worth valuable
engineering time right now? Validators have much better error
reporting.

> > Furthermore, it's not all that valuable to web authors as an
> > authoring tool, since it doesn't report any of the errors that it
> > finds. HTML Validators are considerably better. It's also not
> > clear that all cases where authors trigger the strict DTD would be
> > intentional. We certainly shouldn't require web authors to be aware
> > of our browser - that's not what web standards are about. (For
> > example, an author could copy the prolog of an existing strict
> > document and create a modified document, testing in IE.)
> >
>
> There is a mode built into the parsing engine to report errors. We've
> disabled this for 6.0, but I've had a many requests for error reporting.
> (There are even more in this thread). In post-6.0 release, we may enable
> error reporting.

If you do, it could be useful as an authoring tool. But it still
probably wouldn't be as good as a true SGML validator.

> By the way, many of the other validators may dubious suggestions/corrections
> to bad HTML. (Ask harishd).

It's the nature of error reporting that sometimes the error reported is
somewhere after the error found, since computers can't always
understand what was intended. Other than that, I'd like to see these
dubious suggestions that you're talking about.

> > 3) Existing bugs make it worse than NavDTD
> >
> > The strict DTD has, right now, existing bugs that make it worse in
> > terms of standards compliance than the Nav DTD. See, for example,
> > bugs 16934, 45659, and 46475. There are two options:
> >
> > + ship with these bugs, and have less standards-compliance on new
> > documents and discourage authors from using new standards
> >
> > + use additional development time to fix these bugs, rather than
> > others
> >
> > Considering the limited time before release, neither of these
> > options seems very good, leaving the third option:
> >
> > + turn off the strict DTD
>
> Too subjective. There are 5 or so bugs I know of, and they look like a day
> of work.

Can I mark them nsbeta3+ and assign them to you? 16934 is a very
serious bug for getting web authors to write content that works
in Mozilla.

> > 4) Forward compatibility
> >
> > The strict DTD attempts to apply the rules of HTML 4.0 strict to
> > future documents written for currently unknown DTDs. We make no
> > attempt to read the DTD, but instead blindly apply the rules of HTML
> > 4.0 strict. In future DTDs, some of the nesting requirements of
> > HTML 4.0 strict could be relaxed. We would then have a
> > forward-compatibility problem where we break future pages. (This
> > already happens for current pages, since the strict DTD is used for
> > XHTML 1.0 transitional pages served as text/html.)
> >
>
> I recall reading that the 4.0 DTD from the w3c would be the last for HTML.
> I believe that schemas are the future of web.
> The question of future pages is interesting, since I don't think they will
> be any more HTML specs from the w3c. 6.0 won't handle schemas anyway -- so
> we'll have to upgrade the browser to handle them. I think this is a
> non-issue.

It's a non-issue only if you don't care about the risk of releasing
a browser that will severely restrict the development of future W3C
specs.

> > If we decide to apply the strict DTD only to pages with HTML 4.0
> > strict DOCTYPE declarations, then it won't be used very often
> > in the long run, and won't be very useful.
>
> My experience with content developers at intenetworld, and builder.net
> suggest otherwise. They want to make a transition -- and need a path. We
> provide them the first (small steps).

If they want to begin using stricter HTML, then they should use XHTML.

> > 5) DOCTYPE sniffing harder
> >
> > Forward-compatibility problems in the strict DTD make our problem of
> > determining the parser DTD and the layout quirks mode (see [1]) much
> > harder. We want layout to be in quirks mode only for old documents,
> > and want any new documents to be in standard mode, since layout's
> > quirks mode is a mode with standards-compliance bugs needed to
> > maintain backwards-compatibility. However, putting new documents
> > into the parser's strict mode causes forward compatibility problems
> > and should not be done. This complicates the requirements of the
> > DOCTYPE sniffing done in |nsParser::DetermineParseMode|, which
> > currently has serious bugs [2] and needs to be fixed before
> > release.
> >
>
> Please forward me the bug numbers.

Moved footnote up:

> > [2] http://bugzilla.mozilla.org/showdependencytree.cgi?id=34662

-David

--
L. David Baron <URL: http://www.people.fas.harvard.edu/~dbaron/ >

Henri Sivonen

unread,

Aug 17, 2000, 3:00:00 AM8/17/00

to

In article <399B3D43...@san.rr.com>, Rick Gessner

<rg...@san.rr.com> wrote:

> To summarize, the strictDTD was enabled for beta2 to get bug reports, and
> will be conditionally enabled via a user pref for final ship. Only
> advanced users will see the strictDTD

Would the StrictDTD still be enabled by default for HTML 4 Strict?
What's the point in making only advanced users see the StrictDTD?

> unless they encounter XHTML served as text/html. The rules in that
> scenario are much different than for XML, and the strictDTD handles
> these issues.

The StrictDTD doesn't perform well-formedness analysis to XHTML
documents. Given the following document, Mozilla will render it
according to the HTML rules with implicit paragraph closing tags. That's
wrong. IE 5 for Mac, on the other hand, does the Right Thing. It pops up
an error message and refuses to render the document. Enforcing the XML
error handling rules is of paramount importance in keeping applications
of XML non-quirky.

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Broken XHTML</title></head>
<body>
<p>Below this paragraph there are two HTML-style paragraphs that lack
end tags. This is in error as per the XML spec.</p>
<p>Second paragraph.
<p>Third paragraph.
</body>
</html>

Vidur Apparao

unread,

Aug 17, 2000, 3:00:00 AM8/17/00

to mozilla...@mozilla.org

In most cases where a server delivers XHTML as text/html, it is with the intent of allowing the content to be dealt with by legacy browsers. User agent sniffing could allow an author to deliver XHTML as text/xml to Mozilla and IE 5.x, but I suspect that will not happen. I agree with David - XHTML should always go through our existing XML codepath. There are two problems with this today, but I suspect we will have to live with them:
1) There are minor differences in the resulting DOM tree depending on whether the content is dealt with using the HTML or XML codepath. The most obvious of these is the casing of element tagnames. In the former case, they will be returned as all-caps; in the latter, they will be returned using the same casing as the original document. Web authors should take this difference into account when writing scripts that access element tagnames - since legacy browsers may treat XHTML as vanilla HTML, all comparisons of element tagnames should be case insensitive.
2) There is too much logic in the HTMLContentSink (the piece of code that creates HTML content based on parser notifications). Some of this code appears redundantly in the XMLContentSink, some of it is missing. As a result, XHTML elements created through the XML content sink may not always act as they should. This unfortunately means that current XHTML authors might not be able to use all XHTML elements (see bug 36790 as an example of this), Future work should include moving logic out of the content sinks into the elements themselves and improving factoring of code between the various content sinks.

If the XHTML goes through the XML codepath, well-formedness will be enforced (i.e. well-formedness errors will be displayed and further processing of the document will stop). Validity will not be enforced, since we do not currently have a validating XML parser. Again, I think we can live with this, even if entails enforcing validity in a future release. Validity is not part of our XML story for this release and I don't think we need to make an exception for XHTML.

There is an additional problem. In both cases above (HTML codepath vs XML codepath), for XHTML elements we create instances of the same HTML content classes as those used for vanilla HTML. These classes do not correctly handle attributes with namespaces other than the element's namespace (or no namespace). One place where this may be a problem is for the subset of XLink attributes that we do support. This restricts XHTML authors to using the A element for linking. Again, I believe this is something that we can live with for now and fix in a later release.

--Vidur

Eric Krock

unread,

Aug 17, 2000, 3:00:00 AM8/17/00

to

We need to get closure on this question one way or another next week, so
I've scheduled a meeting Monday 8/21 at 2 p.m. Pacific time to resolve
the Strict DTD question realtime. Interested parties are welcome to
attend. Anyone who is not in Mountain View and wishes to listen and/or
participate should email me the phone number at which we should dial
them in. In the meantime, this thread can continue and perhaps make
progress on getting everyone on the same page.

Karl Ove Hufthammer

unread,

Aug 17, 2000, 3:00:00 AM8/17/00

to

"Vidur Apparao" <vi...@netscape.com> wrote in message
news:399C1ED9...@netscape.com...

At least one of these differences is important. See this (partial) document:

<table>
<tr><td>Table cell 1</td></tr>
<tr><td>Table cell 2</td></tr>
</table>

with this style sheet:

tbody { background: red; }

If the document (using XML declaration and XHTML DOCTYPE and namespace) is
parsed as HTML, the background of the table should be red (the document has
an implicit 'tbody' surrounding the 'tr's, since 'tbody' has optional start
and end tags). If the document is parsed as X(HT)ML, it should *not* have a
red background. And with this style sheet:

table > tr { background: green; }

The document should have green rows *only* if it's being treated as X(HT)ML
('tr' is a legal child of 'table' in XHTML, but not in HTML). Theses style
rules combined should give a green background in a XHTML aware browser and a
red background in a HTML aware browser.

| If the XHTML goes through the XML codepath, well-formedness will be
enforced (i.e.
| well-formedness errors will be displayed and further processing of the
document
| will stop).

This is, IMO, extremely important. I've already seen a couple of XHTML
documents which weren't well-formed. There might be a lot more of them if
simple well-formedness isn't used. One other thing, consider the following
XHTML document:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<body>

<p>The letters æ, ø and å are commonly used in Norwegian</p>

</body>
</html>

In a normal HTML browser (i.e. Mozilla with XHTML being treadted as HTML),
the letters should be displayed as æ, ø and å. In an X(HT)ML browser, they
should be displayed as something completely different (since the encoding is
UTF-8).

| Validity will not be enforced, since we do not currently have a
| validating XML parser. Again, I think we can live with this, even if
entails
| enforcing validity in a future release. Validity is not part of our XML
story for
| this release and I don't think we need to make an exception for XHTML.

I agree.

--
Karl Ove Hufthammer

Jonas Sicking

unread,

Aug 20, 2000, 3:00:00 AM8/20/00

to

"Vidur Apparao" <vi...@netscape.com> wrote in message
news:399C1ED9...@netscape.com...
>

This is not Mozillas fault and the problem should arise in any XHTML
browser. One problem however is that this might make people write their
XHTML tags in uppercase since that would make both Mozilla and a HTML parser
return uppercase element tagnames. In other words, Mozilla would encourage
people to write bad XHTML.

The only solution to this that I can see is to do some sort of validation,
which I know involves a lot of work. Unfortuantly I think it's better to put
in NO validation rather then SOME validation because otherwise people might
start using Mozilla as a validator on thier XHTML documents ("This document
works in Mozilla so it must be ok since I've heard that Mozilla verifies
XHTML").

> 2) There is too much logic in the HTMLContentSink (the piece of code that
creates
> HTML content based on parser notifications). Some of this code appears
redundantly
> in the XMLContentSink, some of it is missing. As a result, XHTML elements
created
> through the XML content sink may not always act as they should. This
unfortunately
> means that current XHTML authors might not be able to use all XHTML
elements (see
> bug 36790 as an example of this), Future work should include moving logic
out of
> the content sinks into the elements themselves and improving factoring of
code
> between the various content sinks.

This is IMHO Mozillas biggest problem with regards to the XHTML supprot. Is
there any chance that this will be worked on?

> There is an additional problem. In both cases above (HTML codepath vs XML
> codepath), for XHTML elements we create instances of the same HTML content
classes
> as those used for vanilla HTML. These classes do not correctly handle
attributes
> with namespaces other than the element's namespace (or no namespace). One
place
> where this may be a problem is for the subset of XLink attributes that we
do
> support. This restricts XHTML authors to using the A element for linking.
Again, I
> believe this is something that we can live with for now and fix in a later
> release.

Would this affect namespaced elements (such as MathML) or just namespaced
attributes in XHTML elements?

/ Jonas Sicking

Chris Waterson

unread,

Aug 21, 2000, 3:00:00 AM8/21/00

to

Eric had me take some notes during the meeting: here's a summary. I
think I've covered the salient points. Attendees please chime in if I've
missed or botched anything.

Who
---

dbaron, ianh, ekrock, vidur, nisheeth, harishd, rickg, jar, Henri
Sivonen, Peter Anaema, waterson.

Decisions Made
--------------

- Parse XHTML delivered as text/xml "as is", using the XML content sink
with an XML document. (This is what we do currently.)

- Parse XHTML delivered as text/html using the XML content sink with an
HTML document. (Instead of using the Strict DTD, which we do today.)

- Parse "strict" HTML using NavDTD. (Instead of using the Strict DTD,
which we do today.)

- Remove Strict DTD from the build.

Rationale
---------

With respect to the XHTML decisions, there were several factors:

- Strict DTD does not allow inclusion of tags from arbitrary XML namespaces.

- Strict DTD does "fixup" on invalid XHTML, which is wrong.

- Hard-coded doctype validation will potentially cause problems going
forward as the XHTML spec evolves.

- The XML content sink, although it does not enforce validity, can be
extended to enforce validity at a later point in time.

With respect to the strict HTML decision, the rationale was primary
resource driven:

- Strict DTD was to be enabled based on a developer pref, and therefore
was only to be visible if a developer asked for it.

- Strict DTD does not directly provide end-user value, but adds code
that requires support and debugging.

- Although it possibly could do error detection and reporting, it does
not do so at this time, so it's value as a developer tool is limited.

- Since it is hard-coded, there will be skew with respect to the latest
standard and the validation that the Strict DTD performs.

Issues
------

- There are currently some potential issues with using the XML content
sink to generate an HTML document:

. <html:link> may not work, requiring use of XML
processing instruction to include stylesheets

. inline style attributes will only work on HTML
elements, not XML elements

. <script src="..."> may have problems.

- There may need to be some rework to ensure that "text/html" creates an
XML content sink when delivered an XHTML document

- There may be issues using an HTML document with the XML content sink;
e.g., namespace support may be botched.

- There may be problems handling comments using NavDTD with strict HTML.

- There are some edge cases where NavDTD will not properly handle
correct strict HTML (e.g., <table> closing <p>). May be possible to deal
with this using a flag.

Jonas Sicking

unread,

Aug 22, 2000, 3:00:00 AM8/22/00

to

"Chris Waterson" <wate...@netscape.com> wrote in message
news:39A1D088...@netscape.com...

> Decisions Made
> --------------
>
> - Parse XHTML delivered as text/xml "as is", using the XML content sink
> with an XML document. (This is what we do currently.)
>
> - Parse XHTML delivered as text/html using the XML content sink with an
> HTML document. (Instead of using the Strict DTD, which we do today.)

What would be the difference for an XHTML document beeing a HTML document or
a XML document? What is the reason for treating XHTML different when passed
as text/xml or text/html?

/ Jonas Sicking