HTML parsing quirks required for Web compat

30 views
Skip to first unread message

Henri Sivonen

unread,
Apr 1, 2009, 8:34:15 AM4/1/09
to
What quirks mode parsing differences (if any) are considered 'must have'?

As far as I can tell, Gecko has only three HTML parsing behaviors that
differ between the quirks mode and the standards mode:
1) </br>
2) <!-->
3) <p><table>

Here are the behaviors in the top 4 implementations and HTML5 as drafted:

</br>:
Gecko quirks: break
WebKit quirks: break
Opera quirks: break
IE8 quirks: break
Gecko standards: no break
WebKit standards: no break
Opera standards: break
IE8 standards: break
HTML5: break

Would it be safe to make </br> always cause a line break? My guess is
yes.

<!-->
Gecko quirks: Complete comment.
WebKit quirks: Complete comment.
Opera quirks: Complete comment.
IE8 quirks: Complete comment.
Gecko standards: Start comment.
WebKit standards: Start comment.
Opera standards: Start comment.
IE8 standards: Complete comment.
HTML5: Complete comment.

Would it be safe to make <!--> always be a complete zero-length comment?
My guess is yes.

<p><table>
Gecko quirks: table child of p
WebKit quirks: table child of p
Opera quirks: table child of p
IE8 quirks: table child of p
Gecko standards: Implies </p>
WebKit standards: Implies </p>
Opera standards: Implies </p>
IE8 standards: Implies </p>
HTML5: Implies </p>

Would it be safe to make <table> always close the current <p>? My guess
is no. Is there evidence either way?

I'm annoyed that the HTML4 spec-wise correct implied </p> behavior was
pushed to the standards mode of each browser through Acid2. It looks a
lot like that's going to be the single quirks vs. standards difference
in HTML5 parsing. :-(

Did I miss any Gecko parser-level quirks?

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

Boris Zbarsky

unread,
Apr 1, 2009, 10:01:02 AM4/1/09
to
Henri Sivonen wrote:
> Would it be safe to make </br> always cause a line break? My guess is
> yes.

Probably yes, though this is getting pretty close to my "gah, we have to
support that?" threshold... ;)

> Would it be safe to make <!--> always be a complete zero-length comment?
> My guess is yes.

Probably, but sites might still break depending on what they do with
conditional comments.... Worth testing carefully.

> Would it be safe to make <table> always close the current <p>? My guess
> is no. Is there evidence either way?

My guess is also no. You could spin up a try-server build with this
changed in the current parser and ask people to test, maybe?

> I'm annoyed that the HTML4 spec-wise correct implied </p> behavior was
> pushed to the standards mode of each browser through Acid2. It looks a
> lot like that's going to be the single quirks vs. standards difference
> in HTML5 parsing. :-(
>
> Did I miss any Gecko parser-level quirks?

We seem to have a quirk for </p> at least. See
http://hg.mozilla.org/mozilla-central/annotate/0f030665e3ba/parser/htmlparser/src/CNavDTD.cpp#l1720

There's the comment quirk parsing in general, but I assume you're
removing that.

That seems to be it for parser quirks, yes. I also see none in the
content sink.

-Boris

Blake Kaplan

unread,
Apr 1, 2009, 1:26:34 PM4/1/09
to
Henri Sivonen <hsiv...@iki.fi> wrote:
> Would it be safe to make </br> always cause a line break? My guess is
> yes.

That'd be my guess as well.

> Would it be safe to make <!--> always be a complete zero-length comment?
> My guess is yes.

Yeah, Gecko's standards mode comment parsing does nothing but confuse authors,
afaict.

> Would it be safe to make <table> always close the current <p>? My guess
> is no. Is there evidence either way?

It'd be good for Hixie to weigh in here.

> Did I miss any Gecko parser-level quirks?

In Gecko, the document
<script><!-- document.write("</script>") // --></script>
In *quirks* mode results in
script
<!-- document.write("</script>") // -->

In *standards* mode results in

script
<!-- document.write("
// -->

I think that's it though.
--
Blake Kaplan

Jonas Sicking

unread,
Apr 1, 2009, 5:24:54 PM4/1/09
to
Boris Zbarsky wrote:
>> Would it be safe to make <table> always close the current <p>? My
>> guess is no. Is there evidence either way?
>
> My guess is also no. You could spin up a try-server build with this
> changed in the current parser and ask people to test, maybe?

I would rather ask, would it be safe to make <table> always be a child
of the <p>. My guess would be 'yes'.

/ Jonas

Boris Zbarsky

unread,
Apr 1, 2009, 5:45:42 PM4/1/09
to
Jonas Sicking wrote:
> I would rather ask, would it be safe to make <table> always be a child
> of the <p>. My guess would be 'yes'.

Make us fail ACID2, though. ;)

-Boris

Jonas Sicking

unread,
Apr 1, 2009, 6:19:43 PM4/1/09
to

If we change HTML5, then indeed we'd need to get ACID2 fixed as well.

/ Jonas

Henri Sivonen

unread,
Apr 2, 2009, 5:31:17 AM4/2/09
to
In article <RoydndQeMMtXPk7U...@mozilla.org>,
Blake Kaplan <mrb...@gmail.com> wrote:

> In Gecko, the document
> <script><!-- document.write("</script>") // --></script>
> In *quirks* mode results in
> script
> <!-- document.write("</script>") // -->
>
> In *standards* mode results in
>
> script
> <!-- document.write("
> // -->
>
> I think that's it though.

In article <m8SdnXqGedID7k7U...@mozilla.org>,
Boris Zbarsky <bzba...@mit.edu> wrote:

> We seem to have a quirk for </p> at least. See
> http://hg.mozilla.org/mozilla-central/annotate/0f030665e3ba/parser/htmlparser/
> src/CNavDTD.cpp#l1720

HTML5 specifies the Gecko quirks behavior in both of the above cases. I
assume they is safe to adopt it for the standards mode as well, because
IE8, WebKit and Opera already do in their standards modes what Gecko
only does in the quirks mode.

> There's the comment quirk parsing in general, but I assume you're
> removing that.

Is it more elaborate than <!-->?

In article <DeSdnZGWPtXfQU7U...@mozilla.org>,
Jonas Sicking <jo...@sicking.cc> wrote:

> Boris Zbarsky wrote:
> >> Would it be safe to make <table> always close the current <p>? My
> >> guess is no. Is there evidence either way?
> >

> > My guess is also no. You could spin up a try-server build with this
> > changed in the current parser and ask people to test, maybe?
>

> I would rather ask, would it be safe to make <table> always be a child
> of the <p>. My guess would be 'yes'.

Mine too.

On the other hand, Philip Taylor put together a list of dmoz pages where
closing the paragraph implicitly makes a difference (in terms of the
parser stack). I tried 10 sites on the list, and I couldn't see any
visual differences between the old Gecko parser and the HTML5 parser.
All the pages I tried has designs that were in fashion circa 1995-1999.

The list is at
http://philip.html5.org/data/table-implicitly-closed-p.txt

Boris Zbarsky

unread,
Apr 2, 2009, 11:10:59 AM4/2/09
to
Henri Sivonen wrote:
>> There's the comment quirk parsing in general, but I assume you're
>> removing that.
>
> Is it more elaborate than <!-->?

Yes. In quirks mode <!-- -- --> is a terminated comment, while in
standards mode it's not.

-Boris

Boris Zbarsky

unread,
Apr 2, 2009, 11:42:18 AM4/2/09
to
Henri Sivonen wrote:
>> There's the comment quirk parsing in general, but I assume you're
>> removing that.
>
> Is it more elaborate than <!-->?

Yes. <!-- -- --> is a terminated comment in quirks but not standards
mode at the moment. But the plan is to switch to that behavior in both
modes, right?

-Boris

Jonas Sicking

unread,
Apr 2, 2009, 4:16:35 PM4/2/09
to
> In article <DeSdnZGWPtXfQU7U...@mozilla.org>,
> Jonas Sicking <jo...@sicking.cc> wrote:
>
>> Boris Zbarsky wrote:
>>>> Would it be safe to make <table> always close the current <p>? My
>>>> guess is no. Is there evidence either way?
>>> My guess is also no. You could spin up a try-server build with this
>>> changed in the current parser and ask people to test, maybe?
>> I would rather ask, would it be safe to make <table> always be a child
>> of the <p>. My guess would be 'yes'.
>
> Mine too.
>
> On the other hand, Philip Taylor put together a list of dmoz pages where
> closing the paragraph implicitly makes a difference (in terms of the
> parser stack). I tried 10 sites on the list, and I couldn't see any
> visual differences between the old Gecko parser and the HTML5 parser.
> All the pages I tried has designs that were in fashion circa 1995-1999.

I'm all for trying to clean up HTML parsing if data indicates that it
seems safe. We're going to have to do some extensive testing no matter what.

/ Jonas

Henri Sivonen

unread,
Apr 3, 2009, 5:43:44 AM4/3/09
to
In article <aYedncMqoOFHQUnU...@mozilla.org>,
Boris Zbarsky <bzba...@mit.edu> wrote:

Yes, that's the plan.

Reply all
Reply to author
Forward
0 new messages