Intent to ship: Honoring bogo-XML declaration for character encoding in text/html

99 views
Skip to first unread message

Henri Sivonen

unread,
Mar 24, 2021, 4:34:04 AMMar 24
to dev-platform
This has now landed and is expected to ride the trains in Firefox 89.

For added historical context:

Prior to HTML parsing getting specified, in addition to WebKit, also
Gecko and Presto implemented this. At the time, the specification
process paid too much attention to IE behavior as a presumed indicator
of Web-compatibility instead of looking at engine quorum.

Like WebKit, Presto kept this behavior when implementing
HTML5-compliant tokenization and tree building. That is, I was the
only browser implementor fooled into removing this behavior as part of
re-implementing parsing from the spec--not just the tokenization and
tree building layers but also the input stream layer.

What can we learn? Instead of trusting the spec and trusting other
implementors to loudly object to the parts of the spec they don't
intend to follow, proactively check what the others are doing and
adjust sooner.

On Wed, Mar 10, 2021 at 5:56 PM Henri Sivonen <hsiv...@mozilla.com> wrote:
>
> # Summary
>
> For compatibility with WebKit and Blink, honor the character encoding
> declared using the XML declaration syntax in text/html.
>
> For reasons explained in https://hsivonen.fi/utf-8-detection/ , unlike
> other encodings, UTF-8 isn't detected from content, so with the demise
> of Trident and EdgeHTML (which don't honor the XML declaration syntax
> in text/html), <?xml version="1.0" encoding="UTF-8"?> has become a
> more notable Web compat problem for us. With non-Latin scripts, the
> failure mode is particularly bad for a Web compat problem: The text is
> completely unreadable.
>
> That is, this isn't a feature for Web authors to use. This is to
> address a push factor for users when authors do use this feature.
>
> # Bug
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=673087
>
> # Standard
>
> https://github.com/whatwg/html/pull/1752
>
> # Platform coverage
>
> All
>
> # Preference
>
> To be enabled unconditionally.
>
> # DevTools bug
>
> No integration needed.
>
> # Other browsers
>
> WebKit has had this behavior for a very long time and didn't remove it
> when HTML parsing was standardized.
>
> Blink inherited this from WebKit upon forking.
>
> Trident and EdgeHTML don't have this; their demise changed the balance
> for this feature.
>
> # web-platform-tests
>
> https://hsivonen.com/test/moz/xml-decl/ contains tests which are
> wrapped for WPT as part of the Gecko patch.
>
> --
> Henri Sivonen
> hsiv...@mozilla.com



--
Henri Sivonen
hsiv...@mozilla.com
Reply all
Reply to author
Forward
0 new messages