Thanks to Arthur Sonzogni for the tests, originally from
https://crrev.com/c/7706938@arthurs...@chromium.org, I'm noticing that without this change I'm only seeing two of the tests fail: `adoption_agency_shift.html` with a `DCHECK failed: Contains(old_element)` and `adoption_agency_uaf.html` with a SEGV. I'm curious if some of the other tests were failing for you in a different configuration?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
The code looks good, and I agree it'll fix the bug. Just some questions.
+securitymb@ for the mXSS question
// Disallow custom elements from being on the stack of active formatting
// elements.You might consider also saying that this isn't per-spec, but is a corner case we're willing to live with? Or something to that effect?
if (token->GetAttributeItem(html_names::kIsAttr) &&Per my offline comment, my concern is that this might (?) introduce mXSS concerns? I.e. the behavior within trees of nested formatting elements will be different, depending on whether the formatting element has `is=something`. So if a sanitizer is stripping out `is` attributes, and then serializing the resulting tree, the resulting HTML will be parsed into a different tree because of the lack of `is`.
@securitymb for comments.
Perhaps just `-crash.html` rather than `_uaf.html`? This test doesn't actually make real use of `testharness`. (Also replace "use after free" with "crash" in the content of the test?)
This comment applies to a few tests above. Perhaps all of them.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Removing myself from reviewers as I don't actually have permission to vote.
if (token->GetAttributeItem(html_names::kIsAttr) &&Per my offline comment, my concern is that this might (?) introduce mXSS concerns? I.e. the behavior within trees of nested formatting elements will be different, depending on whether the formatting element has `is=something`. So if a sanitizer is stripping out `is` attributes, and then serializing the resulting tree, the resulting HTML will be parsed into a different tree because of the lack of `is`.
@securitymb for comments.
I think this should be fine. The list of active formatting elements is meant to handle mis-nested tags. So the sanitizer should create a valid HTML structure (i.e. no mis-nested tags) and on re-parsing this algorithm shouldn't kick in.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// Disallow custom elements from being on the stack of active formatting
// elements.You might consider also saying that this isn't per-spec, but is a corner case we're willing to live with? Or something to that effect?
Better, I filed https://github.com/whatwg/html/issues/12327 and I'll link to it.
Perhaps just `-crash.html` rather than `_uaf.html`? This test doesn't actually make real use of `testharness`. (Also replace "use after free" with "crash" in the content of the test?)
This comment applies to a few tests above. Perhaps all of them.
Moved to `crashtests` subdirectory, renamed, changed them to use crashtest infrastructure (`class="test-wait"` and `TestRendered` event).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
// Disallow custom elements from being on the stack of active formatting
// elements.David BaronYou might consider also saying that this isn't per-spec, but is a corner case we're willing to live with? Or something to that effect?
Better, I filed https://github.com/whatwg/html/issues/12327 and I'll link to it.
even better
if (token->GetAttributeItem(html_names::kIsAttr) &&Michał BentkowskiPer my offline comment, my concern is that this might (?) introduce mXSS concerns? I.e. the behavior within trees of nested formatting elements will be different, depending on whether the formatting element has `is=something`. So if a sanitizer is stripping out `is` attributes, and then serializing the resulting tree, the resulting HTML will be parsed into a different tree because of the lack of `is`.
@securitymb for comments.
I think this should be fine. The list of active formatting elements is meant to handle mis-nested tags. So the sanitizer should create a valid HTML structure (i.e. no mis-nested tags) and on re-parsing this algorithm shouldn't kick in.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks to Arthur Sonzogni for the tests, originally from
https://crrev.com/c/7706938@arthurs...@chromium.org, I'm noticing that without this change I'm only seeing two of the tests fail: `adoption_agency_shift.html` with a `DCHECK failed: Contains(old_element)` and `adoption_agency_uaf.html` with a SEGV. I'm curious if some of the other tests were failing for you in a different configuration?
I'm going to go ahead and submit, but I'd still be interested in further feedback from @arthurs...@chromium.org if he has a chance.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Exportable changes to web-platform-tests were detected in this CL and a pull request in the upstream repo has been made: https://github.com/web-platform-tests/wpt/pull/58949.
When this CL lands, the bot will automatically merge the PR on GitHub if the required GitHub checks pass; otherwise, ecosystem-infra@ team will triage the failures and may contact you.
WPT Export docs:
https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md#Automatic-export-process
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Prevent custom elements from being considered formatting elements in HTML parser.
The HTML parser (in spec and implementations) has algorithms for
backwards compatibility with pre-tree implementations of HTML. One of
these deals with the concept of "active formatting elements", which are
elements which we attempt to faithfully represent mismatched end tags
for formatting purposes. This gives behavior that is compatible with
HTML implementations that predate in-memory tree representations of
the DOM.
This change forbids elements with an is= attribute from being considered
formatting elements, and thus being reopened as part of this
compatibility behavior.
This avoids needing to fix the algorithm to "reconstruct the active
formatting elements" in both the spec at
https://html.spec.whatwg.org/multipage/parsing.html#reconstruct-the-active-formatting-elements
and in the code, to deal with the possibility that custom element
constructors will run in the middle of the algorithm and mess with the
algorithm's internal state.
Thanks to Arthur Sonzogni for the tests, originally from
https://crrev.com/c/7706938 (with some small tweaks by me).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
The WPT PR for this CL has been merged upstream! https://github.com/web-platform-tests/wpt/pull/58949
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |