I think it's a good idea to conform to the spec here, so thanks for putting this patch together. Two things:
bookmark =
tree_.ActiveFormattingElements()->BookmarkFor(formatting_element);Why is this `bookmark` reassignment line here? I think it's likely a bug.
// <aside> should contain a new <b> element, because <sdide> is not anit: `aside`
How come this one is being deleted?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think it's a good idea to conform to the spec here, so thanks for putting this patch together. Two things:
- Since it's a behavior change, it should definitely be gated behind a runtime enabled feature flag (default "stable" is ok). Just in case.
- I'd like a security person to take a look too, since changes in this area can expose MXSS or sanitizer holes. I added a few folks.
Acknowledged
Thanks for the review.
I addressed some of the suggested changes and replied inline to the questions.
bookmark =
tree_.ActiveFormattingElements()->BookmarkFor(formatting_element);Why is this `bookmark` reassignment line here? I think it's likely a bug.
Given that we have removed an element from the ActiveFormattingElements list, the bookmark contains an index that is no longer valid.
// <aside> should contain a new <b> element, because <sdide> is not aJavier Fernandeznit: `aside`
Done
How come this one is being deleted?
This test has been added in the CL 1389333005 [1] as a regression tests for the crbug.com/40356377. The bug was about a change in how the styles are being added to the DOM elements. However, there was a case where the some elements in the HTML tree where re-parented, precisely as part of the Agency Adoption algorithm.
After this change, the test case defined in the test doesn't trigger such re-parenting process. I tried to find another case which produced similar behavior, but I couldn't find it.
I don't think that test cover an actual use case, but some unexpected behavior due to the Agency Algorithm's behavior implemented back then. However, I admit my thesis isn't solid enough; the fact that I couldn't find a case where a similar reparenting happens doesn't mean it doesn't exists. Hence, I appreciate any feedback on this issue.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
bookmark =
tree_.ActiveFormattingElements()->BookmarkFor(formatting_element);Javier FernandezWhy is this `bookmark` reassignment line here? I think it's likely a bug.
Given that we have removed an element from the ActiveFormattingElements list, the bookmark contains an index that is no longer valid.
During the first iteration of the inner loop, if `last_node == furthest_block`, the algorithm correctly moves the bookmark (Step 4.13.7: `bookmark.MoveToAfter(node_entry);`). If a subsequent iteration triggers the `inner_loop_counter > 3` condition, resetting the bookmark to `BookmarkFor(formatting_element)` overwrites the MoveToAfter adjustment made in the earlier iteration. This will cause the adopted elements to be inserted in the wrong position in the formatting elements list.
Since `Remove()` invalidates the existing bookmark index, we might need to handle that invalidation gracefully somehow, rather than resetting the bookmark to the original formatting_element.
Test case:
`<b><i><s><u><em><div></b><p>X</p>`
Javier FernandezHow come this one is being deleted?
This test has been added in the CL 1389333005 [1] as a regression tests for the crbug.com/40356377. The bug was about a change in how the styles are being added to the DOM elements. However, there was a case where the some elements in the HTML tree where re-parented, precisely as part of the Agency Adoption algorithm.
After this change, the test case defined in the test doesn't trigger such re-parenting process. I tried to find another case which produced similar behavior, but I couldn't find it.
I don't think that test cover an actual use case, but some unexpected behavior due to the Agency Algorithm's behavior implemented back then. However, I admit my thesis isn't solid enough; the fact that I couldn't find a case where a similar reparenting happens doesn't mean it doesn't exists. Hence, I appreciate any feedback on this issue.
Thanks for the archeology. I'm ok deleting it, given what you found.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
Javier FernandezI think it's a good idea to conform to the spec here, so thanks for putting this patch together. Two things:
- Since it's a behavior change, it should definitely be gated behind a runtime enabled feature flag (default "stable" is ok). Just in case.
- I'd like a security person to take a look too, since changes in this area can expose MXSS or sanitizer holes. I added a few folks.
Acknowledged
I'd like a security person to take a look too, since changes in this area can expose MXSS or sanitizer holes. I added a few folks.
Thanks. I don't see an additional risk here.
Sanitizer: Sanitizer deliberately runs _after_ the full parser - and thus after any re-parenting - and thus shouldn't be affected. So I don't think Sanitizer would be affected. (The only Sanitizer construct of relevance here is replace-with-children, which would see whatever final tree the HTML parser produces.)
mXSS: I think the whole re-parenting is indeed mXSS-sensitive, but I don't think this opens any opportunities that weren't there before. At the worst, people might have to adopt their POCs a little... :-D
----
If we want to be fancy, we could include an additional test in third_party/blink/web_tests/external/wpt/sanitizer-api/sethtml-tree-construction.sub.dat that ensures Sanitizer only looks at it after the full parse.
E.g. something like this, adopted from run-adoption01-data.
```
#data
<div><b><em><foo><foob><fooc><aside></b></em></div>
#config
{ "removeElements": ["em"] }
#document
| <div>
| <b>
| <aside>
| <b>
```
lgtm (from Sanitizer perspective; please still wait for other reviewers)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hi @masonf
The last submitted patch should have fixed the issue you mentioned in the last review. Could you please take a look ?
Thanks.
bookmark =
tree_.ActiveFormattingElements()->BookmarkFor(formatting_element);Javier FernandezWhy is this `bookmark` reassignment line here? I think it's likely a bug.
Mason FreedGiven that we have removed an element from the ActiveFormattingElements list, the bookmark contains an index that is no longer valid.
During the first iteration of the inner loop, if `last_node == furthest_block`, the algorithm correctly moves the bookmark (Step 4.13.7: `bookmark.MoveToAfter(node_entry);`). If a subsequent iteration triggers the `inner_loop_counter > 3` condition, resetting the bookmark to `BookmarkFor(formatting_element)` overwrites the MoveToAfter adjustment made in the earlier iteration. This will cause the adopted elements to be inserted in the wrong position in the formatting elements list.
Since `Remove()` invalidates the existing bookmark index, we might need to handle that invalidation gracefully somehow, rather than resetting the bookmark to the original formatting_element.
Test case:
`<b><i><s><u><em><div></b><p>X</p>`
Yeah, the way I overrode the bookmark was indeed buggy. I defined a new unit test with that case.
The last patch fixed the bug and I think it regenerates the bookmark correctly after removing the element form the formatting list.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
LGTM, while I agree this code is in a potentially hot mXSS area, I don't think it gives any new capabilities that weren't there before.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
LGTM, while I agree this code is in a potentially hot mXSS area, I don't think it gives any new capabilities that weren't there before.
Also I moved myself to CC, since I'm not actually permitted to vote.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks! I think that looks like the right logic.
Given the potential for bugs in this area of the code, can you please add a feature flag that guards the new behavior? So that if there are issues, we can roll it back easily? Be sure to add a comment about what the flag does, and a note that it will land in M147 and can be removed in M149.
Element* bookmark_element = bookmark.Mark()->GetElement();Can you add this, just above this line?
```
DCHECK(bookmark.Mark());
```
node_in_active_formatting_elements = false;Can you add a comment above this line:
```
// Set to false so the subsequent step (4.13.5) removes the node from
// the stack of open elements and continues the inner loop.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks! I think that looks like the right logic.
Given the potential for bugs in this area of the code, can you please add a feature flag that guards the new behavior? So that if there are issues, we can roll it back easily? Be sure to add a comment about what the flag does, and a note that it will land in M147 and can be removed in M149.
Done
Element* bookmark_element = bookmark.Mark()->GetElement();Can you add this, just above this line?
```
DCHECK(bookmark.Mark());
```
Done
bookmark =
tree_.ActiveFormattingElements()->BookmarkFor(formatting_element);Javier FernandezWhy is this `bookmark` reassignment line here? I think it's likely a bug.
Mason FreedGiven that we have removed an element from the ActiveFormattingElements list, the bookmark contains an index that is no longer valid.
Javier FernandezDuring the first iteration of the inner loop, if `last_node == furthest_block`, the algorithm correctly moves the bookmark (Step 4.13.7: `bookmark.MoveToAfter(node_entry);`). If a subsequent iteration triggers the `inner_loop_counter > 3` condition, resetting the bookmark to `BookmarkFor(formatting_element)` overwrites the MoveToAfter adjustment made in the earlier iteration. This will cause the adopted elements to be inserted in the wrong position in the formatting elements list.
Since `Remove()` invalidates the existing bookmark index, we might need to handle that invalidation gracefully somehow, rather than resetting the bookmark to the original formatting_element.
Test case:
`<b><i><s><u><em><div></b><p>X</p>`
Yeah, the way I overrode the bookmark was indeed buggy. I defined a new unit test with that case.
The last patch fixed the bug and I think it regenerates the bookmark correctly after removing the element form the formatting list.
Done
Can you add a comment above this line:
```
// Set to false so the subsequent step (4.13.5) removes the node from
// the stack of open elements and continues the inner loop.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (node == formatting_element_item) {
break;
}Thanks for adding the flag, but this branch still changes position relative to the old code.
Also, and more critically, I think the new logic (with the flag disabled) just doesn't check `inner_loop_counter` vs. `kInnerIterationLimit` at all. I think that's a potential infinite loop.
It might be a good idea to just add a big if/else block for the runtime feature, with exactly the old code in the `else` block. That way it's easy to see from inspection that the flag-disabled behavior is exactly the same.
// formatting list when the internal loop reached the maximum number of iterations.nit: line wrap this please.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks @masonf for the review.
I applied the suggested changes, but left some comments inline as well.
if (node == formatting_element_item) {
break;
}Thanks for adding the flag, but this branch still changes position relative to the old code.
Also, and more critically, I think the new logic (with the flag disabled) just doesn't check `inner_loop_counter` vs. `kInnerIterationLimit` at all. I think that's a potential infinite loop.
It might be a good idea to just add a big if/else block for the runtime feature, with exactly the old code in the `else` block. That way it's easy to see from inspection that the flag-disabled behavior is exactly the same.
I agree a big else with the original code is safer, so I will do that. However, I'd like to reply to some of your comments on the new code, just in case there are still issues than need more careful review.
The inner_loop_counter is checked against the kInnerIterationLimit before the flag-protected code, so it's indeed checked independently of the flag's value. However, the codepath enabled by this check doesn't affect the inner-loop at all; it just decide whether to remove the node form the formatting list.
The inner_loop is only broken when "node" is the "formatting_element_item". I think this condition will always be reached, because node it's initialized to the furthers_block in the Open Elements list and the loop just traverses the list until finding the formatting_element.
However, this infinite-loop issue was the main concerns I had when I changed the "for" loop by a "while(true)" one. I even wondered if one of the the reason this step defined in the spec was not implemented was precisely to force a "for" loop.
// formatting list when the internal loop reached the maximum number of iterations.nit: line wrap this please.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
while (true) {I think just this inner `while()` loop needs to be guarded by the flag, right? Isn't all of the stuff above this shared? And below the loop also?
if (node == formatting_element_item) {
break;
}Javier FernandezThanks for adding the flag, but this branch still changes position relative to the old code.
Also, and more critically, I think the new logic (with the flag disabled) just doesn't check `inner_loop_counter` vs. `kInnerIterationLimit` at all. I think that's a potential infinite loop.
It might be a good idea to just add a big if/else block for the runtime feature, with exactly the old code in the `else` block. That way it's easy to see from inspection that the flag-disabled behavior is exactly the same.
I agree a big else with the original code is safer, so I will do that. However, I'd like to reply to some of your comments on the new code, just in case there are still issues than need more careful review.
The inner_loop_counter is checked against the kInnerIterationLimit before the flag-protected code, so it's indeed checked independently of the flag's value. However, the codepath enabled by this check doesn't affect the inner-loop at all; it just decide whether to remove the node form the formatting list.
The inner_loop is only broken when "node" is the "formatting_element_item". I think this condition will always be reached, because node it's initialized to the furthers_block in the Open Elements list and the loop just traverses the list until finding the formatting_element.
However, this infinite-loop issue was the main concerns I had when I changed the "for" loop by a "while(true)" one. I even wondered if one of the the reason this step defined in the spec was not implemented was precisely to force a "for" loop.
Yeah, my concern is that the "before" code was `for j=0;j< kInnerIterationLimit;++j)` which will always exit. The new code is a `while(true)` that will never break as a result of hitting the iteration limit. I added a request for a comment that should make this more clear.
if (node == formatting_element_item) {Please add this comment here:
```
// This loop is guaranteed to terminate because formatting_element_item
// is always present higher up in the stack of open elements.
```
// BookmarkFor creates a fresh bookmark so we need to restore its
// original state.nit: perhaps
```
// BookmarkFor creates a 'before' bookmark; if the original was moved, we
// must explicitly convert it back to an 'after' bookmark relative to its
// current mark.
```
status: "experimental",This should be "stable" I think. The flag is just in case there are compat problems from this change.
Also, sorry, but M147 just branched, so this should be updated to M148 and M150, respectively.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think it's a good idea to conform to the spec here, so thanks for putting this patch together. Two things:
- Since it's a behavior change, it should definitely be gated behind a runtime enabled feature flag (default "stable" is ok). Just in case.
- I'd like a security person to take a look too, since changes in this area can expose MXSS or sanitizer holes. I added a few folks.
Daniel VogelheimAcknowledged
I'd like a security person to take a look too, since changes in this area can expose MXSS or sanitizer holes. I added a few folks.
Thanks. I don't see an additional risk here.
Sanitizer: Sanitizer deliberately runs _after_ the full parser - and thus after any re-parenting - and thus shouldn't be affected. So I don't think Sanitizer would be affected. (The only Sanitizer construct of relevance here is replace-with-children, which would see whatever final tree the HTML parser produces.)
mXSS: I think the whole re-parenting is indeed mXSS-sensitive, but I don't think this opens any opportunities that weren't there before. At the worst, people might have to adopt their POCs a little... :-D
----
If we want to be fancy, we could include an additional test in third_party/blink/web_tests/external/wpt/sanitizer-api/sethtml-tree-construction.sub.dat that ensures Sanitizer only looks at it after the full parse.
E.g. something like this, adopted from run-adoption01-data.
- If Sanitizer runs before adoption agency, then result should be <div><b> only.
- In the current state (before this patch), then result should have an empty <aside>
```
#data
<div><b><em><foo><foob><fooc><aside></b></em></div>
#config
{ "removeElements": ["em"] }
#document
| <div>
| <b>
| <aside>
| <b>
```
I have finally added this new test case as well.
Thanks for the review.
I have updated the comments as suggested.
The last patch contains the refactoring to guard the inner loop only.
Additionally, I have added the suggested test case in the sanitizerAPI folder.
while (true) {I think just this inner `while()` loop needs to be guarded by the flag, right? Isn't all of the stuff above this shared? And below the loop also?
Yes, only the inner loop needs to be guarded, but the change looked more messy and difficult to review (probably because of the change in the comments). I can give it a try again, and decide then which one we want to land.
if (node == formatting_element_item) {
break;
}Javier FernandezThanks for adding the flag, but this branch still changes position relative to the old code.
Also, and more critically, I think the new logic (with the flag disabled) just doesn't check `inner_loop_counter` vs. `kInnerIterationLimit` at all. I think that's a potential infinite loop.
It might be a good idea to just add a big if/else block for the runtime feature, with exactly the old code in the `else` block. That way it's easy to see from inspection that the flag-disabled behavior is exactly the same.
Mason FreedI agree a big else with the original code is safer, so I will do that. However, I'd like to reply to some of your comments on the new code, just in case there are still issues than need more careful review.
The inner_loop_counter is checked against the kInnerIterationLimit before the flag-protected code, so it's indeed checked independently of the flag's value. However, the codepath enabled by this check doesn't affect the inner-loop at all; it just decide whether to remove the node form the formatting list.
The inner_loop is only broken when "node" is the "formatting_element_item". I think this condition will always be reached, because node it's initialized to the furthers_block in the Open Elements list and the loop just traverses the list until finding the formatting_element.
However, this infinite-loop issue was the main concerns I had when I changed the "for" loop by a "while(true)" one. I even wondered if one of the the reason this step defined in the spec was not implemented was precisely to force a "for" loop.
Yeah, my concern is that the "before" code was `for j=0;j< kInnerIterationLimit;++j)` which will always exit. The new code is a `while(true)` that will never break as a result of hitting the iteration limit. I added a request for a comment that should make this more clear.
Added a comment to clarify the while(true) exit is guaranteed.
Please add this comment here:
```
// This loop is guaranteed to terminate because formatting_element_item
// is always present higher up in the stack of open elements.
```
Done
// BookmarkFor creates a fresh bookmark so we need to restore its
// original state.nit: perhaps
```
// BookmarkFor creates a 'before' bookmark; if the original was moved, we
// must explicitly convert it back to an 'after' bookmark relative to its
// current mark.
```
Done
This should be "stable" I think. The flag is just in case there are compat problems from this change.
Also, sorry, but M147 just branched, so this should be updated to M148 and M150, respectively.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |