The detailed HTML spec is still being sorted out. This is an impl based off the current safari's implementation and WPTs.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
has_customelementregistry_attr = true;Interesting that we need to support this in the XML parser. Can you clarify why we need that? Normally, the `AttributeChanged` or `ParseAttribute` paths in the HTML elements would take care of HTML attributes added during XML document parsing. And I'd hope we could avoid touching the parsers at all for this. But let me know what's special about this one, perhaps I missed it.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
has_customelementregistry_attr = true;Interesting that we need to support this in the XML parser. Can you clarify why we need that? Normally, the `AttributeChanged` or `ParseAttribute` paths in the HTML elements would take care of HTML attributes added during XML document parsing. And I'd hope we could avoid touching the parsers at all for this. But let me know what's special about this one, perhaps I missed it.
My understanding is that we'd need to do the parser change here given that later when we're creating the element in xml parser we'd need to know whether we're using global registry or null registry. In the code change below I tried to make sure we're adding minimal amount of work to regular xml parsing path. Let me know if this answered your question.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
has_customelementregistry_attr = true;Jayson ChenInteresting that we need to support this in the XML parser. Can you clarify why we need that? Normally, the `AttributeChanged` or `ParseAttribute` paths in the HTML elements would take care of HTML attributes added during XML document parsing. And I'd hope we could avoid touching the parsers at all for this. But let me know what's special about this one, perhaps I missed it.
My understanding is that we'd need to do the parser change here given that later when we're creating the element in xml parser we'd need to know whether we're using global registry or null registry. In the code change below I tried to make sure we're adding minimal amount of work to regular xml parsing path. Let me know if this answered your question.
So in both AttributeChanged and ParseAttribute, there's a params flag that tells you it's coming from the parser. That's the usual way to handle element/attribute creation when you need to do something different from the parser. E.g.
Do you think that approach might work here? Both the XML and HTML parsers, when creating elements and attributes, should go through that one path. Which would mean you wouldn't need to modify the parser code directly. If it works, it's a better approach.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
has_customelementregistry_attr = true;Jayson ChenInteresting that we need to support this in the XML parser. Can you clarify why we need that? Normally, the `AttributeChanged` or `ParseAttribute` paths in the HTML elements would take care of HTML attributes added during XML document parsing. And I'd hope we could avoid touching the parsers at all for this. But let me know what's special about this one, perhaps I missed it.
Mason FreedMy understanding is that we'd need to do the parser change here given that later when we're creating the element in xml parser we'd need to know whether we're using global registry or null registry. In the code change below I tried to make sure we're adding minimal amount of work to regular xml parsing path. Let me know if this answered your question.
So in both AttributeChanged and ParseAttribute, there's a params flag that tells you it's coming from the parser. That's the usual way to handle element/attribute creation when you need to do something different from the parser. E.g.
Do you think that approach might work here? Both the XML and HTML parsers, when creating elements and attributes, should go through that one path. Which would mean you wouldn't need to modify the parser code directly. If it works, it's a better approach.
I don't think it will work here. In line 1147, we do need to resolve the registry before we can look up the custom element definition. It will be awkward if we assume global registry, create the element and let the element get upgraded then realize in AttributeChanged that it should actually use a null registry and shouldn't be upgraded.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
has_customelementregistry_attr = true;Jayson ChenInteresting that we need to support this in the XML parser. Can you clarify why we need that? Normally, the `AttributeChanged` or `ParseAttribute` paths in the HTML elements would take care of HTML attributes added during XML document parsing. And I'd hope we could avoid touching the parsers at all for this. But let me know what's special about this one, perhaps I missed it.
Mason FreedMy understanding is that we'd need to do the parser change here given that later when we're creating the element in xml parser we'd need to know whether we're using global registry or null registry. In the code change below I tried to make sure we're adding minimal amount of work to regular xml parsing path. Let me know if this answered your question.
Jayson ChenSo in both AttributeChanged and ParseAttribute, there's a params flag that tells you it's coming from the parser. That's the usual way to handle element/attribute creation when you need to do something different from the parser. E.g.
Do you think that approach might work here? Both the XML and HTML parsers, when creating elements and attributes, should go through that one path. Which would mean you wouldn't need to modify the parser code directly. If it works, it's a better approach.
I don't think it will work here. In line 1147, we do need to resolve the registry before we can look up the custom element definition. It will be awkward if we assume global registry, create the element and let the element get upgraded then realize in AttributeChanged that it should actually use a null registry and shouldn't be upgraded.
Ok. Yeah. You're right - thanks for discussing it. That was the key thing - you need to know the definition before you even create the element.
CustomElementRegistry::DefaultRegistry(current_node_->GetDocument());Since this isn't just a trivial getter, and you'll throw it away in several cases below, perhaps just start with `nullptr` and assign it if the feature is disabled?
// that we've seen the attribute so descendants will check their parent.This is kind of a funny optimization. It means that once the attribute has been seen just once anywhere in a document, the branch below gets executed for all of the rest of the document. Not just within that subtree. That kind of feels odd, unless you think this is really a performance bottleneck? (Usually XML document parsing isn't on most benchmarks, or most common web usage.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
CustomElementRegistry::DefaultRegistry(current_node_->GetDocument());Since this isn't just a trivial getter, and you'll throw it away in several cases below, perhaps just start with `nullptr` and assign it if the feature is disabled?
My assumption is that the cases that throw the default registry away are fairly rare (using custom element registry attribute in xml), so I thought it'd make sense to just start with `DefaultRegistry`.
// that we've seen the attribute so descendants will check their parent.This is kind of a funny optimization. It means that once the attribute has been seen just once anywhere in a document, the branch below gets executed for all of the rest of the document. Not just within that subtree. That kind of feels odd, unless you think this is really a performance bottleneck? (Usually XML document parsing isn't on most benchmarks, or most common web usage.)
I think a more naive approach is that we always look at the parent element to get the registry, so the global registry will be passed down from the root element. I'm not entirely sure how much I should worry about its performance impact tbh, but if we don't think this is something we should worry, we can keep the code simple and just grab the registry from the parent element to avoid over-complicating things.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
I'll LGTM this, but see the comments.
CustomElementRegistry::DefaultRegistry(current_node_->GetDocument());Jayson ChenSince this isn't just a trivial getter, and you'll throw it away in several cases below, perhaps just start with `nullptr` and assign it if the feature is disabled?
My assumption is that the cases that throw the default registry away are fairly rare (using custom element registry attribute in xml), so I thought it'd make sense to just start with `DefaultRegistry`.
I'm ok as-is, but I guess also there's no downside really. I think the alternative would be just as readable, and wouldn't do as much work in the `has_customelementregistry_attr` case.
// that we've seen the attribute so descendants will check their parent.Jayson ChenThis is kind of a funny optimization. It means that once the attribute has been seen just once anywhere in a document, the branch below gets executed for all of the rest of the document. Not just within that subtree. That kind of feels odd, unless you think this is really a performance bottleneck? (Usually XML document parsing isn't on most benchmarks, or most common web usage.)
I think a more naive approach is that we always look at the parent element to get the registry, so the global registry will be passed down from the root element. I'm not entirely sure how much I should worry about its performance impact tbh, but if we don't think this is something we should worry, we can keep the code simple and just grab the registry from the parent element to avoid over-complicating things.
I'd (always) err on that side. Do the easy-to-understand thing first, and then optimize later (if needed).
At the minimum, if you're going to keep this approach, rename the variable from `saw_customelementregistry_attr_` to something like `document_contains_custom_element_registries_`.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
CustomElementRegistry::DefaultRegistry(current_node_->GetDocument());Jayson ChenSince this isn't just a trivial getter, and you'll throw it away in several cases below, perhaps just start with `nullptr` and assign it if the feature is disabled?
Mason FreedMy assumption is that the cases that throw the default registry away are fairly rare (using custom element registry attribute in xml), so I thought it'd make sense to just start with `DefaultRegistry`.
I'm ok as-is, but I guess also there's no downside really. I think the alternative would be just as readable, and wouldn't do as much work in the `has_customelementregistry_attr` case.
Done
// that we've seen the attribute so descendants will check their parent.Jayson ChenThis is kind of a funny optimization. It means that once the attribute has been seen just once anywhere in a document, the branch below gets executed for all of the rest of the document. Not just within that subtree. That kind of feels odd, unless you think this is really a performance bottleneck? (Usually XML document parsing isn't on most benchmarks, or most common web usage.)
Mason FreedI think a more naive approach is that we always look at the parent element to get the registry, so the global registry will be passed down from the root element. I'm not entirely sure how much I should worry about its performance impact tbh, but if we don't think this is something we should worry, we can keep the code simple and just grab the registry from the parent element to avoid over-complicating things.
I'd (always) err on that side. Do the easy-to-understand thing first, and then optimize later (if needed).
At the minimum, if you're going to keep this approach, rename the variable from `saw_customelementregistry_attr_` to something like `document_contains_custom_element_registries_`.
Yea totally understand. I think I'm just a bit worried about performance given what happened with SCER, but agree that we can start with the easy-to-understand approach first.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |