Dominik Röttsches (Gerrit)

unread,

Oct 28, 2025, 12:57:48 PM (3 days ago) Oct 28

to David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron

New activity on the change

Open in Gerrit

Related details

Attention is currently required from:

David Baron

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

AI Code Reviewer (Gerrit)

unread,

Oct 28, 2025, 12:59:37 PM (3 days ago) Oct 28

to Dominik Röttsches, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron

AI Code Reviewer added 2 comments

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h

Line 127, Patchset 15 (Latest): bool saw_error_ = false;

AI Code Reviewer . unresolved

nit: Blink Style Guide: Naming - Precede boolean values with words like “is” and “did”. Please consider renaming `saw_error_` to `did_see_error_`.

To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason

This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

Line 103, Patchset 15 (Latest): void end();

AI Code Reviewer . unresolved

Blink Style Guide: Naming - Use 'CamelCase' for all function names. All function names should use `CamelCase()`-style names, beginning with an uppercase letter. Please consider renaming `end` to `End`.

To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason

This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

Open in Gerrit

Related details

Attention is currently required from:

David Baron

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Oct 28, 2025, 1:03:26 PM (3 days ago) Oct 28

to AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron

Dominik Röttsches added 2 comments

Patchset-level comments

File-level comment, Patchset 15 (Latest):

Dominik Röttsches . resolved

reillig@, PTAL at `omnibox_api_interactive_test.cc` & `web_navigation_apitest.cc`
tkent@, PTAL at `audit_non_blink_usage.py`

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h

Line 127, Patchset 15 (Latest): bool saw_error_ = false;

AI Code Reviewer . resolved

nit: Blink Style Guide: Naming - Precede boolean values with words like “is” and “did”. Please consider renaming `saw_error_` to `did_see_error_`.
To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason
This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

Dominik Röttsches

Won't fix: Keep consistent with `XMLDocumentParser`.

satisfied_requirement

unsatisfied_requirement

open

diffy

David Baron (Gerrit)

unread,

Oct 28, 2025, 3:30:58 PM (3 days ago) Oct 28

to Dominik Röttsches, Reilly Grant, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Dominik Röttsches, Kent Tamura and Reilly Grant

David Baron added 11 comments

Patchset-level comments

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.

I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.

I also wouldn't be offended if you prefer to find a different reviewer given that.

Commit Message

Line 9, Patchset 15 (Latest):Introduce an XMLDocumentParserRs, which is instantiated in Document

David Baron . resolved

This feels a little bit like it's undoing https://chromium.googlesource.com/chromium/src/+/c99c39aed8f0f6007a7a5ce8565d91e353ed332b%5E%21/ except I don't think it is because that (and followup refactorings) were removing a permanent mechanism for having different XML libraries, whereas this is intended to be temporary so it makes sense for it to be structured differently (e.g., via flags).

Line 32, Patchset 15 (Latest):Bug: chromium:435623334

David Baron . unresolved

Should this really be going to the XSLT bug, or is there a more appropriate bug? (Also, no need for `chromium:`.)

File chrome/browser/extensions/api/omnibox/omnibox_api_interactive_test.cc

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.

(I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)

File third_party/blink/renderer/core/BUILD.gn

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

If you consider yourself an expert on rust `BUILD.gn` stuff at this point, then this is fine... but if not I'd appreciate if you could ask someone who you do consider to be an expert in that to review this file.

File third_party/blink/renderer/core/frame/document_loading_rendering_test.cc

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

I guess this is OK if you're reasonably confident we're going to be able to live with this behavior change, but otherwise I might defer making this sort of test change until you are or until you feel close to enabling the flag (or maybe you do now?).

File third_party/blink/renderer/core/xml/parser/entities.rs

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

I didn't review the correctness of the contents of this file -- I'm assuming you got it from a reliable source.

Line 5, Patchset 15 (Latest):// https://www.w3.org/MarkUp/DTD/xhtml-lat1.ent

David Baron . unresolved

I'm not sure whether this URL or https://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent is a better URL to reference. The files are formatted differently but hopefully have identical contents...

(Same for the other 2 parts.)

At least they're all the same other than comments and the order of the first few entries of the special map.

Line 235, Patchset 15 (Latest):// https://www.w3.org/MarkUp/DTD/xhtml-special.ent

David Baron . unresolved

Note that this excludes `&lt` and `&amp` which are in that file, though not the other three entities in https://www.w3.org/TR/REC-xml/#sec-predefined-ent . Might be worth revising the comment but it's a little inconsistent with the comment right now.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h

Line 31, Patchset 15 (Latest):class XMLDocumentParserRs final : public ScriptableDocumentParser,

David Baron . unresolved

Is it worth considering using `Rust` in the name rather than `Rs`? It seems like it might be a little clearer.

File third_party/blink/web_tests/platform/linux/virtual/rust-xml/fast/parser/external-entities-expected.txt

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

I think it makes sense to move the three *text* expectations (but not the image expectation) to not have `platform/linux/` in the path.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches
Kent Tamura
Reilly Grant

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Reilly Grant (Gerrit)

unread,

Oct 28, 2025, 4:56:22 PM (3 days ago) Oct 28

to Dominik Röttsches, Reilly Grant, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Dominik Röttsches and Kent Tamura

Reilly Grant added 1 comment

File chrome/browser/extensions/api/omnibox/omnibox_api_interactive_test.cc

File-level comment, Patchset 15 (Latest):

David Baron . unresolved

Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.
(I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)

Reilly Grant

I'm okay landing these changes though I'm a little confused why parsing `<tag><match></match>` is throwing an "Unexpected closing tag" error when the problem is a _missing_ closing tag (or unexpected end of file).

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches
Kent Tamura

satisfied_requirement

unsatisfied_requirement

open

diffy

Kent Tamura (Gerrit)

unread,

Oct 28, 2025, 10:46:35 PM (3 days ago) Oct 28

to Dominik Röttsches, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron and Dominik Röttsches

Kent Tamura added 10 comments

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

Line 34, Patchset 15 (Latest):static inline bool HasNoStyleInformation(Document* document) {

Kent Tamura . unresolved

`static` is unnecessary.

Line 55, Patchset 15 (Latest):String rustStrToWtfString(rust::Str str) {

Kent Tamura . unresolved

The name should be `RustStrToWtfString`.

Line 62, Patchset 15 (Latest):void extract_doctype_parts(const std::string& doctype_string,

Kent Tamura . unresolved

The name should be `ExtractDocTypeParts`.

Line 241, Patchset 15 (Latest): GetDocument()->setXMLVersion(String(version), ASSERT_NO_EXCEPTION);

Kent Tamura . unresolved

Don't we use `rustStrToWtfString()`?

Line 255, Patchset 15 (Latest):static inline bool HandleNamespaceAttributes(

Kent Tamura . unresolved

We prefer functions in anonymous namespace to `static` functions.

Line 308, Patchset 15 (Latest):static inline bool CollectElementAttributes(

Kent Tamura . unresolved

Ditto.

Line 335, Patchset 15 (Latest):static inline void SetAttributes(

Kent Tamura . unresolved

Ditto.

File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h

Line 17, Patchset 15 (Latest): kNoXMlDeclaration,

Kent Tamura . unresolved

`kNoXMl` -> `kNoXml`

File third_party/blink/web_tests/NeverFixTests

Line 1858, Patchset 15 (Latest):# XSLT tests disabled with Rust XML parser, no support for XSLT

Kent Tamura . unresolved

Does it mean we can't ship the Rust XML parser until we remove XSLT support completely?

File third_party/blink/web_tests/VirtualTestSuites

Line 5983, Patchset 15 (Latest): "bases": [

Kent Tamura . unresolved

This adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Oct 29, 2025, 6:23:33 AM (2 days ago) Oct 29

to Łukasz Anforowicz, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura, Reilly Grant and Łukasz Anforowicz

Dominik Röttsches added 22 comments

Patchset-level comments

File-level comment, Patchset 15:

David Baron . resolved

Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.
I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.
I also wouldn't be offended if you prefer to find a different reviewer given that.

Dominik Röttsches

Thanks - I'll add Łukasz for the FFI parts.

File-level comment, Patchset 17 (Latest):

Dominik Röttsches . resolved

@luk...@chromium.org Could you please take a look at the FFI parts towards the Rust XML library of this change including the `core/BUILD.gn` change?

Others, thanks for the reviews so far. I addressed most comments, some feedback/clarifications below.

Commit Message

Line 32, Patchset 15:Bug: chromium:435623334

David Baron . resolved

Should this really be going to the XSLT bug, or is there a more appropriate bug? (Also, no need for `chromium:`.)

Dominik Röttsches

Yes, this should be 441911594.

File chrome/browser/extensions/api/omnibox/omnibox_api_interactive_test.cc

File-level comment, Patchset 15:

David Baron . resolved

Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.
(I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)

Reilly Grant

I'm okay landing these changes though I'm a little confused why parsing `<tag><match></match>` is throwing an "Unexpected closing tag" error when the problem is a _missing_ closing tag (or unexpected end of file).

Dominik Röttsches

I'll split these off as the flag remains default off in this test and to meet @dbaron's expectations for keeping this CL a little smaller.

The assertion failure without modifying the expectation is:
```
[232535:232535:1029/115906.485593:INFO:CONSOLE:3] "[FAIL] invalidXml_NoClosingTag: error on line 1 at column 53: Unexpected closing tag: fragment != tag
```
Which means that the `description:` argument

```
chrome.omnibox.setDefaultSuggestion(
                     {description: '<tag> <match>match</match> world'},
```

is not the full XML, but there is a `</fragment>` closing tag in surrounding XML.

Semantically libxml2's and the Rust XML parser's error message align here.

File third_party/blink/renderer/core/BUILD.gn

File-level comment, Patchset 15:

David Baron . resolved

If you consider yourself an expert on rust `BUILD.gn` stuff at this point, then this is fine... but if not I'd appreciate if you could ask someone who you do consider to be an expert in that to review this file.

Dominik Röttsches

Not sure if expert, but I have done a few of these by now - I think this is a pretty standard snippet for adding a Rust library with FFI. But see separate comment, I will add Łukasz for Rust/FFI parts of this change.

File third_party/blink/renderer/core/frame/document_loading_rendering_test.cc

File-level comment, Patchset 15:

David Baron . resolved

I guess this is OK if you're reasonably confident we're going to be able to live with this behavior change, but otherwise I might defer making this sort of test change until you are or until you feel close to enabling the flag (or maybe you do now?).

Dominik Röttsches

I'll split this off. For development purposes, and for running a separate trybot, it's useful to have the unit tests already pass with the Rust XML parser.

File third_party/blink/renderer/core/xml/parser/entities.rs

File-level comment, Patchset 15:

David Baron . resolved

I didn't review the correctness of the contents of this file -- I'm assuming you got it from a reliable source.

Dominik Röttsches

Ack, I generated these locally by transforming spec reference .ent files.

Line 5, Patchset 15:// https://www.w3.org/MarkUp/DTD/xhtml-lat1.ent

David Baron . resolved

I'm not sure whether this URL or https://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent is a better URL to reference. The files are formatted differently but hopefully have identical contents...
(Same for the other 2 parts.)
At least they're all the same other than comments and the order of the first few entries of the special map.

Dominik Röttsches

Changed link to the one you're suggesting - here and below.

Line 235, Patchset 15:// https://www.w3.org/MarkUp/DTD/xhtml-special.ent

David Baron . resolved

Note that this excludes `&lt` and `&amp` which are in that file, though not the other three entities in https://www.w3.org/TR/REC-xml/#sec-predefined-ent . Might be worth revising the comment but it's a little inconsistent with the comment right now.

Dominik Röttsches

Removed gt, apos, quot and clarified comment.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h

Line 103, Patchset 15: void end();

AI Code Reviewer . resolved

Blink Style Guide: Naming - Use 'CamelCase' for all function names. All function names should use `CamelCase()`-style names, beginning with an uppercase letter. Please consider renaming `end` to `End`.

To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason
This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

Dominik Röttsches

Invalid: This was a leftover and the function does no longer exist in the impl file.

Line 31, Patchset 15:class XMLDocumentParserRs final : public ScriptableDocumentParser,

David Baron . resolved

Is it worth considering using `Rust` in the name rather than `Rs`? It seems like it might be a little clearer.

Dominik Röttsches

Rs seems to me to be a pretty standard suffix, compare many crate names ending on -rs https://crates.io/search?q=-rs and the Rust source file extension being .rs - but I am not insisting on that if you'd prefer it to be changed.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

Line 34, Patchset 15:static inline bool HasNoStyleInformation(Document* document) {

Kent Tamura . resolved

`static` is unnecessary.

Dominik Röttsches

Done

Line 55, Patchset 15:String rustStrToWtfString(rust::Str str) {

Kent Tamura . resolved

The name should be `RustStrToWtfString`.

Dominik Röttsches

Done

Line 62, Patchset 15:void extract_doctype_parts(const std::string& doctype_string,

Kent Tamura . resolved

The name should be `ExtractDocTypeParts`.

Dominik Röttsches

Done

Line 241, Patchset 15: GetDocument()->setXMLVersion(String(version), ASSERT_NO_EXCEPTION);

Kent Tamura . resolved

Don't we use `rustStrToWtfString()`?

Dominik Röttsches

Good catch, thanks. Done.

Line 255, Patchset 15:static inline bool HandleNamespaceAttributes(

Kent Tamura . resolved

We prefer functions in anonymous namespace to `static` functions.

Dominik Röttsches

Done

Line 308, Patchset 15:static inline bool CollectElementAttributes(

Kent Tamura . resolved

Ditto.

Dominik Röttsches

Done

Line 335, Patchset 15:static inline void SetAttributes(

Kent Tamura . resolved

Ditto.

Dominik Röttsches

Done

File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h

Line 17, Patchset 15: kNoXMlDeclaration,

Kent Tamura . resolved

`kNoXMl` -> `kNoXml`

Dominik Röttsches

Done

File third_party/blink/web_tests/NeverFixTests

Line 1858, Patchset 15:# XSLT tests disabled with Rust XML parser, no support for XSLT

Kent Tamura . resolved

Does it mean we can't ship the Rust XML parser until we remove XSLT support completely?

Dominik Röttsches

I am investigating transition options: We could use it right away for SVG files or all non-XSLT cases if we are willing to run a bit of a scan up front. We could also potentially run it as an internal dogfood right away.

File third_party/blink/web_tests/VirtualTestSuites

Line 5983, Patchset 15: "bases": [

Kent Tamura . resolved

This adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

Dominik Röttsches

Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.

File third_party/blink/web_tests/platform/linux/virtual/rust-xml/fast/parser/external-entities-expected.txt

File-level comment, Patchset 15:

David Baron . resolved

I think it makes sense to move the three *text* expectations (but not the image expectation) to not have `platform/linux/` in the path.

Dominik Röttsches

Removed these as @tkent asked me to reduce set of virtual tests.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Kent Tamura
Reilly Grant
Łukasz Anforowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Oct 29, 2025, 6:29:11 AM (2 days ago) Oct 29

to Łukasz Anforowicz, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura, Reilly Grant and Łukasz Anforowicz

Dominik Röttsches added 1 comment

Patchset-level comments

File-level comment, Patchset 15:

David Baron . resolved

Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.
I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.
I also wouldn't be offended if you prefer to find a different reviewer given that.

Dominik Röttsches

Thanks - I'll add Łukasz for the FFI parts.

Dominik Röttsches

PS: Perhaps to reiterate the context, as mentioned in the commit message, most of the behavior of the new `XMLDocumentParserRs` is ported from `XMLDocumentParser` which does a couple of interesting gymnastics to comply with state transition expectations from `Document` and `DocumentParser`.

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Oct 29, 2025, 9:45:22 AM (2 days ago) Oct 29

to Łukasz Anforowicz, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura, Reilly Grant and Łukasz Anforowicz

Dominik Röttsches added 1 comment

File third_party/blink/web_tests/VirtualTestSuites

Line 5983, Patchset 15: "bases": [

Kent Tamura . resolved

This adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

Dominik Röttsches

Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.

Dominik Röttsches

I found information on how to set up a separate trybot, but I would also like to explore starting to use the Rust XML parser for parts of real world workloads - which would make it part of mainline testing. I'll get back on that.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Kent Tamura
Reilly Grant
Łukasz Anforowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Łukasz Anforowicz (Gerrit)

unread,

Oct 29, 2025, 1:15:27 PM (2 days ago) Oct 29

to Dominik Röttsches, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Dominik Röttsches and Kent Tamura

Łukasz Anforowicz added 19 comments

Commit Message

Line 17, Patchset 19 (Latest):This by now fairly complete in the sense of reliably passing a wide set

Łukasz Anforowicz . unresolved

Missing word?: "This is by now fairly complete"?

File third_party/blink/renderer/core/BUILD.gn

Line 255, Patchset 19 (Latest): build_native_rust_unit_tests = true

Łukasz Anforowicz . unresolved

Is `build_native_rust_unit_tests` needed and intentional here? The official [guidance](https://google.github.io/comprehensive-rust/chromium/testing.html#:~:text=Native%20Rust%20tests%20(i.e.%20%23%5Btest%5D).%20Discouraged%20outside%20of%20//third_party/rust.) that the team landed on is that "Native Rust tests [are] discouraged outside of //third_party/rust.".

PS. Maybe we should copy this guidance to `//docs/rust/...` or somewhere else?

Line 256, Patchset 19 (Latest):}

Łukasz Anforowicz . unresolved

I am not a GN expert, but I was a bit surprised that some new `.h` / `.cc` files from this CL do not have corresponding `BUILD.gn` changes. Examples:

`third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc`
`third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h`

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 25, Patchset 19 (Latest):#[allow(unused)]

Łukasz Anforowicz . unresolved

Can you add a comment why `#[allow(unused)]` is needed + if this is expected to stay here forever or be fixed in the future?

PS. In case it helps, let me share that some guidance about unused-code-warnings have recently been added to https://chromium.googlesource.com/chromium/src/+/main/docs/rust/dev_experience_tips_and_tricks.md#suppressing-unused-code-warnings

Line 28, Patchset 19 (Latest): saw_error: bool,

Łukasz Anforowicz . unresolved

Is `self.saw_error` the same as `self.error_details.is_some()`?

Line 36, Patchset 19 (Latest): let cursor = Cursor::new(Vec::new());

Łukasz Anforowicz . unresolved

This LGTM, but I wanted to share some notes/thoughts:

`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.

Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

Line 105, Patchset 19 (Latest):

                    let standalone = standalone
                        .map(|is_standalone| if is_standalone { 1 } else { 0 })
                        .unwrap_or_else(|| {
                            if xml_declaration_view.is_some() {
                                return -2; // <?xml instruction present, but
                                           // standalone not specified
                            }
                            -1 // No XML declaration at all
                        });

Łukasz Anforowicz . unresolved

Would it be worth adding a comment saying where those constants come from? I assume they correspond to some constants on the C++ side?

Line 151, Patchset 19 (Latest):

                        &mut attributes,
                        &mut new_namespaces,

Łukasz Anforowicz . unresolved

It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?

Line 191, Patchset 19 (Latest):fn get_last_event_position(read_state: &XmlReadState, row: &mut u64, col: &mut u64) -> bool {

Łukasz Anforowicz . unresolved

nit: Maybe name this `try_get_last_event_position` to indicate it might fail?

Line 207, Patchset 19 (Latest):fn get_error_details(

Łukasz Anforowicz . unresolved

nit: `try_get_error_details`?

Line 219, Patchset 19 (Latest): .map_or(error_string.clone(), |pos| error_string[pos + 1..].to_string());

Łukasz Anforowicz . unresolved

nitty nit: IIUC this `clone` will execute unconditionally, even when the other branch/lamda will be taken. Can this be avoided?

Line 221, Patchset 19 (Latest): *col = error_details.position().column as u64;

Łukasz Anforowicz . unresolved

Why is `as u64` needed here? I assume that this is [`TextPosition::column`](https://docs.rs/xml/latest/xml/common/struct.TextPosition.html#method.column) which already returns a `u64`?

Line 294, Patchset 19 (Latest): Ok(StartElement { name: _, attributes, namespace: _ }) => {

Łukasz Anforowicz . unresolved

```suggestion
            Ok(StartElement { attributes, .. }) => {
```

Line 296, Patchset 19 (Latest): let mut ret_attributes: Vec<AttributeNameValue> = Vec::new();

Łukasz Anforowicz . unresolved

```suggestion
                let mut ret_attributes: Vec<AttributeNameValue> = Vec::with_capacity(attributes.len());
```

Line 304, Patchset 19 (Latest): AttributeNameValue { q_name: q_name, value: attribute.value.clone() };

Łukasz Anforowicz . unresolved

```suggestion
                        AttributeNameValue { q_name, value: attribute.value.clone() };
```

Line 319, Patchset 19 (Latest):#[allow(unused_unsafe)]

Łukasz Anforowicz . unresolved

Is this needed? Does that indicate a bug in `cxx`?

Line 371, Patchset 19 (Latest): fn saw_error(read_state: &XmlReadState) -> bool;

Łukasz Anforowicz . unresolved

LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.

File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h

Line 15, Patchset 19 (Latest):

enum StandaloneInfo {
  kStandaloneUnspecified = -2,
  kNoXmlDeclaration,
  kStandaloneNo,
  kStandaloneYes
};

Łukasz Anforowicz . unresolved

Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.

File third_party/blink/web_tests/virtual/rust-xml/README.md

Line 1, Patchset 19 (Latest):Contains test files for the rust-xml virtual test suite which runs XML parsing through the Rust XML crate.

Łukasz Anforowicz . unresolved

Please wrap to 80 columns?

Can you consider mentioning something like?:

```
For more details see go/chrome_x_mitigation and/or https://crbug.com/441911594.
```

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches
Kent Tamura

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

David Baron (Gerrit)

unread,

Oct 29, 2025, 3:49:16 PM (2 days ago) Oct 29

to Dominik Röttsches, Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Dominik Röttsches and Kent Tamura

David Baron added 26 comments

Patchset-level comments

File-level comment, Patchset 19 (Latest):

David Baron . resolved

Here's another round of partial comments. (This is likely it for today, but I'll resume tomorrow.)

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h

Line 143, Patchset 19 (Latest): bool is_xhtml_document_ = false;

David Baron . unresolved

currently unused, but maybe should be used?

Line 139, Patchset 19 (Latest): bool parsing_fragment_;

David Baron . unresolved

```suggestion
  const bool parsing_fragment_;
```

Line 118, Patchset 19 (Latest):

  void DoWrite(const String&);
  void DoEnd();

David Baron . unresolved

Both of these are also unimplemented.

Line 106, Patchset 19 (Latest): bool AppendFragmentSource(const String&);

David Baron . unresolved

Also unimplemented.

Line 60, Patchset 19 (Latest): void SetScriptStartPosition(TextPosition);

David Baron . unresolved

Also unimplemented.

Line 58, Patchset 19 (Latest): static bool SupportsXMLVersion(const String&);

David Baron . unresolved

Also unimplemented.

Line 47, Patchset 19 (Latest):

  static bool ParseDocumentFragment(const String&,
                                    DocumentFragment*,
                                    Element* parent,
                                    ParserContentPolicy,
                                    ExceptionState&);

David Baron . unresolved

Also unimplemented.

Line 45, Patchset 19 (Latest): bool IsCurrentlyParsing8BitChunk();

David Baron . unresolved

Also unimplemented.

Line 42, Patchset 19 (Latest):

  void SetIsXHTMLDocument(bool is_xhtml);
  bool IsXHTMLDocument() const;

David Baron . unresolved

There's no implementation of these methods and they appear unused. (Why not inline like the current parser?) But maybe there should be?

Line 1, Patchset 19 (Latest):

// Copyright 2025 The Chromium Authors
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

David Baron . unresolved

If the origin of this file is a copy of `xml_document_parser.h` (I'm not sure whether that's the case), I think it should probably preserve the original license header. Same for the `.cc`.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

File-level comment, Patchset 19 (Latest):

David Baron . unresolved

One other thought:

Since you need to test and branch based on the flag at multiple points, it seems like it might be good to:

add a bunch of `CHECK(RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParserRs`, either at the public entry points or the places where it calls into the rust-based XML parser
add a bunch of `CHECK(!RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParser`, either at the public entry points or the places where it calls into the libxml2 parser

just to make sure that you've gotten all of the entry points (and that we continue to do so until the flag is removed).

(I don't feel strongly about the exact approach (e.g., probably OK with `DCHECK()` instead of `CHECK()` if you feel it's justified) -- but we've certainly had cases in the past where we've found that a flag didn't enable all of what it was supposed to, and then fully enabling it took longer than expected -- so I think it would be good to be a bit more defensive here to ensure we don't do that here.)

Line 56, Patchset 19 (Latest): return String::FromUTF8(static_cast<std::string>(str));

David Baron . unresolved

Per the bullet points in https://google.github.io/styleguide/cppguide.html#Casting I think the preference is
```suggestion
  return String::FromUTF8(std::string(str));
```
assuming it works.  Though I'm wondering if there's a simpler or cheaper way.  At least maybe if you can go through `std::string_view` rather than `std::string` you can save a copy?  So maybe instead:
```suggestion
  return String::FromUTF8(base::RustStrToStringView(str));
```

Line 59, Patchset 19 (Latest):

// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
// from an XML/HTML DOCTYPE declaration.

David Baron . unresolved

This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.

Line 73, Patchset 19 (Latest): "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"

David Baron . unresolved

Whitespace isn't allowed within `<!DOCTYPE`

Line 71, Patchset 19 (Latest):

  // 1. Regex for PUBLIC-style statement: Captures name, public id, system id.
  const char* kPublicRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
      "$";

  // 2. Regex for SYSTEM or simple statement, captures name and system id.
  const char* kSystemRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)(?:\\s+SYSTEM\\s+\"([^\"]*)\")?\\s*(?:\\[[^\\]]*"
      "\\])?\\s*>$";

David Baron . unresolved

These strings would be more readable with C++ raw strings.

Line 121, Patchset 19 (Latest): {g_xmlns_with_colon, AtomicString(RustStrToWtfString(prefix))}));

David Baron . unresolved

Atomizing a string you're about to concatenate seems unnecessary.
```suggestion
          {g_xmlns_with_colon, RustStrToWtfString(prefix)}));
```

(I realize that's a problem in the current parser too.)

Line 130, Patchset 19 (Latest): Attribute(*parsed_name, AtomicString(RustStrToWtfString(uri))));

David Baron . unresolved

```suggestion
        Attribute(std::move(*parsed_name), AtomicString(RustStrToWtfString(uri))));
```
(I realize that's from the existing code, which does it in `CollectElementAttributes` but not `HandleNamespaceAttributes`.)

Line 173, Patchset 19 (Latest):static const unsigned kMaxXMLTreeDepth = 5000;

David Baron . unresolved

Probably preserve the existing `// FIXME: HTMLConstructionSite has a limit of 512, should these match?`

Line 186, Patchset 19 (Latest): parsing_fragment_(false) {}

David Baron . unresolved

preserve the code for the `WebFeature::kXMLDocument` use counter?

Line 193, Patchset 19 (Latest): script_runner_(nullptr),

David Baron . unresolved

Preserve the `// Don't execute scripts for document fragments.` comment?

Line 251, Patchset 19 (Latest): xml_ffi::append_to_source(*read_state_,

David Baron . unresolved

Do you know if it's ok that it does this even when `parser_paused_` is true? (I don't understand all the implications of the current code here...)

Line 302, Patchset 19 (Latest): GetDocument()->setXMLVersion(RustStrToWtfString(version),

David Baron . unresolved

Do you need to check SupportsXMLVersion here? (Though the existing code has a comment that disagrees with the existing code!)

In particular, it seems like if you don't the `ASSERT_NO_EXCEPTION` might assert, unless the rust parser's internals have *already* checked the XML version.

Line 323, Patchset 19 (Latest): DummyExceptionStateForTesting exception_state;

David Baron . unresolved

This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)

Line 336, Patchset 19 (Latest):

  if (pi->IsCSS()) {
    saw_css_ = true;
  } else {
    return;
  }

  CheckIfBlockingStyleSheetAdded();

David Baron . unresolved

```suggestion
  if (pi->IsCSS()) {
    saw_css_ = true;
    CheckIfBlockingStyleSheetAdded();
  }
```

Line 356, Patchset 19 (Latest):

  // TODO(https://crbug.com/441911594): Handle is attribute.
  // for (const auto& attr : prefixed_attributes) {
  //   if (attr.GetName() == html_names::kIsAttr) {
  //     is = attr.Value();
  //     break;
  //   }
  // }

David Baron . unresolved

Any reason this wouldn't just work if you move it down to after `CollectElementAttributes` like it is in the existing code?

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Oct 30, 2025, 9:50:03 AM (yesterday) Oct 30

to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

Dominik Röttsches voted and added 45 comments

Votes added by Dominik Röttsches

Commit-Queue

+1

45 comments

Patchset-level comments

File-level comment, Patchset 21 (Latest):

Dominik Röttsches . resolved

Thank you both for your thorough reviews and helpful feedback. IMO the CL has improved in multiple ways thanks to that.

Most comments addressed, a couple of items left open for discussion in the below.

Commit Message

Line 17, Patchset 19:This by now fairly complete in the sense of reliably passing a wide set

Łukasz Anforowicz . resolved

Missing word?: "This is by now fairly complete"?

Dominik Röttsches

Done

File third_party/blink/renderer/core/BUILD.gn

Line 255, Patchset 19: build_native_rust_unit_tests = true

Łukasz Anforowicz . resolved

Is `build_native_rust_unit_tests` needed and intentional here? The official [guidance](https://google.github.io/comprehensive-rust/chromium/testing.html#:~:text=Native%20Rust%20tests%20(i.e.%20%23%5Btest%5D).%20Discouraged%20outside%20of%20//third_party/rust.) that the team landed on is that "Native Rust tests [are] discouraged outside of //third_party/rust.".
PS. Maybe we should copy this guidance to `//docs/rust/...` or somewhere else?

Dominik Röttsches

Removed, I think I had had one test there earlier that I later removed.

Line 256, Patchset 19:}

Łukasz Anforowicz . resolved

I am not a GN expert, but I was a bit surprised that some new `.h` / `.cc` files from this CL do not have corresponding `BUILD.gn` changes. Examples:
`third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc`
`third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h`

Dominik Röttsches

See `third_party/blink/renderer/core/xml/build.gni`.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h

Line 143, Patchset 19: bool is_xhtml_document_ = false;

David Baron . resolved

currently unused, but maybe should be used?

Dominik Röttsches

See above, controls entity support, state kept in Rust parser. Removed.

Line 139, Patchset 19: bool parsing_fragment_;

David Baron . resolved

```suggestion
  const bool parsing_fragment_;
```

Dominik Röttsches

Done

Line 118, Patchset 19:

  void DoWrite(const String&);
  void DoEnd();

David Baron . resolved

Both of these are also unimplemented.

Dominik Röttsches

Removed.

Line 106, Patchset 19: bool AppendFragmentSource(const String&);

David Baron . resolved

Also unimplemented.

Dominik Röttsches

Removed.

Line 60, Patchset 19: void SetScriptStartPosition(TextPosition);

David Baron . resolved

Also unimplemented.

Dominik Röttsches

Removed.

Line 58, Patchset 19: static bool SupportsXMLVersion(const String&);

David Baron . resolved

Also unimplemented.

Dominik Röttsches

Removed.

Line 47, Patchset 19: static bool ParseDocumentFragment(const String&,


                                    DocumentFragment*,
                                    Element* parent,
                                    ParserContentPolicy,
                                    ExceptionState&);

David Baron . resolved

Also unimplemented.

Dominik Röttsches

Curiously the fact that we miss this implementation here seem to not have been hit by any tests so far. Either `DocumentFragment::ParseXML` is not exercised by any tests or might be unused.

Added a comment with a TODO to investigate that.

Line 45, Patchset 19: bool IsCurrentlyParsing8BitChunk();

David Baron . resolved

Also unimplemented.

Dominik Röttsches

Sorry it fell on you to spot these. Removed.

Line 42, Patchset 19:

  void SetIsXHTMLDocument(bool is_xhtml);
  bool IsXHTMLDocument() const;

David Baron . resolved

There's no implementation of these methods and they appear unused. (Why not inline like the current parser?) But maybe there should be?

Dominik Röttsches

Removed, to my knowledge this boolean only controlled whether extended XHTML entities should be supported. We decide this at the DocumentType stage and inform the Rust XML parser to include the extended entities. So the state of whether they should be supported is kept in the Rust side parser state.


// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

David Baron . unresolved

If the origin of this file is a copy of `xml_document_parser.h` (I'm not sure whether that's the case), I think it should probably preserve the original license header. Same for the `.cc`.

Dominik Röttsches

I was considering what to do and did not arrive at a clear conclusion. I guess it's arguable that there are still enough traces of the general skeleton of logic which would mean the original license header should be kept. On the other hand, on the implementation side, not much any original meaningful logic of the parser library integration is left.

Changed to the original header plus adding a 2025 The Chromium Authors line.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

File-level comment, Patchset 19:

David Baron . unresolved

One other thought:
Since you need to test and branch based on the flag at multiple points, it seems like it might be good to:
add a bunch of `CHECK(RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParserRs`, either at the public entry points or the places where it calls into the rust-based XML parser
add a bunch of `CHECK(!RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParser`, either at the public entry points or the places where it calls into the libxml2 parser
just to make sure that you've gotten all of the entry points (and that we continue to do so until the flag is removed).
(I don't feel strongly about the exact approach (e.g., probably OK with `DCHECK()` instead of `CHECK()` if you feel it's justified) -- but we've certainly had cases in the past where we've found that a flag didn't enable all of what it was supposed to, and then fully enabling it took longer than expected -- so I think it would be good to be a bit more defensive here to ensure we don't do that here.)

Dominik Röttsches

This helped to find an issue with fragment parsing entry points, which I suggest to resolve in follow-up changes.

Added to public non-fragment constructor entry points, `parseAttributes(Rust)` and tried to add to `ParseDocumentFragment` and found that this still needs a Rust side impl.

IMO it makes sense to add this, also for further documenting how the usage / switching between the parsers is intended. However, adding the CHECKs itself is not necessarily a guarantee but shifts the problem to identifying the correct entry points.

There's the weird usage of `parseAttributes` for parsing the attributes of a XSLT PI.

Line 56, Patchset 19: return String::FromUTF8(static_cast<std::string>(str));

David Baron . resolved

Per the bullet points in https://google.github.io/styleguide/cppguide.html#Casting I think the preference is
```suggestion
  return String::FromUTF8(std::string(str));
```
assuming it works.  Though I'm wondering if there's a simpler or cheaper way.  At least maybe if you can go through `std::string_view` rather than `std::string` you can save a copy?  So maybe instead:
```suggestion
  return String::FromUTF8(base::RustStrToStringView(str));
```

Dominik Röttsches

Nice, good idea, thanks.

Line 59, Patchset 19:// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID


// from an XML/HTML DOCTYPE declaration.

David Baron . unresolved

This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.

Dominik Röttsches

Fully agree and that's what I intended to express with the comment in the next line. Rephrased comment to

```
// This is a workaround until we hopefully can add missing DOCTYPE
// parsing functionality in upstream.
```

Line 73, Patchset 19: "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"

David Baron . resolved

Whitespace isn't allowed within `<!DOCTYPE`

Dominik Röttsches

Regex adjusted.

Line 71, Patchset 19: // 1. Regex for PUBLIC-style statement: Captures name, public id, system id.


  const char* kPublicRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
      "$";

  // 2. Regex for SYSTEM or simple statement, captures name and system id.
  const char* kSystemRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)(?:\\s+SYSTEM\\s+\"([^\"]*)\")?\\s*(?:\\[[^\\]]*"
      "\\])?\\s*>$";

David Baron . resolved

These strings would be more readable with C++ raw strings.

Dominik Röttsches

Good idea, transformed to raw strings with "regex" delimiter sequence.

Line 121, Patchset 19: {g_xmlns_with_colon, AtomicString(RustStrToWtfString(prefix))}));

David Baron . resolved

Atomizing a string you're about to concatenate seems unnecessary.
```suggestion
          {g_xmlns_with_colon, RustStrToWtfString(prefix)}));
```

(I realize that's a problem in the current parser too.)

Dominik Röttsches

Removed inner `AtomicString` construction.

Line 130, Patchset 19: Attribute(*parsed_name, AtomicString(RustStrToWtfString(uri))));

David Baron . resolved

```suggestion
        Attribute(std::move(*parsed_name), AtomicString(RustStrToWtfString(uri))));
```
(I realize that's from the existing code, which does it in `CollectElementAttributes` but not `HandleNamespaceAttributes`.)

Dominik Röttsches

Done and harmonized in analog `CollectElementAttributes` function.

Line 173, Patchset 19:static const unsigned kMaxXMLTreeDepth = 5000;

David Baron . resolved

Probably preserve the existing `// FIXME: HTMLConstructionSite has a limit of 512, should these match?`

Dominik Röttsches

Done

Line 186, Patchset 19: parsing_fragment_(false) {}

David Baron . resolved

preserve the code for the `WebFeature::kXMLDocument` use counter?

Dominik Röttsches

Yes, counter added.

Line 193, Patchset 19: script_runner_(nullptr),

David Baron . resolved

Preserve the `// Don't execute scripts for document fragments.` comment?

Dominik Röttsches

Ok.

Line 251, Patchset 19: xml_ffi::append_to_source(*read_state_,

David Baron . resolved

Do you know if it's ok that it does this even when `parser_paused_` is true? (I don't understand all the implications of the current code here...)

Dominik Röttsches

Yes, from what I know, this is ok. The architecture of the XML Rust parser is different. It is a pull parser where as `libxml2` seems to be a push model parser. Appending to the buffer here is an operation that does not drive parsing. Only `process_next_event` will trigger reads and consume the buffer.

Contrary to that, in the existing code there is a lot of complexity in keeping the parser from progressing when adding to the buffer. All the closure for `PendingCallbacks` need to be added there to complete received events later.

Line 302, Patchset 19: GetDocument()->setXMLVersion(RustStrToWtfString(version),

David Baron . resolved

Do you need to check SupportsXMLVersion here? (Though the existing code has a comment that disagrees with the existing code!)
In particular, it seems like if you don't the `ASSERT_NO_EXCEPTION` might assert, unless the rust parser's internals have *already* checked the XML version.

Dominik Röttsches

Carried over logic from `XMLDocumentParser` to set a version only if it's 1.0 and ignore it otherwise. Copied the odd comment for posterity as well and added link to explaining Rust XML parser behavior.

Line 323, Patchset 19: DummyExceptionStateForTesting exception_state;

David Baron . unresolved

This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)

Dominik Röttsches

This should probably use ASSERT_NO_EXCEPTION. The only reasons Document::createProcessingInstruction can throw are reasons that the rust XML parser should have already given errors on, I think.

Is that so? In my understanding the Rust XML parser would not validate a processing instruction inner prose, whereas we expect it to be well-formed in the sense of matching an attribute key-value list pattern, which is validated in `createProcessingInstruction` which then calls `parseAttributes` I think.

Do we even need to add the TODO here? The Rust XML parser will not support XSLT, so the only thing where this practically goes wrong in a relevant way is for an `<?xml-stylesheet>` processing instruction with a garbage attribute list. Do we want to throw for that?

Added comment:
```
  // The only way this fails in a relevant way would be a processing instruction
  // inner prose that does not match the style of an attribute list for a
  // xml-stylesheet processing instruction. The result of that would be the
  // style is not applied. We likely do not want to throw an exception for that.
```

What do you think?

Line 336, Patchset 19: if (pi->IsCSS()) {


    saw_css_ = true;
  } else {
    return;
  }

  CheckIfBlockingStyleSheetAdded();

David Baron . resolved

```suggestion
  if (pi->IsCSS()) {
    saw_css_ = true;
    CheckIfBlockingStyleSheetAdded();
  }
```

Dominik Röttsches

Certainly.

Line 356, Patchset 19: // TODO(https://crbug.com/441911594): Handle is attribute.


  // for (const auto& attr : prefixed_attributes) {
  //   if (attr.GetName() == html_names::kIsAttr) {
  //     is = attr.Value();
  //     break;
  //   }
  // }

David Baron . resolved

Any reason this wouldn't just work if you move it down to after `CollectElementAttributes` like it is in the existing code?

Dominik Röttsches

Activated this code, but I did not observe any test coverage so far. Modified the TODO to try and fix coverage.

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 25, Patchset 19:#[allow(unused)]

Łukasz Anforowicz . resolved

Can you add a comment why `#[allow(unused)]` is needed + if this is expected to stay here forever or be fixed in the future?
PS. In case it helps, let me share that some guidance about unused-code-warnings have recently been added to https://chromium.googlesource.com/chromium/src/+/main/docs/rust/dev_experience_tips_and_tricks.md#suppressing-unused-code-warnings

Dominik Röttsches

Removed, I think this was a leftover from when I was defining the FFI but not yet using it in the C++ side.

Line 28, Patchset 19: saw_error: bool,

Łukasz Anforowicz . resolved

Is `self.saw_error` the same as `self.error_details.is_some()`?

Dominik Röttsches

Good point, using your suggestion.

Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());

Łukasz Anforowicz . resolved

This LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

Dominik Röttsches

I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.

I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

Line 105, Patchset 19: let standalone = standalone


                        .map(|is_standalone| if is_standalone { 1 } else { 0 })
                        .unwrap_or_else(|| {
                            if xml_declaration_view.is_some() {
                                return -2; // <?xml instruction present, but
                                           // standalone not specified
                            }
                            -1 // No XML declaration at all
                        });

Łukasz Anforowicz . resolved

Would it be worth adding a comment saying where those constants come from? I assume they correspond to some constants on the C++ side?

Dominik Röttsches

Fixed with migration to shared enum.

Line 151, Patchset 19:

                        &mut attributes,
                        &mut new_namespaces,

Łukasz Anforowicz . unresolved

It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?

Dominik Röttsches

I might be missing something here, but doesn't the `AttributesIterator`, which is
```
pub struct AttributesIterator<'a> {
    attributes: std::slice::Iter<'a, OwnedAttribute>,
}
```
have a mutable inner state? And fn `attributes_next<'a>(attributes: &mut AttributesIterator<'a>,` needs the argument to be mutable to be able to start iterating over them?

Line 191, Patchset 19:fn get_last_event_position(read_state: &XmlReadState, row: &mut u64, col: &mut u64) -> bool {

Łukasz Anforowicz . resolved

nit: Maybe name this `try_get_last_event_position` to indicate it might fail?

Dominik Röttsches

Renamed.

Line 207, Patchset 19:fn get_error_details(

Łukasz Anforowicz . resolved

nit: `try_get_error_details`?

Dominik Röttsches

Yes, renamed.

Line 219, Patchset 19: .map_or(error_string.clone(), |pos| error_string[pos + 1..].to_string());

Łukasz Anforowicz . resolved

nitty nit: IIUC this `clone` will execute unconditionally, even when the other branch/lamda will be taken. Can this be avoided?

Dominik Röttsches

Moved to `map_or_else`.

Line 221, Patchset 19: *col = error_details.position().column as u64;

Łukasz Anforowicz . resolved

Why is `as u64` needed here? I assume that this is [`TextPosition::column`](https://docs.rs/xml/latest/xml/common/struct.TextPosition.html#method.column) which already returns a `u64`?

Dominik Röttsches

Removed, not sure why I had left that in there. I might have changed the FFI type at some point.

Line 294, Patchset 19: Ok(StartElement { name: _, attributes, namespace: _ }) => {

Łukasz Anforowicz . resolved

```suggestion
            Ok(StartElement { attributes, .. }) => {
```

Dominik Röttsches

Done

Line 296, Patchset 19: let mut ret_attributes: Vec<AttributeNameValue> = Vec::new();

Łukasz Anforowicz . resolved

```suggestion
                let mut ret_attributes: Vec<AttributeNameValue> = Vec::with_capacity(attributes.len());
```

Dominik Röttsches

Good idea, thanks. Done.

Line 304, Patchset 19: AttributeNameValue { q_name: q_name, value: attribute.value.clone() };

Łukasz Anforowicz . resolved

```suggestion
                        AttributeNameValue { q_name, value: attribute.value.clone() };
```

Dominik Röttsches

Done

Line 319, Patchset 19:#[allow(unused_unsafe)]

Łukasz Anforowicz . resolved

Is this needed? Does that indicate a bug in `cxx`?

Dominik Röttsches

Removed, see above, likely a leftover.

Line 371, Patchset 19: fn saw_error(read_state: &XmlReadState) -> bool;

Łukasz Anforowicz . unresolved

LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.

Dominik Röttsches

The name is consistent with the `saw_error_` that was ported from the original `XmlDocumentParser` code. Not that I am strongly attached to that, but IMHO it would look worse if we had both, `saw_error` and `has_error`.

File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h

Line 15, Patchset 19:enum StandaloneInfo {


  kStandaloneUnspecified = -2,
  kNoXmlDeclaration,
  kStandaloneNo,
  kStandaloneYes
};

Łukasz Anforowicz . resolved

Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.

Dominik Röttsches

Good suggestion, I wasn't aware this existing in CXX. Moved to a shared enum. I had one challenge in the callbacks header file, but looks like the enum can be forward declared there.

File third_party/blink/web_tests/virtual/rust-xml/README.md

Line 1, Patchset 19:Contains test files for the rust-xml virtual test suite which runs XML parsing through the Rust XML crate.

Łukasz Anforowicz . resolved

Please wrap to 80 columns?
Can you consider mentioning something like?:
```
For more details see go/chrome_x_mitigation and/or https://crbug.com/441911594.
```

Dominik Röttsches

Wrapped and references added.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Kent Tamura
Łukasz Anforowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Łukasz Anforowicz (Gerrit)

unread,

Oct 30, 2025, 11:57:59 AM (yesterday) Oct 30

to Dominik Röttsches, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Dominik Röttsches and Kent Tamura

Łukasz Anforowicz voted and added 9 comments

Votes added by Łukasz Anforowicz

Code-Review

+1

9 comments

Patchset-level comments

File-level comment, Patchset 22 (Latest):

Łukasz Anforowicz . resolved

`third_party/blink/renderer/core/xml/parser/xml_ffi.rs` and `third_party/blink/renderer/core/xml/parser/entities.rs` LGTM (I shared a few additional minor feedback items below - PTAL).

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 25, Patchset 22 (Latest):pub struct XmlReadState<'a> {

Łukasz Anforowicz . unresolved

I think this `pub` may be removed - a child `ffi` module should be able to see this `struct` without `pub` (same for `AttributesIterator` and `NamespacesIterator`).

Line 83, Patchset 22 (Latest):

                    let version: &str = match version {
                        XmlVersion::Version10 => "1.0",
                        XmlVersion::Version11 => "1.1",
                    };

Łukasz Anforowicz . unresolved

Maybe in the future this can be replaced with `version.as_str()` - see https://github.com/kornelski/xml-rs/pull/54

Not sure if it's worth leaving a TODO for this. Up to you.

Line 151, Patchset 19:

                        &mut attributes,
                        &mut new_namespaces,

Łukasz Anforowicz . resolved

It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?

Dominik Röttsches

I might be missing something here, but doesn't the `AttributesIterator`, which is
```
pub struct AttributesIterator<'a> {
    attributes: std::slice::Iter<'a, OwnedAttribute>,
}
```
have a mutable inner state? And fn `attributes_next<'a>(attributes: &mut AttributesIterator<'a>,` needs the argument to be mutable to be able to start iterating over them?

Łukasz Anforowicz

You're right - I missed this. Thank you for explaining.

Line 204, Patchset 22 (Latest): let _ = read_state.event_reader.add_entities(entities::HTML5_MAP.iter().copied());

Łukasz Anforowicz . unresolved

I wonder if it would be a bit more readable if we used 1) `unwrap()` + a comment, or 2) `expect("Built-in entities shouldn't cause errors")`.

Line 320, Patchset 22 (Latest):pub mod ffi {

Łukasz Anforowicz . unresolved

Is this `pub` needed?

Line 329, Patchset 22 (Latest): pub struct AttributeNameValue {

Łukasz Anforowicz . unresolved

Is this `pub` needed?

Line 371, Patchset 19: fn saw_error(read_state: &XmlReadState) -> bool;

Łukasz Anforowicz . resolved

LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.

Dominik Röttsches

The name is consistent with the `saw_error_` that was ported from the original `XmlDocumentParser` code. Not that I am strongly attached to that, but IMHO it would look worse if we had both, `saw_error` and `has_error`.

Łukasz Anforowicz

Matching existing C++ names SGTM.

File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h

Line 15, Patchset 19:

enum StandaloneInfo {
  kStandaloneUnspecified = -2,
  kNoXmlDeclaration,
  kStandaloneNo,
  kStandaloneYes
};

Łukasz Anforowicz . resolved

Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.

Dominik Röttsches

Good suggestion, I wasn't aware this existing in CXX. Moved to a shared enum. I had one challenge in the callbacks header file, but looks like the enum can be forward declared there.

Łukasz Anforowicz

Ugh - yeah, the risk of circular dependencies around C++/Rust FFI is quite annoying at times. Although I guess that sometimes it forces a cleaner layering. I am glad you've been able to find a way to make this work.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches
Kent Tamura

Submit Requirements:

Code-Coverage
Code-Owners

Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Kent Tamura (Gerrit)

unread,

Oct 30, 2025, 8:16:49 PM (20 hours ago) Oct 30

to Dominik Röttsches, Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron and Dominik Röttsches

Kent Tamura added 4 comments

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

Line 169, Patchset 22 (Latest): prefix.empty() ? AtomicString(RustStrToWtfString(local_name))

Kent Tamura . unresolved

nit: We might want to add `RustStrToAtomicString()`.

File third_party/blink/web_tests/NeverFixTests

Line 1856, Patchset 22 (Latest):

Kent Tamura . unresolved

nit: looks an unnecessary change

Line 1858, Patchset 15:# XSLT tests disabled with Rust XML parser, no support for XSLT

Kent Tamura . resolved

Does it mean we can't ship the Rust XML parser until we remove XSLT support completely?

Dominik Röttsches

I am investigating transition options: We could use it right away for SVG files or all non-XSLT cases if we are willing to run a bit of a scan up front. We could also potentially run it as an internal dogfood right away.

Kent Tamura

Expanding coverage incrementally sounds a very good approach.

File third_party/blink/web_tests/VirtualTestSuites

Line 5983, Patchset 15: "bases": [

Kent Tamura . resolved

This adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

Dominik Röttsches

Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.

Dominik Röttsches

I found information on how to set up a separate trybot, but I would also like to explore starting to use the Rust XML parser for parts of real world workloads - which would make it part of mainline testing. I'll get back on that.

Kent Tamura

Ack.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

8:22 AM (8 hours ago) 8:22 AM

to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

Dominik Röttsches added 8 comments

Patchset-level comments

File-level comment, Patchset 26 (Latest):

Dominik Röttsches . resolved

@luk...@chromium.org Could you renew your +1?

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

Line 169, Patchset 22: prefix.empty() ? AtomicString(RustStrToWtfString(local_name))

Kent Tamura . resolved

nit: We might want to add `RustStrToAtomicString()`.

Dominik Röttsches

Added.

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 25, Patchset 22:pub struct XmlReadState<'a> {

Łukasz Anforowicz . resolved

I think this `pub` may be removed - a child `ffi` module should be able to see this `struct` without `pub` (same for `AttributesIterator` and `NamespacesIterator`).

Dominik Röttsches

Done

Line 83, Patchset 22: let version: &str = match version {


                        XmlVersion::Version10 => "1.0",
                        XmlVersion::Version11 => "1.1",
                    };

Łukasz Anforowicz . resolved

Maybe in the future this can be replaced with `version.as_str()` - see https://github.com/kornelski/xml-rs/pull/54
Not sure if it's worth leaving a TODO for this. Up to you.

Dominik Röttsches

TODO added.

Line 204, Patchset 22: let _ = read_state.event_reader.add_entities(entities::HTML5_MAP.iter().copied());

Łukasz Anforowicz . resolved

I wonder if it would be a bit more readable if we used 1) `unwrap()` + a comment, or 2) `expect("Built-in entities shouldn't cause errors")`.

Dominik Röttsches

Shortened this with an iteration of the entity maps and added `.expect()`s.

Line 320, Patchset 22:pub mod ffi {

Łukasz Anforowicz . resolved

Is this `pub` needed?

Dominik Röttsches

Removed.

Line 329, Patchset 22: pub struct AttributeNameValue {

Łukasz Anforowicz . resolved

Is this `pub` needed?

Dominik Röttsches

Removed.

File third_party/blink/web_tests/NeverFixTests

Line 1856, Patchset 22:

Kent Tamura . resolved

nit: looks an unnecessary change

Dominik Röttsches

Done

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Kent Tamura
Łukasz Anforowicz

Submit Requirements:

Code-Coverage
Code-Owners

Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

9:37 AM (7 hours ago) 9:37 AM

to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

Dominik Röttsches voted and added 1 comment

Votes added by Dominik Röttsches

Commit-Queue

+1

1 comment

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());

Łukasz Anforowicz . unresolved

This LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

Dominik Röttsches

I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

Dominik Röttsches

@luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).

If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.

satisfied_requirement

unsatisfied_requirement

open

diffy

Łukasz Anforowicz (Gerrit)

unread,

11:35 AM (5 hours ago) 11:35 AM

to Dominik Röttsches, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Dominik Röttsches and Kent Tamura

Łukasz Anforowicz added 1 comment

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());

Łukasz Anforowicz . unresolved

This LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

Dominik Röttsches

I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

Dominik Röttsches

@luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).
If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.

Łukasz Anforowicz

Thanks for following up. Using `Vec` makes sense given your explanation. Maybe it's worth leaving a comment explaining why we can't use `VecDeque`?

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Dominik Röttsches
Kent Tamura

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

11:48 AM (4 hours ago) 11:48 AM

to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

Dominik Röttsches added 1 comment

File third_party/blink/renderer/core/xml/parser/xml_ffi.rs

Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());

Łukasz Anforowicz . resolved

This LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

Dominik Röttsches

I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

Dominik Röttsches

@luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).
If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.

Łukasz Anforowicz

Thanks for following up. Using `Vec` makes sense given your explanation. Maybe it's worth leaving a comment explaining why we can't use `VecDeque`?

Dominik Röttsches

Comment added, under the assumption you meant we keep `Vec` wrapped in `Cursor`.

Open in Gerrit

Related details

Attention is currently required from:

David Baron
Kent Tamura
Łukasz Anforowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

David Baron (Gerrit)

unread,

11:57 AM (4 hours ago) 11:57 AM

to Dominik Röttsches, Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org

Attention needed from Dominik Röttsches, Kent Tamura and Łukasz Anforowicz

David Baron added 12 comments

Patchset-level comments

File-level comment, Patchset 27 (Latest):

David Baron . resolved

OK, I think I've finally gotten through the whole thing. Here are my remaining comments.

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc

Line 59, Patchset 19:

// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
// from an XML/HTML DOCTYPE declaration.

David Baron . unresolved

This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.

Dominik Röttsches

Fully agree and that's what I intended to express with the comment in the next line. Rephrased comment to
```
// This is a workaround until we hopefully can add missing DOCTYPE
// parsing functionality in upstream.
```

David Baron

But maybe explicitly note that it doesn't handle internal subsets?

Line 323, Patchset 19: DummyExceptionStateForTesting exception_state;

David Baron . unresolved

This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)

Dominik Röttsches

This should probably use ASSERT_NO_EXCEPTION. The only reasons Document::createProcessingInstruction can throw are reasons that the rust XML parser should have already given errors on, I think.
Is that so? In my understanding the Rust XML parser would not validate a processing instruction inner prose, whereas we expect it to be well-formed in the sense of matching an attribute key-value list pattern, which is validated in `createProcessingInstruction` which then calls `parseAttributes` I think.
Do we even need to add the TODO here? The Rust XML parser will not support XSLT, so the only thing where this practically goes wrong in a relevant way is for an `<?xml-stylesheet>` processing instruction with a garbage attribute list. Do we want to throw for that?
Added comment:
```
  // The only way this fails in a relevant way would be a processing instruction
  // inner prose that does not match the style of an attribute list for a
  // xml-stylesheet processing instruction. The result of that would be the
  // style is not applied. We likely do not want to throw an exception for that.
```
What do you think?

David Baron

The only reason that `Document::createProcessingInstruction` should throw is if:

the `target` given is not a valid XML name
the `data` provided contains the string `?>`

I think the XML parser should have given an error for either of these.

Line 406, Patchset 26:

  bool is_first_element = !saw_first_element_;
  saw_first_element_ = true;

David Baron . unresolved

While it's fine given the current uses of this variable, maybe move this down to before `HandleNamespaceAttributes` like it is in the existing parser, just in case `saw_first_element_` ends up being used for something else outside this function?

Line 409, Patchset 26: AtomicString is;

David Baron . unresolved

Probably move this declaration down near the code that assigns it, like in the current parser?

Line 420, Patchset 26: // empty NS url, resolve it against the initially preserved namespace

David Baron . unresolved

So I think there's a problem here relative to the existing parser. The code in the existing parser distinguishes between a null string and an empty string, but this code doesn't.

That distinction is needed to maintain the difference between `<elem>` (which is a start tag for an element that is in the default namespace of its parent element) and `<elem xmlns="">` (which is the start tag for an element in no namespace that changes the default namespace to being no namespace). It seems like this code breaks that distinction in fragment parsing. (It wouldn't surprise me if there are no tests for this, but I think it is a real distinction.)

I'm not sure if it will be easier to fix by getting the null-versus-empty distinction out of the rust parser, or by setting up the namespaces inside of the rust parser for fragment parsing and dropping this fixup.

Line 459, Patchset 26: // TODO(https://crbug.com/441911594): Test coverage for this?

David Baron . unresolved

For what it's worth, I see a single test that I expect might exercise this code, and it's `third_party/blink/web_tests/external/wpt/custom-elements/parser/parser-uses-create-an-element-for-a-token-svg.svg`

(I didn't actually test this.)

Line 467, Patchset 26: QualifiedName q_name(

David Baron . unresolved

Two comments here, though I'm not sure whether either is a concrete problem:

1. In the existing parser there was some code to concatenate prefix and localname in certain cases and put it all back into the localname (but only if the adjusted_uri was empty and the prefix was not). I'm not sure why that code was there... but it's possible it's important?

2. I'm again a little worried about null-versus-empty distinctions here being different from the old parser and that distinction being important. Unlike the case above I don't have any specific examples in mind yet, though.

Line 493, Patchset 26:

  // TODO(https://crbug.com/441911594): Port custom element support from
  // XMLDocumentParser.

David Baron . unresolved

This might also be needed for the one SVG custom elements test mentioned above. (Though the code in the existing parser was also a little higher up in the function, which might matter.)

Line 588, Patchset 26: String cdata_string = RustStrToWtfString(data);

David Baron . unresolved

I think it's probably stylistically better here to *not* have the `cdata_string` variable but instead just write it in the function call (like you do in the `Comment` function just below).

This is because if `CDataSection::Create` had a `String&&` overload, it would be more efficient since it would use that one. (It doesn't right now, but it probably should!) Using `std::move(cdata_string)` would have the same effect, but with more verbosity.

(I write this partly because reading this code led me to check to see whether there was a `String&&` version, whereas if I'd seen what I'm suggesting I wouldn't have felt the need to check.)

Line 703, Patchset 26: Flush();

David Baron . unresolved

I can't figure out what `Flush()` this is calling. Where is it?

Line 847, Patchset 26: DCHECK(!IsDetached());

David Baron . unresolved

The old parser didn't do anything here when `parsing_fragment_` was true -- is there a need for that?

Open in Gerrit

Related details

Attention is currently required from:

Dominik Röttsches
Kent Tamura
Łukasz Anforowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Reply all

Reply to author

Forward

Parse XML documents through Rust XML crate [chromium/src : main]

Dominik Röttsches (Gerrit)

New activity on the change

Related details

AI Code Reviewer (Gerrit)

AI Code Reviewer added 2 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 2 comments

David Baron (Gerrit)

David Baron added 11 comments

Related details

Reilly Grant (Gerrit)

Reilly Grant added 1 comment

Related details

Kent Tamura (Gerrit)

Kent Tamura added 10 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 22 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 1 comment

Dominik Röttsches (Gerrit)

Dominik Röttsches added 1 comment

Related details

Łukasz Anforowicz (Gerrit)

Łukasz Anforowicz added 19 comments

Related details

David Baron (Gerrit)

David Baron added 26 comments

Dominik Röttsches (Gerrit)

Dominik Röttsches voted and added 45 comments

Votes added by Dominik Röttsches

45 comments

Related details

Łukasz Anforowicz (Gerrit)

Łukasz Anforowicz voted and added 9 comments

Votes added by Łukasz Anforowicz

9 comments

Related details

Kent Tamura (Gerrit)

Kent Tamura added 4 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 8 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches voted and added 1 comment

Votes added by Dominik Röttsches

1 comment

Łukasz Anforowicz (Gerrit)

Łukasz Anforowicz added 1 comment

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 1 comment

Related details

David Baron (Gerrit)

David Baron added 12 comments

Related details