Parse XML documents through Rust XML crate [chromium/src : main]

3 views
Skip to first unread message

Dominik Röttsches (Gerrit)

unread,
Oct 28, 2025, 12:57:48 PM (3 days ago) Oct 28
to David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
Attention needed from David Baron

New activity on the change

Open in Gerrit

Related details

Attention is currently required from:
  • David Baron
Submit Requirements:
  • requirement satisfiedCode-Coverage
  • requirement is not satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
Gerrit-Change-Number: 7088396
Gerrit-PatchSet: 15
Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
Gerrit-Reviewer: David Baron <dba...@chromium.org>
Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
Gerrit-CC: Daniel Cheng <dch...@chromium.org>
Gerrit-CC: Mason Freed <mas...@chromium.org>
Gerrit-Attention: David Baron <dba...@chromium.org>
Gerrit-Comment-Date: Tue, 28 Oct 2025 16:57:30 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

AI Code Reviewer (Gerrit)

unread,
Oct 28, 2025, 12:59:37 PM (3 days ago) Oct 28
to Dominik Röttsches, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
Attention needed from David Baron

AI Code Reviewer added 2 comments

File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h
Line 127, Patchset 15 (Latest): bool saw_error_ = false;
AI Code Reviewer . unresolved

nit: Blink Style Guide: Naming - Precede boolean values with words like “is” and “did”. Please consider renaming `saw_error_` to `did_see_error_`.

To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason

This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

Line 103, Patchset 15 (Latest): void end();
AI Code Reviewer . unresolved

Blink Style Guide: Naming - Use 'CamelCase' for all function names. All function names should use `CamelCase()`-style names, beginning with an uppercase letter. Please consider renaming `end` to `End`.

To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason

This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

Open in Gerrit

Related details

Attention is currently required from:
  • David Baron
Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
    Gerrit-Change-Number: 7088396
    Gerrit-PatchSet: 15
    Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Reviewer: David Baron <dba...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
    Gerrit-CC: Daniel Cheng <dch...@chromium.org>
    Gerrit-CC: Mason Freed <mas...@chromium.org>
    Gerrit-Attention: David Baron <dba...@chromium.org>
    Gerrit-Comment-Date: Tue, 28 Oct 2025 16:59:33 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dominik Röttsches (Gerrit)

    unread,
    Oct 28, 2025, 1:03:26 PM (3 days ago) Oct 28
    to AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
    Attention needed from David Baron

    Dominik Röttsches added 2 comments

    Patchset-level comments
    File-level comment, Patchset 15 (Latest):
    Dominik Röttsches . resolved

    reillig@, PTAL at `omnibox_api_interactive_test.cc` & `web_navigation_apitest.cc`
    tkent@, PTAL at `audit_non_blink_usage.py`

    File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h
    Line 127, Patchset 15 (Latest): bool saw_error_ = false;
    AI Code Reviewer . resolved

    nit: Blink Style Guide: Naming - Precede boolean values with words like “is” and “did”. Please consider renaming `saw_error_` to `did_see_error_`.

    To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
    **Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason

    This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

    Dominik Röttsches

    Won't fix: Keep consistent with `XMLDocumentParser`.

    Gerrit-Comment-Date: Tue, 28 Oct 2025 17:03:09 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    David Baron (Gerrit)

    unread,
    Oct 28, 2025, 3:30:58 PM (3 days ago) Oct 28
    to Dominik Röttsches, Reilly Grant, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
    Attention needed from David Baron, Dominik Röttsches, Kent Tamura and Reilly Grant

    David Baron added 11 comments

    Patchset-level comments
    David Baron . unresolved

    Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.

    I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.

    I also wouldn't be offended if you prefer to find a different reviewer given that.

    Commit Message
    Line 9, Patchset 15 (Latest):Introduce an XMLDocumentParserRs, which is instantiated in Document
    David Baron . resolved

    This feels a little bit like it's undoing https://chromium.googlesource.com/chromium/src/+/c99c39aed8f0f6007a7a5ce8565d91e353ed332b%5E%21/ except I don't think it is because that (and followup refactorings) were removing a permanent mechanism for having different XML libraries, whereas this is intended to be temporary so it makes sense for it to be structured differently (e.g., via flags).

    Line 32, Patchset 15 (Latest):Bug: chromium:435623334
    David Baron . unresolved

    Should this really be going to the XSLT bug, or is there a more appropriate bug? (Also, no need for `chromium:`.)

    File chrome/browser/extensions/api/omnibox/omnibox_api_interactive_test.cc
    David Baron . unresolved

    Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.

    (I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)

    File third_party/blink/renderer/core/BUILD.gn
    David Baron . unresolved

    If you consider yourself an expert on rust `BUILD.gn` stuff at this point, then this is fine... but if not I'd appreciate if you could ask someone who you do consider to be an expert in that to review this file.

    File third_party/blink/renderer/core/frame/document_loading_rendering_test.cc
    David Baron . unresolved

    I guess this is OK if you're reasonably confident we're going to be able to live with this behavior change, but otherwise I might defer making this sort of test change until you are or until you feel close to enabling the flag (or maybe you do now?).

    File third_party/blink/renderer/core/xml/parser/entities.rs
    David Baron . unresolved

    I didn't review the correctness of the contents of this file -- I'm assuming you got it from a reliable source.

    Line 5, Patchset 15 (Latest):// https://www.w3.org/MarkUp/DTD/xhtml-lat1.ent
    David Baron . unresolved

    I'm not sure whether this URL or https://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent is a better URL to reference. The files are formatted differently but hopefully have identical contents...

    (Same for the other 2 parts.)

    At least they're all the same other than comments and the order of the first few entries of the special map.

    Line 235, Patchset 15 (Latest):// https://www.w3.org/MarkUp/DTD/xhtml-special.ent
    David Baron . unresolved

    Note that this excludes `&lt` and `&amp` which are in that file, though not the other three entities in https://www.w3.org/TR/REC-xml/#sec-predefined-ent . Might be worth revising the comment but it's a little inconsistent with the comment right now.

    File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h
    Line 31, Patchset 15 (Latest):class XMLDocumentParserRs final : public ScriptableDocumentParser,
    David Baron . unresolved

    Is it worth considering using `Rust` in the name rather than `Rs`? It seems like it might be a little clearer.

    File third_party/blink/web_tests/platform/linux/virtual/rust-xml/fast/parser/external-entities-expected.txt
    David Baron . unresolved

    I think it makes sense to move the three *text* expectations (but not the image expectation) to not have `platform/linux/` in the path.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • David Baron
    • Dominik Röttsches
    • Kent Tamura
    • Reilly Grant
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
    Gerrit-Change-Number: 7088396
    Gerrit-PatchSet: 15
    Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Reviewer: David Baron <dba...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
    Gerrit-CC: Daniel Cheng <dch...@chromium.org>
    Gerrit-CC: Mason Freed <mas...@chromium.org>
    Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Kent Tamura <tk...@chromium.org>
    Gerrit-Attention: David Baron <dba...@chromium.org>
    Gerrit-Comment-Date: Tue, 28 Oct 2025 19:30:54 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Oct 28, 2025, 4:56:22 PM (3 days ago) Oct 28
    to Dominik Röttsches, Reilly Grant, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
    Attention needed from David Baron, Dominik Röttsches and Kent Tamura

    Reilly Grant added 1 comment

    File chrome/browser/extensions/api/omnibox/omnibox_api_interactive_test.cc
    David Baron . unresolved

    Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.

    (I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)

    Reilly Grant

    I'm okay landing these changes though I'm a little confused why parsing `<tag><match></match>` is throwing an "Unexpected closing tag" error when the problem is a _missing_ closing tag (or unexpected end of file).

    Open in Gerrit

    Related details

    Attention is currently required from:
    • David Baron
    • Dominik Röttsches
    • Kent Tamura
    Gerrit-Attention: Kent Tamura <tk...@chromium.org>
    Gerrit-Attention: David Baron <dba...@chromium.org>
    Gerrit-Comment-Date: Tue, 28 Oct 2025 20:56:12 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: David Baron <dba...@chromium.org>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Kent Tamura (Gerrit)

    unread,
    Oct 28, 2025, 10:46:35 PM (3 days ago) Oct 28
    to Dominik Röttsches, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
    Attention needed from David Baron and Dominik Röttsches

    Kent Tamura added 10 comments

    File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
    Line 34, Patchset 15 (Latest):static inline bool HasNoStyleInformation(Document* document) {
    Kent Tamura . unresolved

    `static` is unnecessary.

    Line 55, Patchset 15 (Latest):String rustStrToWtfString(rust::Str str) {
    Kent Tamura . unresolved

    The name should be `RustStrToWtfString`.

    Line 62, Patchset 15 (Latest):void extract_doctype_parts(const std::string& doctype_string,
    Kent Tamura . unresolved

    The name should be `ExtractDocTypeParts`.

    Line 241, Patchset 15 (Latest): GetDocument()->setXMLVersion(String(version), ASSERT_NO_EXCEPTION);
    Kent Tamura . unresolved

    Don't we use `rustStrToWtfString()`?

    Line 255, Patchset 15 (Latest):static inline bool HandleNamespaceAttributes(
    Kent Tamura . unresolved

    We prefer functions in anonymous namespace to `static` functions.

    Line 308, Patchset 15 (Latest):static inline bool CollectElementAttributes(
    Kent Tamura . unresolved

    Ditto.

    Line 335, Patchset 15 (Latest):static inline void SetAttributes(
    Kent Tamura . unresolved

    Ditto.

    File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h
    Line 17, Patchset 15 (Latest): kNoXMlDeclaration,
    Kent Tamura . unresolved

    `kNoXMl` -> `kNoXml`

    File third_party/blink/web_tests/NeverFixTests
    Line 1858, Patchset 15 (Latest):# XSLT tests disabled with Rust XML parser, no support for XSLT
    Kent Tamura . unresolved

    Does it mean we can't ship the Rust XML parser until we remove XSLT support completely?

    File third_party/blink/web_tests/VirtualTestSuites
    Line 5983, Patchset 15 (Latest): "bases": [
    Kent Tamura . unresolved

    This adds 1200+ tests, and it's too large for a single virtual test.
    Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

    Open in Gerrit

    Related details

    Attention is currently required from:
    • David Baron
    • Dominik Röttsches
    Gerrit-Attention: David Baron <dba...@chromium.org>
    Gerrit-Comment-Date: Wed, 29 Oct 2025 02:46:10 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dominik Röttsches (Gerrit)

    unread,
    Oct 29, 2025, 6:23:33 AM (2 days ago) Oct 29
    to Łukasz Anforowicz, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
    Attention needed from David Baron, Kent Tamura, Reilly Grant and Łukasz Anforowicz

    Dominik Röttsches added 22 comments

    Patchset-level comments
    File-level comment, Patchset 15:
    David Baron . resolved

    Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.

    I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.

    I also wouldn't be offended if you prefer to find a different reviewer given that.

    Dominik Röttsches

    Thanks - I'll add Łukasz for the FFI parts.

    File-level comment, Patchset 17 (Latest):
    Dominik Röttsches . resolved

    @luk...@chromium.org Could you please take a look at the FFI parts towards the Rust XML library of this change including the `core/BUILD.gn` change?

    Others, thanks for the reviews so far. I addressed most comments, some feedback/clarifications below.

    Commit Message
    Line 32, Patchset 15:Bug: chromium:435623334
    David Baron . resolved

    Should this really be going to the XSLT bug, or is there a more appropriate bug? (Also, no need for `chromium:`.)

    Dominik Röttsches

    Yes, this should be 441911594.

    File chrome/browser/extensions/api/omnibox/omnibox_api_interactive_test.cc
    File-level comment, Patchset 15:
    David Baron . resolved

    Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.

    (I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)

    Reilly Grant

    I'm okay landing these changes though I'm a little confused why parsing `<tag><match></match>` is throwing an "Unexpected closing tag" error when the problem is a _missing_ closing tag (or unexpected end of file).

    Dominik Röttsches

    I'll split these off as the flag remains default off in this test and to meet @dbaron's expectations for keeping this CL a little smaller.

    The assertion failure without modifying the expectation is:
    ```
    [232535:232535:1029/115906.485593:INFO:CONSOLE:3] "[FAIL] invalidXml_NoClosingTag: error on line 1 at column 53: Unexpected closing tag: fragment != tag
    ```
    Which means that the `description:` argument

    ```
    chrome.omnibox.setDefaultSuggestion(
    {description: '<tag> <match>match</match> world'},
    ```

    is not the full XML, but there is a `</fragment>` closing tag in surrounding XML.

    Semantically libxml2's and the Rust XML parser's error message align here.

    File third_party/blink/renderer/core/BUILD.gn
    File-level comment, Patchset 15:
    David Baron . resolved

    If you consider yourself an expert on rust `BUILD.gn` stuff at this point, then this is fine... but if not I'd appreciate if you could ask someone who you do consider to be an expert in that to review this file.

    Dominik Röttsches

    Not sure if expert, but I have done a few of these by now - I think this is a pretty standard snippet for adding a Rust library with FFI. But see separate comment, I will add Łukasz for Rust/FFI parts of this change.

    File third_party/blink/renderer/core/frame/document_loading_rendering_test.cc
    File-level comment, Patchset 15:
    David Baron . resolved

    I guess this is OK if you're reasonably confident we're going to be able to live with this behavior change, but otherwise I might defer making this sort of test change until you are or until you feel close to enabling the flag (or maybe you do now?).

    Dominik Röttsches

    I'll split this off. For development purposes, and for running a separate trybot, it's useful to have the unit tests already pass with the Rust XML parser.

    File third_party/blink/renderer/core/xml/parser/entities.rs
    File-level comment, Patchset 15:
    David Baron . resolved

    I didn't review the correctness of the contents of this file -- I'm assuming you got it from a reliable source.

    Dominik Röttsches

    Ack, I generated these locally by transforming spec reference .ent files.

    I'm not sure whether this URL or https://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent is a better URL to reference. The files are formatted differently but hopefully have identical contents...

    (Same for the other 2 parts.)

    At least they're all the same other than comments and the order of the first few entries of the special map.

    Dominik Röttsches

    Changed link to the one you're suggesting - here and below.

    Note that this excludes `&lt` and `&amp` which are in that file, though not the other three entities in https://www.w3.org/TR/REC-xml/#sec-predefined-ent . Might be worth revising the comment but it's a little inconsistent with the comment right now.

    Dominik Röttsches

    Removed gt, apos, quot and clarified comment.

    File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h
    AI Code Reviewer . resolved

    Blink Style Guide: Naming - Use 'CamelCase' for all function names. All function names should use `CamelCase()`-style names, beginning with an uppercase letter. Please consider renaming `end` to `End`.

    To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
    **Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason

    This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_

    Dominik Röttsches

    Invalid: This was a leftover and the function does no longer exist in the impl file.

    Line 31, Patchset 15:class XMLDocumentParserRs final : public ScriptableDocumentParser,
    David Baron . resolved

    Is it worth considering using `Rust` in the name rather than `Rs`? It seems like it might be a little clearer.

    Dominik Röttsches

    Rs seems to me to be a pretty standard suffix, compare many crate names ending on -rs https://crates.io/search?q=-rs and the Rust source file extension being .rs - but I am not insisting on that if you'd prefer it to be changed.

    File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
    Line 34, Patchset 15:static inline bool HasNoStyleInformation(Document* document) {
    Kent Tamura . resolved

    `static` is unnecessary.

    Dominik Röttsches

    Done

    Line 55, Patchset 15:String rustStrToWtfString(rust::Str str) {
    Kent Tamura . resolved

    The name should be `RustStrToWtfString`.

    Dominik Röttsches

    Done

    Line 62, Patchset 15:void extract_doctype_parts(const std::string& doctype_string,
    Kent Tamura . resolved

    The name should be `ExtractDocTypeParts`.

    Dominik Röttsches

    Done

    Line 241, Patchset 15: GetDocument()->setXMLVersion(String(version), ASSERT_NO_EXCEPTION);
    Kent Tamura . resolved

    Don't we use `rustStrToWtfString()`?

    Dominik Röttsches

    Good catch, thanks. Done.

    Line 255, Patchset 15:static inline bool HandleNamespaceAttributes(
    Kent Tamura . resolved

    We prefer functions in anonymous namespace to `static` functions.

    Dominik Röttsches

    Done

    Line 308, Patchset 15:static inline bool CollectElementAttributes(
    Kent Tamura . resolved

    Ditto.

    Dominik Röttsches

    Done

    Line 335, Patchset 15:static inline void SetAttributes(
    Kent Tamura . resolved

    Ditto.

    Dominik Röttsches

    Done

    File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h
    Line 17, Patchset 15: kNoXMlDeclaration,
    Kent Tamura . resolved

    `kNoXMl` -> `kNoXml`

    Dominik Röttsches

    Done

    File third_party/blink/web_tests/NeverFixTests
    Line 1858, Patchset 15:# XSLT tests disabled with Rust XML parser, no support for XSLT
    Kent Tamura . resolved

    Does it mean we can't ship the Rust XML parser until we remove XSLT support completely?

    Dominik Röttsches

    I am investigating transition options: We could use it right away for SVG files or all non-XSLT cases if we are willing to run a bit of a scan up front. We could also potentially run it as an internal dogfood right away.

    File third_party/blink/web_tests/VirtualTestSuites
    Line 5983, Patchset 15: "bases": [
    Kent Tamura . resolved

    This adds 1200+ tests, and it's too large for a single virtual test.
    Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

    Dominik Röttsches

    Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.

    File third_party/blink/web_tests/platform/linux/virtual/rust-xml/fast/parser/external-entities-expected.txt
    File-level comment, Patchset 15:
    David Baron . resolved

    I think it makes sense to move the three *text* expectations (but not the image expectation) to not have `platform/linux/` in the path.

    Dominik Röttsches

    Removed these as @tkent asked me to reduce set of virtual tests.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • David Baron
    • Kent Tamura
    • Reilly Grant
    • Łukasz Anforowicz
    Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement is not satisfiedCode-Owners
      • requirement is not satisfiedCode-Review
      • requirement is not satisfiedReview-Enforcement
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
      Gerrit-Change-Number: 7088396
      Gerrit-PatchSet: 17
      Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
      Gerrit-Reviewer: David Baron <dba...@chromium.org>
      Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
      Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
      Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
      Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
      Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
      Gerrit-CC: Daniel Cheng <dch...@chromium.org>
      Gerrit-CC: Mason Freed <mas...@chromium.org>
      Gerrit-Attention: Łukasz Anforowicz <luk...@chromium.org>
      Gerrit-Attention: Reilly Grant <rei...@chromium.org>
      Gerrit-Attention: Kent Tamura <tk...@chromium.org>
      Gerrit-Attention: David Baron <dba...@chromium.org>
      Gerrit-Comment-Date: Wed, 29 Oct 2025 10:23:14 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
      Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
      Comment-In-Reply-To: David Baron <dba...@chromium.org>
      Comment-In-Reply-To: Kent Tamura <tk...@chromium.org>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Dominik Röttsches (Gerrit)

      unread,
      Oct 29, 2025, 6:29:11 AM (2 days ago) Oct 29
      to Łukasz Anforowicz, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
      Attention needed from David Baron, Kent Tamura, Reilly Grant and Łukasz Anforowicz

      Dominik Röttsches added 1 comment

      Patchset-level comments
      David Baron . resolved

      Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.

      I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.

      I also wouldn't be offended if you prefer to find a different reviewer given that.

      Dominik Röttsches

      Thanks - I'll add Łukasz for the FFI parts.

      Dominik Röttsches

      PS: Perhaps to reiterate the context, as mentioned in the commit message, most of the behavior of the new `XMLDocumentParserRs` is ported from `XMLDocumentParser` which does a couple of interesting gymnastics to comply with state transition expectations from `Document` and `DocumentParser`.

      Gerrit-Comment-Date: Wed, 29 Oct 2025 10:28:49 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
      Comment-In-Reply-To: David Baron <dba...@chromium.org>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Dominik Röttsches (Gerrit)

      unread,
      Oct 29, 2025, 9:45:22 AM (2 days ago) Oct 29
      to Łukasz Anforowicz, Kent Tamura, Reilly Grant, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
      Attention needed from David Baron, Kent Tamura, Reilly Grant and Łukasz Anforowicz

      Dominik Röttsches added 1 comment

      File third_party/blink/web_tests/VirtualTestSuites
      Kent Tamura . resolved

      This adds 1200+ tests, and it's too large for a single virtual test.
      Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

      Dominik Röttsches

      Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.

      Dominik Röttsches

      I found information on how to set up a separate trybot, but I would also like to explore starting to use the Rust XML parser for parts of real world workloads - which would make it part of mainline testing. I'll get back on that.

      Open in Gerrit

      Related details

      Attention is currently required from:
      • David Baron
      • Kent Tamura
      • Reilly Grant
      • Łukasz Anforowicz
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement is not satisfiedCode-Owners
      • requirement is not satisfiedCode-Review
      • requirement is not satisfiedReview-Enforcement
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
      Gerrit-Change-Number: 7088396
      Gerrit-PatchSet: 19
      Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
      Gerrit-Reviewer: David Baron <dba...@chromium.org>
      Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
      Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
      Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
      Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
      Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
      Gerrit-CC: Daniel Cheng <dch...@chromium.org>
      Gerrit-CC: Mason Freed <mas...@chromium.org>
      Gerrit-Attention: Łukasz Anforowicz <luk...@chromium.org>
      Gerrit-Attention: Reilly Grant <rei...@chromium.org>
      Gerrit-Attention: Kent Tamura <tk...@chromium.org>
      Gerrit-Attention: David Baron <dba...@chromium.org>
      Gerrit-Comment-Date: Wed, 29 Oct 2025 13:45:02 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
      Comment-In-Reply-To: Kent Tamura <tk...@chromium.org>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Łukasz Anforowicz (Gerrit)

      unread,
      Oct 29, 2025, 1:15:27 PM (2 days ago) Oct 29
      to Dominik Röttsches, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
      Attention needed from David Baron, Dominik Röttsches and Kent Tamura

      Łukasz Anforowicz added 19 comments

      Commit Message
      Line 17, Patchset 19 (Latest):This by now fairly complete in the sense of reliably passing a wide set
      Łukasz Anforowicz . unresolved

      Missing word?: "This is by now fairly complete"?

      File third_party/blink/renderer/core/BUILD.gn
      Line 255, Patchset 19 (Latest): build_native_rust_unit_tests = true
      Łukasz Anforowicz . unresolved

      Is `build_native_rust_unit_tests` needed and intentional here? The official [guidance](https://google.github.io/comprehensive-rust/chromium/testing.html#:~:text=Native%20Rust%20tests%20(i.e.%20%23%5Btest%5D).%20Discouraged%20outside%20of%20//third_party/rust.) that the team landed on is that "Native Rust tests [are] discouraged outside of //third_party/rust.".

      PS. Maybe we should copy this guidance to `//docs/rust/...` or somewhere else?

      Line 256, Patchset 19 (Latest):}
      Łukasz Anforowicz . unresolved

      I am not a GN expert, but I was a bit surprised that some new `.h` / `.cc` files from this CL do not have corresponding `BUILD.gn` changes. Examples:

      • `third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc`
      • `third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h`
      File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
      Line 25, Patchset 19 (Latest):#[allow(unused)]
      Łukasz Anforowicz . unresolved

      Can you add a comment why `#[allow(unused)]` is needed + if this is expected to stay here forever or be fixed in the future?

      PS. In case it helps, let me share that some guidance about unused-code-warnings have recently been added to https://chromium.googlesource.com/chromium/src/+/main/docs/rust/dev_experience_tips_and_tricks.md#suppressing-unused-code-warnings

      Line 28, Patchset 19 (Latest): saw_error: bool,
      Łukasz Anforowicz . unresolved

      Is `self.saw_error` the same as `self.error_details.is_some()`?

      Line 36, Patchset 19 (Latest): let cursor = Cursor::new(Vec::new());
      Łukasz Anforowicz . unresolved

      This LGTM, but I wanted to share some notes/thoughts:

      `XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.

      Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

      Line 105, Patchset 19 (Latest): let standalone = standalone
      .map(|is_standalone| if is_standalone { 1 } else { 0 })
      .unwrap_or_else(|| {
      if xml_declaration_view.is_some() {
      return -2; // <?xml instruction present, but
      // standalone not specified
      }
      -1 // No XML declaration at all
      });
      Łukasz Anforowicz . unresolved

      Would it be worth adding a comment saying where those constants come from? I assume they correspond to some constants on the C++ side?

      Line 151, Patchset 19 (Latest): &mut attributes,
      &mut new_namespaces,
      Łukasz Anforowicz . unresolved

      It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?

      Line 191, Patchset 19 (Latest):fn get_last_event_position(read_state: &XmlReadState, row: &mut u64, col: &mut u64) -> bool {
      Łukasz Anforowicz . unresolved

      nit: Maybe name this `try_get_last_event_position` to indicate it might fail?

      Line 207, Patchset 19 (Latest):fn get_error_details(
      Łukasz Anforowicz . unresolved

      nit: `try_get_error_details`?

      Line 219, Patchset 19 (Latest): .map_or(error_string.clone(), |pos| error_string[pos + 1..].to_string());
      Łukasz Anforowicz . unresolved

      nitty nit: IIUC this `clone` will execute unconditionally, even when the other branch/lamda will be taken. Can this be avoided?

      Line 221, Patchset 19 (Latest): *col = error_details.position().column as u64;
      Łukasz Anforowicz . unresolved

      Why is `as u64` needed here? I assume that this is [`TextPosition::column`](https://docs.rs/xml/latest/xml/common/struct.TextPosition.html#method.column) which already returns a `u64`?

      Line 294, Patchset 19 (Latest): Ok(StartElement { name: _, attributes, namespace: _ }) => {
      Łukasz Anforowicz . unresolved
      ```suggestion
      Ok(StartElement { attributes, .. }) => {
      ```
      Line 296, Patchset 19 (Latest): let mut ret_attributes: Vec<AttributeNameValue> = Vec::new();
      Łukasz Anforowicz . unresolved
      ```suggestion
      let mut ret_attributes: Vec<AttributeNameValue> = Vec::with_capacity(attributes.len());
      ```
      Line 304, Patchset 19 (Latest): AttributeNameValue { q_name: q_name, value: attribute.value.clone() };
      Łukasz Anforowicz . unresolved
      ```suggestion
      AttributeNameValue { q_name, value: attribute.value.clone() };
      ```
      Line 319, Patchset 19 (Latest):#[allow(unused_unsafe)]
      Łukasz Anforowicz . unresolved

      Is this needed? Does that indicate a bug in `cxx`?

      Line 371, Patchset 19 (Latest): fn saw_error(read_state: &XmlReadState) -> bool;
      Łukasz Anforowicz . unresolved

      LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.

      File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h
      Line 15, Patchset 19 (Latest):enum StandaloneInfo {
      kStandaloneUnspecified = -2,
      kNoXmlDeclaration,
      kStandaloneNo,
      kStandaloneYes
      };
      Łukasz Anforowicz . unresolved

      Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.

      File third_party/blink/web_tests/virtual/rust-xml/README.md
      Line 1, Patchset 19 (Latest):Contains test files for the rust-xml virtual test suite which runs XML parsing through the Rust XML crate.
      Łukasz Anforowicz . unresolved

      Please wrap to 80 columns?

      Can you consider mentioning something like?:

      ```
      For more details see go/chrome_x_mitigation and/or https://crbug.com/441911594.
      ```

      Open in Gerrit

      Related details

      Attention is currently required from:
      • David Baron
      • Dominik Röttsches
      • Kent Tamura
      Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement is not satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
        Gerrit-Change-Number: 7088396
        Gerrit-PatchSet: 19
        Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
        Gerrit-Reviewer: David Baron <dba...@chromium.org>
        Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
        Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
        Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
        Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
        Gerrit-CC: Daniel Cheng <dch...@chromium.org>
        Gerrit-CC: Mason Freed <mas...@chromium.org>
        Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
        Gerrit-Attention: Kent Tamura <tk...@chromium.org>
        Gerrit-Attention: David Baron <dba...@chromium.org>
        Gerrit-Comment-Date: Wed, 29 Oct 2025 17:15:21 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: No
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        David Baron (Gerrit)

        unread,
        Oct 29, 2025, 3:49:16 PM (2 days ago) Oct 29
        to Dominik Röttsches, Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
        Attention needed from David Baron, Dominik Röttsches and Kent Tamura

        David Baron added 26 comments

        Patchset-level comments
        File-level comment, Patchset 19 (Latest):
        David Baron . resolved

        Here's another round of partial comments. (This is likely it for today, but I'll resume tomorrow.)

        File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h
        Line 143, Patchset 19 (Latest): bool is_xhtml_document_ = false;
        David Baron . unresolved

        currently unused, but maybe should be used?

        Line 139, Patchset 19 (Latest): bool parsing_fragment_;
        David Baron . unresolved
        ```suggestion
        const bool parsing_fragment_;
        ```
        Line 118, Patchset 19 (Latest): void DoWrite(const String&);
        void DoEnd();
        David Baron . unresolved

        Both of these are also unimplemented.

        Line 106, Patchset 19 (Latest): bool AppendFragmentSource(const String&);
        David Baron . unresolved

        Also unimplemented.

        Line 60, Patchset 19 (Latest): void SetScriptStartPosition(TextPosition);
        David Baron . unresolved

        Also unimplemented.

        Line 58, Patchset 19 (Latest): static bool SupportsXMLVersion(const String&);
        David Baron . unresolved

        Also unimplemented.

        Line 47, Patchset 19 (Latest): static bool ParseDocumentFragment(const String&,
        DocumentFragment*,
        Element* parent,
        ParserContentPolicy,
        ExceptionState&);
        David Baron . unresolved

        Also unimplemented.

        Line 45, Patchset 19 (Latest): bool IsCurrentlyParsing8BitChunk();
        David Baron . unresolved

        Also unimplemented.

        Line 42, Patchset 19 (Latest): void SetIsXHTMLDocument(bool is_xhtml);
        bool IsXHTMLDocument() const;
        David Baron . unresolved

        There's no implementation of these methods and they appear unused. (Why not inline like the current parser?) But maybe there should be?

        Line 1, Patchset 19 (Latest):// Copyright 2025 The Chromium Authors
        // Use of this source code is governed by a BSD-style license that can be
        // found in the LICENSE file.
        David Baron . unresolved

        If the origin of this file is a copy of `xml_document_parser.h` (I'm not sure whether that's the case), I think it should probably preserve the original license header. Same for the `.cc`.

        File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
        File-level comment, Patchset 19 (Latest):
        David Baron . unresolved

        One other thought:

        Since you need to test and branch based on the flag at multiple points, it seems like it might be good to:

        • add a bunch of `CHECK(RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParserRs`, either at the public entry points or the places where it calls into the rust-based XML parser
        • add a bunch of `CHECK(!RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParser`, either at the public entry points or the places where it calls into the libxml2 parser

        just to make sure that you've gotten all of the entry points (and that we continue to do so until the flag is removed).

        (I don't feel strongly about the exact approach (e.g., probably OK with `DCHECK()` instead of `CHECK()` if you feel it's justified) -- but we've certainly had cases in the past where we've found that a flag didn't enable all of what it was supposed to, and then fully enabling it took longer than expected -- so I think it would be good to be a bit more defensive here to ensure we don't do that here.)

        Line 56, Patchset 19 (Latest): return String::FromUTF8(static_cast<std::string>(str));
        David Baron . unresolved
        Per the bullet points in https://google.github.io/styleguide/cppguide.html#Casting I think the preference is
        ```suggestion
        return String::FromUTF8(std::string(str));
        ```
        assuming it works. Though I'm wondering if there's a simpler or cheaper way. At least maybe if you can go through `std::string_view` rather than `std::string` you can save a copy? So maybe instead:
        ```suggestion
        return String::FromUTF8(base::RustStrToStringView(str));
        ```
        Line 59, Patchset 19 (Latest):// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
        // from an XML/HTML DOCTYPE declaration.
        David Baron . unresolved

        This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.

        Line 73, Patchset 19 (Latest): "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
        David Baron . unresolved

        Whitespace isn't allowed within `<!DOCTYPE`

        Line 71, Patchset 19 (Latest): // 1. Regex for PUBLIC-style statement: Captures name, public id, system id.
        const char* kPublicRe =
        "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
        "$";

        // 2. Regex for SYSTEM or simple statement, captures name and system id.
        const char* kSystemRe =
        "^<\\s*!DOCTYPE\\s+(\\w+)(?:\\s+SYSTEM\\s+\"([^\"]*)\")?\\s*(?:\\[[^\\]]*"
        "\\])?\\s*>$";
        David Baron . unresolved

        These strings would be more readable with C++ raw strings.

        Line 121, Patchset 19 (Latest): {g_xmlns_with_colon, AtomicString(RustStrToWtfString(prefix))}));
        David Baron . unresolved
        Atomizing a string you're about to concatenate seems unnecessary.
        ```suggestion
        {g_xmlns_with_colon, RustStrToWtfString(prefix)}));
        ```

        (I realize that's a problem in the current parser too.)

        Line 130, Patchset 19 (Latest): Attribute(*parsed_name, AtomicString(RustStrToWtfString(uri))));
        David Baron . unresolved
        ```suggestion
        Attribute(std::move(*parsed_name), AtomicString(RustStrToWtfString(uri))));
        ```
        (I realize that's from the existing code, which does it in `CollectElementAttributes` but not `HandleNamespaceAttributes`.)
        Line 173, Patchset 19 (Latest):static const unsigned kMaxXMLTreeDepth = 5000;
        David Baron . unresolved

        Probably preserve the existing `// FIXME: HTMLConstructionSite has a limit of 512, should these match?`

        Line 186, Patchset 19 (Latest): parsing_fragment_(false) {}
        David Baron . unresolved

        preserve the code for the `WebFeature::kXMLDocument` use counter?

        Line 193, Patchset 19 (Latest): script_runner_(nullptr),
        David Baron . unresolved

        Preserve the `// Don't execute scripts for document fragments.` comment?

        Line 251, Patchset 19 (Latest): xml_ffi::append_to_source(*read_state_,
        David Baron . unresolved

        Do you know if it's ok that it does this even when `parser_paused_` is true? (I don't understand all the implications of the current code here...)

        Line 302, Patchset 19 (Latest): GetDocument()->setXMLVersion(RustStrToWtfString(version),
        David Baron . unresolved

        Do you need to check SupportsXMLVersion here? (Though the existing code has a comment that disagrees with the existing code!)

        In particular, it seems like if you don't the `ASSERT_NO_EXCEPTION` might assert, unless the rust parser's internals have *already* checked the XML version.

        Line 323, Patchset 19 (Latest): DummyExceptionStateForTesting exception_state;
        David Baron . unresolved

        This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)

        Line 336, Patchset 19 (Latest): if (pi->IsCSS()) {
        saw_css_ = true;
        } else {
        return;
        }

        CheckIfBlockingStyleSheetAdded();
        David Baron . unresolved
        ```suggestion
        if (pi->IsCSS()) {
        saw_css_ = true;
        CheckIfBlockingStyleSheetAdded();
        }
        ```
        Line 356, Patchset 19 (Latest): // TODO(https://crbug.com/441911594): Handle is attribute.
        // for (const auto& attr : prefixed_attributes) {
        // if (attr.GetName() == html_names::kIsAttr) {
        // is = attr.Value();
        // break;
        // }
        // }
        David Baron . unresolved

        Any reason this wouldn't just work if you move it down to after `CollectElementAttributes` like it is in the existing code?

        Gerrit-Comment-Date: Wed, 29 Oct 2025 19:49:12 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: No
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        Dominik Röttsches (Gerrit)

        unread,
        Oct 30, 2025, 9:50:03 AM (yesterday) Oct 30
        to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
        Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

        Dominik Röttsches voted and added 45 comments

        Votes added by Dominik Röttsches

        Commit-Queue+1

        45 comments

        Patchset-level comments
        File-level comment, Patchset 21 (Latest):
        Dominik Röttsches . resolved

        Thank you both for your thorough reviews and helpful feedback. IMO the CL has improved in multiple ways thanks to that.

        Most comments addressed, a couple of items left open for discussion in the below.

        Commit Message
        Line 17, Patchset 19:This by now fairly complete in the sense of reliably passing a wide set
        Łukasz Anforowicz . resolved

        Missing word?: "This is by now fairly complete"?

        Dominik Röttsches

        Done

        File third_party/blink/renderer/core/BUILD.gn
        Line 255, Patchset 19: build_native_rust_unit_tests = true
        Łukasz Anforowicz . resolved

        Is `build_native_rust_unit_tests` needed and intentional here? The official [guidance](https://google.github.io/comprehensive-rust/chromium/testing.html#:~:text=Native%20Rust%20tests%20(i.e.%20%23%5Btest%5D).%20Discouraged%20outside%20of%20//third_party/rust.) that the team landed on is that "Native Rust tests [are] discouraged outside of //third_party/rust.".

        PS. Maybe we should copy this guidance to `//docs/rust/...` or somewhere else?

        Dominik Röttsches

        Removed, I think I had had one test there earlier that I later removed.

        Line 256, Patchset 19:}
        Łukasz Anforowicz . resolved

        I am not a GN expert, but I was a bit surprised that some new `.h` / `.cc` files from this CL do not have corresponding `BUILD.gn` changes. Examples:

        • `third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc`
        • `third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h`
        Dominik Röttsches

        See `third_party/blink/renderer/core/xml/build.gni`.

        File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.h
        Line 143, Patchset 19: bool is_xhtml_document_ = false;
        David Baron . resolved

        currently unused, but maybe should be used?

        Dominik Röttsches

        See above, controls entity support, state kept in Rust parser. Removed.

        Line 139, Patchset 19: bool parsing_fragment_;
        David Baron . resolved
        ```suggestion
        const bool parsing_fragment_;
        ```
        Dominik Röttsches

        Done

        Line 118, Patchset 19: void DoWrite(const String&);
        void DoEnd();
        David Baron . resolved

        Both of these are also unimplemented.

        Dominik Röttsches

        Removed.

        Line 106, Patchset 19: bool AppendFragmentSource(const String&);
        David Baron . resolved

        Also unimplemented.

        Dominik Röttsches

        Removed.

        Line 60, Patchset 19: void SetScriptStartPosition(TextPosition);
        David Baron . resolved

        Also unimplemented.

        Dominik Röttsches

        Removed.

        Line 58, Patchset 19: static bool SupportsXMLVersion(const String&);
        David Baron . resolved

        Also unimplemented.

        Dominik Röttsches

        Removed.

        Line 47, Patchset 19: static bool ParseDocumentFragment(const String&,

        DocumentFragment*,
        Element* parent,
        ParserContentPolicy,
        ExceptionState&);
        David Baron . resolved

        Also unimplemented.

        Dominik Röttsches

        Curiously the fact that we miss this implementation here seem to not have been hit by any tests so far. Either `DocumentFragment::ParseXML` is not exercised by any tests or might be unused.

        Added a comment with a TODO to investigate that.

        Line 45, Patchset 19: bool IsCurrentlyParsing8BitChunk();
        David Baron . resolved

        Also unimplemented.

        Dominik Röttsches

        Sorry it fell on you to spot these. Removed.

        Line 42, Patchset 19: void SetIsXHTMLDocument(bool is_xhtml);
        bool IsXHTMLDocument() const;
        David Baron . resolved

        There's no implementation of these methods and they appear unused. (Why not inline like the current parser?) But maybe there should be?

        Dominik Röttsches

        Removed, to my knowledge this boolean only controlled whether extended XHTML entities should be supported. We decide this at the DocumentType stage and inform the Rust XML parser to include the extended entities. So the state of whether they should be supported is kept in the Rust side parser state.

        Line 1, Patchset 19:// Copyright 2025 The Chromium Authors

        // Use of this source code is governed by a BSD-style license that can be
        // found in the LICENSE file.
        David Baron . unresolved

        If the origin of this file is a copy of `xml_document_parser.h` (I'm not sure whether that's the case), I think it should probably preserve the original license header. Same for the `.cc`.

        Dominik Röttsches

        I was considering what to do and did not arrive at a clear conclusion. I guess it's arguable that there are still enough traces of the general skeleton of logic which would mean the original license header should be kept. On the other hand, on the implementation side, not much any original meaningful logic of the parser library integration is left.

        Changed to the original header plus adding a 2025 The Chromium Authors line.

        File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
        David Baron . unresolved

        One other thought:

        Since you need to test and branch based on the flag at multiple points, it seems like it might be good to:

        • add a bunch of `CHECK(RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParserRs`, either at the public entry points or the places where it calls into the rust-based XML parser
        • add a bunch of `CHECK(!RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParser`, either at the public entry points or the places where it calls into the libxml2 parser

        just to make sure that you've gotten all of the entry points (and that we continue to do so until the flag is removed).

        (I don't feel strongly about the exact approach (e.g., probably OK with `DCHECK()` instead of `CHECK()` if you feel it's justified) -- but we've certainly had cases in the past where we've found that a flag didn't enable all of what it was supposed to, and then fully enabling it took longer than expected -- so I think it would be good to be a bit more defensive here to ensure we don't do that here.)

        Dominik Röttsches

        This helped to find an issue with fragment parsing entry points, which I suggest to resolve in follow-up changes.

        Added to public non-fragment constructor entry points, `parseAttributes(Rust)` and tried to add to `ParseDocumentFragment` and found that this still needs a Rust side impl.

        IMO it makes sense to add this, also for further documenting how the usage / switching between the parsers is intended. However, adding the CHECKs itself is not necessarily a guarantee but shifts the problem to identifying the correct entry points.

        There's the weird usage of `parseAttributes` for parsing the attributes of a XSLT PI.

        Line 56, Patchset 19: return String::FromUTF8(static_cast<std::string>(str));
        David Baron . resolved
        Per the bullet points in https://google.github.io/styleguide/cppguide.html#Casting I think the preference is
        ```suggestion
        return String::FromUTF8(std::string(str));
        ```
        assuming it works. Though I'm wondering if there's a simpler or cheaper way. At least maybe if you can go through `std::string_view` rather than `std::string` you can save a copy? So maybe instead:
        ```suggestion
        return String::FromUTF8(base::RustStrToStringView(str));
        ```
        Dominik Röttsches

        Nice, good idea, thanks.

        Line 59, Patchset 19:// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID

        // from an XML/HTML DOCTYPE declaration.
        David Baron . unresolved

        This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.

        Dominik Röttsches

        Fully agree and that's what I intended to express with the comment in the next line. Rephrased comment to

        ```
        // This is a workaround until we hopefully can add missing DOCTYPE
        // parsing functionality in upstream.
        ```

        Line 73, Patchset 19: "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
        David Baron . resolved

        Whitespace isn't allowed within `<!DOCTYPE`

        Dominik Röttsches

        Regex adjusted.

        Line 71, Patchset 19: // 1. Regex for PUBLIC-style statement: Captures name, public id, system id.

        const char* kPublicRe =
        "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
        "$";

        // 2. Regex for SYSTEM or simple statement, captures name and system id.
        const char* kSystemRe =
        "^<\\s*!DOCTYPE\\s+(\\w+)(?:\\s+SYSTEM\\s+\"([^\"]*)\")?\\s*(?:\\[[^\\]]*"
        "\\])?\\s*>$";
        David Baron . resolved

        These strings would be more readable with C++ raw strings.

        Dominik Röttsches

        Good idea, transformed to raw strings with "regex" delimiter sequence.

        Line 121, Patchset 19: {g_xmlns_with_colon, AtomicString(RustStrToWtfString(prefix))}));
        David Baron . resolved
        Atomizing a string you're about to concatenate seems unnecessary.
        ```suggestion
        {g_xmlns_with_colon, RustStrToWtfString(prefix)}));
        ```

        (I realize that's a problem in the current parser too.)

        Dominik Röttsches

        Removed inner `AtomicString` construction.

        Line 130, Patchset 19: Attribute(*parsed_name, AtomicString(RustStrToWtfString(uri))));
        David Baron . resolved
        ```suggestion
        Attribute(std::move(*parsed_name), AtomicString(RustStrToWtfString(uri))));
        ```
        (I realize that's from the existing code, which does it in `CollectElementAttributes` but not `HandleNamespaceAttributes`.)
        Dominik Röttsches

        Done and harmonized in analog `CollectElementAttributes` function.

        Line 173, Patchset 19:static const unsigned kMaxXMLTreeDepth = 5000;
        David Baron . resolved

        Probably preserve the existing `// FIXME: HTMLConstructionSite has a limit of 512, should these match?`

        Dominik Röttsches

        Done

        Line 186, Patchset 19: parsing_fragment_(false) {}
        David Baron . resolved

        preserve the code for the `WebFeature::kXMLDocument` use counter?

        Dominik Röttsches

        Yes, counter added.

        Line 193, Patchset 19: script_runner_(nullptr),
        David Baron . resolved

        Preserve the `// Don't execute scripts for document fragments.` comment?

        Dominik Röttsches

        Ok.

        Line 251, Patchset 19: xml_ffi::append_to_source(*read_state_,
        David Baron . resolved

        Do you know if it's ok that it does this even when `parser_paused_` is true? (I don't understand all the implications of the current code here...)

        Dominik Röttsches

        Yes, from what I know, this is ok. The architecture of the XML Rust parser is different. It is a pull parser where as `libxml2` seems to be a push model parser. Appending to the buffer here is an operation that does not drive parsing. Only `process_next_event` will trigger reads and consume the buffer.

        Contrary to that, in the existing code there is a lot of complexity in keeping the parser from progressing when adding to the buffer. All the closure for `PendingCallbacks` need to be added there to complete received events later.

        Line 302, Patchset 19: GetDocument()->setXMLVersion(RustStrToWtfString(version),
        David Baron . resolved

        Do you need to check SupportsXMLVersion here? (Though the existing code has a comment that disagrees with the existing code!)

        In particular, it seems like if you don't the `ASSERT_NO_EXCEPTION` might assert, unless the rust parser's internals have *already* checked the XML version.

        Dominik Röttsches

        Carried over logic from `XMLDocumentParser` to set a version only if it's 1.0 and ignore it otherwise. Copied the odd comment for posterity as well and added link to explaining Rust XML parser behavior.

        Line 323, Patchset 19: DummyExceptionStateForTesting exception_state;
        David Baron . unresolved

        This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)

        Dominik Röttsches

        This should probably use ASSERT_NO_EXCEPTION. The only reasons Document::createProcessingInstruction can throw are reasons that the rust XML parser should have already given errors on, I think.

        Is that so? In my understanding the Rust XML parser would not validate a processing instruction inner prose, whereas we expect it to be well-formed in the sense of matching an attribute key-value list pattern, which is validated in `createProcessingInstruction` which then calls `parseAttributes` I think.

        Do we even need to add the TODO here? The Rust XML parser will not support XSLT, so the only thing where this practically goes wrong in a relevant way is for an `<?xml-stylesheet>` processing instruction with a garbage attribute list. Do we want to throw for that?

        Added comment:
        ```
        // The only way this fails in a relevant way would be a processing instruction
        // inner prose that does not match the style of an attribute list for a
        // xml-stylesheet processing instruction. The result of that would be the
        // style is not applied. We likely do not want to throw an exception for that.
        ```

        What do you think?

        Line 336, Patchset 19: if (pi->IsCSS()) {

        saw_css_ = true;
        } else {
        return;
        }

        CheckIfBlockingStyleSheetAdded();
        David Baron . resolved
        ```suggestion
        if (pi->IsCSS()) {
        saw_css_ = true;
        CheckIfBlockingStyleSheetAdded();
        }
        ```
        Dominik Röttsches

        Certainly.


        // for (const auto& attr : prefixed_attributes) {
        // if (attr.GetName() == html_names::kIsAttr) {
        // is = attr.Value();
        // break;
        // }
        // }
        David Baron . resolved

        Any reason this wouldn't just work if you move it down to after `CollectElementAttributes` like it is in the existing code?

        Dominik Röttsches

        Activated this code, but I did not observe any test coverage so far. Modified the TODO to try and fix coverage.

        File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
        Line 25, Patchset 19:#[allow(unused)]
        Łukasz Anforowicz . resolved

        Can you add a comment why `#[allow(unused)]` is needed + if this is expected to stay here forever or be fixed in the future?

        PS. In case it helps, let me share that some guidance about unused-code-warnings have recently been added to https://chromium.googlesource.com/chromium/src/+/main/docs/rust/dev_experience_tips_and_tricks.md#suppressing-unused-code-warnings

        Dominik Röttsches

        Removed, I think this was a leftover from when I was defining the FFI but not yet using it in the C++ side.

        Line 28, Patchset 19: saw_error: bool,
        Łukasz Anforowicz . resolved

        Is `self.saw_error` the same as `self.error_details.is_some()`?

        Dominik Röttsches

        Good point, using your suggestion.

        Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());
        Łukasz Anforowicz . resolved

        This LGTM, but I wanted to share some notes/thoughts:

        `XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.

        Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

        Dominik Röttsches

        I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.

        I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

        Line 105, Patchset 19: let standalone = standalone

        .map(|is_standalone| if is_standalone { 1 } else { 0 })
        .unwrap_or_else(|| {
        if xml_declaration_view.is_some() {
        return -2; // <?xml instruction present, but
        // standalone not specified
        }
        -1 // No XML declaration at all
        });
        Łukasz Anforowicz . resolved

        Would it be worth adding a comment saying where those constants come from? I assume they correspond to some constants on the C++ side?

        Dominik Röttsches

        Fixed with migration to shared enum.

        Line 151, Patchset 19: &mut attributes,
        &mut new_namespaces,
        Łukasz Anforowicz . unresolved

        It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?

        Dominik Röttsches
        I might be missing something here, but doesn't the `AttributesIterator`, which is
        ```
        pub struct AttributesIterator<'a> {
        attributes: std::slice::Iter<'a, OwnedAttribute>,
        }
        ```
        have a mutable inner state? And fn `attributes_next<'a>(attributes: &mut AttributesIterator<'a>,` needs the argument to be mutable to be able to start iterating over them?
        Line 191, Patchset 19:fn get_last_event_position(read_state: &XmlReadState, row: &mut u64, col: &mut u64) -> bool {
        Łukasz Anforowicz . resolved

        nit: Maybe name this `try_get_last_event_position` to indicate it might fail?

        Dominik Röttsches

        Renamed.

        Line 207, Patchset 19:fn get_error_details(
        Łukasz Anforowicz . resolved

        nit: `try_get_error_details`?

        Dominik Röttsches

        Yes, renamed.

        Line 219, Patchset 19: .map_or(error_string.clone(), |pos| error_string[pos + 1..].to_string());
        Łukasz Anforowicz . resolved

        nitty nit: IIUC this `clone` will execute unconditionally, even when the other branch/lamda will be taken. Can this be avoided?

        Dominik Röttsches

        Moved to `map_or_else`.

        Line 221, Patchset 19: *col = error_details.position().column as u64;
        Łukasz Anforowicz . resolved

        Why is `as u64` needed here? I assume that this is [`TextPosition::column`](https://docs.rs/xml/latest/xml/common/struct.TextPosition.html#method.column) which already returns a `u64`?

        Dominik Röttsches

        Removed, not sure why I had left that in there. I might have changed the FFI type at some point.

        Line 294, Patchset 19: Ok(StartElement { name: _, attributes, namespace: _ }) => {
        Łukasz Anforowicz . resolved
        ```suggestion
        Ok(StartElement { attributes, .. }) => {
        ```
        Dominik Röttsches

        Done

        Line 296, Patchset 19: let mut ret_attributes: Vec<AttributeNameValue> = Vec::new();
        Łukasz Anforowicz . resolved
        ```suggestion
        let mut ret_attributes: Vec<AttributeNameValue> = Vec::with_capacity(attributes.len());
        ```
        Dominik Röttsches

        Good idea, thanks. Done.

        Line 304, Patchset 19: AttributeNameValue { q_name: q_name, value: attribute.value.clone() };
        Łukasz Anforowicz . resolved
        ```suggestion
        AttributeNameValue { q_name, value: attribute.value.clone() };
        ```
        Dominik Röttsches

        Done

        Line 319, Patchset 19:#[allow(unused_unsafe)]
        Łukasz Anforowicz . resolved

        Is this needed? Does that indicate a bug in `cxx`?

        Dominik Röttsches

        Removed, see above, likely a leftover.

        Line 371, Patchset 19: fn saw_error(read_state: &XmlReadState) -> bool;
        Łukasz Anforowicz . unresolved

        LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.

        Dominik Röttsches

        The name is consistent with the `saw_error_` that was ported from the original `XmlDocumentParser` code. Not that I am strongly attached to that, but IMHO it would look worse if we had both, `saw_error` and `has_error`.

        File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h
        Line 15, Patchset 19:enum StandaloneInfo {

        kStandaloneUnspecified = -2,
        kNoXmlDeclaration,
        kStandaloneNo,
        kStandaloneYes
        };
        Łukasz Anforowicz . resolved

        Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.

        Dominik Röttsches

        Good suggestion, I wasn't aware this existing in CXX. Moved to a shared enum. I had one challenge in the callbacks header file, but looks like the enum can be forward declared there.

        File third_party/blink/web_tests/virtual/rust-xml/README.md
        Line 1, Patchset 19:Contains test files for the rust-xml virtual test suite which runs XML parsing through the Rust XML crate.
        Łukasz Anforowicz . resolved

        Please wrap to 80 columns?

        Can you consider mentioning something like?:

        ```
        For more details see go/chrome_x_mitigation and/or https://crbug.com/441911594.
        ```

        Dominik Röttsches

        Wrapped and references added.

        Open in Gerrit

        Related details

        Attention is currently required from:
        • David Baron
        • Kent Tamura
        • Łukasz Anforowicz
        Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement is not satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
        Gerrit-Change-Number: 7088396
        Gerrit-PatchSet: 21
        Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
        Gerrit-Reviewer: David Baron <dba...@chromium.org>
        Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
        Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
        Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
        Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
        Gerrit-CC: Daniel Cheng <dch...@chromium.org>
        Gerrit-CC: Mason Freed <mas...@chromium.org>
        Gerrit-Attention: Łukasz Anforowicz <luk...@chromium.org>
        Gerrit-Attention: Kent Tamura <tk...@chromium.org>
        Gerrit-Attention: David Baron <dba...@chromium.org>
        Gerrit-Comment-Date: Thu, 30 Oct 2025 13:49:45 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: Yes
        Comment-In-Reply-To: Łukasz Anforowicz <luk...@chromium.org>
        Comment-In-Reply-To: David Baron <dba...@chromium.org>
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        Łukasz Anforowicz (Gerrit)

        unread,
        Oct 30, 2025, 11:57:59 AM (yesterday) Oct 30
        to Dominik Röttsches, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
        Attention needed from David Baron, Dominik Röttsches and Kent Tamura

        Łukasz Anforowicz voted and added 9 comments

        Votes added by Łukasz Anforowicz

        Code-Review+1

        9 comments

        Patchset-level comments
        File-level comment, Patchset 22 (Latest):
        Łukasz Anforowicz . resolved

        `third_party/blink/renderer/core/xml/parser/xml_ffi.rs` and `third_party/blink/renderer/core/xml/parser/entities.rs` LGTM (I shared a few additional minor feedback items below - PTAL).

        File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
        Line 25, Patchset 22 (Latest):pub struct XmlReadState<'a> {
        Łukasz Anforowicz . unresolved

        I think this `pub` may be removed - a child `ffi` module should be able to see this `struct` without `pub` (same for `AttributesIterator` and `NamespacesIterator`).

        Line 83, Patchset 22 (Latest): let version: &str = match version {
        XmlVersion::Version10 => "1.0",
        XmlVersion::Version11 => "1.1",
        };
        Łukasz Anforowicz . unresolved

        Maybe in the future this can be replaced with `version.as_str()` - see https://github.com/kornelski/xml-rs/pull/54

        Not sure if it's worth leaving a TODO for this. Up to you.

        Line 151, Patchset 19: &mut attributes,
        &mut new_namespaces,
        Łukasz Anforowicz . resolved

        It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?

        Dominik Röttsches
        I might be missing something here, but doesn't the `AttributesIterator`, which is
        ```
        pub struct AttributesIterator<'a> {
        attributes: std::slice::Iter<'a, OwnedAttribute>,
        }
        ```
        have a mutable inner state? And fn `attributes_next<'a>(attributes: &mut AttributesIterator<'a>,` needs the argument to be mutable to be able to start iterating over them?
        Łukasz Anforowicz

        You're right - I missed this. Thank you for explaining.

        Line 204, Patchset 22 (Latest): let _ = read_state.event_reader.add_entities(entities::HTML5_MAP.iter().copied());
        Łukasz Anforowicz . unresolved

        I wonder if it would be a bit more readable if we used 1) `unwrap()` + a comment, or 2) `expect("Built-in entities shouldn't cause errors")`.

        Line 320, Patchset 22 (Latest):pub mod ffi {
        Łukasz Anforowicz . unresolved

        Is this `pub` needed?

        Line 329, Patchset 22 (Latest): pub struct AttributeNameValue {
        Łukasz Anforowicz . unresolved

        Is this `pub` needed?

        Line 371, Patchset 19: fn saw_error(read_state: &XmlReadState) -> bool;
        Łukasz Anforowicz . resolved

        LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.

        Dominik Röttsches

        The name is consistent with the `saw_error_` that was ported from the original `XmlDocumentParser` code. Not that I am strongly attached to that, but IMHO it would look worse if we had both, `saw_error` and `has_error`.

        Łukasz Anforowicz

        Matching existing C++ names SGTM.

        File third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h
        Line 15, Patchset 19:enum StandaloneInfo {
        kStandaloneUnspecified = -2,
        kNoXmlDeclaration,
        kStandaloneNo,
        kStandaloneYes
        };
        Łukasz Anforowicz . resolved

        Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.

        Dominik Röttsches

        Good suggestion, I wasn't aware this existing in CXX. Moved to a shared enum. I had one challenge in the callbacks header file, but looks like the enum can be forward declared there.

        Łukasz Anforowicz

        Ugh - yeah, the risk of circular dependencies around C++/Rust FFI is quite annoying at times. Although I guess that sometimes it forces a cleaner layering. I am glad you've been able to find a way to make this work.

        Open in Gerrit

        Related details

        Attention is currently required from:
        • David Baron
        • Dominik Röttsches
        • Kent Tamura
        Submit Requirements:
          • requirement satisfiedCode-Coverage
          • requirement is not satisfiedCode-Owners
          • requirement satisfiedCode-Review
          • requirement is not satisfiedNo-Unresolved-Comments
          • requirement satisfiedReview-Enforcement
          Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
          Gerrit-MessageType: comment
          Gerrit-Project: chromium/src
          Gerrit-Branch: main
          Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
          Gerrit-Change-Number: 7088396
          Gerrit-PatchSet: 22
          Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
          Gerrit-Reviewer: David Baron <dba...@chromium.org>
          Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
          Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
          Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
          Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
          Gerrit-CC: Daniel Cheng <dch...@chromium.org>
          Gerrit-CC: Mason Freed <mas...@chromium.org>
          Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
          Gerrit-Attention: Kent Tamura <tk...@chromium.org>
          Gerrit-Attention: David Baron <dba...@chromium.org>
          Gerrit-Comment-Date: Thu, 30 Oct 2025 15:57:48 +0000
          Gerrit-HasComments: Yes
          Gerrit-Has-Labels: Yes
          Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
          Comment-In-Reply-To: Łukasz Anforowicz <luk...@chromium.org>
          satisfied_requirement
          unsatisfied_requirement
          open
          diffy

          Kent Tamura (Gerrit)

          unread,
          Oct 30, 2025, 8:16:49 PM (20 hours ago) Oct 30
          to Dominik Röttsches, Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
          Attention needed from David Baron and Dominik Röttsches

          Kent Tamura added 4 comments

          File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
          Line 169, Patchset 22 (Latest): prefix.empty() ? AtomicString(RustStrToWtfString(local_name))
          Kent Tamura . unresolved

          nit: We might want to add `RustStrToAtomicString()`.

          File third_party/blink/web_tests/NeverFixTests
          Line 1856, Patchset 22 (Latest):
          Kent Tamura . unresolved

          nit: looks an unnecessary change

          Line 1858, Patchset 15:# XSLT tests disabled with Rust XML parser, no support for XSLT
          Kent Tamura . resolved

          Does it mean we can't ship the Rust XML parser until we remove XSLT support completely?

          Dominik Röttsches

          I am investigating transition options: We could use it right away for SVG files or all non-XSLT cases if we are willing to run a bit of a scan up front. We could also potentially run it as an internal dogfood right away.

          Kent Tamura

          Expanding coverage incrementally sounds a very good approach.

          File third_party/blink/web_tests/VirtualTestSuites
          Kent Tamura . resolved

          This adds 1200+ tests, and it's too large for a single virtual test.
          Can we reduce the number of tests to less than 200, or set up a dedicated build bot?

          Dominik Röttsches

          Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.

          Dominik Röttsches

          I found information on how to set up a separate trybot, but I would also like to explore starting to use the Rust XML parser for parts of real world workloads - which would make it part of mainline testing. I'll get back on that.

          Kent Tamura

          Ack.

          Open in Gerrit

          Related details

          Attention is currently required from:
          • David Baron
          • Dominik Röttsches
          Gerrit-Attention: David Baron <dba...@chromium.org>
          Gerrit-Comment-Date: Fri, 31 Oct 2025 00:16:09 +0000
          Gerrit-HasComments: Yes
          Gerrit-Has-Labels: No
          Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
          Comment-In-Reply-To: Kent Tamura <tk...@chromium.org>
          satisfied_requirement
          unsatisfied_requirement
          open
          diffy

          Dominik Röttsches (Gerrit)

          unread,
          8:22 AM (8 hours ago) 8:22 AM
          to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
          Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

          Dominik Röttsches added 8 comments

          Patchset-level comments
          File-level comment, Patchset 26 (Latest):
          Dominik Röttsches . resolved

          @luk...@chromium.org Could you renew your +1?

          File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
          Line 169, Patchset 22: prefix.empty() ? AtomicString(RustStrToWtfString(local_name))
          Kent Tamura . resolved

          nit: We might want to add `RustStrToAtomicString()`.

          Dominik Röttsches

          Added.

          File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
          Line 25, Patchset 22:pub struct XmlReadState<'a> {
          Łukasz Anforowicz . resolved

          I think this `pub` may be removed - a child `ffi` module should be able to see this `struct` without `pub` (same for `AttributesIterator` and `NamespacesIterator`).

          Dominik Röttsches

          Done

          Line 83, Patchset 22: let version: &str = match version {

          XmlVersion::Version10 => "1.0",
          XmlVersion::Version11 => "1.1",
          };
          Łukasz Anforowicz . resolved

          Maybe in the future this can be replaced with `version.as_str()` - see https://github.com/kornelski/xml-rs/pull/54

          Not sure if it's worth leaving a TODO for this. Up to you.

          Dominik Röttsches

          TODO added.

          Line 204, Patchset 22: let _ = read_state.event_reader.add_entities(entities::HTML5_MAP.iter().copied());
          Łukasz Anforowicz . resolved

          I wonder if it would be a bit more readable if we used 1) `unwrap()` + a comment, or 2) `expect("Built-in entities shouldn't cause errors")`.

          Dominik Röttsches

          Shortened this with an iteration of the entity maps and added `.expect()`s.

          Line 320, Patchset 22:pub mod ffi {
          Łukasz Anforowicz . resolved

          Is this `pub` needed?

          Dominik Röttsches

          Removed.

          Line 329, Patchset 22: pub struct AttributeNameValue {
          Łukasz Anforowicz . resolved

          Is this `pub` needed?

          Dominik Röttsches

          Removed.

          File third_party/blink/web_tests/NeverFixTests
          Line 1856, Patchset 22:
          Kent Tamura . resolved

          nit: looks an unnecessary change

          Dominik Röttsches

          Done

          Open in Gerrit

          Related details

          Attention is currently required from:
          • David Baron
          • Kent Tamura
          • Łukasz Anforowicz
          Submit Requirements:
            • requirement satisfiedCode-Coverage
            • requirement is not satisfiedCode-Owners
            • requirement is not satisfiedCode-Review
            • requirement is not satisfiedNo-Unresolved-Comments
            • requirement is not satisfiedReview-Enforcement
            Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
            Gerrit-MessageType: comment
            Gerrit-Project: chromium/src
            Gerrit-Branch: main
            Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
            Gerrit-Change-Number: 7088396
            Gerrit-PatchSet: 26
            Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: David Baron <dba...@chromium.org>
            Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
            Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
            Gerrit-CC: Daniel Cheng <dch...@chromium.org>
            Gerrit-CC: Mason Freed <mas...@chromium.org>
            Gerrit-Attention: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-Attention: Kent Tamura <tk...@chromium.org>
            Gerrit-Attention: David Baron <dba...@chromium.org>
            Gerrit-Comment-Date: Fri, 31 Oct 2025 12:21:56 +0000
            Gerrit-HasComments: Yes
            Gerrit-Has-Labels: No
            Comment-In-Reply-To: Łukasz Anforowicz <luk...@chromium.org>
            Comment-In-Reply-To: Kent Tamura <tk...@chromium.org>
            satisfied_requirement
            unsatisfied_requirement
            open
            diffy

            Dominik Röttsches (Gerrit)

            unread,
            9:37 AM (7 hours ago) 9:37 AM
            to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
            Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

            Dominik Röttsches voted and added 1 comment

            Votes added by Dominik Röttsches

            Commit-Queue+1

            1 comment

            File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
            Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());
            Łukasz Anforowicz . unresolved

            This LGTM, but I wanted to share some notes/thoughts:

            `XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.

            Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

            Dominik Röttsches

            I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.

            I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

            Dominik Röttsches

            @luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).

            If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.

            Gerrit-Comment-Date: Fri, 31 Oct 2025 13:36:54 +0000
            Gerrit-HasComments: Yes
            Gerrit-Has-Labels: Yes
            Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
            Comment-In-Reply-To: Łukasz Anforowicz <luk...@chromium.org>
            satisfied_requirement
            unsatisfied_requirement
            open
            diffy

            Łukasz Anforowicz (Gerrit)

            unread,
            11:35 AM (5 hours ago) 11:35 AM
            to Dominik Röttsches, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
            Attention needed from David Baron, Dominik Röttsches and Kent Tamura

            Łukasz Anforowicz added 1 comment

            File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
            Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());
            Łukasz Anforowicz . unresolved

            This LGTM, but I wanted to share some notes/thoughts:

            `XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.

            Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

            Dominik Röttsches

            I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.

            I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

            Dominik Röttsches

            @luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).

            If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.

            Łukasz Anforowicz

            Thanks for following up. Using `Vec` makes sense given your explanation. Maybe it's worth leaving a comment explaining why we can't use `VecDeque`?

            Open in Gerrit

            Related details

            Attention is currently required from:
            • David Baron
            • Dominik Röttsches
            • Kent Tamura
            Submit Requirements:
            • requirement satisfiedCode-Coverage
            • requirement is not satisfiedCode-Owners
            • requirement is not satisfiedCode-Review
            • requirement is not satisfiedNo-Unresolved-Comments
            • requirement is not satisfiedReview-Enforcement
            Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
            Gerrit-MessageType: comment
            Gerrit-Project: chromium/src
            Gerrit-Branch: main
            Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
            Gerrit-Change-Number: 7088396
            Gerrit-PatchSet: 26
            Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: David Baron <dba...@chromium.org>
            Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
            Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
            Gerrit-CC: Daniel Cheng <dch...@chromium.org>
            Gerrit-CC: Mason Freed <mas...@chromium.org>
            Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Attention: Kent Tamura <tk...@chromium.org>
            Gerrit-Attention: David Baron <dba...@chromium.org>
            Gerrit-Comment-Date: Fri, 31 Oct 2025 15:35:44 +0000
            Gerrit-HasComments: Yes
            Gerrit-Has-Labels: No
            satisfied_requirement
            unsatisfied_requirement
            open
            diffy

            Dominik Röttsches (Gerrit)

            unread,
            11:48 AM (4 hours ago) 11:48 AM
            to Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
            Attention needed from David Baron, Kent Tamura and Łukasz Anforowicz

            Dominik Röttsches added 1 comment

            File third_party/blink/renderer/core/xml/parser/xml_ffi.rs
            Line 36, Patchset 19: let cursor = Cursor::new(Vec::new());
            Łukasz Anforowicz . resolved

            This LGTM, but I wanted to share some notes/thoughts:

            `XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.

            Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.

            Dominik Röttsches

            I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.

            I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.

            Dominik Röttsches

            @luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).

            If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.

            Łukasz Anforowicz

            Thanks for following up. Using `Vec` makes sense given your explanation. Maybe it's worth leaving a comment explaining why we can't use `VecDeque`?

            Dominik Röttsches

            Comment added, under the assumption you meant we keep `Vec` wrapped in `Cursor`.

            Open in Gerrit

            Related details

            Attention is currently required from:
            • David Baron
            • Kent Tamura
            • Łukasz Anforowicz
            Submit Requirements:
            • requirement satisfiedCode-Coverage
            • requirement is not satisfiedCode-Owners
            • requirement is not satisfiedCode-Review
            • requirement is not satisfiedNo-Unresolved-Comments
            • requirement is not satisfiedReview-Enforcement
            Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
            Gerrit-MessageType: comment
            Gerrit-Project: chromium/src
            Gerrit-Branch: main
            Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
            Gerrit-Change-Number: 7088396
            Gerrit-PatchSet: 26
            Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: David Baron <dba...@chromium.org>
            Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
            Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
            Gerrit-CC: Daniel Cheng <dch...@chromium.org>
            Gerrit-CC: Mason Freed <mas...@chromium.org>
            Gerrit-Attention: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-Attention: Kent Tamura <tk...@chromium.org>
            Gerrit-Attention: David Baron <dba...@chromium.org>
            Gerrit-Comment-Date: Fri, 31 Oct 2025 15:48:10 +0000
            satisfied_requirement
            unsatisfied_requirement
            open
            diffy

            David Baron (Gerrit)

            unread,
            11:57 AM (4 hours ago) 11:57 AM
            to Dominik Röttsches, Łukasz Anforowicz, Kent Tamura, AI Code Reviewer, David Baron, Mason Freed, Daniel Cheng, Chromium LUCI CQ, chromium...@chromium.org, blink-re...@chromium.org, blink-revi...@chromium.org, blink-...@chromium.org, chromium-a...@chromium.org, dom+...@chromium.org, dominicc+...@chromium.org, extension...@chromium.org, hiroshig...@chromium.org, jmedle...@chromium.org, kinuko...@chromium.org, kouhei...@chromium.org
            Attention needed from Dominik Röttsches, Kent Tamura and Łukasz Anforowicz

            David Baron added 12 comments

            Patchset-level comments
            File-level comment, Patchset 27 (Latest):
            David Baron . resolved

            OK, I think I've finally gotten through the whole thing. Here are my remaining comments.

            File third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc
            Line 59, Patchset 19:// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
            // from an XML/HTML DOCTYPE declaration.
            David Baron . unresolved

            This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.

            Dominik Röttsches

            Fully agree and that's what I intended to express with the comment in the next line. Rephrased comment to

            ```
            // This is a workaround until we hopefully can add missing DOCTYPE
            // parsing functionality in upstream.
            ```

            David Baron

            But maybe explicitly note that it doesn't handle internal subsets?

            Line 323, Patchset 19: DummyExceptionStateForTesting exception_state;
            David Baron . unresolved

            This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)

            Dominik Röttsches

            This should probably use ASSERT_NO_EXCEPTION. The only reasons Document::createProcessingInstruction can throw are reasons that the rust XML parser should have already given errors on, I think.

            Is that so? In my understanding the Rust XML parser would not validate a processing instruction inner prose, whereas we expect it to be well-formed in the sense of matching an attribute key-value list pattern, which is validated in `createProcessingInstruction` which then calls `parseAttributes` I think.

            Do we even need to add the TODO here? The Rust XML parser will not support XSLT, so the only thing where this practically goes wrong in a relevant way is for an `<?xml-stylesheet>` processing instruction with a garbage attribute list. Do we want to throw for that?

            Added comment:
            ```
            // The only way this fails in a relevant way would be a processing instruction
            // inner prose that does not match the style of an attribute list for a
            // xml-stylesheet processing instruction. The result of that would be the
            // style is not applied. We likely do not want to throw an exception for that.
            ```

            What do you think?

            David Baron

            The only reason that `Document::createProcessingInstruction` should throw is if:

            • the `target` given is not a valid XML name
            • the `data` provided contains the string `?>`

            I think the XML parser should have given an error for either of these.

            Line 406, Patchset 26: bool is_first_element = !saw_first_element_;
            saw_first_element_ = true;
            David Baron . unresolved

            While it's fine given the current uses of this variable, maybe move this down to before `HandleNamespaceAttributes` like it is in the existing parser, just in case `saw_first_element_` ends up being used for something else outside this function?

            Line 409, Patchset 26: AtomicString is;
            David Baron . unresolved

            Probably move this declaration down near the code that assigns it, like in the current parser?

            Line 420, Patchset 26: // empty NS url, resolve it against the initially preserved namespace
            David Baron . unresolved

            So I think there's a problem here relative to the existing parser. The code in the existing parser distinguishes between a null string and an empty string, but this code doesn't.

            That distinction is needed to maintain the difference between `<elem>` (which is a start tag for an element that is in the default namespace of its parent element) and `<elem xmlns="">` (which is the start tag for an element in no namespace that changes the default namespace to being no namespace). It seems like this code breaks that distinction in fragment parsing. (It wouldn't surprise me if there are no tests for this, but I think it is a real distinction.)

            I'm not sure if it will be easier to fix by getting the null-versus-empty distinction out of the rust parser, or by setting up the namespaces inside of the rust parser for fragment parsing and dropping this fixup.

            Line 459, Patchset 26: // TODO(https://crbug.com/441911594): Test coverage for this?
            David Baron . unresolved

            For what it's worth, I see a single test that I expect might exercise this code, and it's `third_party/blink/web_tests/external/wpt/custom-elements/parser/parser-uses-create-an-element-for-a-token-svg.svg`

            (I didn't actually test this.)

            Line 467, Patchset 26: QualifiedName q_name(
            David Baron . unresolved

            Two comments here, though I'm not sure whether either is a concrete problem:

            1. In the existing parser there was some code to concatenate prefix and localname in certain cases and put it all back into the localname (but only if the adjusted_uri was empty and the prefix was not). I'm not sure why that code was there... but it's possible it's important?

            2. I'm again a little worried about null-versus-empty distinctions here being different from the old parser and that distinction being important. Unlike the case above I don't have any specific examples in mind yet, though.

            Line 493, Patchset 26: // TODO(https://crbug.com/441911594): Port custom element support from
            // XMLDocumentParser.
            David Baron . unresolved

            This might also be needed for the one SVG custom elements test mentioned above. (Though the code in the existing parser was also a little higher up in the function, which might matter.)

            Line 588, Patchset 26: String cdata_string = RustStrToWtfString(data);
            David Baron . unresolved

            I think it's probably stylistically better here to *not* have the `cdata_string` variable but instead just write it in the function call (like you do in the `Comment` function just below).

            This is because if `CDataSection::Create` had a `String&&` overload, it would be more efficient since it would use that one. (It doesn't right now, but it probably should!) Using `std::move(cdata_string)` would have the same effect, but with more verbosity.

            (I write this partly because reading this code led me to check to see whether there was a `String&&` version, whereas if I'd seen what I'm suggesting I wouldn't have felt the need to check.)

            Line 703, Patchset 26: Flush();
            David Baron . unresolved

            I can't figure out what `Flush()` this is calling. Where is it?

            Line 847, Patchset 26: DCHECK(!IsDetached());
            David Baron . unresolved

            The old parser didn't do anything here when `parsing_fragment_` was true -- is there a need for that?

            Open in Gerrit

            Related details

            Attention is currently required from:
            • Dominik Röttsches
            • Kent Tamura
            • Łukasz Anforowicz
            Submit Requirements:
            • requirement satisfiedCode-Coverage
            • requirement is not satisfiedCode-Owners
            • requirement is not satisfiedCode-Review
            • requirement is not satisfiedNo-Unresolved-Comments
            • requirement is not satisfiedReview-Enforcement
            Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
            Gerrit-MessageType: comment
            Gerrit-Project: chromium/src
            Gerrit-Branch: main
            Gerrit-Change-Id: If794513791d91cbfa06927dad38cbdf90b7935b1
            Gerrit-Change-Number: 7088396
            Gerrit-PatchSet: 27
            Gerrit-Owner: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: David Baron <dba...@chromium.org>
            Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Reviewer: Kent Tamura <tk...@chromium.org>
            Gerrit-Reviewer: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-CC: AI Code Reviewer <peep-gen...@system.gserviceaccount.com>
            Gerrit-CC: Daniel Cheng <dch...@chromium.org>
            Gerrit-CC: Mason Freed <mas...@chromium.org>
            Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
            Gerrit-Attention: Łukasz Anforowicz <luk...@chromium.org>
            Gerrit-Attention: Kent Tamura <tk...@chromium.org>
            Gerrit-Comment-Date: Fri, 31 Oct 2025 15:57:51 +0000
            Gerrit-HasComments: Yes
            Gerrit-Has-Labels: No
            Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
            Comment-In-Reply-To: David Baron <dba...@chromium.org>
            satisfied_requirement
            unsatisfied_requirement
            open
            diffy
            Reply all
            Reply to author
            Forward
            0 new messages