| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
  bool saw_error_ = false;nit: Blink Style Guide: Naming - Precede boolean values with words like “is” and “did”. Please consider renaming `saw_error_` to `did_see_error_`.
To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason
This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_
  void end();Blink Style Guide: Naming - Use 'CamelCase' for all function names. All function names should use `CamelCase()`-style names, beginning with an uppercase letter. Please consider renaming `end` to `End`.
To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reason
This comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
reillig@, PTAL at `omnibox_api_interactive_test.cc` & `web_navigation_apitest.cc`
tkent@, PTAL at `audit_non_blink_usage.py`
  bool saw_error_ = false;nit: Blink Style Guide: Naming - Precede boolean values with words like “is” and “did”. Please consider renaming `saw_error_` to `did_see_error_`.
To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reasonThis comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_
Won't fix: Keep consistent with `XMLDocumentParser`.
Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.
I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.
I also wouldn't be offended if you prefer to find a different reviewer given that.
Introduce an XMLDocumentParserRs, which is instantiated in DocumentThis feels a little bit like it's undoing https://chromium.googlesource.com/chromium/src/+/c99c39aed8f0f6007a7a5ce8565d91e353ed332b%5E%21/ except I don't think it is because that (and followup refactorings) were removing a permanent mechanism for having different XML libraries, whereas this is intended to be temporary so it makes sense for it to be structured differently (e.g., via flags).
Bug: chromium:435623334Should this really be going to the XSLT bug, or is there a more appropriate bug? (Also, no need for `chromium:`.)
Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.
(I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)
If you consider yourself an expert on rust `BUILD.gn` stuff at this point, then this is fine... but if not I'd appreciate if you could ask someone who you do consider to be an expert in that to review this file.
I guess this is OK if you're reasonably confident we're going to be able to live with this behavior change, but otherwise I might defer making this sort of test change until you are or until you feel close to enabling the flag (or maybe you do now?).
I didn't review the correctness of the contents of this file -- I'm assuming you got it from a reliable source.
// https://www.w3.org/MarkUp/DTD/xhtml-lat1.entI'm not sure whether this URL or https://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent is a better URL to reference. The files are formatted differently but hopefully have identical contents...
(Same for the other 2 parts.)
At least they're all the same other than comments and the order of the first few entries of the special map.
// https://www.w3.org/MarkUp/DTD/xhtml-special.entNote that this excludes `<` and `&` which are in that file, though not the other three entities in https://www.w3.org/TR/REC-xml/#sec-predefined-ent . Might be worth revising the comment but it's a little inconsistent with the comment right now.
class XMLDocumentParserRs final : public ScriptableDocumentParser,Is it worth considering using `Rust` in the name rather than `Rs`? It seems like it might be a little clearer.
I think it makes sense to move the three *text* expectations (but not the image expectation) to not have `platform/linux/` in the path.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
Is it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.
(I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)
I'm okay landing these changes though I'm a little confused why parsing `<tag><match></match>` is throwing an "Unexpected closing tag" error when the problem is a _missing_ closing tag (or unexpected end of file).
static inline bool HasNoStyleInformation(Document* document) {`static` is unnecessary.
String rustStrToWtfString(rust::Str str) {The name should be `RustStrToWtfString`.
void extract_doctype_parts(const std::string& doctype_string,The name should be `ExtractDocTypeParts`.
  GetDocument()->setXMLVersion(String(version), ASSERT_NO_EXCEPTION);Don't we use `rustStrToWtfString()`?
static inline bool HandleNamespaceAttributes(We prefer functions in anonymous namespace to `static` functions.
static inline bool CollectElementAttributes(Ditto.
# XSLT tests disabled with Rust XML parser, no support for XSLTDoes it mean we can't ship the Rust XML parser until we remove XSLT support completely?
    "bases": [This adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?
Here's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.
I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.
I also wouldn't be offended if you prefer to find a different reviewer given that.
Thanks - I'll add Łukasz for the FFI parts.
@luk...@chromium.org Could you please take a look at the FFI parts towards the Rust XML library of this change including the `core/BUILD.gn` change?
Others, thanks for the reviews so far. I addressed most comments, some feedback/clarifications below.
Should this really be going to the XSLT bug, or is there a more appropriate bug? (Also, no need for `chromium:`.)
Yes, this should be 441911594.
Reilly GrantIs it worth considering holding off on these two changes in `chrome/`? It seems like you don't *need* to land them at this point given that the feature isn't enabled by default, and it might be easier to track the set of things that are different if you don't land them.
(I'm OK either way, but figured it's worth mentioning. I think maybe it depends on whether you're likely to try to fix the differences along the way or just want to accept them.)
I'm okay landing these changes though I'm a little confused why parsing `<tag><match></match>` is throwing an "Unexpected closing tag" error when the problem is a _missing_ closing tag (or unexpected end of file).
I'll split these off as the flag remains default off in this test and to meet @dbaron's expectations for keeping this CL a little smaller.
The assertion failure without modifying the expectation is:
```
[232535:232535:1029/115906.485593:INFO:CONSOLE:3] "[FAIL] invalidXml_NoClosingTag: error on line 1 at column 53: Unexpected closing tag: fragment != tag
```
Which means that the `description:` argument 
```
chrome.omnibox.setDefaultSuggestion(
{description: '<tag> <match>match</match> world'},
```
is not the full XML, but there is a `</fragment>` closing tag in surrounding XML.
Semantically libxml2's and the Rust XML parser's error message align here.
If you consider yourself an expert on rust `BUILD.gn` stuff at this point, then this is fine... but if not I'd appreciate if you could ask someone who you do consider to be an expert in that to review this file.
Not sure if expert, but I have done a few of these by now - I think this is a pretty standard snippet for adding a Rust library with FFI. But see separate comment, I will add Łukasz for Rust/FFI parts of this change.
I guess this is OK if you're reasonably confident we're going to be able to live with this behavior change, but otherwise I might defer making this sort of test change until you are or until you feel close to enabling the flag (or maybe you do now?).
I'll split this off. For development purposes, and for running a separate trybot, it's useful to have the unit tests already pass with the Rust XML parser.
I didn't review the correctness of the contents of this file -- I'm assuming you got it from a reliable source.
Ack, I generated these locally by transforming spec reference .ent files.
I'm not sure whether this URL or https://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent is a better URL to reference. The files are formatted differently but hopefully have identical contents...
(Same for the other 2 parts.)
At least they're all the same other than comments and the order of the first few entries of the special map.
Changed link to the one you're suggesting - here and below.
Note that this excludes `<` and `&` which are in that file, though not the other three entities in https://www.w3.org/TR/REC-xml/#sec-predefined-ent . Might be worth revising the comment but it's a little inconsistent with the comment right now.
Removed gt, apos, quot and clarified comment.
  void end();Blink Style Guide: Naming - Use 'CamelCase' for all function names. All function names should use `CamelCase()`-style names, beginning with an uppercase letter. Please consider renaming `end` to `End`.
To keep this interaction as brief and non-intrusive as possible, please consider responding with one of following options:
**Done** | **Won't fix**: reason | **b/<bug_id>** | **Invalid:** reasonThis comment was generated by [Experimental Blink C++ Code Review Agent](http://go/blink-c++-code-review-agent).AI reviews can sometimes be inaccurate; We appreciate your 🙏 feedback 🙏 to help us improve.[File a bug](http://go/blink-c++-code-review-agent-feedback) | [Provide feedback on chat](https://chat.google.com/room/AAQA0zhQHe0?cls=4) | [Opt-out](https://ganpati2.corp.google.com/group/peep-genai-blink-agent-optout.prod)_
Invalid: This was a leftover and the function does no longer exist in the impl file.
class XMLDocumentParserRs final : public ScriptableDocumentParser,Is it worth considering using `Rust` in the name rather than `Rs`? It seems like it might be a little clearer.
Rs seems to me to be a pretty standard suffix, compare many crate names ending on -rs https://crates.io/search?q=-rs and the Rust source file extension being .rs - but I am not insisting on that if you'd prefer it to be changed.
static inline bool HasNoStyleInformation(Document* document) {Dominik Röttsches`static` is unnecessary.
Done
The name should be `RustStrToWtfString`.
Done
void extract_doctype_parts(const std::string& doctype_string,The name should be `ExtractDocTypeParts`.
Done
  GetDocument()->setXMLVersion(String(version), ASSERT_NO_EXCEPTION);Don't we use `rustStrToWtfString()`?
Good catch, thanks. Done.
We prefer functions in anonymous namespace to `static` functions.
Done
static inline bool CollectElementAttributes(Dominik RöttschesDitto.
Done
static inline void SetAttributes(Dominik RöttschesDitto.
Done
  kNoXMlDeclaration,Dominik Röttsches`kNoXMl` -> `kNoXml`
Done
# XSLT tests disabled with Rust XML parser, no support for XSLTDoes it mean we can't ship the Rust XML parser until we remove XSLT support completely?
I am investigating transition options: We could use it right away for SVG files or all non-XSLT cases if we are willing to run a bit of a scan up front. We could also potentially run it as an internal dogfood right away.
This adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?
Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.
I think it makes sense to move the three *text* expectations (but not the image expectation) to not have `platform/linux/` in the path.
Removed these as @tkent asked me to reduce set of virtual tests.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
Dominik RöttschesHere's an initial set of comments on a few parts of the CL. I haven't really gotten to the core of it yet, but at this point I have at least a rough idea of what's in this CL.
I think this is going to be somewhat slow because reviewing this will involve learning a bunch of new things about Rust. I do have some familiarity with Rust (though somewhat limited), but I haven't worked on any FFI code before. I'm willing to try to get through it, but it's going to be a bit slow because this is pretty large. I also know enough to know that I probably shouldn't be reviewing any `unsafe` rust, so probably someone qualified to do that should be reviewing at least the `unsafe` parts.
I also wouldn't be offended if you prefer to find a different reviewer given that.
Thanks - I'll add Łukasz for the FFI parts.
PS: Perhaps to reiterate the context, as mentioned in the commit message, most of the behavior of the new `XMLDocumentParserRs` is ported from `XMLDocumentParser` which does a couple of interesting gymnastics to comply with state transition expectations from `Document` and `DocumentParser`.
    "bases": [Dominik RöttschesThis adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?
Changed to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.
I found information on how to set up a separate trybot, but I would also like to explore starting to use the Rust XML parser for parts of real world workloads - which would make it part of mainline testing. I'll get back on that.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
This by now fairly complete in the sense of reliably passing a wide setMissing word?: "This is by now fairly complete"?
  build_native_rust_unit_tests = trueIs `build_native_rust_unit_tests` needed and intentional here? The official [guidance](https://google.github.io/comprehensive-rust/chromium/testing.html#:~:text=Native%20Rust%20tests%20(i.e.%20%23%5Btest%5D).%20Discouraged%20outside%20of%20//third_party/rust.) that the team landed on is that "Native Rust tests [are] discouraged outside of //third_party/rust.".
PS. Maybe we should copy this guidance to `//docs/rust/...` or somewhere else?
}I am not a GN expert, but I was a bit surprised that some new `.h` / `.cc` files from this CL do not have corresponding `BUILD.gn` changes. Examples:
#[allow(unused)]Can you add a comment why `#[allow(unused)]` is needed + if this is expected to stay here forever or be fixed in the future?
PS. In case it helps, let me share that some guidance about unused-code-warnings have recently been added to https://chromium.googlesource.com/chromium/src/+/main/docs/rust/dev_experience_tips_and_tricks.md#suppressing-unused-code-warnings
    saw_error: bool,Is `self.saw_error` the same as `self.error_details.is_some()`?
    let cursor = Cursor::new(Vec::new());This LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.
                    let standalone = standalone
                        .map(|is_standalone| if is_standalone { 1 } else { 0 })
                        .unwrap_or_else(|| {
                            if xml_declaration_view.is_some() {
                                return -2; // <?xml instruction present, but
                                           // standalone not specified
                            }
                            -1 // No XML declaration at all
                        });Would it be worth adding a comment saying where those constants come from? I assume they correspond to some constants on the C++ side?
                        &mut attributes,
                        &mut new_namespaces,It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?
fn get_last_event_position(read_state: &XmlReadState, row: &mut u64, col: &mut u64) -> bool {nit: Maybe name this `try_get_last_event_position` to indicate it might fail?
fn get_error_details(nit: `try_get_error_details`?
            .map_or(error_string.clone(), |pos| error_string[pos + 1..].to_string());nitty nit: IIUC this `clone` will execute unconditionally, even when the other branch/lamda will be taken. Can this be avoided?
        *col = error_details.position().column as u64;Why is `as u64` needed here? I assume that this is [`TextPosition::column`](https://docs.rs/xml/latest/xml/common/struct.TextPosition.html#method.column) which already returns a `u64`?
            Ok(StartElement { name: _, attributes, namespace: _ }) => {```suggestion
Ok(StartElement { attributes, .. }) => {
```
                let mut ret_attributes: Vec<AttributeNameValue> = Vec::new();```suggestion
let mut ret_attributes: Vec<AttributeNameValue> = Vec::with_capacity(attributes.len());
```
                        AttributeNameValue { q_name: q_name, value: attribute.value.clone() };```suggestion
AttributeNameValue { q_name, value: attribute.value.clone() };
```
#[allow(unused_unsafe)]Is this needed? Does that indicate a bug in `cxx`?
        fn saw_error(read_state: &XmlReadState) -> bool;LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.
enum StandaloneInfo {
  kStandaloneUnspecified = -2,
  kNoXmlDeclaration,
  kStandaloneNo,
  kStandaloneYes
};Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.
Contains test files for the rust-xml virtual test suite which runs XML parsing through the Rust XML crate.Please wrap to 80 columns?
Can you consider mentioning something like?:
```
For more details see go/chrome_x_mitigation and/or https://crbug.com/441911594.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
Here's another round of partial comments. (This is likely it for today, but I'll resume tomorrow.)
  bool is_xhtml_document_ = false;currently unused, but maybe should be used?
  bool parsing_fragment_;```suggestion
const bool parsing_fragment_;
```
  void DoWrite(const String&);
  void DoEnd();Both of these are also unimplemented.
  bool AppendFragmentSource(const String&);Also unimplemented.
  void SetScriptStartPosition(TextPosition);Also unimplemented.
  static bool SupportsXMLVersion(const String&);Also unimplemented.
  static bool ParseDocumentFragment(const String&,
                                    DocumentFragment*,
                                    Element* parent,
                                    ParserContentPolicy,
                                    ExceptionState&);Also unimplemented.
  bool IsCurrentlyParsing8BitChunk();Also unimplemented.
  void SetIsXHTMLDocument(bool is_xhtml);
  bool IsXHTMLDocument() const;There's no implementation of these methods and they appear unused. (Why not inline like the current parser?) But maybe there should be?
// Copyright 2025 The Chromium Authors
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.If the origin of this file is a copy of `xml_document_parser.h` (I'm not sure whether that's the case), I think it should probably preserve the original license header. Same for the `.cc`.
One other thought:
Since you need to test and branch based on the flag at multiple points, it seems like it might be good to:
just to make sure that you've gotten all of the entry points (and that we continue to do so until the flag is removed).
(I don't feel strongly about the exact approach (e.g., probably OK with `DCHECK()` instead of `CHECK()` if you feel it's justified) -- but we've certainly had cases in the past where we've found that a flag didn't enable all of what it was supposed to, and then fully enabling it took longer than expected -- so I think it would be good to be a bit more defensive here to ensure we don't do that here.)
  return String::FromUTF8(static_cast<std::string>(str));Per the bullet points in https://google.github.io/styleguide/cppguide.html#Casting I think the preference is
```suggestion
return String::FromUTF8(std::string(str));
```
assuming it works. Though I'm wondering if there's a simpler or cheaper way. At least maybe if you can go through `std::string_view` rather than `std::string` you can save a copy? So maybe instead:
```suggestion
return String::FromUTF8(base::RustStrToStringView(str));
```
// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
// from an XML/HTML DOCTYPE declaration.This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.
      "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"Whitespace isn't allowed within `<!DOCTYPE`
  // 1. Regex for PUBLIC-style statement: Captures name, public id, system id.
  const char* kPublicRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
      "$";
  // 2. Regex for SYSTEM or simple statement, captures name and system id.
  const char* kSystemRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)(?:\\s+SYSTEM\\s+\"([^\"]*)\")?\\s*(?:\\[[^\\]]*"
      "\\])?\\s*>$";These strings would be more readable with C++ raw strings.
          {g_xmlns_with_colon, AtomicString(RustStrToWtfString(prefix))}));Atomizing a string you're about to concatenate seems unnecessary.
```suggestion
{g_xmlns_with_colon, RustStrToWtfString(prefix)}));
```
(I realize that's a problem in the current parser too.)
        Attribute(*parsed_name, AtomicString(RustStrToWtfString(uri))));```suggestion
Attribute(std::move(*parsed_name), AtomicString(RustStrToWtfString(uri))));
```
(I realize that's from the existing code, which does it in `CollectElementAttributes` but not `HandleNamespaceAttributes`.)
static const unsigned kMaxXMLTreeDepth = 5000;Probably preserve the existing `// FIXME: HTMLConstructionSite has a limit of 512, should these match?`
      parsing_fragment_(false) {}preserve the code for the `WebFeature::kXMLDocument` use counter?
      script_runner_(nullptr),Preserve the `// Don't execute scripts for document fragments.` comment?
  xml_ffi::append_to_source(*read_state_,Do you know if it's ok that it does this even when `parser_paused_` is true? (I don't understand all the implications of the current code here...)
  GetDocument()->setXMLVersion(RustStrToWtfString(version),Do you need to check SupportsXMLVersion here? (Though the existing code has a comment that disagrees with the existing code!)
In particular, it seems like if you don't the `ASSERT_NO_EXCEPTION` might assert, unless the rust parser's internals have *already* checked the XML version.
  DummyExceptionStateForTesting exception_state;This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)
  if (pi->IsCSS()) {
    saw_css_ = true;
  } else {
    return;
  }
  CheckIfBlockingStyleSheetAdded();```suggestion
if (pi->IsCSS()) {
saw_css_ = true;
CheckIfBlockingStyleSheetAdded();
}
```
  // TODO(https://crbug.com/441911594): Handle is attribute.
  // for (const auto& attr : prefixed_attributes) {
  //   if (attr.GetName() == html_names::kIsAttr) {
  //     is = attr.Value();
  //     break;
  //   }
  // }Any reason this wouldn't just work if you move it down to after `CollectElementAttributes` like it is in the existing code?
| Commit-Queue | +1 | 
Thank you both for your thorough reviews and helpful feedback. IMO the CL has improved in multiple ways thanks to that.
Most comments addressed, a couple of items left open for discussion in the below.
This by now fairly complete in the sense of reliably passing a wide setMissing word?: "This is by now fairly complete"?
Done
Is `build_native_rust_unit_tests` needed and intentional here? The official [guidance](https://google.github.io/comprehensive-rust/chromium/testing.html#:~:text=Native%20Rust%20tests%20(i.e.%20%23%5Btest%5D).%20Discouraged%20outside%20of%20//third_party/rust.) that the team landed on is that "Native Rust tests [are] discouraged outside of //third_party/rust.".
PS. Maybe we should copy this guidance to `//docs/rust/...` or somewhere else?
Removed, I think I had had one test there earlier that I later removed.
I am not a GN expert, but I was a bit surprised that some new `.h` / `.cc` files from this CL do not have corresponding `BUILD.gn` changes. Examples:
- `third_party/blink/renderer/core/xml/parser/xml_document_parser_rs.cc`
- `third_party/blink/renderer/core/xml/parser/xml_ffi_callbacks.h`
See `third_party/blink/renderer/core/xml/build.gni`.
currently unused, but maybe should be used?
See above, controls entity support, state kept in Rust parser. Removed.
  bool parsing_fragment_;Dominik Röttsches```suggestion
const bool parsing_fragment_;
```
Done
Both of these are also unimplemented.
Removed.
  bool AppendFragmentSource(const String&);Dominik RöttschesAlso unimplemented.
Removed.
  void SetScriptStartPosition(TextPosition);Dominik RöttschesAlso unimplemented.
Removed.
  static bool SupportsXMLVersion(const String&);Dominik RöttschesAlso unimplemented.
Removed.
  static bool ParseDocumentFragment(const String&,
                                    DocumentFragment*,
                                    Element* parent,
                                    ParserContentPolicy,
                                    ExceptionState&);Dominik RöttschesAlso unimplemented.
Curiously the fact that we miss this implementation here seem to not have been hit by any tests so far. Either `DocumentFragment::ParseXML` is not exercised by any tests or might be unused.
Added a comment with a TODO to investigate that.
  bool IsCurrentlyParsing8BitChunk();Dominik RöttschesAlso unimplemented.
Sorry it fell on you to spot these. Removed.
  void SetIsXHTMLDocument(bool is_xhtml);
  bool IsXHTMLDocument() const;There's no implementation of these methods and they appear unused. (Why not inline like the current parser?) But maybe there should be?
Removed, to my knowledge this boolean only controlled whether extended XHTML entities should be supported. We decide this at the DocumentType stage and inform the Rust XML parser to include the extended entities. So the state of whether they should be supported is kept in the Rust side parser state.
// Copyright 2025 The Chromium Authors
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.If the origin of this file is a copy of `xml_document_parser.h` (I'm not sure whether that's the case), I think it should probably preserve the original license header. Same for the `.cc`.
I was considering what to do and did not arrive at a clear conclusion. I guess it's arguable that there are still enough traces of the general skeleton of logic which would mean the original license header should be kept. On the other hand, on the implementation side, not much any original meaningful logic of the parser library integration is left.
Changed to the original header plus adding a 2025 The Chromium Authors line.
One other thought:
Since you need to test and branch based on the flag at multiple points, it seems like it might be good to:
- add a bunch of `CHECK(RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParserRs`, either at the public entry points or the places where it calls into the rust-based XML parser
- add a bunch of `CHECK(!RuntimeEnabledFeatures::XMLParsingRustEnabled())` in `XMLDocumentParser`, either at the public entry points or the places where it calls into the libxml2 parser
just to make sure that you've gotten all of the entry points (and that we continue to do so until the flag is removed).
(I don't feel strongly about the exact approach (e.g., probably OK with `DCHECK()` instead of `CHECK()` if you feel it's justified) -- but we've certainly had cases in the past where we've found that a flag didn't enable all of what it was supposed to, and then fully enabling it took longer than expected -- so I think it would be good to be a bit more defensive here to ensure we don't do that here.)
This helped to find an issue with fragment parsing entry points, which I suggest to resolve in follow-up changes.
Added to public non-fragment constructor entry points, `parseAttributes(Rust)` and tried to add to `ParseDocumentFragment` and found that this still needs a Rust side impl.
IMO it makes sense to add this, also for further documenting how the usage / switching between the parsers is intended. However, adding the CHECKs itself is not necessarily a guarantee but shifts the problem to identifying the correct entry points.
There's the weird usage of `parseAttributes` for parsing the attributes of a XSLT PI.
Per the bullet points in https://google.github.io/styleguide/cppguide.html#Casting I think the preference is
```suggestion
return String::FromUTF8(std::string(str));
```
assuming it works. Though I'm wondering if there's a simpler or cheaper way. At least maybe if you can go through `std::string_view` rather than `std::string` you can save a copy? So maybe instead:
```suggestion
return String::FromUTF8(base::RustStrToStringView(str));
```
Nice, good idea, thanks.
// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
// from an XML/HTML DOCTYPE declaration.This doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.
Fully agree and that's what I intended to express with the comment in the next line. Rephrased comment to
```
// This is a workaround until we hopefully can add missing DOCTYPE 
// parsing functionality in upstream.
```
      "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"Whitespace isn't allowed within `<!DOCTYPE`
Regex adjusted.
  // 1. Regex for PUBLIC-style statement: Captures name, public id, system id.
  const char* kPublicRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)\\s+PUBLIC\\s+\"([^\"]*)\"\\s+\"([^\"]*)\"\\s*>"
      "$";
  // 2. Regex for SYSTEM or simple statement, captures name and system id.
  const char* kSystemRe =
      "^<\\s*!DOCTYPE\\s+(\\w+)(?:\\s+SYSTEM\\s+\"([^\"]*)\")?\\s*(?:\\[[^\\]]*"
      "\\])?\\s*>$";These strings would be more readable with C++ raw strings.
Good idea, transformed to raw strings with "regex" delimiter sequence.
          {g_xmlns_with_colon, AtomicString(RustStrToWtfString(prefix))}));Atomizing a string you're about to concatenate seems unnecessary.
```suggestion
{g_xmlns_with_colon, RustStrToWtfString(prefix)}));
```(I realize that's a problem in the current parser too.)
Removed inner `AtomicString` construction.
        Attribute(*parsed_name, AtomicString(RustStrToWtfString(uri))));```suggestion
Attribute(std::move(*parsed_name), AtomicString(RustStrToWtfString(uri))));
```
(I realize that's from the existing code, which does it in `CollectElementAttributes` but not `HandleNamespaceAttributes`.)
Done and harmonized in analog `CollectElementAttributes` function.
Probably preserve the existing `// FIXME: HTMLConstructionSite has a limit of 512, should these match?`
Done
preserve the code for the `WebFeature::kXMLDocument` use counter?
Yes, counter added.
Preserve the `// Don't execute scripts for document fragments.` comment?
Ok.
Do you know if it's ok that it does this even when `parser_paused_` is true? (I don't understand all the implications of the current code here...)
Yes, from what I know, this is ok. The architecture of the XML Rust parser is different. It is a pull parser where as `libxml2` seems to be a push model parser. Appending to the buffer here is an operation that does not drive parsing. Only `process_next_event` will trigger reads and consume the buffer.
Contrary to that, in the existing code there is a lot of complexity in keeping the parser from progressing when adding to the buffer. All the closure for `PendingCallbacks` need to be added there to complete received events later.
  GetDocument()->setXMLVersion(RustStrToWtfString(version),Do you need to check SupportsXMLVersion here? (Though the existing code has a comment that disagrees with the existing code!)
In particular, it seems like if you don't the `ASSERT_NO_EXCEPTION` might assert, unless the rust parser's internals have *already* checked the XML version.
Carried over logic from `XMLDocumentParser` to set a version only if it's 1.0 and ignore it otherwise. Copied the odd comment for posterity as well and added link to explaining Rust XML parser behavior.
  DummyExceptionStateForTesting exception_state;This should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)
This should probably use ASSERT_NO_EXCEPTION. The only reasons Document::createProcessingInstruction can throw are reasons that the rust XML parser should have already given errors on, I think.
Is that so? In my understanding the Rust XML parser would not validate a processing instruction inner prose, whereas we expect it to be well-formed in the sense of matching an attribute key-value list pattern, which is validated in `createProcessingInstruction` which then calls `parseAttributes` I think.
Do we even need to add the TODO here? The Rust XML parser will not support XSLT, so the only thing where this practically goes wrong in a relevant way is for an `<?xml-stylesheet>` processing instruction with a garbage attribute list. Do we want to throw for that?
Added comment:
```
// The only way this fails in a relevant way would be a processing instruction
// inner prose that does not match the style of an attribute list for a
// xml-stylesheet processing instruction. The result of that would be the
// style is not applied. We likely do not want to throw an exception for that.
```
What do you think?
  if (pi->IsCSS()) {
    saw_css_ = true;
  } else {
    return;
  }
  CheckIfBlockingStyleSheetAdded();```suggestion
if (pi->IsCSS()) {
saw_css_ = true;
CheckIfBlockingStyleSheetAdded();
}
```
Certainly.
  // TODO(https://crbug.com/441911594): Handle is attribute.
  // for (const auto& attr : prefixed_attributes) {
  //   if (attr.GetName() == html_names::kIsAttr) {
  //     is = attr.Value();
  //     break;
  //   }
  // }Any reason this wouldn't just work if you move it down to after `CollectElementAttributes` like it is in the existing code?
Activated this code, but I did not observe any test coverage so far. Modified the TODO to try and fix coverage.
Can you add a comment why `#[allow(unused)]` is needed + if this is expected to stay here forever or be fixed in the future?
PS. In case it helps, let me share that some guidance about unused-code-warnings have recently been added to https://chromium.googlesource.com/chromium/src/+/main/docs/rust/dev_experience_tips_and_tricks.md#suppressing-unused-code-warnings
Removed, I think this was a leftover from when I was defining the FFI but not yet using it in the C++ side.
Is `self.saw_error` the same as `self.error_details.is_some()`?
Good point, using your suggestion.
This LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.
I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.
                    let standalone = standalone
                        .map(|is_standalone| if is_standalone { 1 } else { 0 })
                        .unwrap_or_else(|| {
                            if xml_declaration_view.is_some() {
                                return -2; // <?xml instruction present, but
                                           // standalone not specified
                            }
                            -1 // No XML declaration at all
                        });Would it be worth adding a comment saying where those constants come from? I assume they correspond to some constants on the C++ side?
Fixed with migration to shared enum.
                        &mut attributes,
                        &mut new_namespaces,It seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?
I might be missing something here, but doesn't the `AttributesIterator`, which is
```
pub struct AttributesIterator<'a> {
attributes: std::slice::Iter<'a, OwnedAttribute>,
}
```
have a mutable inner state? And fn `attributes_next<'a>(attributes: &mut AttributesIterator<'a>,` needs the argument to be mutable to be able to start iterating over them?
fn get_last_event_position(read_state: &XmlReadState, row: &mut u64, col: &mut u64) -> bool {nit: Maybe name this `try_get_last_event_position` to indicate it might fail?
Renamed.
fn get_error_details(Dominik Röttschesnit: `try_get_error_details`?
Yes, renamed.
            .map_or(error_string.clone(), |pos| error_string[pos + 1..].to_string());nitty nit: IIUC this `clone` will execute unconditionally, even when the other branch/lamda will be taken. Can this be avoided?
Moved to `map_or_else`.
Why is `as u64` needed here? I assume that this is [`TextPosition::column`](https://docs.rs/xml/latest/xml/common/struct.TextPosition.html#method.column) which already returns a `u64`?
Removed, not sure why I had left that in there. I might have changed the FFI type at some point.
            Ok(StartElement { name: _, attributes, namespace: _ }) => {```suggestion
Ok(StartElement { attributes, .. }) => {
```
Done
                let mut ret_attributes: Vec<AttributeNameValue> = Vec::new();```suggestion
let mut ret_attributes: Vec<AttributeNameValue> = Vec::with_capacity(attributes.len());
```
Good idea, thanks. Done.
                        AttributeNameValue { q_name: q_name, value: attribute.value.clone() };```suggestion
AttributeNameValue { q_name, value: attribute.value.clone() };
```
Done
Is this needed? Does that indicate a bug in `cxx`?
Removed, see above, likely a leftover.
        fn saw_error(read_state: &XmlReadState) -> bool;LGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.
The name is consistent with the `saw_error_` that was ported from the original `XmlDocumentParser` code. Not that I am strongly attached to that, but IMHO it would look worse if we had both, `saw_error` and `has_error`.
enum StandaloneInfo {
  kStandaloneUnspecified = -2,
  kNoXmlDeclaration,
  kStandaloneNo,
  kStandaloneYes
};Can that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.
Good suggestion, I wasn't aware this existing in CXX. Moved to a shared enum. I had one challenge in the callbacks header file, but looks like the enum can be forward declared there.
Contains test files for the rust-xml virtual test suite which runs XML parsing through the Rust XML crate.Please wrap to 80 columns?
Can you consider mentioning something like?:
```
For more details see go/chrome_x_mitigation and/or https://crbug.com/441911594.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
| Code-Review | +1 | 
`third_party/blink/renderer/core/xml/parser/xml_ffi.rs` and `third_party/blink/renderer/core/xml/parser/entities.rs` LGTM (I shared a few additional minor feedback items below - PTAL).
pub struct XmlReadState<'a> {I think this `pub` may be removed - a child `ffi` module should be able to see this `struct` without `pub` (same for `AttributesIterator` and `NamespacesIterator`).
                    let version: &str = match version {
                        XmlVersion::Version10 => "1.0",
                        XmlVersion::Version11 => "1.1",
                    };Maybe in the future this can be replaced with `version.as_str()` - see https://github.com/kornelski/xml-rs/pull/54
Not sure if it's worth leaving a TODO for this. Up to you.
                        &mut attributes,
                        &mut new_namespaces,Dominik RöttschesIt seems that `&mut` here is unneeded, because the values will be discarded at the end of the scope. So I thought that I'd ask if this can be a TODO to change C++ callback declaration to take a const-ref instead?
I might be missing something here, but doesn't the `AttributesIterator`, which is
```
pub struct AttributesIterator<'a> {
attributes: std::slice::Iter<'a, OwnedAttribute>,
}
```
have a mutable inner state? And fn `attributes_next<'a>(attributes: &mut AttributesIterator<'a>,` needs the argument to be mutable to be able to start iterating over them?
You're right - I missed this. Thank you for explaining.
    let _ = read_state.event_reader.add_entities(entities::HTML5_MAP.iter().copied());I wonder if it would be a bit more readable if we used 1) `unwrap()` + a comment, or 2) `expect("Built-in entities shouldn't cause errors")`.
    pub struct AttributeNameValue {Is this `pub` needed?
        fn saw_error(read_state: &XmlReadState) -> bool;Dominik RöttschesLGTM, but I wonder if `has_error` would also be a good name? Sorry for bike-shedding :-/.
The name is consistent with the `saw_error_` that was ported from the original `XmlDocumentParser` code. Not that I am strongly attached to that, but IMHO it would look worse if we had both, `saw_error` and `has_error`.
Matching existing C++ names SGTM.
enum StandaloneInfo {
  kStandaloneUnspecified = -2,
  kNoXmlDeclaration,
  kStandaloneNo,
  kStandaloneYes
};Dominik RöttschesCan that be defined as [a shared enum](https://cxx.rs/shared.html#shared-structs-and-enums) in `#[cxx::bridge]`? This would avoid having to duplicate the numbers in `.rs` and in `.h` here.
Good suggestion, I wasn't aware this existing in CXX. Moved to a shared enum. I had one challenge in the callbacks header file, but looks like the enum can be forward declared there.
Ugh - yeah, the risk of circular dependencies around C++/Rust FFI is quite annoying at times. Although I guess that sometimes it forces a cleaner layering. I am glad you've been able to find a way to make this work.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
        prefix.empty() ? AtomicString(RustStrToWtfString(local_name))nit: We might want to add `RustStrToAtomicString()`.
# XSLT tests disabled with Rust XML parser, no support for XSLTDominik RöttschesDoes it mean we can't ship the Rust XML parser until we remove XSLT support completely?
I am investigating transition options: We could use it right away for SVG files or all non-XSLT cases if we are willing to run a bit of a scan up front. We could also potentially run it as an internal dogfood right away.
Expanding coverage incrementally sounds a very good approach.
    "bases": [Dominik RöttschesThis adds 1200+ tests, and it's too large for a single virtual test.
Can we reduce the number of tests to less than 200, or set up a dedicated build bot?
Dominik RöttschesChanged to a smaller set of 177 files as a basic sanity check. Will look into a separate build bot. Let me know if you have any pointers.
I found information on how to set up a separate trybot, but I would also like to explore starting to use the Rust XML parser for parts of real world workloads - which would make it part of mainline testing. I'll get back on that.
Ack.
        prefix.empty() ? AtomicString(RustStrToWtfString(local_name))nit: We might want to add `RustStrToAtomicString()`.
Added.
I think this `pub` may be removed - a child `ffi` module should be able to see this `struct` without `pub` (same for `AttributesIterator` and `NamespacesIterator`).
Done
                    let version: &str = match version {
                        XmlVersion::Version10 => "1.0",
                        XmlVersion::Version11 => "1.1",
                    };Maybe in the future this can be replaced with `version.as_str()` - see https://github.com/kornelski/xml-rs/pull/54
Not sure if it's worth leaving a TODO for this. Up to you.
TODO added.
    let _ = read_state.event_reader.add_entities(entities::HTML5_MAP.iter().copied());I wonder if it would be a bit more readable if we used 1) `unwrap()` + a comment, or 2) `expect("Built-in entities shouldn't cause errors")`.
Shortened this with an iteration of the entity maps and added `.expect()`s.
pub mod ffi {Dominik RöttschesIs this `pub` needed?
Removed.
    pub struct AttributeNameValue {Dominik RöttschesIs this `pub` needed?
Removed.
nit: looks an unnecessary change
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
| Commit-Queue | +1 | 
    let cursor = Cursor::new(Vec::new());Dominik RöttschesThis LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.
I think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.
@luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).
If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.
    let cursor = Cursor::new(Vec::new());Dominik RöttschesThis LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.
Dominik RöttschesI think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.
@luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).
If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.
Thanks for following up. Using `Vec` makes sense given your explanation. Maybe it's worth leaving a comment explaining why we can't use `VecDeque`?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
    let cursor = Cursor::new(Vec::new());Dominik RöttschesThis LGTM, but I wanted to share some notes/thoughts:
`XmlEventReader::new_with_config` ([docs here](https://docs.rs/xml/latest/xml/reader/struct.EventReader.html#method.new_with_config)) only needs `R: Read` (and doesn't need `R: Seek` provided by `Cursor`). So in theory after reading some bytes, the prefix of the vector can be discarded rather than kept in memory. OTOH, in my cursory search I didn't find anything that already works for this.
Hmm... I see that `VecDeque<u8>` implements `Read + Write`. So maybe we can use `VecDeque<u8>` instead of `Cursor<Vec<u8>>`? Fine to attempt this in a separate CL, because it may be a bit annoying to switch to `as_slices` instead of a single `as_slice`.
Dominik RöttschesI think I originally used Cursor to be able to append to the buffer when reads had already started in an earlier version of the code: https://chromium-review.googlesource.com/c/chromium/src/+/6828846/21/third_party/blink/renderer/core/xml/parser/xml_ffi.rs#44 - but then modified the append logic to get to the underlying storage of `EventReader`, so indeed, Cursor is no longer needed.
I briefly looked into making this change in this change, but I think this would complicate the review. I'll try to incorporate this into a follow-up.
Łukasz Anforowicz@luk...@chromium.org, I looked into this a bit more, but currently I am concerned the code for the `StartDocument` event would not work anymore, as there we do access the buffer at the event position, to find out whether there is actually an encoding in the source text (due to a shortcoming in the XML crate API).
If the read() calls on the `VecDeque` do consume the data - I don't think it would be possible to access that data anymore in this way.
Thanks for following up. Using `Vec` makes sense given your explanation. Maybe it's worth leaving a comment explaining why we can't use `VecDeque`?
Comment added, under the assumption you meant we keep `Vec` wrapped in `Cursor`.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. | 
OK, I think I've finally gotten through the whole thing. Here are my remaining comments.
// TODO(https://crbug.com/441911594): Extracts name, public ID, and system ID
// from an XML/HTML DOCTYPE declaration.Dominik RöttschesThis doesn't accept any doctype declarations that have an internal subset. That might be a problem. Maybe having a way of getting it out from the rust parser would be the best way to fix that.
Fully agree and that's what I intended to express with the comment in the next line. Rephrased comment to
```
// This is a workaround until we hopefully can add missing DOCTYPE
// parsing functionality in upstream.
```
But maybe explicitly note that it doesn't handle internal subsets?
  DummyExceptionStateForTesting exception_state;Dominik RöttschesThis should probably use `ASSERT_NO_EXCEPTION`. The only reasons `Document::createProcessingInstruction` can throw are reasons that the rust XML parser *should* have already given errors on, I think. (I realize this is an issue in the existing code. If you don't want to make this change, at least preserve the existing code's not-quite-TODO Comment about it.)
This should probably use ASSERT_NO_EXCEPTION. The only reasons Document::createProcessingInstruction can throw are reasons that the rust XML parser should have already given errors on, I think.
Is that so? In my understanding the Rust XML parser would not validate a processing instruction inner prose, whereas we expect it to be well-formed in the sense of matching an attribute key-value list pattern, which is validated in `createProcessingInstruction` which then calls `parseAttributes` I think.
Do we even need to add the TODO here? The Rust XML parser will not support XSLT, so the only thing where this practically goes wrong in a relevant way is for an `<?xml-stylesheet>` processing instruction with a garbage attribute list. Do we want to throw for that?
Added comment:
```
// The only way this fails in a relevant way would be a processing instruction
// inner prose that does not match the style of an attribute list for a
// xml-stylesheet processing instruction. The result of that would be the
// style is not applied. We likely do not want to throw an exception for that.
```What do you think?
The only reason that `Document::createProcessingInstruction` should throw is if:
I think the XML parser should have given an error for either of these.
  bool is_first_element = !saw_first_element_;
  saw_first_element_ = true;While it's fine given the current uses of this variable, maybe move this down to before `HandleNamespaceAttributes` like it is in the existing parser, just in case `saw_first_element_` ends up being used for something else outside this function?
  AtomicString is;Probably move this declaration down near the code that assigns it, like in the current parser?
  // empty NS url, resolve it against the initially preserved namespaceSo I think there's a problem here relative to the existing parser. The code in the existing parser distinguishes between a null string and an empty string, but this code doesn't.
That distinction is needed to maintain the difference between `<elem>` (which is a start tag for an element that is in the default namespace of its parent element) and `<elem xmlns="">` (which is the start tag for an element in no namespace that changes the default namespace to being no namespace). It seems like this code breaks that distinction in fragment parsing. (It wouldn't surprise me if there are no tests for this, but I think it is a real distinction.)
I'm not sure if it will be easier to fix by getting the null-versus-empty distinction out of the rust parser, or by setting up the namespaces inside of the rust parser for fragment parsing and dropping this fixup.
  // TODO(https://crbug.com/441911594): Test coverage for this?For what it's worth, I see a single test that I expect might exercise this code, and it's `third_party/blink/web_tests/external/wpt/custom-elements/parser/parser-uses-create-an-element-for-a-token-svg.svg`
(I didn't actually test this.)
  QualifiedName q_name(Two comments here, though I'm not sure whether either is a concrete problem:
1. In the existing parser there was some code to concatenate prefix and localname in certain cases and put it all back into the localname (but only if the adjusted_uri was empty and the prefix was not). I'm not sure why that code was there... but it's possible it's important?
2. I'm again a little worried about null-versus-empty distinctions here being different from the old parser and that distinction being important. Unlike the case above I don't have any specific examples in mind yet, though.
  // TODO(https://crbug.com/441911594): Port custom element support from
  // XMLDocumentParser.
This might also be needed for the one SVG custom elements test mentioned above. (Though the code in the existing parser was also a little higher up in the function, which might matter.)
  String cdata_string = RustStrToWtfString(data);I think it's probably stylistically better here to *not* have the `cdata_string` variable but instead just write it in the function call (like you do in the `Comment` function just below).
This is because if `CDataSection::Create` had a `String&&` overload, it would be more efficient since it would use that one. (It doesn't right now, but it probably should!) Using `std::move(cdata_string)` would have the same effect, but with more verbosity.
(I write this partly because reading this code led me to check to see whether there was a `String&&` version, whereas if I'd seen what I'm suggesting I wouldn't have felt the need to check.)
  Flush();I can't figure out what `Flush()` this is calling. Where is it?
  DCHECK(!IsDetached());The old parser didn't do anything here when `parsing_fragment_` was true -- is there a need for that?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |