Apologies, I wasn't aware the PSA guidance has changed. I've created
https://chromestatus.com/feature/5168429241204736 to track the change.
As to an intent, I believed the breakage risk to be small so thought PSA was sufficient. I think so since we already can perform this kind of merge (from
https://crrev.com/2fdffd306d488, granted, the case that CL merges is very unlikely to happen). Additionally, if this case does get hit, the resulting data would still be consistent, applications are likely to concatenate the CDATA sections anyway.
But I didn't have hard data so I did some digging today:
Via the XMLDocument UseCounter, 0.18% of page loads include an XMLDocument (this is the *upper bound* on potentially affected page loads).
I went looking through HTTPArchive data using the `requests` and `resposne_bodies` tables from 2023_08_01_desktop. I filtered requests to:
Content-Type header with "application/xhtml+xml"
request_type "Document"
Which I believe is the case where XMLDocument is used.
I then looked at the response bodies for those requests looking for adjacent CDATA sections: looking for instances of the string "]]><![CDATA[" and found none. (note, any kind of space, comment, other text would insert a node between them and thus prevent the merge).
Certainly not definitive but I expect this to be encountered very rarely and even when encountered, it's unlikely to cause an issue unless the page is using some kind of reflection (e.g. looking at `node.innerHTML` and then relying on there being separate CDATA sections for some reason).
Of all the cases I've seen in HTTPArchive, CDATA is used simply to ensure that <script> and <style> content is parsed correctly when loaded as XHTML.
Given all this, I think breakage is very unlikely (and we have a flag guard just in case) so I think PSA is enough but if anyone thinks a full intent is justified please let me know.
Thanks,
David