Add atomic string cache to speed up attribute value parsing [chromium/src : main]

0 views
Skip to first unread message

Bin Liao (Gerrit)

unread,
Mar 8, 2023, 6:45:33 PMMar 8
to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

View Change

1 comment:

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 4
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 23:45:15 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Bin Liao (Gerrit)

unread,
Mar 8, 2023, 6:46:06 PMMar 8
to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org

Bin Liao uploaded patch set #5 to this change.

View Change

Add atomic string cache to speed up attribute value parsing

The idea of this patch comes from WebCore
https://github.com/WebKit/WebKit/commit/86c93355e0e01ca5f382b1b6b125eeb9c6d34ac5 and https://github.com/WebKit/WebKit/commit/153929d8a8d479703a52fc8a86a8303b0ad64995
It caches attribute value instead of constructing AtomicString and implements a simple hash algorithm, it gets 0.7% progression report by pinpoint test on Speedometer2.0

Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
---
M third_party/blink/renderer/core/dom/document.cc
M third_party/blink/renderer/core/html/build.gni
M third_party/blink/renderer/core/html/parser/atomic_html_token.h
A third_party/blink/renderer/core/html/parser/atomic_string_cache.h
M third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc
M third_party/blink/renderer/core/html/parser/html_token.h
6 files changed, 114 insertions(+), 8 deletions(-)

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 5
Gerrit-MessageType: newpatchset

Bin Liao (Gerrit)

unread,
Mar 9, 2023, 2:19:57 AMMar 9
to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

View Change

1 comment:

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 5
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 07:19:41 +0000

Bin Liao (Gerrit)

unread,
Mar 9, 2023, 2:20:19 AMMar 9
to Mason Freed, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org

Attention is currently required from: Mason Freed.

Bin Liao would like Mason Freed to review this change.

View Change

Add atomic string cache to speed up attribute value parsing

The idea of this patch comes from WebCore
https://github.com/WebKit/WebKit/commit/86c93355e0e01ca5f382b1b6b125eeb9c6d34ac5 and https://github.com/WebKit/WebKit/commit/153929d8a8d479703a52fc8a86a8303b0ad64995
It caches attribute value instead of constructing AtomicString and implements a simple hash algorithm, it gets 0.7% progression report by pinpoint test on Speedometer2.0

Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
---
M third_party/blink/renderer/core/dom/document.cc
M third_party/blink/renderer/core/html/build.gni
M third_party/blink/renderer/core/html/parser/atomic_html_token.h
A third_party/blink/renderer/core/html/parser/atomic_string_cache.h
M third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc
M third_party/blink/renderer/core/html/parser/html_token.h
6 files changed, 114 insertions(+), 8 deletions(-)


To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 5
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Attention: Mason Freed <mas...@chromium.org>
Gerrit-MessageType: newchange

Mason Freed (Gerrit)

unread,
Mar 13, 2023, 1:31:24 PMMar 13
to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

Attention is currently required from: Bin Liao.

View Change

1 comment:

  • Patchset:

    • Patch Set #5:

      Sorry for the delay reviewing this one. Will get to it soon, sorry!

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 5
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Attention: Bin Liao <bin....@intel.com>
Gerrit-Comment-Date: Mon, 13 Mar 2023 17:31:11 +0000

Mason Freed (Gerrit)

unread,
Mar 13, 2023, 1:59:54 PMMar 13
to Scott Violet, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Bin Liao

Attention is currently required from: Bin Liao, Scott Violet.

Mason Freed would like Scott Violet to review this change authored by Bin Liao.

View Change

Add atomic string cache to speed up attribute value parsing

The idea of this patch comes from WebCore
https://github.com/WebKit/WebKit/commit/86c93355e0e01ca5f382b1b6b125eeb9c6d34ac5 and https://github.com/WebKit/WebKit/commit/153929d8a8d479703a52fc8a86a8303b0ad64995
It caches attribute value instead of constructing AtomicString and implements a simple hash algorithm, it gets 0.7% progression report by pinpoint test on Speedometer2.0

Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
---
M third_party/blink/renderer/core/dom/document.cc
M third_party/blink/renderer/core/html/build.gni
M third_party/blink/renderer/core/html/parser/atomic_html_token.h
A third_party/blink/renderer/core/html/parser/atomic_string_cache.h
M third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc
M third_party/blink/renderer/core/html/parser/html_token.h
6 files changed, 114 insertions(+), 8 deletions(-)


To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 5
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Reviewer: Scott Violet <s...@chromium.org>
Gerrit-Attention: Scott Violet <s...@chromium.org>
Gerrit-MessageType: newchange

Mason Freed (Gerrit)

unread,
Mar 13, 2023, 2:00:01 PMMar 13
to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

Attention is currently required from: Bin Liao, Scott Violet.

View Change

13 comments:

  • Commit Message:

    • Patch Set #5, Line 11: It caches attribute value instead of constructing AtomicString and implements a simple hash algorithm, it gets 0.7% progression report by pinpoint test on Speedometer2.0

      Is the 0.7% progression something you measured locally? Or is that from the WebKit patch?

      I think we'd want to run some of our own testing of this number. You could try running some pinpoints on speedometer, it should be able to see a 0.7% improvement, I'd hope.

  • Patchset:

    • Patch Set #5:

      Thanks for the patch! I added some comments. Also +sky@ - it'd be nice to get a second look.

  • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

    • Patch Set #5, Line 352:

          // The string pointer in |value| is null for attributes with no values, but
      // the null atom is used to represent absence of attributes; attributes with
      // no values have the value set to an empty atom instead.

      I think this comment now belongs in AtomicStringCache instead of here.

  • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

    • Patch Set #5, Line 9: HTMLAtomicStringCache

      This needs a comment explaining what it does, what it's used for, and most importantly, who owns this.

    • Patch Set #5, Line 29:

      kMaxStringLengthForCache = 36;
      static constexpr auto kCapacity = 512;

      Can you comment (and add a comment above them) on how you chose these two values? 36 seems like a very specific number.

    • Patch Set #5, Line 31: kCapacity

      Are we worried about memory? This will keep 512 strings alive for the life of the document, in the worst case. So I guess 18kB per document. Assuming ~10 frames, that's 180kB order of magnitude. Maybe ok, just asking to make sure.

    • Patch Set #5, Line 34: MakeAtomString

      Maybe `MakeCachedAtomicString()` instead? Or at least s/Atom/Atomic.

    • Patch Set #5, Line 50: AtomicString result(string.data(), static_cast<unsigned>(length));

      So this just keeps the latest attribute only, in the case of a cache collision, right? That'll be a perf regression in the case that two attributes that collide are used heavily on a page. Probably this is a very corner case.

    • Patch Set #5, Line 52: return result;

      You could eliminate this return, since it'll get returned below.

    • Patch Set #5, Line 34:

       ALWAYS_INLINE static AtomicString MakeAtomString(
      base::span<const CharacterType> string) {
      if (string.empty()) {
      return g_empty_atom;
      }

      auto length = string.size();
      if (length > kMaxStringLengthForCache) {
      return AtomicString(string.data(), static_cast<unsigned>(length));
      }

      auto first_character = string[0];
      auto last_character = string[length - 1];
      auto& slot = AtomicStringCacheSlot(first_character, last_character, length);

      if (!Equal(slot.Impl(), string.data(), static_cast<unsigned>(length))) {
      AtomicString result(string.data(), static_cast<unsigned>(length));
      slot = result;
      return result;
      }

      return slot;
      }


      It sucks to have two essentially-copy-paste versions of this. Any cool way (via more template magic) to combine all three versions into one?

    • Patch Set #5, Line 92:

       unsigned hash =
      (first_character << 6) ^ ((last_character << 14) ^ first_character);

      Can you comment (here and as a comment) how you chose this particular hash function?

  • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

    • Patch Set #5, Line 890:

          // The string pointer in |value| is null for attributes with no values, but
      // the null atom is used to represent absence of attributes; attributes with
      // no values have the value set to an empty atom instead.

      I think this comment, plus the code at 899 below, can be removed. that's now handled by MakeAttributeValue.

    • Patch Set #5, Line 899:

       if (value.IsNull()) {
      value = g_empty_atom;
      }

      You could convert this to

      `DCHECK(!value.IsNull());`

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 5
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Reviewer: Scott Violet <s...@chromium.org>
Gerrit-Attention: Bin Liao <bin....@intel.com>
Gerrit-Attention: Scott Violet <s...@chromium.org>
Gerrit-Comment-Date: Mon, 13 Mar 2023 17:59:51 +0000

Bin Liao (Gerrit)

unread,
Mar 14, 2023, 4:27:25 AMMar 14
to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

Attention is currently required from: Mason Freed, Scott Violet.

View Change

13 comments:

  • Commit Message:

    • Patch Set #5, Line 11: It caches attribute value instead of constructing AtomicString and implements a simple hash algorithm, it gets 0.7% progression report by pinpoint test on Speedometer2.0

    • Is the 0.7% progression something you measured locally? Or is that from the WebKit patch? […]

      This 0.7% improvement is reported by google pinpoint test of this patch.
      https://pinpoint-dot-chromeperf.appspot.com/job/15fa0fc0060000
      I also test it locally on intel platform, it also gets 0.6% ~ 0.8% progression on different devices.

  • Patchset:

  • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

    • Patch Set #5, Line 352:

          // The string pointer in |value| is null for attributes with no values, but
      // the null atom is used to represent absence of attributes; attributes with
      // no values have the value set to an empty atom instead.

      I think this comment now belongs in AtomicStringCache instead of here.

    • Done

  • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

    • This needs a comment explaining what it does, what it's used for, and most importantly, who owns thi […]

      Done

    • Can you comment (and add a comment above them) on how you chose these two values? 36 seems like a ve […]

      This value I used the same as WebCore, maybe this is a tuned by WebCore to get a good performance

    • Are we worried about memory? This will keep 512 strings alive for the life of the document, in the w […]

      Yeah, this compacts the memory, do we have any memory benchmark? Maybe a memory benchmark is needed here

    • Done

    • So this just keeps the latest attribute only, in the case of a cache collision, right? That'll be a […]

      Actually this keeps latest kCapacity(512) attributes here

    • Done

    • Patch Set #5, Line 34:

       ALWAYS_INLINE static AtomicString MakeAtomString(
      base::span<const CharacterType> string) {
      if (string.empty()) {
      return g_empty_atom;
      }

      auto length = string.size();
      if (length > kMaxStringLengthForCache) {
      return AtomicString(string.data(), static_cast<unsigned>(length));
      }

      auto first_character = string[0];
      auto last_character = string[length - 1];
      auto& slot = AtomicStringCacheSlot(first_character, last_character, length);

      if (!Equal(slot.Impl(), string.data(), static_cast<unsigned>(length))) {
      AtomicString result(string.data(), static_cast<unsigned>(length));
      slot = result;
      return result;
      }

      return slot;
      }


    • It sucks to have two essentially-copy-paste versions of this. […]

      Yes, this copy-paste code is really bad, I have tried, but it's really hard to user template to simplify these two functions, however, I will optimize the LChar and UChar vector to user literal buffer in fast path parser after this CL landed, so this function can be simplified.

    • Patch Set #5, Line 92:

       unsigned hash =
      (first_character << 6) ^ ((last_character << 14) ^ first_character);

      Can you comment (here and as a comment) how you chose this particular hash function?

    • This file is totally porting from WebCore, it claims that the default string hashing algorithm only barely outperforms
      this simple hash function on Speedometer (i.e., a cache hit rate of 99.24% using the default hash algorithm vs.
      99.15% using the "first/last character and length" hash)

  • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

    • Patch Set #5, Line 890:

          // The string pointer in |value| is null for attributes with no values, but
      // the null atom is used to represent absence of attributes; attributes with
      // no values have the value set to an empty atom instead.

    • I think this comment, plus the code at 899 below, can be removed. […]

      Done

    • You could convert this to […]

      Done

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 6
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Reviewer: Scott Violet <s...@chromium.org>
Gerrit-Attention: Scott Violet <s...@chromium.org>
Gerrit-Attention: Mason Freed <mas...@chromium.org>
Gerrit-Comment-Date: Tue, 14 Mar 2023 08:27:14 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Mason Freed <mas...@chromium.org>
Gerrit-MessageType: comment

Scott Violet (Gerrit)

unread,
Mar 14, 2023, 12:43:59 PMMar 14
to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

Attention is currently required from: Bin Liao, Mason Freed.

View Change

1 comment:

  • Patchset:

    • Patch Set #6:

      Bin, thanks for the patch. One comment on gaining confidence a patch improves things. Internal analysis has shown the m1 bots produce the most reliable data. I encourage you to use it for measuring impact of changes. I just started a pinpoint run for this.

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 6
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Reviewer: Scott Violet <s...@chromium.org>
Gerrit-Attention: Bin Liao <bin....@intel.com>
Gerrit-Attention: Mason Freed <mas...@chromium.org>
Gerrit-Comment-Date: Tue, 14 Mar 2023 16:43:47 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Scott Violet (Gerrit)

unread,
Mar 14, 2023, 12:56:08 PMMar 14
to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

Attention is currently required from: Bin Liao, Mason Freed.

View Change

3 comments:

  • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

    • Patch Set #6, Line 13:

       While parsing html attributes, it spents a nontrivial amount of time mapping
      // from UChar/LChar to AtomicString, and it's mainly because the AtomicString
      // constructor.
      // This class offers an AtomicString cache during attribute parsing, and
      // implements a simple & fast hash algorithm to distinguishable strings from
      // others.

      To be clear, all of this is still done. If this helps, it would be because attribute values are repeated, and the hash you chose is good for the common case.

    • Patch Set #6, Line 39:

      tic constexpr auto kMaxStringLengthForCache = 36;


    • static constexpr auto kCapacity = 512

    • o first_character = string[0];
      auto last_character = string[length - 1];

    • It would be good to document why we think this is a good hash choice.

To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
Gerrit-Change-Number: 4310894
Gerrit-PatchSet: 6
Gerrit-Owner: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Bin Liao <bin....@intel.com>
Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
Gerrit-Reviewer: Scott Violet <s...@chromium.org>
Gerrit-Attention: Bin Liao <bin....@intel.com>
Gerrit-Attention: Mason Freed <mas...@chromium.org>
Gerrit-Comment-Date: Tue, 14 Mar 2023 16:55:52 +0000

chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,
Mar 14, 2023, 2:19:44 PMMar 14
to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chromium...@chromium.org

Attention is currently required from: Bin Liao, Mason Freed.

📍 Job complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/14ebc88e060000

View Change

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 6
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Bin Liao <bin....@intel.com>
    Gerrit-Attention: Mason Freed <mas...@chromium.org>
    Gerrit-Comment-Date: Tue, 14 Mar 2023 18:19:31 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: No
    Gerrit-MessageType: comment

    Bin Liao (Gerrit)

    unread,
    Mar 14, 2023, 7:49:37 PMMar 14
    to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Mason Freed, Scott Violet.

    View Change

    4 comments:

    • Patchset:

      • Patch Set #6:

        Thanks very much for Scott's review, the m1 bot's pinpoint result also shows a 0.6% progression, it almost aligned with the winbot.

    • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

      • Patch Set #6, Line 13:

         While parsing html attributes, it spents a nontrivial amount of time mapping
        // from UChar/LChar to AtomicString, and it's mainly because the AtomicString
        // constructor.
        // This class offers an AtomicString cache during attribute parsing, and
        // implements a simple & fast hash algorithm to distinguishable strings from
        // others.

      • To be clear, all of this is still done. […]

        Yes, you description is much more precise, I will modify this in next patchset.

      • Patch Set #6, Line 39:

        tic constexpr auto kMaxStringLengthForCache = 36;
        static constexpr auto kCapacity = 512

        I'm curious how you arrived at these numbers?

      • Yeah, this file mainly comes from the WebCore, so I just use these numbers they used.

      • Patch Set #6, Line 57:

        o first_character = string[0];
        auto last_character = string[length - 1];

        It would be good to document why we think this is a good hash choice.

      • In WebCore patch, they claims that in terms
        of the cache hit rate in this AtomString cache, the default string hashing algorithm only barely outperforms


      • this simple hash function on Speedometer (i.e., a cache hit rate of 99.24% using the default hash algorithm vs.

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 6
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Mason Freed <mas...@chromium.org>
    Gerrit-Comment-Date: Tue, 14 Mar 2023 23:49:26 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Scott Violet <s...@chromium.org>
    Gerrit-MessageType: comment

    Scott Violet (Gerrit)

    unread,
    Mar 15, 2023, 11:13:21 AMMar 15
    to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Bin Liao, Mason Freed.

    View Change

    4 comments:

    • Patchset:

      • Patch Set #6:

        I'm ok with this, will lg once changes have been made.

    • File third_party/blink/renderer/core/dom/document.cc:

      • Patch Set #6, Line 2938:

        if (GetFrame()->IsMainFrame()) {
        HTMLAtomicStringCache::Clear();
        }

        Please document why this is done. This call really just releases the strings, and doesn't really shrink the size of the cache.

    • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

    • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 6
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Bin Liao <bin....@intel.com>
    Gerrit-Attention: Mason Freed <mas...@chromium.org>
    Gerrit-Comment-Date: Wed, 15 Mar 2023 15:13:13 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Gerrit-MessageType: comment

    Bin Liao (Gerrit)

    unread,
    Mar 15, 2023, 11:52:07 PMMar 15
    to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Mason Freed, Scott Violet.

    View Change

    6 comments:

    • Patchset:

    • File third_party/blink/renderer/core/dom/document.cc:

      • Please document why this is done. […]

        Done

    • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

      • Done

    • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

      • Patch Set #6, Line 13:

         While parsing html attributes, it spents a nontrivial amount of time mapping
        // from UChar/LChar to AtomicString, and it's mainly because the AtomicString
        // constructor.
        // This class offers an AtomicString cache during attribute parsing, and
        // implements a simple & fast hash algorithm to distinguishable strings from
        // others.

      • Yes, you description is much more precise, I will modify this in next patchset.

        Done

      • In WebCore patch, they claims that in terms […]

        Done

    • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

      • Done

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 7
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Mason Freed <mas...@chromium.org>
    Gerrit-Comment-Date: Thu, 16 Mar 2023 03:52:00 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Bin Liao <bin....@intel.com>

    Scott Violet (Gerrit)

    unread,
    Mar 16, 2023, 1:08:54 PMMar 16
    to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Tricium, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Bin Liao, Mason Freed.

    Patch set 8:Code-Review +1

    View Change

    5 comments:

    • File third_party/blink/renderer/core/dom/document.cc:

      • Done

        I tend to think the reason to do this is more of:

        "The cache of string values is very likely specific to a particular document. Empty it now to release any strings that may not be used by the next document."

        But a renderer may have multiple documents, in which case this is a bit wrong.

        I could go either way on whether this is worth it, and would likely not bother. Perhaps the right signal to use is low memory. I'm curious what Mason thinks here.

    • File third_party/blink/renderer/core/dom/document.cc:

    • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

      • Patch Set #8, Line 354:

        ttribute with no value will be set to a empty atom in
        // MakeAttributeValue.

        I think you mean "MakeAttributeValue() should never return a null value.".

    • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

      • Patch Set #6, Line 13:

         While parsing html attributes, it spents a nontrivial amount of time mapping
        // from UChar/LChar to AtomicString, and it's mainly because the AtomicString
        // constructor.
        // This class offers an AtomicString cache during attribute parsing, and
        // implements a simple & fast hash algorithm to distinguishable strings from
        // others.

      • Done

        How about something like:

        HTMLAtomicStringCache provides a fixed size cache of strings that is used during parsing, and specifically for attribute values. The cache lookup is cheap (much cheaper than AtomicString). This benefits parsing when the same attribute values are repeated.

    • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

      • Patch Set #8, Line 896:

        Attribute with no value will be set to a empty atom in
        // MakeAttributeValue.

        Similar suggestion to other place you copied this from.

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 8
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Bin Liao <bin....@intel.com>
    Gerrit-Attention: Mason Freed <mas...@chromium.org>
    Gerrit-Comment-Date: Thu, 16 Mar 2023 17:08:44 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes

    Mason Freed (Gerrit)

    unread,
    Mar 16, 2023, 7:55:47 PMMar 16
    to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Tricium, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Bin Liao.

    View Change

    1 comment:

    • Patchset:

      • Patch Set #8:

        Sorry for the delay, I've been swamped with other things. I'll definitely get a chance to review this tomorrow.

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 8
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Bin Liao <bin....@intel.com>
    Gerrit-Comment-Date: Thu, 16 Mar 2023 23:55:37 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Gerrit-MessageType: comment

    Bin Liao (Gerrit)

    unread,
    Mar 16, 2023, 9:57:40 PMMar 16
    to blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Tricium, Mason Freed, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Mason Freed.

    View Change

    7 comments:

    • Patchset:

      • Patch Set #8:

        Sorry for the delay, I've been swamped with other things. […]

        NP, also thanks very much for your response.

    • Patchset:

      • Patch Set #9:

        Thanks for Scott and Mason, for the Clear() function in Document::Shutdown, I think we can delete it. How about your ideas?

    • File third_party/blink/renderer/core/dom/document.cc:

      • I tend to think the reason to do this is more of: […]

        Actually, calling Clear function here is added by myself, the WebCore just call the clear function upon receiving a low memory warning, and upon top-level navigation. Thanks for Scott's careful review, I also think here is not needed.
        Even document A's string values are different from document B's values, the cache can also be updated.

    • File third_party/blink/renderer/core/dom/document.cc:

      • Done

    • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

      • Patch Set #8, Line 354:

        ttribute with no value will be set to a empty atom in
        // MakeAttributeValue.

        I think you mean "MakeAttributeValue() should never return a null value.".

      • Yes, this DCHECK is added by the suggestion of Mason, now I thinks is not needed here. Since I have document inner MakeAttributeValue.

    • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

      • Patch Set #6, Line 13:

         While parsing html attributes, it spents a nontrivial amount of time mapping
        // from UChar/LChar to AtomicString, and it's mainly because the AtomicString
        // constructor.
        // This class offers an AtomicString cache during attribute parsing, and
        // implements a simple & fast hash algorithm to distinguishable strings from
        // others.

      • How about something like: […]

        Thanks for this, this seems much more clear.

    • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

      • Patch Set #8, Line 896:

        Attribute with no value will be set to a empty atom in
        // MakeAttributeValue.

        Similar suggestion to other place you copied this from.

      • Done

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 9
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Mason Freed <mas...@chromium.org>
    Gerrit-Comment-Date: Fri, 17 Mar 2023 01:57:24 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Bin Liao <bin....@intel.com>
    Comment-In-Reply-To: Scott Violet <s...@chromium.org>

    Mason Freed (Gerrit)

    unread,
    Mar 17, 2023, 2:20:03 PMMar 17
    to Bin Liao, blink-re...@chromium.org, blink-rev...@chromium.org, blink-...@chromium.org, kinuko...@chromium.org, loading-rev...@chromium.org, Scott Violet, Tricium, Chromium LUCI CQ, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org

    Attention is currently required from: Bin Liao, Scott Violet.

    View Change

    11 comments:

    • Patchset:

      • Patch Set #9:

        This looks very close. Just a few more comments and questions. Thanks for all the changes, and for waiting for my very slow reviews. 😊

    • File third_party/blink/renderer/core/dom/document.cc:

      • Actually, calling Clear function here is added by myself, the WebCore just call the clear function u […]

        I think the logic for performance is likely right - better to not clear the cache here.

        I'm a little worried about some kind of information leakage side-channel attack, where you can see that some strings are faster in the new page, due to their being cached from the last page visited. But I actually think that isn't a concern since I'm sure (?) we throw away the document when we're doing any kind of origin change, right Scott?

        If the above isn't a concern, I'd say remove this `Clear()` call completely.

    • File third_party/blink/renderer/core/html/parser/atomic_html_token.h:

      • Done

        Shouldn't this DCHECK still be here, just with a comment? I think the reason is the same as this comment from below:
        ```
        // Attribute with no value will be set to a empty atom in
        // MakeAttributeValue.
        ```
        Perhaps since that comment is there, you could just use:
        ```
        DCHECK(!value.IsNull()) << "Attribute value should never be null";
        ```
        here and below?
    • File third_party/blink/renderer/core/html/parser/atomic_string_cache.h:

      • Done

        Perfect, thanks.

      • kMaxStringLengthForCache = 36;
        static constexpr auto kCapacity = 512;

      • This value I used the same as WebCore, maybe this is a tuned by WebCore to get a good performance

      • Ok. In that case, at the very least, can you please add a comment above them that says something like that ("values taken from WebCore") and link to the corresponding code from WebCore? At least then someone coming later can go dig to see whether these are worth changing.

      • Yeah, this compacts the memory, do we have any memory benchmark? Maybe a memory benchmark is needed […]

        We have some memory benchmarks, but they're very noisy. I suspect you wouldn't see this with those benchmarks. Let's just land this and see if you get a memory performance regression bug detected.

      • Actually this keeps latest kCapacity(512) attributes here

        Sorry, what I mean is `<div v1="abcde" v2="azyxe">` will have a cache collision on those two values, because they have the same first/last letter and length. The `slot` will be the same, and the second one will evict the first one. If that happens often on a particular page, then this caching code will likely be slower than the previous, uncached code.

        Again, likely not worth doing anything about, so ok.

      •  ALWAYS_INLINE static AtomicString MakeAtomString(
        base::span<const CharacterType> string) {
        if (string.empty()) {
        return g_empty_atom;
        }

        auto length = string.size();
        if (length > kMaxStringLengthForCache) {
        return AtomicString(string.data(), static_cast<unsigned>(length));
        }

      •     auto first_character = string[0];


      • auto last_character = string[length - 1];

      •     auto& slot = AtomicStringCacheSlot(first_character, last_character, length);

        if (!Equal(slot.Impl(), string.data(), static_cast<unsigned>(length))) {
        AtomicString result(string.data(), static_cast<unsigned>(length));
        slot = result;
        return result;
        }

        return slot;
        }


      • Yes, this copy-paste code is really bad, I have tried, but it's really hard to user template to simp […]

        Ok. Thanks for trying, I figured that's what you'd say.

      • This file is totally porting from WebCore, it claims that the default string hashing algorithm only […]

        That's fine, but like above, could you please add a comment to that effect?

    • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

      •  if (value.IsNull()) {
        value = g_empty_atom;
        }

      • Done

        I think maybe you forgot the `DCHECK`, right?

    • File third_party/blink/renderer/core/html/parser/html_document_parser_fastpath.cc:

      • Done

        Ditto above.

    To view, visit change 4310894. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I3275fbe1135c256b011c925ff81a0f258a0d0815
    Gerrit-Change-Number: 4310894
    Gerrit-PatchSet: 9
    Gerrit-Owner: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Bin Liao <bin....@intel.com>
    Gerrit-Reviewer: Mason Freed <mas...@chromium.org>
    Gerrit-Reviewer: Scott Violet <s...@chromium.org>
    Gerrit-Attention: Bin Liao <bin....@intel.com>
    Gerrit-Attention: Scott Violet <s...@chromium.org>
    Gerrit-Comment-Date: Fri, 17 Mar 2023 18:19:54 +0000

    Scott Violet (Gerrit)

    unread,
    Mar 17, 2023, 5:21:45 PMMar 17