Speed up ScriptRunIterator. [chromium/src : main]

0 views
Skip to first unread message

chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,
Jun 12, 2026, 10:14:46 AMJun 12
to Steinar H Gunderson, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
Attention needed from Steinar H Gunderson

Message from chrom...@appspot.gserviceaccount.com

📍 Job mac-m1_mini_2020-perf/speedometer3 complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1595e2f0c90000

Open in Gerrit

Related details

Attention is currently required from:
  • Steinar H Gunderson
Submit Requirements:
  • requirement satisfiedCode-Coverage
  • requirement is not satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
Gerrit-Change-Number: 7917954
Gerrit-PatchSet: 14
Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
Gerrit-CC: Stephen Chenney <sche...@chromium.org>
Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
Gerrit-Comment-Date: Fri, 12 Jun 2026 14:14:30 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,
Jun 12, 2026, 12:17:53 PMJun 12
to Steinar H Gunderson, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
Attention needed from Steinar H Gunderson

Message from chrom...@appspot.gserviceaccount.com

📍 Job mac-m1_mini_2020-perf/speedometer3 complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/10904294c90000

Open in Gerrit

Related details

Attention is currently required from:
  • Steinar H Gunderson
Submit Requirements:
  • requirement satisfiedCode-Coverage
  • requirement is not satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
Gerrit-Change-Number: 7917954
Gerrit-PatchSet: 15
Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
Gerrit-CC: Stephen Chenney <sche...@chromium.org>
Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
Gerrit-Comment-Date: Fri, 12 Jun 2026 16:17:41 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,
Jun 12, 2026, 2:37:26 PMJun 12
to Steinar H Gunderson, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
Attention needed from Steinar H Gunderson

Message from chrom...@appspot.gserviceaccount.com

📍 Job mac-m1_mini_2020-perf/speedometer3 complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/103a3748c90000

Open in Gerrit

Related details

Attention is currently required from:
  • Steinar H Gunderson
Submit Requirements:
  • requirement satisfiedCode-Coverage
  • requirement is not satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
Gerrit-Change-Number: 7917954
Gerrit-PatchSet: 19
Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
Gerrit-CC: Stephen Chenney <sche...@chromium.org>
Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
Gerrit-Comment-Date: Fri, 12 Jun 2026 18:37:14 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

Dominik Röttsches (Gerrit)

unread,
Jun 15, 2026, 6:26:37 AMJun 15
to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
Attention needed from Steinar H Gunderson

Dominik Röttsches added 3 comments

Commit Message
Line 17, Patchset 23 (Latest):again. Each character incurs overhead from calling ICU, resizing
vectors, intersecting and so on.
Dominik Röttsches . unresolved

Resizing vectors in this case are the vectors of possible scripts per character? In a separate change, or in this one, can we just keep these vectors pre-allocated or use a fixed maximum size std::array for them?

Line 20, Patchset 23 (Latest):Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move
Dominik Röttsches . unresolved

Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?

Can you define "Skippable" in Unicode terms?

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
Line 232, Patchset 23 (Latest):std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
Dominik Röttsches . unresolved

I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?

When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

Open in Gerrit

Related details

Attention is currently required from:
  • Steinar H Gunderson
Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 23
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Comment-Date: Mon, 15 Jun 2026 10:26:18 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Steinar H Gunderson (Gerrit)

    unread,
    Jun 15, 2026, 7:14:12 AMJun 15
    to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Dominik Röttsches

    Steinar H Gunderson added 2 comments

    Commit Message
    Line 17, Patchset 23 (Latest):again. Each character incurs overhead from calling ICU, resizing
    vectors, intersecting and so on.
    Dominik Röttsches . unresolved

    Resizing vectors in this case are the vectors of possible scripts per character? In a separate change, or in this one, can we just keep these vectors pre-allocated or use a fixed maximum size std::array for them?

    Steinar H Gunderson

    They are already pre-allocated to the best of our ability; we don't call malloc. But they cannot be fixed-size as long as we want to do stuff like push_front().

    Line 20, Patchset 23 (Latest):Add a fast path so that once we're down at a singleton script,
    we consult a bitmap (lazily built, kept per-process) of which
    code points are allowed in that script. This allows us to move
    Dominik Röttsches . unresolved

    Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?

    Can you define "Skippable" in Unicode terms?

    Steinar H Gunderson

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Dominik Röttsches
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 23
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Comment-Date: Mon, 15 Jun 2026 11:13:38 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dominik Röttsches (Gerrit)

    unread,
    Jun 15, 2026, 8:50:16 AMJun 15
    to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Steinar H Gunderson

    Dominik Röttsches added 7 comments

    Commit Message
    Line 20, Patchset 23 (Latest):Add a fast path so that once we're down at a singleton script,
    we consult a bitmap (lazily built, kept per-process) of which
    code points are allowed in that script. This allows us to move
    Dominik Röttsches . unresolved

    Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?

    Can you define "Skippable" in Unicode terms?

    Steinar H Gunderson

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

    Dominik Röttsches

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

    Can you say what you are looking for wrt. “Unicode terms”?

    Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?" - And it sounds like that description is similar to what you're saying “is the given script the primary script for the character, or in the character's list of script extensions”.

    My intention was to generally improve naming and clarity her and work towards a name more descriptive that "GetSkippable", by finding a more Unicode-oriented way to describe the function return values.

    Line 33, Patchset 23 (Latest): Editor-TipTap [ -5.8%, -5.5%]
    Dominik Röttsches . unresolved

    Does this mostly affect the Latin parts of Editor-TipTap? Or does this optimization hit the CJK part of Editor-TipTap, too? Or is Editor TipTap only the French Guttenberg text and the long programming code by now? I remember it had a CJK text, or did it?

    File third_party/blink/renderer/platform/fonts/script_run_iterator.h
    Line 136, Patchset 23 (Latest): virtual std::pair<const ScriptData::UnicodeBitSet*,
    const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

    File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
    Line 232, Patchset 23 (Latest):std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?

    When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

    Dominik Röttsches

    Thinking about this more, as the behavior is quite tightly coupled with this implementation, I'd say moving this to `wft/text` is optional. On the other hand, if we generate parts of the bitsets at compile time and would need a generator executable (like other character properties) then we better do that in the same platform/text/ where we also have `character_property_data_generator.cc`.

    Line 319, Patchset 23 (Latest):ALWAYS_INLINE static bool IsSet(
    Dominik Röttsches . unresolved

    Suggest to improve name, perhaps `MatchesScript`, `DoesNotIntroduceScriptBreak`, or something like that?

    Line 358, Patchset 23 (Latest): // If we're down to a singleton script, anything that's allowed in that
    Dominik Röttsches . unresolved

    Can this be moved into a helper function that is labelled as the optimized forward scan?

    Line 379, Patchset 23 (Latest): // SAFETY: We already tested that ahead_pos_ <= length_ above, and
    // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
    // at all times, and within text_[0..length) when we dereference it due to
    // the check (ptr != end) in the for loop.
    Dominik Röttsches . unresolved

    I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Steinar H Gunderson
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 23
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Comment-Date: Mon, 15 Jun 2026 12:49:55 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Dominik Röttsches <dr...@chromium.org>
    Comment-In-Reply-To: Steinar H Gunderson <se...@chromium.org>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Steinar H Gunderson (Gerrit)

    unread,
    Jun 15, 2026, 9:08:40 AMJun 15
    to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Dominik Röttsches

    Steinar H Gunderson added 6 comments

    Commit Message
    Line 20, Patchset 23 (Latest):Add a fast path so that once we're down at a singleton script,
    we consult a bitmap (lazily built, kept per-process) of which
    code points are allowed in that script. This allows us to move
    Dominik Röttsches . unresolved

    Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?

    Can you define "Skippable" in Unicode terms?

    Steinar H Gunderson

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

    Dominik Röttsches

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

    Can you say what you are looking for wrt. “Unicode terms”?

    Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?" - And it sounds like that description is similar to what you're saying “is the given script the primary script for the character, or in the character's list of script extensions”.

    My intention was to generally improve naming and clarity her and work towards a name more descriptive that "GetSkippable", by finding a more Unicode-oriented way to describe the function return values.

    Steinar H Gunderson

    Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

    To be clear, is this because you expect speed improvements from it? I can test that if you wish, although I'm not optimistic it will matter a lot.

    Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?"

    The problem is that it's not actually the truth; GetScripts() does a lot of fiddling with e.g. inherited characters, and we also have brackets to worry about. “skippable” is meant to be functional (what can the external caller use this for, not how is it built), since it's hard to make a short and accurate name.

    I mean, we can't call it GetRunesThatAreNotSurrogatesAndMatchOurPrimaryScriptOrExtensionScript(OrOtherTypesOfHana)AndAlsoCommonOrInheritedCharactersWithoutExtensionScripts() :-)

    What about something like GetSafeToExtendExistingRun(UCodeScript script)?

    Line 33, Patchset 23 (Latest): Editor-TipTap [ -5.8%, -5.5%]
    Dominik Röttsches . unresolved

    Does this mostly affect the Latin parts of Editor-TipTap? Or does this optimization hit the CJK part of Editor-TipTap, too? Or is Editor TipTap only the French Guttenberg text and the long programming code by now? I remember it had a CJK text, or did it?

    Steinar H Gunderson

    I don't know what goes into Editor-TipTap. On screen, it only shows Latin text, and I can only find that in resources/editors/longtext.html, but I can't guarantee that it doesn't try to shape CJK somewhere off-screen.

    I believe the optimization should work well for Chinese (which mainly consists of long strings of Han characters, AFAIK) but less so for Japanese (which frequently mixes kanji and hana), and I don't know enough about hangul in Unicode to say anything about Korean.

    File third_party/blink/renderer/platform/fonts/script_run_iterator.h
    Line 136, Patchset 23 (Latest): virtual std::pair<const ScriptData::UnicodeBitSet*,
    const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

    Steinar H Gunderson

    It's independent, but it's made on-the-fly because we don't need it earlier.

    The problem of not returning is; how would you ever do that? As long as ScriptData is a pure virtual, we need these kind of gymnastics to return anything. We could have two calls, of course, but I'm not sure if that's an improvement, since every caller ever would call both.

    File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
    Line 319, Patchset 23 (Latest):ALWAYS_INLINE static bool IsSet(
    Dominik Röttsches . unresolved

    Suggest to improve name, perhaps `MatchesScript`, `DoesNotIntroduceScriptBreak`, or something like that?

    Steinar H Gunderson

    No, that's wrong. Note that it's used for testing inherited_not_common_chars, too.

    Line 358, Patchset 23 (Latest): // If we're down to a singleton script, anything that's allowed in that
    Dominik Röttsches . unresolved

    Can this be moved into a helper function that is labelled as the optimized forward scan?

    Steinar H Gunderson

    I'm not sure if that makes sense; it's intimately tied to local variables in the function. What would the signature look like?

    Line 379, Patchset 23 (Latest): // SAFETY: We already tested that ahead_pos_ <= length_ above, and
    // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
    // at all times, and within text_[0..length) when we dereference it due to
    // the check (ptr != end) in the for loop.
    Dominik Röttsches . unresolved

    I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

    Steinar H Gunderson

    No, the point of SAFETY comments is to justify that the UNSAFE_BUFFERS() does not go out of bounds, nothing else should be there. I can add the part of hitting surrogates if you wish, but it does not belong in the SAFETY comment (it comes from the IsSet() test as you say).

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Dominik Röttsches
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 23
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Comment-Date: Mon, 15 Jun 2026 13:08:02 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dominik Röttsches (Gerrit)

    unread,
    Jun 15, 2026, 9:27:01 AMJun 15
    to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Steinar H Gunderson

    Dominik Röttsches added 3 comments

    Commit Message
    Line 20, Patchset 23 (Latest):Add a fast path so that once we're down at a singleton script,
    we consult a bitmap (lazily built, kept per-process) of which
    code points are allowed in that script. This allows us to move
    Dominik Röttsches . resolved

    Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?

    Can you define "Skippable" in Unicode terms?

    Steinar H Gunderson

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

    Dominik Röttsches

    We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

    Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

    Can you say what you are looking for wrt. “Unicode terms”?

    Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?" - And it sounds like that description is similar to what you're saying “is the given script the primary script for the character, or in the character's list of script extensions”.

    My intention was to generally improve naming and clarity her and work towards a name more descriptive that "GetSkippable", by finding a more Unicode-oriented way to describe the function return values.

    Steinar H Gunderson

    Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

    To be clear, is this because you expect speed improvements from it? I can test that if you wish, although I'm not optimistic it will matter a lot.

    Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?"

    The problem is that it's not actually the truth; GetScripts() does a lot of fiddling with e.g. inherited characters, and we also have brackets to worry about. “skippable” is meant to be functional (what can the external caller use this for, not how is it built), since it's hard to make a short and accurate name.

    I mean, we can't call it GetRunesThatAreNotSurrogatesAndMatchOurPrimaryScriptOrExtensionScript(OrOtherTypesOfHana)AndAlsoCommonOrInheritedCharactersWithoutExtensionScripts() :-)

    What about something like GetSafeToExtendExistingRun(UCodeScript script)?

    Dominik Röttsches

    To be clear, is this because you expect speed improvements from it? I can test that if you wish, although I'm not optimistic it will matter a lot.

    I think it's less wasteful if it's reasonably small and compile time doable. We do that for a set of other per-Character properties to move tham to a more digestible form for consumption, when ICU doesn't have the most efficient interface to access them at runime.

    GetSafeToExtendExistingRun(UCodeScript script)?

    WFM, thanks.

    File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
    Line 319, Patchset 23 (Latest):ALWAYS_INLINE static bool IsSet(
    Dominik Röttsches . resolved

    Suggest to improve name, perhaps `MatchesScript`, `DoesNotIntroduceScriptBreak`, or something like that?

    Steinar H Gunderson

    No, that's wrong. Note that it's used for testing inherited_not_common_chars, too.

    Dominik Röttsches

    Acknowledged

    Line 379, Patchset 23 (Latest): // SAFETY: We already tested that ahead_pos_ <= length_ above, and
    // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
    // at all times, and within text_[0..length) when we dereference it due to
    // the check (ptr != end) in the for loop.
    Dominik Röttsches . unresolved

    I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

    Steinar H Gunderson

    No, the point of SAFETY comments is to justify that the UNSAFE_BUFFERS() does not go out of bounds, nothing else should be there. I can add the part of hitting surrogates if you wish, but it does not belong in the SAFETY comment (it comes from the IsSet() test as you say).

    Dominik Röttsches

    Ok, let's add it outside of `SAFETY:`.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Steinar H Gunderson
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 23
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Comment-Date: Mon, 15 Jun 2026 13:26:43 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Steinar H Gunderson (Gerrit)

    unread,
    Jun 16, 2026, 3:15:03 AMJun 16
    to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Dominik Röttsches

    Steinar H Gunderson added 2 comments

    File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
    Line 232, Patchset 23:std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?

    When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

    Dominik Röttsches

    Thinking about this more, as the behavior is quite tightly coupled with this implementation, I'd say moving this to `wft/text` is optional. On the other hand, if we generate parts of the bitsets at compile time and would need a generator executable (like other character properties) then we better do that in the same platform/text/ where we also have `character_property_data_generator.cc`.

    Steinar H Gunderson

    I ran a test where I precompute the bit sets that we use in Speedometer:

      https://pinpoint-dot-chromeperf.appspot.com/job/16cec652c90000

    There is no win over making it on-the-fly as we do here.
    Line 379, Patchset 23: // SAFETY: We already tested that ahead_pos_ <= length_ above, and

    // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
    // at all times, and within text_[0..length) when we dereference it due to
    // the check (ptr != end) in the for loop.
    Dominik Röttsches . resolved

    I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

    Steinar H Gunderson

    No, the point of SAFETY comments is to justify that the UNSAFE_BUFFERS() does not go out of bounds, nothing else should be there. I can add the part of hitting surrogates if you wish, but it does not belong in the SAFETY comment (it comes from the IsSet() test as you say).

    Dominik Röttsches

    Ok, let's add it outside of `SAFETY:`.

    Steinar H Gunderson

    Done

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Dominik Röttsches
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 24
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Comment-Date: Tue, 16 Jun 2026 07:14:41 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dominik Röttsches (Gerrit)

    unread,
    Jun 16, 2026, 5:14:44 AMJun 16
    to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Steinar H Gunderson

    Dominik Röttsches added 3 comments

    File third_party/blink/renderer/platform/fonts/script_run_iterator.h
    Line 136, Patchset 23: virtual std::pair<const ScriptData::UnicodeBitSet*,
    const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

    Steinar H Gunderson

    It's independent, but it's made on-the-fly because we don't need it earlier.

    The problem of not returning is; how would you ever do that? As long as ScriptData is a pure virtual, we need these kind of gymnastics to return anything. We could have two calls, of course, but I'm not sure if that's an improvement, since every caller ever would call both.

    Dominik Röttsches

    I suggest at least a custom struct return to give them names. `std::pair` is used here as "two independent things", which is not descriptive in code.

    How about defining a type
    `struct RunExtensionLookups {
    const ScriptData::UnicodeBitSet can_remain_in_script_;
    const ScriptData::UnicodeBitSet inherited_not_common_;
    }` and using this as the return type.

    File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
    Line 232, Patchset 23:std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . resolved

    I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?

    When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

    Dominik Röttsches

    Thinking about this more, as the behavior is quite tightly coupled with this implementation, I'd say moving this to `wft/text` is optional. On the other hand, if we generate parts of the bitsets at compile time and would need a generator executable (like other character properties) then we better do that in the same platform/text/ where we also have `character_property_data_generator.cc`.

    Steinar H Gunderson

    I ran a test where I precompute the bit sets that we use in Speedometer:

      https://pinpoint-dot-chromeperf.appspot.com/job/16cec652c90000

    There is no win over making it on-the-fly as we do here.
    Dominik Röttsches

    Acknowledged

    File third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc
    Line 114, Patchset 24 (Latest): std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    Does this mean the optimization is switched off for testing? I believe we do need correctness coverage for the optimization as well. I am concerned otherwise it's too difficult to maintain correctness and performance for future developers treading here. Having two code paths increases the risk of diverging.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Steinar H Gunderson
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 24
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Comment-Date: Tue, 16 Jun 2026 09:14:28 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Steinar H Gunderson (Gerrit)

    unread,
    Jun 16, 2026, 5:33:34 AMJun 16
    to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Dominik Röttsches

    Steinar H Gunderson added 2 comments

    File third_party/blink/renderer/platform/fonts/script_run_iterator.h
    Line 136, Patchset 23: virtual std::pair<const ScriptData::UnicodeBitSet*,
    const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . resolved

    The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

    Steinar H Gunderson

    It's independent, but it's made on-the-fly because we don't need it earlier.

    The problem of not returning is; how would you ever do that? As long as ScriptData is a pure virtual, we need these kind of gymnastics to return anything. We could have two calls, of course, but I'm not sure if that's an improvement, since every caller ever would call both.

    Dominik Röttsches

    I suggest at least a custom struct return to give them names. `std::pair` is used here as "two independent things", which is not descriptive in code.

    How about defining a type
    `struct RunExtensionLookups {
    const ScriptData::UnicodeBitSet can_remain_in_script_;
    const ScriptData::UnicodeBitSet inherited_not_common_;
    }` and using this as the return type.

    Steinar H Gunderson

    Done

    File third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc
    Line 114, Patchset 24: std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . unresolved

    Does this mean the optimization is switched off for testing? I believe we do need correctness coverage for the optimization as well. I am concerned otherwise it's too difficult to maintain correctness and performance for future developers treading here. Having two code paths increases the risk of diverging.

    Steinar H Gunderson

    It's switched off for MockScriptIterator. It is used for all other tests (both unit tests and WPT tests). Most tests don't use MockScriptIterator.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Dominik Röttsches
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 25
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Dominik Röttsches <dr...@chromium.org>
    Gerrit-Comment-Date: Tue, 16 Jun 2026 09:33:16 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dominik Röttsches (Gerrit)

    unread,
    Jun 16, 2026, 5:36:14 AMJun 16
    to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org
    Attention needed from Steinar H Gunderson

    Dominik Röttsches voted and added 2 comments

    Votes added by Dominik Röttsches

    Code-Review+1

    2 comments

    Patchset-level comments
    File-level comment, Patchset 25 (Latest):
    Dominik Röttsches . resolved

    LGTM.

    File third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc
    Line 114, Patchset 24: std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>
    Dominik Röttsches . resolved

    Does this mean the optimization is switched off for testing? I believe we do need correctness coverage for the optimization as well. I am concerned otherwise it's too difficult to maintain correctness and performance for future developers treading here. Having two code paths increases the risk of diverging.

    Steinar H Gunderson

    It's switched off for MockScriptIterator. It is used for all other tests (both unit tests and WPT tests). Most tests don't use MockScriptIterator.

    Dominik Röttsches

    Ok.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Steinar H Gunderson
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement satisfiedCode-Owners
    • requirement satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
    Gerrit-Change-Number: 7917954
    Gerrit-PatchSet: 25
    Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
    Gerrit-CC: Dirk Schulze <dsch...@chromium.org>
    Gerrit-CC: Stephen Chenney <sche...@chromium.org>
    Gerrit-Attention: Steinar H Gunderson <se...@chromium.org>
    Gerrit-Comment-Date: Tue, 16 Jun 2026 09:35:57 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Steinar H Gunderson (Gerrit)

    unread,
    Jun 16, 2026, 5:41:49 AMJun 16
    to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

    Steinar H Gunderson voted and added 4 comments

    Votes added by Steinar H Gunderson

    Commit-Queue+2

    4 comments

    Patchset-level comments
    Steinar H Gunderson . resolved

    Closing remaining comments to submit.

    Commit Message
    Line 17, Patchset 23:again. Each character incurs overhead from calling ICU, resizing

    vectors, intersecting and so on.
    Dominik Röttsches . resolved

    Resizing vectors in this case are the vectors of possible scripts per character? In a separate change, or in this one, can we just keep these vectors pre-allocated or use a fixed maximum size std::array for them?

    Steinar H Gunderson

    They are already pre-allocated to the best of our ability; we don't call malloc. But they cannot be fixed-size as long as we want to do stuff like push_front().

    Steinar H Gunderson

    Acknowledged

    Line 33, Patchset 23: Editor-TipTap [ -5.8%, -5.5%]
    Dominik Röttsches . resolved

    Does this mostly affect the Latin parts of Editor-TipTap? Or does this optimization hit the CJK part of Editor-TipTap, too? Or is Editor TipTap only the French Guttenberg text and the long programming code by now? I remember it had a CJK text, or did it?

    Steinar H Gunderson

    I don't know what goes into Editor-TipTap. On screen, it only shows Latin text, and I can only find that in resources/editors/longtext.html, but I can't guarantee that it doesn't try to shape CJK somewhere off-screen.

    I believe the optimization should work well for Chinese (which mainly consists of long strings of Han characters, AFAIK) but less so for Japanese (which frequently mixes kanji and hana), and I don't know enough about hangul in Unicode to say anything about Korean.

    Steinar H Gunderson

    Acknowledged

    File third_party/blink/renderer/platform/fonts/script_run_iterator.cc
    Line 358, Patchset 23: // If we're down to a singleton script, anything that's allowed in that
    Dominik Röttsches . resolved

    Can this be moved into a helper function that is labelled as the optimized forward scan?

    Steinar H Gunderson

    I'm not sure if that makes sense; it's intimately tied to local variables in the function. What would the signature look like?

    Steinar H Gunderson

    Acknowledged

    Open in Gerrit

    Related details

    Attention set is empty
    Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Owners
      • requirement satisfiedCode-Review
      • requirement satisfiedReview-Enforcement
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
      Gerrit-Change-Number: 7917954
      Gerrit-PatchSet: 25
      Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
      Gerrit-Reviewer: Dominik Röttsches <dr...@chromium.org>
      Gerrit-Reviewer: Steinar H Gunderson <se...@chromium.org>
      Gerrit-Comment-Date: Tue, 16 Jun 2026 09:41:28 +0000
      satisfied_requirement
      open
      diffy

      Chromium LUCI CQ (Gerrit)

      unread,
      Jun 16, 2026, 6:42:05 AMJun 16
      to Steinar H Gunderson, Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

      Chromium LUCI CQ submitted the change

      Change information

      Commit message:
      Speed up ScriptRunIterator.

      ScriptRunIterator works by keeping a set of potential scripts
      for a given run, intersecting it with per-character sets as it goes
      until it's down to EOF or zero candidates (at which point it
      chooses a single candidate from the previous set, and ends
      the run). However, in the vast majority of cases, it will very
      quickly end up with a single candidate (e.g. “this text is
      definitely in the Latin script”) and just ends up checking
      that this script is valid for the next character, over and over

      again. Each character incurs overhead from calling ICU, resizing
      vectors, intersecting and so on.

      Add a fast path so that once we're down at a singleton script,
      we consult a bitmap (lazily built, kept per-process) of which
      code points are allowed in that script. This allows us to move
      through the text in a tight loop (only a single bitmap lookup
      per character, plus some bounds checking and loop overhead)
      at long as we don't see any non-compatible characters (that would
      have us end the run), brackets (which require special handling)
      or surrogates.

      Speedometer3 (M1 Pinpoint, LTO but no PGO, significant results
      at 99% CI only, lower is better except for Score):

      TodoMVC-Lit-Complex-DOM [ -0.5%, -0.1%]
      Editor-TipTap [ -5.8%, -5.5%]

      Score [ +0.2%, +0.5%]
      Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
      Commit-Queue: Steinar H Gunderson <se...@chromium.org>
      Reviewed-by: Dominik Röttsches <dr...@chromium.org>
      Cr-Commit-Position: refs/heads/main@{#1647446}
      Files:
      • M third_party/blink/renderer/platform/fonts/script_run_iterator.cc
      • M third_party/blink/renderer/platform/fonts/script_run_iterator.h
      • M third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc
      Change size: M
      Delta: 3 files changed, 218 insertions(+), 2 deletions(-)
      Branch: refs/heads/main
      Submit Requirements:
      • requirement satisfiedCode-Review: +1 by Dominik Röttsches
      Open in Gerrit
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: merged
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882
      Gerrit-Change-Number: 7917954
      Gerrit-PatchSet: 26
      Gerrit-Owner: Steinar H Gunderson <se...@chromium.org>
      Gerrit-Reviewer: Chromium LUCI CQ <chromiu...@luci-project-accounts.iam.gserviceaccount.com>
      open
      diffy
      satisfied_requirement

      luci-bisection@appspot.gserviceaccount.com (Gerrit)

      unread,
      Jun 16, 2026, 11:28:19 AMJun 16
      to Chromium LUCI CQ, Steinar H Gunderson, Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

      Related details

      Attention set is empty
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Owners
      • requirement satisfiedCode-Review
      • requirement satisfiedReview-Enforcement
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: revert
      satisfied_requirement
      open
      diffy
      Reply all
      Reply to author
      Forward
      0 new messages