chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,

Jun 12, 2026, 10:14:46 AMJun 12

to Steinar H Gunderson, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Message from chrom...@appspot.gserviceaccount.com

📍 Job mac-m1_mini_2020-perf/speedometer3 complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1595e2f0c90000

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,

Jun 12, 2026, 12:17:53 PMJun 12

to Steinar H Gunderson, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Message from chrom...@appspot.gserviceaccount.com

📍 Job mac-m1_mini_2020-perf/speedometer3 complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/10904294c90000

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

chromeperf@appspot.gserviceaccount.com (Gerrit)

unread,

Jun 12, 2026, 2:37:26 PMJun 12

to Steinar H Gunderson, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Message from chrom...@appspot.gserviceaccount.com

📍 Job mac-m1_mini_2020-perf/speedometer3 complete.

See results at: https://pinpoint-dot-chromeperf.appspot.com/job/103a3748c90000

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Jun 15, 2026, 6:26:37 AMJun 15

to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Dominik Röttsches added 3 comments

Commit Message

Line 17, Patchset 23 (Latest):

again. Each character incurs overhead from calling ICU, resizing
vectors, intersecting and so on.

Dominik Röttsches . unresolved

Resizing vectors in this case are the vectors of possible scripts per character? In a separate change, or in this one, can we just keep these vectors pre-allocated or use a fixed maximum size std::array for them?

Line 20, Patchset 23 (Latest):

Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move

Dominik Röttsches . unresolved

Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?

Can you define "Skippable" in Unicode terms?

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 232, Patchset 23 (Latest):std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?

When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Steinar H Gunderson (Gerrit)

unread,

Jun 15, 2026, 7:14:12 AMJun 15

to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Dominik Röttsches

Steinar H Gunderson added 2 comments

Commit Message

Line 17, Patchset 23 (Latest):

again. Each character incurs overhead from calling ICU, resizing
vectors, intersecting and so on.

Dominik Röttsches . unresolved

Resizing vectors in this case are the vectors of possible scripts per character? In a separate change, or in this one, can we just keep these vectors pre-allocated or use a fixed maximum size std::array for them?

Steinar H Gunderson

They are already pre-allocated to the best of our ability; we don't call malloc. But they cannot be fixed-size as long as we want to do stuff like push_front().

Line 20, Patchset 23 (Latest):

Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move

Dominik Röttsches . unresolved

Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?
Can you define "Skippable" in Unicode terms?

Steinar H Gunderson

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

Open in Gerrit

Related details

Attention is currently required from:

Dominik Röttsches

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Jun 15, 2026, 8:50:16 AMJun 15

to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Dominik Röttsches added 7 comments

Commit Message

Line 20, Patchset 23 (Latest):

Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move

Dominik Röttsches . unresolved

Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?
Can you define "Skippable" in Unicode terms?

Steinar H Gunderson

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.
Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

Dominik Röttsches

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.

Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

Can you say what you are looking for wrt. “Unicode terms”?

Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?" - And it sounds like that description is similar to what you're saying “is the given script the primary script for the character, or in the character's list of script extensions”.

My intention was to generally improve naming and clarity her and work towards a name more descriptive that "GetSkippable", by finding a more Unicode-oriented way to describe the function return values.

Line 33, Patchset 23 (Latest): Editor-TipTap [ -5.8%, -5.5%]

Dominik Röttsches . unresolved

Does this mostly affect the Latin parts of Editor-TipTap? Or does this optimization hit the CJK part of Editor-TipTap, too? Or is Editor TipTap only the French Guttenberg text and the long programming code by now? I remember it had a CJK text, or did it?

File third_party/blink/renderer/platform/fonts/script_run_iterator.h

Line 136, Patchset 23 (Latest):

  virtual std::pair<const ScriptData::UnicodeBitSet*,
                    const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 232, Patchset 23 (Latest):std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?
When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

Dominik Röttsches

Thinking about this more, as the behavior is quite tightly coupled with this implementation, I'd say moving this to `wft/text` is optional. On the other hand, if we generate parts of the bitsets at compile time and would need a generator executable (like other character properties) then we better do that in the same platform/text/ where we also have `character_property_data_generator.cc`.

Line 319, Patchset 23 (Latest):ALWAYS_INLINE static bool IsSet(

Dominik Röttsches . unresolved

Suggest to improve name, perhaps `MatchesScript`, `DoesNotIntroduceScriptBreak`, or something like that?

Line 358, Patchset 23 (Latest): // If we're down to a singleton script, anything that's allowed in that

Dominik Röttsches . unresolved

Can this be moved into a helper function that is labelled as the optimized forward scan?

Line 379, Patchset 23 (Latest):

      // SAFETY: We already tested that ahead_pos_ <= length_ above, and
      // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
      // at all times, and within text_[0..length) when we dereference it due to
      // the check (ptr != end) in the for loop.

Dominik Röttsches . unresolved

I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Steinar H Gunderson (Gerrit)

unread,

Jun 15, 2026, 9:08:40 AMJun 15

to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Dominik Röttsches

Steinar H Gunderson added 6 comments

Commit Message

Line 20, Patchset 23 (Latest):

Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move

Dominik Röttsches . unresolved

Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?
Can you define "Skippable" in Unicode terms?

Steinar H Gunderson

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.
Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

Dominik Röttsches

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.
Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.
Can you say what you are looking for wrt. “Unicode terms”?
Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?" - And it sounds like that description is similar to what you're saying “is the given script the primary script for the character, or in the character's list of script extensions”.
My intention was to generally improve naming and clarity her and work towards a name more descriptive that "GetSkippable", by finding a more Unicode-oriented way to describe the function return values.

Steinar H Gunderson

Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.

To be clear, is this because you expect speed improvements from it? I can test that if you wish, although I'm not optimistic it will matter a lot.

Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?"

The problem is that it's not actually the truth; GetScripts() does a lot of fiddling with e.g. inherited characters, and we also have brackets to worry about. “skippable” is meant to be functional (what can the external caller use this for, not how is it built), since it's hard to make a short and accurate name.

I mean, we can't call it GetRunesThatAreNotSurrogatesAndMatchOurPrimaryScriptOrExtensionScript(OrOtherTypesOfHana)AndAlsoCommonOrInheritedCharactersWithoutExtensionScripts() :-)

What about something like GetSafeToExtendExistingRun(UCodeScript script)?

Line 33, Patchset 23 (Latest): Editor-TipTap [ -5.8%, -5.5%]

Dominik Röttsches . unresolved

Does this mostly affect the Latin parts of Editor-TipTap? Or does this optimization hit the CJK part of Editor-TipTap, too? Or is Editor TipTap only the French Guttenberg text and the long programming code by now? I remember it had a CJK text, or did it?

Steinar H Gunderson

I don't know what goes into Editor-TipTap. On screen, it only shows Latin text, and I can only find that in resources/editors/longtext.html, but I can't guarantee that it doesn't try to shape CJK somewhere off-screen.

I believe the optimization should work well for Chinese (which mainly consists of long strings of Han characters, AFAIK) but less so for Japanese (which frequently mixes kanji and hana), and I don't know enough about hangul in Unicode to say anything about Korean.

File third_party/blink/renderer/platform/fonts/script_run_iterator.h

Line 136, Patchset 23 (Latest):

  virtual std::pair<const ScriptData::UnicodeBitSet*,
                    const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

Steinar H Gunderson

It's independent, but it's made on-the-fly because we don't need it earlier.

The problem of not returning is; how would you ever do that? As long as ScriptData is a pure virtual, we need these kind of gymnastics to return anything. We could have two calls, of course, but I'm not sure if that's an improvement, since every caller ever would call both.

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 319, Patchset 23 (Latest):ALWAYS_INLINE static bool IsSet(

Dominik Röttsches . unresolved

Suggest to improve name, perhaps `MatchesScript`, `DoesNotIntroduceScriptBreak`, or something like that?

Steinar H Gunderson

No, that's wrong. Note that it's used for testing inherited_not_common_chars, too.

Line 358, Patchset 23 (Latest): // If we're down to a singleton script, anything that's allowed in that

Dominik Röttsches . unresolved

Can this be moved into a helper function that is labelled as the optimized forward scan?

Steinar H Gunderson

I'm not sure if that makes sense; it's intimately tied to local variables in the function. What would the signature look like?

Line 379, Patchset 23 (Latest):

      // SAFETY: We already tested that ahead_pos_ <= length_ above, and
      // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
      // at all times, and within text_[0..length) when we dereference it due to
      // the check (ptr != end) in the for loop.

Dominik Röttsches . unresolved

I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

Steinar H Gunderson

No, the point of SAFETY comments is to justify that the UNSAFE_BUFFERS() does not go out of bounds, nothing else should be there. I can add the part of hitting surrogates if you wish, but it does not belong in the SAFETY comment (it comes from the IsSet() test as you say).

Open in Gerrit

Related details

Attention is currently required from:

Dominik Röttsches

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Jun 15, 2026, 9:27:01 AMJun 15

to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Dominik Röttsches added 3 comments

Commit Message

Line 20, Patchset 23 (Latest):

Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move

Dominik Röttsches . resolved

Can we potentially build this at compile time? I don't fully understand the definition of this map/bitmap yet. Can you define in Unicode terms what is in there? Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?
Can you define "Skippable" in Unicode terms?

Steinar H Gunderson

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.
Can you say what you are looking for wrt. “Unicode terms”? I don't know what terms you are looking for beyond scripts and script extensions. :-) It is mostly defined in terms of “does GetScripts() contain the script”, which is already poorly documented, but you could say something like “is the given script the primary script for the character, or in the character's list of script extensions”. Except that common and inherited have their own twists on top of that, which I have to emulate. The functional explanation is “if we know the previous run is in a given script, do we know for sure that this character can extend that run (and is not a bracket)”.

Dominik Röttsches

We can build it at compile time, assuming you're fine with shipping 0xD800 * 250 bits = ~1.7 MiB of extra data in the binary. There are a lot of scripts.
Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.
Can you say what you are looking for wrt. “Unicode terms”?
Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?" - And it sounds like that description is similar to what you're saying “is the given script the primary script for the character, or in the character's list of script extensions”.
My intention was to generally improve naming and clarity her and work towards a name more descriptive that "GetSkippable", by finding a more Unicode-oriented way to describe the function return values.

Steinar H Gunderson

Perhaps compile time compute for the most used scripts? Arab, Cyrl, Hani, Jpan, Kore, Latn (in alphabetic order)? But that can be a follow up.
To be clear, is this because you expect speed improvements from it? I can test that if you wish, although I'm not optimistic it will matter a lot.
Yes, in my feedback, I tried to draft such a description: "Does it describe the intersection of input script argument with the union of primary script and script extensions per codepoint in the bitset?"
The problem is that it's not actually the truth; GetScripts() does a lot of fiddling with e.g. inherited characters, and we also have brackets to worry about. “skippable” is meant to be functional (what can the external caller use this for, not how is it built), since it's hard to make a short and accurate name.
I mean, we can't call it GetRunesThatAreNotSurrogatesAndMatchOurPrimaryScriptOrExtensionScript(OrOtherTypesOfHana)AndAlsoCommonOrInheritedCharactersWithoutExtensionScripts() :-)
What about something like GetSafeToExtendExistingRun(UCodeScript script)?

Dominik Röttsches

To be clear, is this because you expect speed improvements from it? I can test that if you wish, although I'm not optimistic it will matter a lot.

I think it's less wasteful if it's reasonably small and compile time doable. We do that for a set of other per-Character properties to move tham to a more digestible form for consumption, when ICU doesn't have the most efficient interface to access them at runime.

GetSafeToExtendExistingRun(UCodeScript script)?

WFM, thanks.

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 319, Patchset 23 (Latest):ALWAYS_INLINE static bool IsSet(

Dominik Röttsches . resolved

Suggest to improve name, perhaps `MatchesScript`, `DoesNotIntroduceScriptBreak`, or something like that?

Steinar H Gunderson

No, that's wrong. Note that it's used for testing inherited_not_common_chars, too.

Dominik Röttsches

Acknowledged

Line 379, Patchset 23 (Latest):

      // SAFETY: We already tested that ahead_pos_ <= length_ above, and
      // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
      // at all times, and within text_[0..length) when we dereference it due to
      // the check (ptr != end) in the for loop.

Dominik Röttsches . unresolved

I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

Steinar H Gunderson

No, the point of SAFETY comments is to justify that the UNSAFE_BUFFERS() does not go out of bounds, nothing else should be there. I can add the part of hitting surrogates if you wish, but it does not belong in the SAFETY comment (it comes from the IsSet() test as you say).

Dominik Röttsches

Ok, let's add it outside of `SAFETY:`.

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Steinar H Gunderson (Gerrit)

unread,

Jun 16, 2026, 3:15:03 AMJun 16

to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Dominik Röttsches

Steinar H Gunderson added 2 comments

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 232, Patchset 23:std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?

When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

Dominik Röttsches

Thinking about this more, as the behavior is quite tightly coupled with this implementation, I'd say moving this to `wft/text` is optional. On the other hand, if we generate parts of the bitsets at compile time and would need a generator executable (like other character properties) then we better do that in the same platform/text/ where we also have `character_property_data_generator.cc`.

Steinar H Gunderson

I ran a test where I precompute the bit sets that we use in Speedometer:

  https://pinpoint-dot-chromeperf.appspot.com/job/16cec652c90000
  
There is no win over making it on-the-fly as we do here.

Line 379, Patchset 23: // SAFETY: We already tested that ahead_pos_ <= length_ above, and


      // ahead_pos_ is unsigned, so we know that ptr is within text_[0..length]
      // at all times, and within text_[0..length) when we dereference it due to
      // the check (ptr != end) in the for loop.

Dominik Röttsches . resolved

I think this SAFETY comment should discuss that the `++ptr` increment may lead to hitting a surrogate and that `IsSet` handles that?

Steinar H Gunderson

No, the point of SAFETY comments is to justify that the UNSAFE_BUFFERS() does not go out of bounds, nothing else should be there. I can add the part of hitting surrogates if you wish, but it does not belong in the SAFETY comment (it comes from the IsSet() test as you say).

Dominik Röttsches

Ok, let's add it outside of `SAFETY:`.

Steinar H Gunderson

Done

Open in Gerrit

Related details

Attention is currently required from:

Dominik Röttsches

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Jun 16, 2026, 5:14:44 AMJun 16

to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Dominik Röttsches added 3 comments

File third_party/blink/renderer/platform/fonts/script_run_iterator.h

Line 136, Patchset 23:

  virtual std::pair<const ScriptData::UnicodeBitSet*,
                    const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

Steinar H Gunderson

It's independent, but it's made on-the-fly because we don't need it earlier.
The problem of not returning is; how would you ever do that? As long as ScriptData is a pure virtual, we need these kind of gymnastics to return anything. We could have two calls, of course, but I'm not sure if that's an improvement, since every caller ever would call both.

Dominik Röttsches

I suggest at least a custom struct return to give them names. `std::pair` is used here as "two independent things", which is not descriptive in code.

How about defining a type
`struct RunExtensionLookups {
const ScriptData::UnicodeBitSet can_remain_in_script_;
const ScriptData::UnicodeBitSet inherited_not_common_;
}` and using this as the return type.

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 232, Patchset 23:std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . resolved

I would prefer if we could move this to `character.h`/`character.cc` or into something suitable in `wtf/text` and describe what it returns in Unicode terms, then use it from here. Reading the code below, it seems computable at compile time? Can the bitset be initialized with a constant?
When moving this to character/text - as the API, can we separate items returned in the pair? Is the second return value, `inherited_not_common` actually dependent on the input script?

Dominik Röttsches

Thinking about this more, as the behavior is quite tightly coupled with this implementation, I'd say moving this to `wft/text` is optional. On the other hand, if we generate parts of the bitsets at compile time and would need a generator executable (like other character properties) then we better do that in the same platform/text/ where we also have `character_property_data_generator.cc`.

Steinar H Gunderson

I ran a test where I precompute the bit sets that we use in Speedometer:
  https://pinpoint-dot-chromeperf.appspot.com/job/16cec652c90000
  
There is no win over making it on-the-fly as we do here.

Dominik Röttsches

Acknowledged

File third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc

Line 114, Patchset 24 (Latest): std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

Does this mean the optimization is switched off for testing? I believe we do need correctness coverage for the optimization as well. I am concerned otherwise it's too difficult to maintain correctness and performance for future developers treading here. Having two code paths increases the risk of diverging.

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Steinar H Gunderson (Gerrit)

unread,

Jun 16, 2026, 5:33:34 AMJun 16

to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Dominik Röttsches

Steinar H Gunderson added 2 comments

File third_party/blink/renderer/platform/fonts/script_run_iterator.h

Line 136, Patchset 23:

  virtual std::pair<const ScriptData::UnicodeBitSet*,
                    const ScriptData::UnicodeBitSet*>

Dominik Röttsches . resolved

The return values are only discenible by reading the comment. I suggest to make this a struct, or preferred also compute `inherited_not_common_chars_` if it is really independent of `UScriptCode script` input argument.

Steinar H Gunderson

It's independent, but it's made on-the-fly because we don't need it earlier.
The problem of not returning is; how would you ever do that? As long as ScriptData is a pure virtual, we need these kind of gymnastics to return anything. We could have two calls, of course, but I'm not sure if that's an improvement, since every caller ever would call both.

Dominik Röttsches

I suggest at least a custom struct return to give them names. `std::pair` is used here as "two independent things", which is not descriptive in code.
How about defining a type
`struct RunExtensionLookups {
const ScriptData::UnicodeBitSet can_remain_in_script_;
const ScriptData::UnicodeBitSet inherited_not_common_;
}` and using this as the return type.

Steinar H Gunderson

Done

File third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc

Line 114, Patchset 24: std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . unresolved

Does this mean the optimization is switched off for testing? I believe we do need correctness coverage for the optimization as well. I am concerned otherwise it's too difficult to maintain correctness and performance for future developers treading here. Having two code paths increases the risk of diverging.

Steinar H Gunderson

It's switched off for MockScriptIterator. It is used for all other tests (both unit tests and WPT tests). Most tests don't use MockScriptIterator.

Open in Gerrit

Related details

Attention is currently required from:

Dominik Röttsches

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Dominik Röttsches (Gerrit)

unread,

Jun 16, 2026, 5:36:14 AMJun 16

to Steinar H Gunderson, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Attention needed from Steinar H Gunderson

Dominik Röttsches voted and added 2 comments

Votes added by Dominik Röttsches

Code-Review

+1

2 comments

Patchset-level comments

File-level comment, Patchset 25 (Latest):

Dominik Röttsches . resolved

LGTM.

File third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc

Line 114, Patchset 24: std::pair<const ScriptData::UnicodeBitSet*, const ScriptData::UnicodeBitSet*>

Dominik Röttsches . resolved

Does this mean the optimization is switched off for testing? I believe we do need correctness coverage for the optimization as well. I am concerned otherwise it's too difficult to maintain correctness and performance for future developers treading here. Having two code paths increases the risk of diverging.

Steinar H Gunderson

It's switched off for MockScriptIterator. It is used for all other tests (both unit tests and WPT tests). Most tests don't use MockScriptIterator.

Dominik Röttsches

Ok.

Open in Gerrit

Related details

Attention is currently required from:

Steinar H Gunderson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Steinar H Gunderson (Gerrit)

unread,

Jun 16, 2026, 5:41:49 AMJun 16

to Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Steinar H Gunderson voted and added 4 comments

Votes added by Steinar H Gunderson

Commit-Queue

+2

4 comments

Patchset-level comments

File-level comment, Patchset 25 (Latest):

Steinar H Gunderson . resolved

Closing remaining comments to submit.

Commit Message

Line 17, Patchset 23:again. Each character incurs overhead from calling ICU, resizing


vectors, intersecting and so on.

Dominik Röttsches . resolved

Resizing vectors in this case are the vectors of possible scripts per character? In a separate change, or in this one, can we just keep these vectors pre-allocated or use a fixed maximum size std::array for them?

Steinar H Gunderson

They are already pre-allocated to the best of our ability; we don't call malloc. But they cannot be fixed-size as long as we want to do stuff like push_front().

Steinar H Gunderson

Acknowledged

Line 33, Patchset 23: Editor-TipTap [ -5.8%, -5.5%]

Dominik Röttsches . resolved

Does this mostly affect the Latin parts of Editor-TipTap? Or does this optimization hit the CJK part of Editor-TipTap, too? Or is Editor TipTap only the French Guttenberg text and the long programming code by now? I remember it had a CJK text, or did it?

Steinar H Gunderson

I don't know what goes into Editor-TipTap. On screen, it only shows Latin text, and I can only find that in resources/editors/longtext.html, but I can't guarantee that it doesn't try to shape CJK somewhere off-screen.
I believe the optimization should work well for Chinese (which mainly consists of long strings of Han characters, AFAIK) but less so for Japanese (which frequently mixes kanji and hana), and I don't know enough about hangul in Unicode to say anything about Korean.

Steinar H Gunderson

Acknowledged

File third_party/blink/renderer/platform/fonts/script_run_iterator.cc

Line 358, Patchset 23: // If we're down to a singleton script, anything that's allowed in that

Dominik Röttsches . resolved

Can this be moved into a helper function that is labelled as the optimized forward scan?

Steinar H Gunderson

I'm not sure if that makes sense; it's intimately tied to local variables in the function. What would the signature look like?

Steinar H Gunderson

Acknowledged

Open in Gerrit

Related details

Attention set is empty

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

open

diffy

Chromium LUCI CQ (Gerrit)

unread,

Jun 16, 2026, 6:42:05 AMJun 16

to Steinar H Gunderson, Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

Chromium LUCI CQ submitted the change

Change information

Commit message:

Speed up ScriptRunIterator.

ScriptRunIterator works by keeping a set of potential scripts
for a given run, intersecting it with per-character sets as it goes
until it's down to EOF or zero candidates (at which point it
chooses a single candidate from the previous set, and ends
the run). However, in the vast majority of cases, it will very
quickly end up with a single candidate (e.g. “this text is
definitely in the Latin script”) and just ends up checking
that this script is valid for the next character, over and over


again. Each character incurs overhead from calling ICU, resizing
vectors, intersecting and so on.

Add a fast path so that once we're down at a singleton script,
we consult a bitmap (lazily built, kept per-process) of which
code points are allowed in that script. This allows us to move

through the text in a tight loop (only a single bitmap lookup
per character, plus some bounds checking and loop overhead)
at long as we don't see any non-compatible characters (that would
have us end the run), brackets (which require special handling)
or surrogates.

Speedometer3 (M1 Pinpoint, LTO but no PGO, significant results
at 99% CI only, lower is better except for Score):

  TodoMVC-Lit-Complex-DOM                     [ -0.5%,  -0.1%]
  Editor-TipTap                               [ -5.8%,  -5.5%]

  Score                                       [ +0.2%,  +0.5%]

Change-Id: I30bb98f717a2b8ab97f2300e533c355af10bf882

Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7917954

Commit-Queue: Steinar H Gunderson <se...@chromium.org>

Reviewed-by: Dominik Röttsches <dr...@chromium.org>

Cr-Commit-Position: refs/heads/main@{#1647446}

Files:

M third_party/blink/renderer/platform/fonts/script_run_iterator.cc
M third_party/blink/renderer/platform/fonts/script_run_iterator.h
M third_party/blink/renderer/platform/fonts/script_run_iterator_test.cc

Change size: M

Delta: 3 files changed, 218 insertions(+), 2 deletions(-)

Branch: refs/heads/main

Submit Requirements:

Code-Review: +1 by Dominik Röttsches

Open in Gerrit

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

open

diffy

satisfied_requirement

luci-bisection@appspot.gserviceaccount.com (Gerrit)

unread,

Jun 16, 2026, 11:28:19 AMJun 16

to Chromium LUCI CQ, Steinar H Gunderson, Dominik Röttsches, chrom...@appspot.gserviceaccount.com, chromium...@chromium.org, Dirk Schulze, Stephen Chenney, blink-reviews-p...@chromium.org, blink-...@chromium.org, drott+bl...@chromium.org, fmalit...@chromium.org, fserb...@chromium.org, kinuko...@chromium.org

luci-bi...@appspot.gserviceaccount.com has created a revert of this change

Open original change

Related details

Attention set is empty

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

open

diffy

Reply all

Reply to author

Forward

Speed up ScriptRunIterator. [chromium/src : main]

chromeperf@appspot.gserviceaccount.com (Gerrit)

Message from chrom...@appspot.gserviceaccount.com

Related details

chromeperf@appspot.gserviceaccount.com (Gerrit)

Message from chrom...@appspot.gserviceaccount.com

Related details

chromeperf@appspot.gserviceaccount.com (Gerrit)

Message from chrom...@appspot.gserviceaccount.com

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 3 comments

Related details

Steinar H Gunderson (Gerrit)

Steinar H Gunderson added 2 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 7 comments

Related details

Steinar H Gunderson (Gerrit)

Steinar H Gunderson added 6 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 3 comments

Related details

Steinar H Gunderson (Gerrit)

Steinar H Gunderson added 2 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches added 3 comments

Related details

Steinar H Gunderson (Gerrit)

Steinar H Gunderson added 2 comments

Related details

Dominik Röttsches (Gerrit)

Dominik Röttsches voted and added 2 comments

Votes added by Dominik Röttsches

2 comments

Related details

Steinar H Gunderson (Gerrit)

Steinar H Gunderson voted and added 4 comments

Votes added by Steinar H Gunderson

4 comments

Related details

Chromium LUCI CQ (Gerrit)

Chromium LUCI CQ submitted the change

Change information

luci-bisection@appspot.gserviceaccount.com (Gerrit)

luci-bi...@appspot.gserviceaccount.com has created a revert of this change

Related details