Intent to Ship: Japanese Phrase Line Breaking

350 views
Skip to first unread message

Koji Ishii

unread,
Sep 1, 2023, 2:00:12 AM9/1/23
to blink-dev, Ian Kilpatrick

Contact emails

ko...@chromium.org

Explainer

None

Specification

https://drafts.csswg.org/css-text-4/#valdef-word-break-auto-phrase

Design docs


https://docs.google.com/document/d/1QyPza8XS4aaYD-yA1MHYx56Hy7DZuEm9cAH-A6lTu8c/edit?usp=sharing

Summary

Changes the line breaking rules for Japanese to keep natural phrases (of multiple words) together. In Japanese, this boundary is called "Bunsetu". Japanese doesn't use spaces to delimit words, and usually prefers to break at any characters, but short paragraphs such as headlines prefer breaking at natural phrase boundaries. In CSS, this feature adds a new value to the `word-break` property: `auto-phrase`. The implementation uses a C++ port of the BudouX <https://github.com/google/budoux>, the AdaBoost ML technology to determine the natural phrase boundaries.



Blink component

Blink>Layout>Inline

TAG review

None

TAG review status

Not applicable

Risks



Interoperability and Compatibility



Gecko: No signal

WebKit: In development (https://bugs.webkit.org/show_bug.cgi?id=258668https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1586696215

Web developers: Positive (https://github.com/google/budoux) The original JS/Python implementation has 970 stars and is already used by several sites <https://github.com/google/budoux> A demo tweet <https://twitter.com/kojiishi/status/1687688315896733696> and its retweets <https://twitter.com/tushuhei/status/1693544644167266403> has 100 likes.

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?

No.



Debuggability



Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes

Is this feature fully tested by web-platform-tests?

Yes

Flag name on chrome://flags



Finch feature name



Non-finch justification

None

Requires code in //chrome?

False

Tracking bug

https://bugs.chromium.org/p/chromium/issues/detail?id=1443291

Sample links


https://github.com/google/budoux
https://google.github.io/budoux
https://twitter.com/kojiishi/status/1687688315896733696

Estimated milestones

Shipping on desktop119
Shipping on Android119
Shipping on WebView119


Anticipated spec changes

Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known github issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).



Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5133892532568064

Links to previous Intent discussions

Intent to prototype: https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHe_1dJBouY10zVrouYbpGnokj65Jz4Qjuh3UMcS477u2Q9uqw%40mail.gmail.com

This intent message was generated by Chrome Platform Status.

Mike Taylor

unread,
Sep 1, 2023, 10:53:45 AM9/1/23
to Koji Ishii, blink-dev, Ian Kilpatrick

On 9/1/23 12:59 AM, Koji Ishii wrote:

Contact emails

ko...@chromium.org

Explainer

None

Specification

https://drafts.csswg.org/css-text-4/#valdef-word-break-auto-phrase

Design docs


https://docs.google.com/document/d/1QyPza8XS4aaYD-yA1MHYx56Hy7DZuEm9cAH-A6lTu8c/edit?usp=sharing

Summary

Changes the line breaking rules for Japanese to keep natural phrases (of multiple words) together. In Japanese, this boundary is called "Bunsetu". Japanese doesn't use spaces to delimit words, and usually prefers to break at any characters, but short paragraphs such as headlines prefer breaking at natural phrase boundaries. In CSS, this feature adds a new value to the `word-break` property: `auto-phrase`. The implementation uses a C++ port of the BudouX <https://github.com/google/budoux>, the AdaBoost ML technology to determine the natural phrase boundaries.



Blink component

Blink>Layout>Inline

TAG review

None

TAG review status

Not applicable
Any reason to not request a TAG review?


Risks



Interoperability and Compatibility



Gecko: No signal
Can we request one?
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHe_1dKEQhh-Fa7WG_RWed8-ST74Oy_6KvwLnkKpwyau54fRAQ%40mail.gmail.com.

Mike Taylor

unread,
Sep 1, 2023, 10:56:50 AM9/1/23
to Koji Ishii, blink-dev, Ian Kilpatrick

On 9/1/23 9:53 AM, Mike Taylor wrote:

On 9/1/23 12:59 AM, Koji Ishii wrote:

Contact emails

ko...@chromium.org

Explainer

None

Specification

https://drafts.csswg.org/css-text-4/#valdef-word-break-auto-phrase

Design docs


https://docs.google.com/document/d/1QyPza8XS4aaYD-yA1MHYx56Hy7DZuEm9cAH-A6lTu8c/edit?usp=sharing

Summary

Changes the line breaking rules for Japanese to keep natural phrases (of multiple words) together. In Japanese, this boundary is called "Bunsetu". Japanese doesn't use spaces to delimit words, and usually prefers to break at any characters, but short paragraphs such as headlines prefer breaking at natural phrase boundaries. In CSS, this feature adds a new value to the `word-break` property: `auto-phrase`. The implementation uses a C++ port of the BudouX <https://github.com/google/budoux>, the AdaBoost ML technology to determine the natural phrase boundaries.



Blink component

Blink>Layout>Inline

TAG review

None

TAG review status

Not applicable
Any reason to not request a TAG review?

Koji Ishii

unread,
Sep 1, 2023, 11:29:19 AM9/1/23
to Mike Taylor, blink-dev, Ian Kilpatrick
On Fri, Sep 1, 2023 at 11:53 PM Mike Taylor <mike...@chromium.org> wrote:
Gecko: No signal
Can we request one?

Philip Jägenstedt

unread,
Sep 6, 2023, 12:01:03 PM9/6/23
to Koji Ishii, Mike Taylor, blink-dev, Ian Kilpatrick
Hi Koji,

It looks like the tests for this are here:

Since the implementation uses a heuristic and the spec doesn't define the precise rules, can you say something about the approach taken in the tests? Did you pick examples that are very easy to get right, so that we can expect all implementations to pass these tests?

There are still four tests failing, do you expect to fix those before shipping?

Best regards,
Philip

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Koji Ishii

unread,
Sep 6, 2023, 1:46:59 PM9/6/23
to Philip Jägenstedt, Mike Taylor, blink-dev, Ian Kilpatrick
Good point, thanks for asking.

Technically speaking, we can't write any tests because the language-specific content analysis is UA defined. Tests use common and easy words that most engines would analyze the same way, but you're right that we may need to modify tests if any engine analyzes them differently.

Tests contain that in comments, like this or this. I'm also talking to WebKit about tests, since there are currently two known engines; one in ICU and one in macOS/iOS. We'll work together to ensure words in tests are common and easy enough for both engines.

Four tests failing is strange, thanks for pointing them out too, they all pass in Chromium bots. I'll check them and make sure they're all green before shipping.

Koji Ishii

unread,
Sep 7, 2023, 5:14:08 AM9/7/23
to Philip Jägenstedt, Mike Taylor, blink-dev, Ian Kilpatrick
I've got two updates on the questions:

On Thu, Sep 7, 2023 at 2:46 AM Koji Ishii <ko...@chromium.org> wrote:
Good point, thanks for asking.

Technically speaking, we can't write any tests because the language-specific content analysis is UA defined. Tests use common and easy words that most engines would analyze the same way, but you're right that we may need to modify tests if any engine analyzes them differently.

Tests contain that in comments, like this or this. I'm also talking to WebKit about tests, since there are currently two known engines; one in ICU and one in macOS/iOS. We'll work together to ensure words in tests are common and easy enough for both engines.

In case this helps, the situation is the same as hyphenation tests. In the past, wpt tests needed updates when one implementation hyphenates differently from other browsers. I remember this happened at least once before.

Four tests failing is strange, thanks for pointing them out too, they all pass in Chromium bots. I'll check them and make sure they're all green before shipping.

I've figured them out, all fixes landed, I'll watch when wpt.fyi will be updated. One was a recent spec change I wasn't aware of, three were differences between wpt bots and chromium bots (#41851.)

Philip Jägenstedt

unread,
Sep 7, 2023, 10:01:39 AM9/7/23
to Koji Ishii, Mike Taylor, blink-dev, Ian Kilpatrick
Thanks for investigating and fixing the failures, Koji!

On the UA defined rules, if other vendors are happy with the examples used, then that's what matters in practice. If you do get pushback on specific examples I hope there are others that can be used that are a common ground.

I think everything looks good here, but since the TAG review and Mozilla issue were filed recently, I'd like to give those a bit more time.

Philip Jägenstedt

unread,
Sep 13, 2023, 11:22:33 AM9/13/23
to Koji Ishii, Mike Taylor, blink-dev, Ian Kilpatrick
LGTM1

If there is feedback on the TAG review or Mozilla issue while this feature is on its way to stable, can you loop back to this thread?

Daniel Bratell

unread,
Sep 13, 2023, 11:46:27 AM9/13/23
to Philip Jägenstedt, Koji Ishii, Mike Taylor, blink-dev, Ian Kilpatrick

LGTM2

/Daniel

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Mike Taylor

unread,
Sep 13, 2023, 7:04:49 PM9/13/23
to Daniel Bratell, Philip Jägenstedt, Koji Ishii, blink-dev, Ian Kilpatrick

LGTM3

Koji Ishii

unread,
Sep 14, 2023, 1:50:22 AM9/14/23
to Philip Jägenstedt, Mike Taylor, blink-dev, Ian Kilpatrick
On Thu, Sep 14, 2023 at 12:14 AM Philip Jägenstedt <foo...@chromium.org> wrote:
If there is feedback on the TAG review or Mozilla issue while this feature is on its way to stable, can you loop back to this thread?

Yes, I will. Thank you all.
Reply all
Reply to author
Forward
0 new messages