Intent to Prototype: Japanese Phrase Line Breaking

262 views
Skip to first unread message

Koji Ishii

unread,
Jul 3, 2023, 1:52:55 AM7/3/23
to blink-dev, Myles C. Maxfield, Florian Rivoal, fantasai

Contact emails

ko...@chromium.org

Explainer

None

Specification

https://drafts.csswg.org/css-text-4/#word-boundaries

Design docs


https://docs.google.com/document/d/1QyPza8XS4aaYD-yA1MHYx56Hy7DZuEm9cAH-A6lTu8c/edit?usp=sharing

Summary

Changes the line breaking rules for Japanese to keep natural phrases (of multiple words) together, by using the AdaBoost ML technology to determine the natural phrase boundaries. In Japanese, this boundary is called "Bunsetu". In CSS, this feature adds a new value to the `word-break` property[1]: ``` word-break: auto; ``` [1] https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1611772475



Blink component

Blink>Layout>Inline

Motivation

None



Initial public proposal

None

TAG review

None

TAG review status

Not applicable

Risks



Interoperability and Compatibility



Gecko: No signal

WebKit: Positive (https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1586696215)

Web developers: Positive (https://github.com/google/budoux) The original JS/Python implementation has 940 stars and used by several sites. Searching "BudouX" hits ~8k as of June 2023. Also it's ported to rust/go/C++ and more, and the C++ port is integrated into ICU at https://unicode-org.atlassian.net/browse/ICU-22100

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?



Debuggability



Is this feature fully tested by web-platform-tests?

No. The exact boundaries are UA-dependent and that they are not testable, but CSS parser and some obvious cases are testable.

Flag name on chrome://flags



Finch feature name

None

Non-finch justification

None

Requires code in //chrome?

False

Tracking bug

https://bugs.chromium.org/p/chromium/issues/detail?id=1443291

Estimated milestones

No milestones specified



Anticipated spec changes

Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known github issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).

https://github.com/w3c/csswg-drafts/issues/7193 https://github.com/w3c/csswg-drafts/pull/8974

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5133892532568064

Links to previous Intent discussions



This intent message was generated by Chrome Platform Status.

fantasai

unread,
Jul 5, 2023, 8:20:03 PM7/5/23
to blin...@chromium.org
Prototyping seems fine, but be aware there's a very strong chance the syntax
for this will change, and that various edge cases and fallback behaviors will
be more strongly defined.

~fantasai

On 7/3/23 01:52, Koji Ishii wrote:
>
> Contact emails
>
> ko...@chromium.org <mailto:ko...@chromium.org>
>
>
> Explainer
>
> None
>
>
> Specification
>
> https://drafts.csswg.org/css-text-4/#word-boundaries
> <https://drafts.csswg.org/css-text-4/#word-boundaries>
>
>
> Design docs
>
>
> https://docs.google.com/document/d/1QyPza8XS4aaYD-yA1MHYx56Hy7DZuEm9cAH-A6lTu8c/edit?usp=sharing <https://docs.google.com/document/d/1QyPza8XS4aaYD-yA1MHYx56Hy7DZuEm9cAH-A6lTu8c/edit?usp=sharing>
>
>
> Summary
>
> Changes the line breaking rules for Japanese to keep natural phrases (of
> multiple words) together, by using the AdaBoost ML technology to determine the
> natural phrase boundaries. In Japanese, this boundary is called "Bunsetu". In
> CSS, this feature adds a new value to the `word-break` property[1]: ```
> word-break: auto; ``` [1]
> https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1611772475
> <https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1611772475>
>
>
>
> Blink component
>
> Blink>Layout>Inline
> <https://bugs.chromium.org/p/chromium/issues/list?q=component:Blink%3ELayout%3EInline>
>
>
> Motivation
>
> None
>
>
>
> Initial public proposal
>
> None
>
>
> TAG review
>
> None
>
>
> TAG review status
>
> Not applicable
>
>
> Risks
>
>
>
> Interoperability and Compatibility
>
>
>
> /Gecko/: No signal
>
> /WebKit/: Positive
> (https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1586696215
> <https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1586696215>)
>
> /Web developers/: Positive (https://github.com/google/budoux
> <https://github.com/google/budoux>) The original JS/Python implementation has
> 940 stars and used by several sites. Searching "BudouX" hits ~8k as of June
> 2023. Also it's ported to rust/go/C++ and more, and the C++ port is integrated
> into ICU at https://unicode-org.atlassian.net/browse/ICU-22100
> <https://unicode-org.atlassian.net/browse/ICU-22100>
>
> /Other signals/:
>
>
> WebView application risks
>
> Does this intent deprecate or change behavior of existing APIs, such that it
> has potentially high risk for Android WebView-based applications?
>
>
>
> Debuggability
>
>
>
> Is this feature fully tested by web-platform-tests
> <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>?
>
> No. The exact boundaries are UA-dependent and that they are not testable, but
> CSS parser and some obvious cases are testable.
>
>
> Flag name on chrome://flags
>
>
>
> Finch feature name
>
> None
>
>
> Non-finch justification
>
> None
>
>
> Requires code in //chrome?
>
> False
>
>
> Tracking bug
>
> https://bugs.chromium.org/p/chromium/issues/detail?id=1443291
> <https://bugs.chromium.org/p/chromium/issues/detail?id=1443291>
>
>
> Estimated milestones
>
> No milestones specified
>
>
>
> Anticipated spec changes
>
> Open questions about a feature may be a source of future web compat or interop
> issues. Please list open issues (e.g. links to known github issues in the
> project for the feature specification) whose resolution may introduce web
> compat/interop risk (e.g., changing to naming or structure of the API in a
> non-backward-compatible way).
>
> https://github.com/w3c/csswg-drafts/issues/7193
> <https://github.com/w3c/csswg-drafts/issues/7193>https://github.com/w3c/csswg-drafts/pull/8974 <https://github.com/w3c/csswg-drafts/pull/8974>
>
>
> Link to entry on the Chrome Platform Status
>
> https://chromestatus.com/feature/5133892532568064
> <https://chromestatus.com/feature/5133892532568064>
>
>
> Links to previous Intent discussions
>
>
>
> This intent message was generated by Chrome Platform Status
> <https://chromestatus.com/>.
>
> --
> You received this message because you are subscribed to the Google Groups
> "blink-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to blink-dev+...@chromium.org
> <mailto:blink-dev+...@chromium.org>.
> To view this discussion on the web visit
> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHe_1dJBouY10zVrouYbpGnokj65Jz4Qjuh3UMcS477u2Q9uqw%40mail.gmail.com <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHe_1dJBouY10zVrouYbpGnokj65Jz4Qjuh3UMcS477u2Q9uqw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Koji Ishii

unread,
Jul 6, 2023, 3:41:40 AM7/6/23
to fantasai, blin...@chromium.org, Myles C. Maxfield, Florian Rivoal
Thank you for the feedback.

Myles@webkit and I have started implementing this, but yes, we both are aware that the resolution:
> RESOLVED: remove auto () from word-boundary-detection, add keyword to word-break for this functionality

doesn't clearly say what the keyword name is, and the spec update is coming in a week:
> florian: so even though this is on my back burner, I will be able to within the week

Thank you for your support. We're happy to try to match the spec when it's updated.

There are some implementation challenges too, but if things go well, I hope I can ship this in Q3-Q4 this year.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/877984ef-0f87-7f9e-835b-976627badadb%40inkedblade.net.
Reply all
Reply to author
Forward
0 new messages