Dear ICU team & users,
Since ICU 2.8, the implementation of line breaking in ICU has reported statuses UBRK_LINE_HARD=100 at hard line breaks and UBRK_LINE_SOFT at break opportunities.
The definition of hard line breaks used by ICU matches rules
LB4 and LB5 of the Unicode line breaking algorithm.
However, there has been a discrepancy with rule
LB3: ICU only reports the end of text as a hard line break if there is a line terminator at the end of text.
The ICU-TC resolved on 2026-05-14 to accept
ICU-23401 to fix this issue: starting in ICU 79, line breaking will always have status UBRK_LINE_HARD at the end of text.
This puts ICU in conformance with the standard and paves the way for UTC to publish the state tables ICU uses as part of the data for UAX #14, making it easier for everyone to implement the line breaking algorithm. It also makes the state table a little simpler (the end of text behaves just like a hard line break, so
some states merge).
We expect this to have no impact on users, as the status of line breaks is a little-known and little-used feature, as evidenced by a recently-discovered long-standing bug in that feature (
ICU-23404 Thai text with hard line breaks would report a hard line break at every word).
Best regards,
Robin Leroy