Re: silly ParagraphLayout question

16 views
Skip to first unread message

Fredrik Roubert

unread,
Apr 29, 2025, 8:19:56 AMApr 29
to Martin Fietkiewicz, Steven Loomis, icu-support
On Mon, Apr 28, 2025 at 2:58 PM <martin.fi...@gmail.com> wrote:

> Hello Fredrik, sorry to bother you, but I've gone through all the
> ParagraphLayout.cpp code over and over again for work, and I can't
> figure out why this function would need or want to have Thai hardcoded
> inside. Or perhaps I don't quite understand the specific Thai case here.

I don't really know anything about the layout module and I have no
clue about why there's a BreakIterator specifically for Thai created
there, but I can see from the revision log that this has been there
from the very first version of this code, which was added for this
ticket:

https://unicode-org.atlassian.net/browse/ICU-2243

Unfortunately there are no comments there whatsoever about why it was
done in this way, but at least it says that the code reviewer was
"srl" which is Steven Loomis who's still around (CC'ed) and who might
possibly have some insight about this.

--
Fredrik Roubert
rou...@google.com

Steven R. Loomis

unread,
Apr 29, 2025, 10:08:37 AMApr 29
to martin.fi...@gmail.com, Fredrik Roubert, icu-support
Hi,
 The  Thai language is written without spaces so you need a special break iterator to know when to split lines, using a dictionary (word list). 


Note that There are minority languages using the same script that need different handling. 

Hope This helps. If you need more detail please keep the list copied here. 

Steven 

El El mar, 29 abr 2025 a la(s) 9:00 a.m., martin.fi...@gmail.com <martin.fi...@gmail.com> escribió:
Thank you so much. Hello, Steven, nice hat. 😊 Any insight you can provide would be fab. 🙏

Thank you,

Martin

Steven Loomis

unread,
Apr 29, 2025, 11:01:58 AMApr 29
to martin.fi...@gmail.com, Fredrik Roubert, icu-support
The Lao Khmer Burmese  iterators may not have existed when that was written

See https://unicode-org.atlassian.net/browse/ICU-13219 for some background. There’s some text in sss-Thai (So language) that has spaces, but the text gets broken badly if it’s treated as if it were the Thai language. (the word อะลู่วาง gets split into three pieces)

--
Steven R. Loomis
Code Hive Tx, LLC



On Apr 29, 2025, at 9:28 AM, martin.fi...@gmail.com wrote:

Thanks Steven! Is the algorithm basically "treat all other spaceless languages using Thai spacelessness" (because Thai is a particular but arbitrary example of such)?

I ask because I don't see any of the others (Lao, Khmer, Burmese, etc.) receive a special break iterator.

Martin

P.S.

The icu-s...@unicode.org addy bounces for me.

M.

Fredrik Roubert

unread,
Apr 29, 2025, 11:17:45 AMApr 29
to Martin Fietkiewicz, Steven R. Loomis, icu-support
On Tue, Apr 29, 2025 at 4:29 PM martin.fi...@gmail.com
<martin.fi...@gmail.com> wrote:

> The icu-s...@unicode.org addy bounces for me.

The current settings for that group are that you need to first join
the group (anyone can join the group) before you can post to it.

--
Fredrik Roubert
rou...@google.com
Reply all
Reply to author
Forward
0 new messages