Bidi behavior in number and date formatting

53 views
Skip to first unread message

Rich Gillam

unread,
Feb 5, 2025, 7:51:24 PMFeb 5
to design-wg (CLDR), icu-design
Hi everybody—

I’ve run into a number of bidi-related problems lately, and I need some guidance. I’m thinking about filing tickets, but thought it might be good to discuss by email before filing, so I have a better idea of what to say in the tickets.

I’ve run into a couple of situations lately where the bidi behavior of some element might be different depending on which digits we’re using to format a number, and there doesn’t seem to be any facility in CLDR to deal with this.

We offer the ability for users to choose their numbering system independent of their locale, so if your system language is Arabic or Urdu, you can use either native digits or Latin digits. But consider the degree sign: If you’re using Latin digits, you want it to stick to the right-hand side of the number, but if you’re using native digits, you want it to stick to the left-hand side of the number. We get that behavior in Arabic “for free” due to the characters’ bidi properties. But we don’t get that behavior in Urdu or Persian. And I can't just deal with this by changing our copy of CLDR to put a RLM in front of the degree sign, because that’ll move it to the left-hand side regardless of my numbering system. I end up having to include clumsy special-case code.

I’ve run into other variations on this: if I format a time in 12-hour format, which side “AM” or “PM” appears on might depend on the digits I’m using for the time, but I can’t control that, either (I also can’t peg it to one side or the other by changing the “AM” and “PM” strings— the bidi mark has to go on the other side of the time). I’ve also run into problems with currency formats where I’m operating in a RTL language but a particular currency symbol is all LTR characters.

So what’s the preferred solution to these kinds of problems? Right now I can’t think of anything other than special-case code.

For these and many other reasons, it seems like we should be getting away from embedding bidi controls in our CLDR data and moving to a code-based solution based on the bidi isolate characters (and even then, I’m not quite sure how to solve the above problems). What would it take to make that move?

—Rich Gillam

Mark Davis Ⓤ

unread,
Feb 6, 2025, 10:46:36 AMFeb 6
to Annemarie Apple, Rich Gillam, design-wg (CLDR), icu-design
Peter should weigh in on this, but one thought. If using bid isolates in the patterns would solve the problem (or ameliorate it), we could include a notice in the v47 migration notes that we may use them in v48.

On Wed, Feb 5, 2025, 20:21 Annemarie Apple <anne...@unicode.org> wrote:
+1 I would like to see this solve systematically. I feel like I've been staring at the same set of bugs and I don't remember some of these being an issue in the past, but I haven't had time to dig into them deeply yet (and maybe I'm just mis-remembering).

I'd be reluctant to add additional data items the way we have vetters add number formats for each numbering system when more than one is common (e.g. the decimal separator is different between latin digit and native digit formatting in Khmer) since this could also be an issue with short and narrow versions of units which could trigger a lot of additional data items.

Best,

Annemarie

~~~~~~~~~~
If you received this communication by mistake, please don't forward it to anyone else, please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thank you!


--
You received this message because you are subscribed to the Google Groups "design-wg (CLDR)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to design-wg+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/design-wg/72389DA2-8574-4BF6-85AE-783C1A9D2D78%40apple.com.

--
You received this message because you are subscribed to the Google Groups "design-wg (CLDR)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to design-wg+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/design-wg/CALVBnREC9fi_BHsEs9F8wyNyF6Vk9gWPkGgbnkfmN-Jw%3DMJsNQ%40mail.gmail.com.

Gilad Almosnino

unread,
Feb 7, 2025, 1:26:12 PMFeb 7
to design-wg (CLDR), icu-design, Rich Gillam, icu-d...@lists.sourceforge.net
Hi Rich, 
Can you provide more information on the environment you are running on? 
Is the UI Mirrored for Arabic, Urdu, and Persian? 
can you share some sample strings? 


Users rarely mix and match locale elements as far as I know, Anrew Glass may have some data around this from Windows. 

Gilad 
Standard institute of Israel Hebrew Support Commitee Chair
EX MSFT BIDI i18n and BIDI PM  


From: 'Rich Gillam' via icu-design <icu-d...@unicode.org>
Sent: Thursday, February 6, 2025 02:51
To: design-wg (CLDR) <desi...@unicode.org>; icu-design <icu-d...@unicode.org>
Subject: [icu-design] Bidi behavior in number and date formatting
 
--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-design+...@unicode.org.
To view this discussion visit https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Funicode.org%2Fd%2Fmsgid%2Ficu-design%2F72389DA2-8574-4BF6-85AE-783C1A9D2D78%2540apple.com&data=05%7C02%7C%7Cf87fb39ac4c54d88bce208dd46487007%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638743999154263949%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=PGxtUX5CyyQ9qMfSOHkYdpVnX6NsZEWINgNxCy0rHAc%3D&reserved=0.
For more options, visit https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Funicode.org%2Fd%2Foptout&data=05%7C02%7C%7Cf87fb39ac4c54d88bce208dd46487007%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638743999154284418%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=IGkLnCNqEDRnUiZtWq5gdlNMZi7IeMACrioWLCrbWzo%3D&reserved=0.


_______________________________________________
icu-design mailing list
icu-d...@lists.sourceforge.net
To Un/Subscribe: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Ficu-design&data=05%7C02%7C%7Cf87fb39ac4c54d88bce208dd46487007%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638743999154298273%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=xmIHLT9pZp0dSRgHj7oWzwIChjBUsF4O154GjCP1kew%3D&reserved=0

Rich Gillam

unread,
Jul 30, 2025, 6:58:36 PMJul 30
to Gilad Almosnino, design-wg (CLDR), icu-design, icu-d...@lists.sourceforge.net
(Another discussion thread that I dropped back in February but didn’t want to let totally fall on the floor…)

Can you provide more information on the environment you are running on? 
Is the UI Mirrored for Arabic, Urdu, and Persian? 
can you share some sample strings? 

Yes, the UI is mirrored for the languages in question, or at least it’s supposed to be.  I’m not sure what you mean by “sample strings”.  The issues I remember would happen even if the formatted value is isolated in a UI element with no surrounding text.  And yes, I think we do have use cases where the UI language is mixed, but I don’t remember the details now.

We definitely DO allow a user in most bidi languages to choose whether to use Latin or native digits, which is what prompts my concern.  If the surrounding text is in, say, Urdu regardless, it needs to look right (whatever that means in practice) with both Latin and native digits, and the app shouldn’t have to do anything special to account for that.

Peter should weigh in on this, but one thought. If using bid isolates in the patterns would solve the problem (or ameliorate it), we could include a notice in the v47 migration notes that we may use them in v48.

I think it’s high time to do this, even though I know my own employer has been one of the ones with backward-compatibility issues.  I’d probably have to ask around to make sure everybody here can handle bidi isolates, but they’d solve a lot of problems.  I suspect that making formatted strings work right with both Latin and native digits might require TWO layers of isolates: one around the numeral and another one around the whole string— I’m not sure there’s another way to get an AM/PM string or a unit name to display in the right place regardless of which digits are being used for the number.

—Rich

To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/DM6PR02MB41874EB8BA495DBAF32FA1C5F5F62%40DM6PR02MB4187.namprd02.prod.outlook.com.
For more options, visit https://groups.google.com/a/unicode.org/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/DM6PR02MB41874EB8BA495DBAF32FA1C5F5F62%40DM6PR02MB4187.namprd02.prod.outlook.com.

Rich Gillam

unread,
Jul 30, 2025, 8:07:03 PMJul 30
to 'Rich Gillam' via icu-design
Forwarding to correct (I hope) icu-design list...

Begin forwarded message:
Reply all
Reply to author
Forward
0 new messages