Fix for Chinese locales under macOS (PR #23556)

55 views
Skip to first unread message

VZ

unread,
May 18, 2023, 7:57:22 PM5/18/23
to wx-...@googlegroups.com, Subscribed

@utelle I'd appreciate your thoughts about this, notably if we should change FromTag() to deal with strings of the form zh_Hant_TW directly and/or if you see any better way of handling this at NSLocale level.


You can view, comment on, or merge this pull request online at:

  https://github.com/wxWidgets/wxWidgets/pull/23556

Commit Summary

  • d2101ca Show result of parsing wxLocaleIdent in wxUILocale pseudo test
  • 6d849f8 Fix creating Chinese locales under macOS

File Changes

(2 files)

Patch Links:


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556@github.com>

Ulrich Telle

unread,
May 19, 2023, 8:44:32 AM5/19/23
to wx-...@googlegroups.com, Subscribed

@utelle I'd appreciate your thoughts about this, notably if we should change FromTag() to deal with strings of the form zh_Hant_TW directly

Well, I'm a bit confused. According to the macOS documentation locale identifiers which contain a script part should have a hyphen to separate the script code from the language code, not an underscore. Method FromTag already handles tags of the form zh-Hant_TW correctly.

So, where does the tag with 2 underscore delimiters come from? User input? If yes, the syntax is simply wrong. From the system? In that case I'd say it is a bug in the system.

and/or if you see any better way of handling this at NSLocale level.

I will have to take a closer look whether there exists a better handling option.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1554518287@github.com>

VZ

unread,
May 19, 2023, 10:03:27 AM5/19/23
to wx-...@googlegroups.com, Subscribed

So, where does the tag with 2 underscore delimiters come from?

From [NSLocale availableLocaleIdentifiers], which is what this code works with. You could say that it's a bug, or you could say that the documentation is incorrect, but we can't really change it, so we have to deal with it and the only possibilities I see is what I did, which is less disruptive, or changing FromTag() to deal with this format (which would involve at least preventing it from "recognizing" Hant_TW as Windows sort order...).

and/or if you see any better way of handling this at NSLocale level.

I will have to take a closer look whether there exists a better handling option.

TIA!


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1554614954@github.com>

Ulrich Telle

unread,
May 19, 2023, 1:26:58 PM5/19/23
to wx-...@googlegroups.com, Subscribed

So, where does the tag with 2 underscore delimiters come from?

From [NSLocale availableLocaleIdentifiers], which is what this code works with. You could say that it's a bug, or you could say that the documentation is incorrect,

Throughout all Apple documentation I haven't seen locale identifiers with 2 underscore delimiters. I tested this myself in my macOS Ventura VM - and I can confirm that indeed 2 underscores are used as delimiters.

With the command locale -a issued from a terminal I don't see a single locale with a script part. Strange.

availableLocaleIdentifiers returns a very long list of locale tags with quite a number of exotic locales. The list is completely unsorted. I haven't checked all entries, but regarding Chinese I found entries which will make it hard to determine the right entry: zh, zh_Hans, zh_Hant, zh_Hans_CN, zh_Hant_CN ...

IMHO the script part should be checked, too, if the tag to be checked contains a script part. I wonder which entry macOS selects if no script part is given.

but we can't really change it, so we have to deal with it and the only possibilities I see is what I did, which is less disruptive, or changing FromTag() to deal with this format (which would involve at least preventing it from "recognizing" Hant_TW as Windows sort order...).

Yes, it is certainly not optimal to interpret Hant_TW as Windows sort order.

What we could do, if we find more than one underscore delimiter, is to replace the underscores by hyphens. Then interpreting the tag would work as expected. As far as I can tell underscores are not allowed as a vaild character of a tag part.

In your code you do just that. It seems to be logical to integrate that approach in method FromTag.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1555005200@github.com>

VZ

unread,
May 19, 2023, 1:45:49 PM5/19/23
to wx-...@googlegroups.com, Subscribed

Throughout all Apple documentation I haven't seen locale identifiers with 2 underscore delimiters.

Yes, I know, which is why I think it could be also a bug in their documentation. But we have to live with what we have...

availableLocaleIdentifiers returns a very long list of locale tags with quite a number of exotic locales. The list is completely unsorted. I haven't checked all entries, but regarding Chinese I found entries which will make it hard to determine the right entry: zh, zh_Hans, zh_Hant, zh_Hans_CN, zh_Hant_CN ...

IMHO the script part should be checked, too, if the tag to be checked contains a script part. I wonder which entry macOS selects if no script part is given.

I couldn't determine this, but I'm also worried by it, see the commit message. The trouble is that we can't really do much without more information than we currently have: if all we've got is zh_CN, how are we supposed to know that its default script should be Hans? Other than checking for "Simplified" in the description I don't think we can get this information from the current language database contents...

What we could do, if we find more than one underscore delimiter, is to replace the underscores by hyphens. Then interpreting the tag would work as expected. As far as I can tell underscores are not allowed as a vaild character of a tag part.

In your code you do just that. It seems to be logical to integrate that approach in method FromTag.

I thought about this too but wasn't completely sure if it was safe to do it in general, which is why I preferred to do it in Mac-specific code only. If you think it's ok, I can change this or I can merge this PR as is (because I'd also like to backport it to 3.2) and then you could update it if you'd like. Please let me know, TIA!


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1555018192@github.com>

Ulrich Telle

unread,
May 19, 2023, 3:49:17 PM5/19/23
to wx-...@googlegroups.com, Subscribed

Throughout all Apple documentation I haven't seen locale identifiers with 2 underscore delimiters.
Yes, I know, which is why I think it could be also a bug in their documentation. But we have to live with what we have...

Yes, there is no other choice. Fortunately, macOS is the only platform using this locale tag representation with 2 underscores. Under Windows and Linux I haven't seen any tags with more than one underscore.

The trouble is that we can't really do much without more information than we currently have: if all we've got is zh_CN, how are we supposed to know that its default script should be Hans? Other than checking for "Simplified" in the description I don't think we can get this information from the current language database contents...

If the user does not specify a script, we need to rely on localeWithLocaleIdentifier to return the most appropriate locale for the given language and country codes. And most likely that is the case.

I don't think it is worth the effort to extend the language database by the default script of the locales.

What we could do, if we find more than one underscore delimiter, is to replace the underscores by hyphens. [...]


In your code you do just that. It seems to be logical to integrate that approach in method FromTag.

I thought about this too but wasn't completely sure if it was safe to do it in general, which is why I preferred to do it in Mac-specific code only. If you think it's ok, I can change this or I can merge this PR as is (because I'd also like to backport it to 3.2) and then you could update it if you'd like. Please let me know, TIA!

IMHO it is safe to handle tags with 2 underscores within FromTag.

The change to FromTag should be rather simple. Only the part handling a Windows sort order needs to be modified. The current code is the following:

    // 1d. Check for sort order in Windows tag
    //
    // Make sure we don't extract the region identifier erroneously as a sortorder identifier
    {
        wxString tagTemp = tagMain.BeforeFirst('_', &tagRest);
        if (tagRest.length() > 4 && locId.m_modifier.empty() && locId.m_charset.empty())
        {
            // Windows sortorder found
            locId.SortOrder(tagRest);
            tagMain = tagTemp;
        }
    }

Changing the line determining tagTemp as follows should do the trick:

        wxString tagTemp = tagMain.BeforeLast('_', &tagRest);

If the part that potentially represents a sort order is not long enough, the tagMain remains unchanged, and in the next step all underscores are replaced by hyphens.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1555155950@github.com>

Ulrich Telle

unread,
May 25, 2023, 9:21:04 AM5/25/23
to wx-...@googlegroups.com, Subscribed

IMHO it is safe to handle tags with 2 underscores within FromTag.

This definitely works ... after adjusting the code for detecting a Windows sort order attribute.

The change to FromTag should be rather simple. Only the part handling a Windows sort order needs to be modified. The current code is the following:

    // 1d. Check for sort order in Windows tag
    //
    // Make sure we don't extract the region identifier erroneously as a sortorder identifier
    {
        wxString tagTemp = tagMain.BeforeFirst('_', &tagRest);
        if (tagRest.length() > 4 && locId.m_modifier.empty() && locId.m_charset.empty())
        {
            // Windows sortorder found
            locId.SortOrder(tagRest);
            tagMain = tagTemp;
        }
    }

Changing the line determining tagTemp as follows should do the trick:

        wxString tagTemp = tagMain.BeforeLast('_', &tagRest);

Using wxString::BeforeLast() works, if one takes into account that tagTemp will be empty, if the search character (here: underscore) was not found at all. Therefore it is necessary to verify that tagTemp is not empty, before assuming a sort order attribute:

        if (tagTemp.length() > 0 && tagRest.length() > 4 &&
            locId.m_modifier.empty() && locId.m_charset.empty())

PR #23562 includes this adjustment.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1562899083@github.com>

VZ

unread,
May 28, 2023, 5:03:59 PM5/28/23
to wx-...@googlegroups.com, Push

@vadz pushed 6 commits.

  • 0c4e97b Use better suited locale for standard C locale under macOS
  • 5a80bfb Improve handling of standard C locale
  • 9ada6cd Fix parsing Windows sort order in wxLocaleIdent::FromTag()
  • 94060e8 Add wxUILocale methods for getting month and day names
  • d8fbbb8 Match the chosen script when creating wxUILocale under macOS
  • 6f18c33 Use correct calendar names form in wxGenericCalendarCtrl


View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/push/13788822681@github.com>

Ulrich Telle

unread,
May 30, 2023, 12:02:02 PM5/30/23
to wx-...@googlegroups.com, Subscribed

@utelle requested changes on this pull request.

Unfortunately, I had 2 small glitches in my original PR. In case of falling back to English names the methods must be called with the parameter form (not flags). This only affects systems that don't have HAVE_LANGINFO_H defined.


In src/unix/uilocale.cpp:

> +          ABMON_10, ABMON_11, ABMON_12 },
+        { ABMON_1,  ABMON_2,  ABMON_3,
+          ABMON_4,  ABMON_5,  ABMON_6,
+          ABMON_7,  ABMON_8,  ABMON_9,
+          ABMON_10, ABMON_11, ABMON_12 }
+    };
+
+    int idx = ArrayIndexFromFlag(form.GetFlags());
+    if (idx == -1)
+        return wxString();
+
+    return wxString(GetLangInfo(monthNameIndex[idx][month]), wxCSConv(GetCodeSet()));
+#endif //  __LINUX__ && __GLIBC__ / !__LINUX__ || !__GLIBC__
+#else // !HAVE_LANGINFO_H
+    // If HAVE_LANGINFO_H is not available, fall back to English names.
+    return wxDateTime::GetEnglishMonthName(month, flags);

This call must use form (not (flags):

return wxDateTime::GetEnglishMonthName(month, form);

Sorry, this was wrong in my original PR.


In src/unix/uilocale.cpp:

> +          DAY_4, DAY_5, DAY_6, DAY_7 },
+        { ABDAY_1, ABDAY_2, ABDAY_3,
+          ABDAY_4, ABDAY_5, ABDAY_6, ABDAY_7 },
+        { ABDAY_1, ABDAY_2, ABDAY_3,
+          ABDAY_4, ABDAY_5, ABDAY_6, ABDAY_7 }
+    };
+
+    const int idx = ArrayIndexFromFlag(form.GetFlags());
+    if (idx == -1)
+        return wxString();
+
+    return wxString(GetLangInfo(weekdayNameIndex[idx][weekday]), wxCSConv(GetCodeSet()));
+#endif //  __LINUX__ && __GLIBC__ / !__LINUX__ || !__GLIBC__
+#else // !HAVE_LANGINFO_H
+    // If HAVE_LANGINFO_H is not available, fall back to English names.
+    return wxDateTime::GetEnglishWeekDayName(weekday, flags);

This call must use form (not (flags), too:

return wxDateTime::GetEnglishWeekDayName(weekday, form);

Sorry, this was wrong in my original PR.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/review/1451298589@github.com>

VZ

unread,
May 31, 2023, 12:54:08 PM5/31/23
to wx-...@googlegroups.com, Push

@vadz pushed 3 commits.

  • 5b424ea Add wxUILocale methods for getting month and day names
  • 9deb7ea Match the chosen script when creating wxUILocale under macOS
  • 36f06bf Use correct calendar names form in wxGenericCalendarCtrl


View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/push/13824745888@github.com>

VZ

unread,
May 31, 2023, 12:54:25 PM5/31/23
to wx-...@googlegroups.com, Subscribed

@vadz commented on this pull request.


In src/unix/uilocale.cpp:

> +          ABMON_10, ABMON_11, ABMON_12 },
+        { ABMON_1,  ABMON_2,  ABMON_3,
+          ABMON_4,  ABMON_5,  ABMON_6,
+          ABMON_7,  ABMON_8,  ABMON_9,
+          ABMON_10, ABMON_11, ABMON_12 }
+    };
+
+    int idx = ArrayIndexFromFlag(form.GetFlags());
+    if (idx == -1)
+        return wxString();
+
+    return wxString(GetLangInfo(monthNameIndex[idx][month]), wxCSConv(GetCodeSet()));
+#endif //  __LINUX__ && __GLIBC__ / !__LINUX__ || !__GLIBC__
+#else // !HAVE_LANGINFO_H
+    // If HAVE_LANGINFO_H is not available, fall back to English names.
+    return wxDateTime::GetEnglishMonthName(month, flags);

Thanks for noticing this, fixed and force-pushed now.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/review/1453725384@github.com>

VZ

unread,
Jun 2, 2023, 1:07:39 PM6/2/23
to wx-...@googlegroups.com, Subscribed

@utelle Just to be sure: this is ok to merge now, right?


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1574054931@github.com>

Ulrich Telle

unread,
Jun 2, 2023, 1:33:26 PM6/2/23
to wx-...@googlegroups.com, Subscribed

@utelle Just to be sure: this is ok to merge now, right?

Yes, AFAICT all related issues are addressed and all checks are successful. So, it should be ok to merge.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1574083443@github.com>

VZ

unread,
Jun 3, 2023, 3:09:39 PM6/3/23
to wx-...@googlegroups.com, Subscribed

Merged #23556 into master.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/issue_event/9422329715@github.com>

Mark Roszko

unread,
Jun 8, 2023, 12:59:07 PM6/8/23
to wx-...@googlegroups.com, Subscribed

Curious, is the changes to matching the locale in uilocale.mm going to be backported to 3.2?

We have issues in KiCad where users cannot switch their app language to zh_TW or zh_CN and I wonder if it's because the OS is returning the longer names i.e. zh_Hant_TW


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1583031526@github.com>

VZ

unread,
Jun 8, 2023, 4:21:29 PM6/8/23
to wx-...@googlegroups.com, Subscribed

@marekr The particular problem you're speaking of is #23209 which is a (strict) subset of the changes in this PR, so I won't backport all of the changes here, but I will indeed backport 6d849f8 and a couple of related commits, thanks for the reminder.

I'll also backport the fix for #23557, i.e. PR #23603.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/pull/23556/c1583286653@github.com>

Reply all
Reply to author
Forward
0 new messages