Case modification fails for Unicode characters

Showing 1-7 of 7 messages
Case modification fails for Unicode characters DennisW 7/12/12 11:19 AM
s=łódź; echo "${s^^} ${s~~}"'
łóDź ŁÓDŹ

The to-upper and the undocumented toggle operators should produce
identical output in this situation, but only the toggle works
correctly.

This is in en_US.UTF-8, but also reported in pl_PL.utf-8. In Bash
4.2.24 and Bash 4.0.33.

--
Visit serverfault.com to get your system administration questions answered.

Re: Case modification fails for Unicode characters DJ Mills 7/12/12 11:57 AM
I get the same result with:
» echo "$s" | tr '[:lower:]' '[:upper:]'
łóDź

» locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


This is a locale issue, and has nothing to do with bash itself...

Re: Case modification fails for Unicode characters DennisW 7/12/12 5:46 PM
That's partly true except that ~~ works.
Re: Case modification fails for Unicode characters Pierre Gaston 7/12/12 10:56 PM
Also many (all?) versions of tr don't know about locale, eg here:
$ echo ź | tr ź a
aa

Re: Case modification fails for Unicode characters Andreas Schwab 7/13/12 12:39 AM
It _is_ a bash bug.

diff --git a/lib/sh/casemod.c b/lib/sh/casemod.c
index 3127d8c..d58b216 100644
--- a/lib/sh/casemod.c
+++ b/lib/sh/casemod.c
@@ -227,8 +227,8 @@ sh_modcase (string, pat, flags)
           {
           default:
           case CASE_NOOP:  nwc = wc; break;
-          case CASE_UPPER:  nwc = TOUPPER (wc); break;
-          case CASE_LOWER:  nwc = TOLOWER (wc); break;
+          case CASE_UPPER:  nwc = _to_wupper (wc); break;
+          case CASE_LOWER:  nwc = _to_wlower (wc); break;
           case CASE_TOGGLEALL:
           case CASE_TOGGLE: nwc = TOGGLE (wc); break;
           }

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Re: Case modification fails for Unicode characters Chet Ramey 7/13/12 6:15 AM
On 7/13/12 3:39 AM, Andreas Schwab wrote:
> It _is_ a bash bug.

Thanks for the fix.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    ch...@case.edu    http://cnswww.cns.cwru.edu/~chet/



Re: Case modification fails for Unicode characters Chet Ramey 7/13/12 7:31 AM
On 7/12/12 2:19 PM, Dennis Williamson wrote:
> s=łódź; echo "${s^^} ${s~~}"'
> łóDź ŁÓDŹ
>
> The to-upper and the undocumented toggle operators should produce
> identical output in this situation, but only the toggle works
> correctly.

Thanks for the report.  Andreas's fix is correct.  That will be in the
next bash release.