Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Case modification fails for Unicode characters

199 views
Skip to first unread message

Dennis Williamson

unread,
Jul 12, 2012, 2:19:05 PM7/12/12
to bug-...@gnu.org
s=łódź; echo "${s^^} ${s~~}"'
łóDź ŁÓDŹ

The to-upper and the undocumented toggle operators should produce
identical output in this situation, but only the toggle works
correctly.

This is in en_US.UTF-8, but also reported in pl_PL.utf-8. In Bash
4.2.24 and Bash 4.0.33.

--
Visit serverfault.com to get your system administration questions answered.

DJ Mills

unread,
Jul 12, 2012, 2:57:40 PM7/12/12
to Dennis Williamson, bug-...@gnu.org
I get the same result with:
» echo "$s" | tr '[:lower:]' '[:upper:]'
łóDź

» locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


This is a locale issue, and has nothing to do with bash itself...

Dennis Williamson

unread,
Jul 12, 2012, 8:46:35 PM7/12/12
to DJ Mills, bug-...@gnu.org
That's partly true except that ~~ works.

Pierre Gaston

unread,
Jul 13, 2012, 1:56:28 AM7/13/12
to Dennis Williamson, DJ Mills, bug-...@gnu.org
Also many (all?) versions of tr don't know about locale, eg here:
$ echo ź | tr ź a
aa

Andreas Schwab

unread,
Jul 13, 2012, 3:39:57 AM7/13/12
to DJ Mills, Dennis Williamson, bug-...@gnu.org
It _is_ a bash bug.

diff --git a/lib/sh/casemod.c b/lib/sh/casemod.c
index 3127d8c..d58b216 100644
--- a/lib/sh/casemod.c
+++ b/lib/sh/casemod.c
@@ -227,8 +227,8 @@ sh_modcase (string, pat, flags)
{
default:
case CASE_NOOP: nwc = wc; break;
- case CASE_UPPER: nwc = TOUPPER (wc); break;
- case CASE_LOWER: nwc = TOLOWER (wc); break;
+ case CASE_UPPER: nwc = _to_wupper (wc); break;
+ case CASE_LOWER: nwc = _to_wlower (wc); break;
case CASE_TOGGLEALL:
case CASE_TOGGLE: nwc = TOGGLE (wc); break;
}

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Chet Ramey

unread,
Jul 13, 2012, 9:15:41 AM7/13/12
to Andreas Schwab, DJ Mills, chet....@case.edu, Dennis Williamson, bug-...@gnu.org
On 7/13/12 3:39 AM, Andreas Schwab wrote:
> It _is_ a bash bug.

Thanks for the fix.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/



Chet Ramey

unread,
Jul 13, 2012, 10:31:06 AM7/13/12
to Dennis Williamson, bug-...@gnu.org, chet....@case.edu
On 7/12/12 2:19 PM, Dennis Williamson wrote:
> s=łódź; echo "${s^^} ${s~~}"'
> łóDź ŁÓDŹ
>
> The to-upper and the undocumented toggle operators should produce
> identical output in this situation, but only the toggle works
> correctly.

Thanks for the report. Andreas's fix is correct. That will be in the
next bash release.
0 new messages