Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#868654: Combining Unicode Mark-Nonspacing are classified as [:punct:]

0 views
Skip to first unread message

Santiago R.R.

unread,
Jul 17, 2017, 5:30:03 AM7/17/17
to
Source: glibc
Version: 2.24-12
Severity: minor
Control: block 662629 by -1

Hi,

There is an issue on how glibc classifies the Unicode Mark-nonspacing
category, that should be maybe [[:alpha:]] instead of [[:punct:]]. This
was identified by the bug reported to grep:
https://bugs.debian.org/662629

You can test it using the U+0301 acute accent:

$ echo árbol | grep -o '[[:alpha:]]*'
a
rbol

This is also the opinion by grep's upstream about it:

"Surely this is a glibc bug, not a grep bug. Grep is just following the
character classification of glibc. I can reproduce the problem by
compiling and running the attached program, which uses only glibc (not
grep). This program exits with status 1, whereas you want it to exit
with status 0. So I suggest filing a glibc bug report."

combining.c is attached to this mail.

Cheers,

-- Santiago

Vincent Lefevre

unread,
Dec 12, 2023, 11:10:06 AM12/12/23
to
Control: forwarded -1 https://sourceware.org/bugzilla/show_bug.cgi?id=31149

I've just reported the bug upstream, as nothing has been done since
more than 6 years!

--
Vincent Lefèvre <vin...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
0 new messages