Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

grep is screwed on Debian, Ubuntu and others ...

3 views
Skip to first unread message

Kaz Kylheku

unread,
May 3, 2012, 9:25:05 PM5/3/12
to
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655293

I ran into this doing a simple grep job that needed to match upper
case characters, and so I started Googling. This was only reported in January.

But the Red Hat people knew about what looks like the same bug two years ago.
Oops, they didn't share!

https://bugzilla.redhat.com/show_bug.cgi?id=583011

(So much for the spirit of collaboration in open source. My distro, my
patches, screw you!)

Watch this:

$ echo a | grep '[A-B]'
a
$ echo b | grep '[A-B]'
$ echo b | grep '[:upper:]'
$ echo B | grep '[:upper:]'
$ echo E | grep '[:upper:]'
$ echo e | grep '[:upper:]'
e

Ooops! Someone doesn't have a regression test suite, or at least not one
that is worth a damn.

Lew Pitcher

unread,
May 3, 2012, 10:05:54 PM5/3/12
to
On Thursday 03 May 2012 21:25, in comp.unix.shell, k...@kylheku.com wrote:

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655293
>
> I ran into this doing a simple grep job that needed to match upper
> case characters, and so I started Googling. This was only reported in
> January.
>
> But the Red Hat people knew about what looks like the same bug two years
> ago. Oops, they didn't share!
>
> https://bugzilla.redhat.com/show_bug.cgi?id=583011
>
> (So much for the spirit of collaboration in open source. My distro, my
> patches, screw you!)
>
> Watch this:
>
> $ echo a | grep '[A-B]'
> a
> $ echo b | grep '[A-B]'
> $ echo b | grep '[:upper:]'
> $ echo B | grep '[:upper:]'
> $ echo E | grep '[:upper:]'
> $ echo e | grep '[:upper:]'
> e

Hmmm... A couple of observations

First, IIRC, the grep character classes (such as [:upper:]) syntatically
substitute for the "list of characters" that are enclosed by the square
brackets.

Consequently, the alternate form of '[A-B]' is not '[:upper:]', but instead
is '[[:upper:]]'. That is, the '[:upper:]' is enclosed within a set of
square brackets, just like 'A-B' is.

Thus, your examples that use
grep '[:upper:]'
should only match the characters ':', 'u', 'p', 'e', or 'r', something that
your final example /does/ show.

Second; I guess that your abberent grep behaviour wrt 'a' is version
dependant. Under GNU grep 2.5.3 (32bit Slackware Linux 12.2), I don't see
the same results. In fact, I see the results you'd properly expect from
grep.

~ $ grep -V
GNU grep 2.5.3

Copyright (C) 1988, 1992-2002, 2004, 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

~ $ for element in a b A B e ;
> do
> echo "Using $element into grep [A-B]"
> echo $element | grep '[A-B]'
> echo "Using $element into grep [:upper:]"
> echo $element | grep '[:upper:]'
> echo "Using $element into grep [[:upper:]]"
> echo $element | grep '[[:upper:]]'
> done
Using a into grep [A-B]
Using a into grep [:upper:]
Using a into grep [[:upper:]]
Using b into grep [A-B]
Using b into grep [:upper:]
Using b into grep [[:upper:]]
Using A into grep [A-B]
A
Using A into grep [:upper:]
Using A into grep [[:upper:]]
A
Using B into grep [A-B]
B
Using B into grep [:upper:]
Using B into grep [[:upper:]]
B
Using e into grep [A-B]
Using e into grep [:upper:]
e
Using e into grep [[:upper:]]

HTH
--
Lew Pitcher

Richard Kettlewell

unread,
May 4, 2012, 4:12:29 AM5/4/12
to
Lew Pitcher <lpit...@teksavvy.com> writes:
> k...@kylheku.com wrote:

>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655293
>>
>> I ran into this doing a simple grep job that needed to match upper
>> case characters, and so I started Googling. This was only reported in
>> January.
>>
>> But the Red Hat people knew about what looks like the same bug two years
>> ago. Oops, they didn't share!
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=583011
>>
>> (So much for the spirit of collaboration in open source. My distro, my
>> patches, screw you!)

The fix linked from the RH bug is to upstream grep, so it's not clear in
what sense they "didn't share".

> Second; I guess that your abberent grep behaviour wrt 'a' is version
> dependant. Under GNU grep 2.5.3 (32bit Slackware Linux 12.2), I don't see
> the same results. In fact, I see the results you'd properly expect from
> grep.

It looks like it was dependent on both version and locale, as well as on
competence with regexp syntax.

--
http://www.greenend.org.uk/rjk/

Wolfram Gloger

unread,
May 4, 2012, 6:56:04 AM5/4/12
to
Kaz Kylheku <k...@kylheku.com> writes:

> Watch this:
>
> $ echo a | grep '[A-B]'
> a

I can't reproduce this on Debian. Neither in lenny nor in squeeze,
not even in etch. (C and de_DE.UTF-8 locales tested)

What is your 'Debian'?

> $ echo b | grep '[A-B]'
> $ echo b | grep '[:upper:]'
> $ echo B | grep '[:upper:]'
> $ echo E | grep '[:upper:]'
> $ echo e | grep '[:upper:]'
> e

I can reproduce these -- are all expected, as explained in followup.

Regards,
Wolfram.

Kaz Kylheku

unread,
May 4, 2012, 1:47:15 PM5/4/12
to
On 2012-05-04, Wolfram Gloger <wm...@dent.med.uni-muenchen.de> wrote:
> Kaz Kylheku <k...@kylheku.com> writes:
>
>> Watch this:
>>
>> $ echo a | grep '[A-B]'
>> a
>
> I can't reproduce this on Debian. Neither in lenny nor in squeeze,
> not even in etch. (C and de_DE.UTF-8 locales tested)

Me neither; I was mistaken about that. Sorry!
0 new messages