Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#730472: grep: with -P, invalid UTF-8 byte sequence in input (regression)

207 views
Skip to first unread message

Vincent Lefevre

unread,
Nov 25, 2013, 6:40:02 AM11/25/13
to
Package: grep
Version: 2.15-1
Severity: important

The -P option no longer works: I get

"invalid UTF-8 byte sequence in input"

errors with it.

$ grep -r blah .
$ grep -r -P blah .
grep: invalid UTF-8 byte sequence in input

-- System Information:
Debian Release: jessie/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.11-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages grep depends on:
ii dpkg 1.17.1
ii install-info 5.2.0.dfsg.1-1
ii libc6 2.17-96
ii libpcre3 1:8.31-2

grep recommends no packages.

grep suggests no packages.

-- no debconf information


--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Santiago

unread,
Nov 25, 2013, 11:20:02 AM11/25/13
to
Control: tags 730472 confirmed

On Mon, Nov 25, 2013 at 12:27:23PM +0100, Vincent Lefevre wrote:
> Package: grep
> Version: 2.15-1
> Severity: important
>
> The -P option no longer works: I get
>
> "invalid UTF-8 byte sequence in input"
>
> errors with it.
>
> $ grep -r blah .
> $ grep -r -P blah .
> grep: invalid UTF-8 byte sequence in input
>

Thanks for you report.

Indeed, grep -P and UTF-8 are not happy with non-valid UTF-8 inputs.

This works:

$ printf '�' | LC_ALL=fr_FR.UTF-8 grep -P '�'


This reports error:

$ echo '�' > /tmp/test
$ LC_ALL=fr_FR.UTF-8 grep -P -r '�' /tmp/
/tmp/test:�
grep: invalid UTF-8 byte sequence in input

But it works if I don't use UTF-8

LC_ALL=C grep -P -r '�' /tmp/
/tmp/test:�
Binary file /tmp/tmp54ca5e73.tmp matches
...

I'll work on it as soon as possible.

Best regards,

Santiago
0 new messages