Hello,
I've downloaded ~100 files encoded in Windows-1251.
https://en.wikipedia.org/wiki/Windows-1251
> Windows-1251 (a.k.a. code page CP1251) is a popular 8-bit character
> encoding, designed to cover languages that use the Cyrillic script.
(CP1251 matches ASCII for values less than 128.)
I am trying to use grep to find lines containing "non-ASCII" characters,
i.e. values 128-255.
According to the following discussion, I should be able to use pcre
in GNU grep, as in
grep -P "[\x80-\xFF]" file
but this does not work for me :-(
https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters-in-unix
$ hexdump.exe -C test.txt
00000000 54 45 53 54 0a 4e 61 6d 65 3a 20 cf f3 f1 f2 ee |TEST.Name: .....|
00000010 3b 0a |;.|
00000012
$ grep -P "[\x80-\xFF]" test.txt
$ echo $?
1
There clearly are values > 128 in the file. What am I doing wrong?
(I should note that I am using Cygwin, not a "real" env.)
Perhaps there are other tools, better suited for this task?
(awk might be useful, but I've never used it for serious work.)
Regards.