Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using locale other than english in WIndows gawk

68 views
Skip to first unread message

chano

unread,
Nov 21, 2016, 12:52:30 AM11/21/16
to
I'm trying to figure out how to use non ascii characters in gawk in windows
I googled no luck so far. some suggests use "set LC_ALL=UTF-8" but didn't
work. I also tried "set LC_ALL=utf8" which didn't work either.

How do I use different locale than English in windows gawk?

Marc de Bourget

unread,
Nov 21, 2016, 1:27:04 AM11/21/16
to

Kaz Kylheku

unread,
Nov 21, 2016, 10:53:59 AM11/21/16
to
This must be referring to the MinGW-based Gawk port.

Of course Microsoft's WWII era C library isn't fucking
going to parse "LC_ALL=en_US.UTF-8" and behave accordingly.

Everything works for me in the Cygnal-based port:

C:\winawk>gawk -f celine.awk
7

|Céline |

C:\winawk>SET LC_ALL=en_US.UTF-8

C:\winawk>gawk -f celine.awk
6
é
|Céline |

Displays fine in the CMD.EXE console on Windows 7, from which I copied
and pasted the above.

Your problems are self-inflicted, caused by the inability to recognize
MinGW has half-broken pile of crap that needs to be discarded.

All these "X doesn't work on Windows Awk" reports are duplicates of the
same issue.

Manuel Collado

unread,
Nov 21, 2016, 12:48:35 PM11/21/16
to
Short answer: Cygwin
Simple answer: don't use locales, but the native Windows charsets
(CP1252 and the like). They support some non-ascii characters.

Need more help? Please post short sample input data, awk code and
expected output.

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Ed Morton

unread,
Nov 21, 2016, 1:20:58 PM11/21/16
to
On 11/21/2016 11:50 AM, Manuel Collado wrote:
> El 21/11/2016 6:52, chano escribió:
>> I'm trying to figure out how to use non ascii characters in gawk in windows
>> I googled no luck so far. some suggests use "set LC_ALL=UTF-8" but didn't
>> work. I also tried "set LC_ALL=utf8" which didn't work either.
>>
>> How do I use different locale than English in windows gawk?
>
> Short answer: Cygwin

Do locales work in cygwin now? Used to be a well known issue...

Ed.

Manuel Collado

unread,
Nov 21, 2016, 3:48:43 PM11/21/16
to
El 21/11/2016 19:20, Ed Morton escribió:
> On 11/21/2016 11:50 AM, Manuel Collado wrote:
>> El 21/11/2016 6:52, chano escribió:
>>> I'm trying to figure out how to use non ascii characters in gawk in
>>> windows
>>> I googled no luck so far. some suggests use "set LC_ALL=UTF-8" but
>>> didn't
>>> work. I also tried "set LC_ALL=utf8" which didn't work either.
>>>
>>> How do I use different locale than English in windows gawk?
>>
>> Short answer: Cygwin
>
> Do locales work in cygwin now? Used to be a well known issue...
>

Well, somehow.

$ echo $LANG
es_ES.UTF-8
$ gawk --version > gawk.txt
$ LANG=es_ES gawk --version > gawk2.txt
$ file gawk*
gawk.txt: UTF-8 Unicode text
gawk2.txt: ISO-8859 text

>
>> Simple answer: don't use locales, but the native Windows charsets
>> (CP1252 and
>> the like). They support some non-ascii characters.
>>
>> Need more help? Please post short sample input data, awk code and
>> expected output.

Marc de Bourget

unread,
Nov 21, 2016, 4:57:09 PM11/21/16
to
There is nothing self-inflicted. I haven't written the GAWK MinGW port.
I wasn't aware that your GAWK version was able to solve the UTF-8 issue.

Cygnal GAWK native Windows port:
http://www.kylheku.com/cygnal/winawk.tar.gz

How do you achieve to get Unix locales to work properly on Windows?
Anyway, I have done some tests and it looks really good. Great work so far!
We should test your version thoroughly and if no one finds an issue we may
use your version as best GAWK Windows port.

Kaz Kylheku

unread,
Nov 21, 2016, 8:18:31 PM11/21/16
to
Very simply: by (hopefully) not breaking any locale stuff in Cygnal,
relative to upstream Cygwin.

The run-time support libraries from Cygwin have the locale support,
according to POSIX. If you patch some things in Cygwin to make it
more native-Windows-like here and there, and don't break the locale
stuff, then the locale stuff continues to work.

> Anyway, I have done some tests and it looks really good. Great work so far!

Great almost-no-work really; some of what you're testing might not
even be different between Cygwin and Cygnal.

> We should test your version thoroughly and if no one finds an issue we may
> use your version as best GAWK Windows port.

If you want to validate a Cygnal-based awk for yourself, the focus
should probably be on all the things that are different with regard
to that same awk executable running on Cygwin.

Mainly that would be in the area of path handling and also running
external processes.

Windows paths should work, and also the concept of a current working
directory per drive letter. If you're currently in C:\Users
but your D: drive is in D:\whatever, and you pass a path like
D:foo.txt to the Cygnal gawk, it should open D:\whatever\foo.txt,
and not D:\foo.txt.

System commands like with system("...") and the pipe syntax and
whatnot should be using the CMD.EXE command interpreter under
Cygnal. Under Cygwin, they look for a /bin/sh shell.

Cygnal isn't likely going to break anything internal to Gawk.

Because Gawk doesn't use stdio streams, it doesn't benefit from the
Cygnal having text mode streams in Windows mode (CR-LF) as default.
This is the down side. When you do printf("foo\n") in gawk, it
puts out a Unix newline.

One useful feature in Gawk is that the RT variable is set to the piece
of text which matches the RS record separator. So with that, if you
have a record separator regex that matches either CR or CR-LF, RT
can reproduce the actual separator regex which occurred. If you
explicitly use RT, you can write code that preserves the line
termination style.

Marc de Bourget

unread,
Nov 22, 2016, 7:03:59 AM11/22/16
to
One hint: The Cygnal GAWK version prints \n instead of \r\n with celine.awk
(the print command). I would have prefered CR-LF, but this is no big issue.
0 new messages