Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

AWK with different decimal separator on different Linux distros?

303 views
Skip to first unread message

cvh@LE

unread,
Jul 17, 2008, 5:41:09 AM7/17/08
to
Hi all,
I noticed an obscurity which took me some time to debug. It seems that
AWK on Fedora Core has a different decimal separator (comma:,) than on
other Distro (point .) which leads to enourmous calculation
differences on different distros.

on Debian based systems:

$ cat /etc/*release*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=8.04
DISTRIB_CODENAME=hardy
DISTRIB_DESCRIPTION="Ubuntu 8.04.1"
$ echo "12.345" | awk '{print $1+2}'
14.345
$ echo "12,345" | awk '{print $1+2}'
14
$


on Fedora Core:

$ cat /etc/*release*
Fedora Core release 6 (Zod)
Fedora Core release 6 (Zod)
$ echo "12.345" | awk '{print $1+2}'
14
$ echo "12,345" | awk '{print $1+2}'
14,345
$

BSD seems to behave like Debian based systems:
# uname
OpenBSD
# echo "12.345" | awk '{print $1+2}'
14.345
# echo "12,345" | awk '{print $1+2}'
14
#

Is this intended? Can someone please verify this?
TIA,

CVH

Juergen Kahrs

unread,
Jul 17, 2008, 5:59:12 AM7/17/08
to
cvh@LE wrote:

> I noticed an obscurity which took me some time to debug. It seems that
> AWK on Fedora Core has a different decimal separator (comma:,) than on
> other Distro (point .) which leads to enourmous calculation
> differences on different distros.

Which LOCALE did you use on each of these platforms ?
Which AWK did you use on each of these platforms ?

Try for example

locale
gawk --version

cvh@LE

unread,
Jul 17, 2008, 10:17:29 AM7/17/08
to
On Jul 17, 11:59 am, Juergen Kahrs <Juergen.KahrsDELETET...@vr-web.de>
wrote:

It's the same locale on both servers being: de_DE.UTF-8"

Jürgen Kahrs

unread,
Jul 17, 2008, 1:36:12 PM7/17/08
to
cvh@LE schrieb:

>> Try for example
>>
>> locale
>> gawk --version
>
> It's the same locale on both servers being: de_DE.UTF-8"

OK, and is it the same AWK implementation ?
Some AWK implementations ignore locale.
This could explain the different results that you got.

Hermann Peifer

unread,
Jul 17, 2008, 1:40:19 PM7/17/08
to

IIRC, my Ubuntu 7.10 release came only with mawk, out of the box.

peifer@LAPTOP7664:~$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

peifer@LAPTOP7664:~$ echo "12,345" | /usr/bin/mawk '{print $1+2}'
14

peifer@LAPTOP7664:~$ echo "12,345" | /usr/bin/gawk '{print $1+2}'
14,345

Hermann Peifer

unread,
Jul 17, 2008, 3:37:25 PM7/17/08
to

While experimenting a bit, I ended up with the following results:

peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.3/gawk '{print $1+2}'
14,345

peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.4/gawk '{print $1+2}'
14,345

peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.5/gawk '{print $1+2}'
14,345

peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.6/gawk '{print $1+2}'
14

This looks like a bug in gawk 3.1.6. Can somebody confirm before I send it to bug-...@gnu.org?

Hermann

Hermann Peifer

unread,
Jul 18, 2008, 8:13:10 AM7/18/08
to
Hermann Peifer wrote:
>
> While experimenting a bit, I ended up with the following results:
>
> peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.3/gawk '{print $1+2}'
> 14,345
>
> peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.4/gawk '{print $1+2}'
> 14,345
>
> peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.5/gawk '{print $1+2}'
> 14,345
>
> peifer@LAPTOP7664:~$ echo "12,345" | gawk-3.1.6/gawk '{print $1+2}'
> 14
>
> This looks like a bug in gawk 3.1.6. Can somebody confirm before I send
> it to bug-...@gnu.org?
>

This is a feature, not a bug. Arnold pointed me to the NEWS file for 3.1.6:

2. Too many people the world over have complained about gawk's use of the
locale's decimal point for parsing input data instead of the traditional
period. So, even though gawk was being nicely standards-compliant, in
a Triumph For The Users, gawk now only uses the locale's decimal point
if --posix is supplied or if POSIXLY_CORRECT is set. It is the sincere
hope that this change will eliminate this FAQ from being asked.

20. A new option, --use-lc-numeric, forces use of the locale's decimal
point without the rest of the draconian restrictions imposed by
--posix. This softens somewhat the stance taken in item #2.

0 new messages