On 8/2/2018 7:33 AM, Geoff Clare wrote:
<snip>
> Therefore I think Ed has a point - the gawk manual and behaviour
> do not match POSIX. However, this seems like a defect in POSIX
> to me, as other awk implementations behave the same way.
>
Agreed. Here's another related example of what POSIX says vs what actual awks do
and a contradiction in the POSIX standard:
POSIX says under "Variables and Special Variables":
---
Uninitialized variables, including scalar variables, array elements, and field
variables, shall have an uninitialized value. An uninitialized value shall have
both a numeric value of zero and a string value of the empty string.
---
So according to POSIX an uninitialized variable and an uninitialized field are
treated identically. OK, got it. But then you have how gawk (and other awks such
as BSD/OSX awk) actually behave:
Uninitialized variable:
$ awk 'BEGIN{print typeof(x), x, (x=="" ? "==" : "!="), typeof(""), ""}'
untyped == string
$ awk 'BEGIN{print typeof(x), x, (x==0 ? "==" : "!="), typeof(0), 0}'
untyped == number 0
Uninitialized field:
$ echo 'a b ' | awk '{print typeof($3), $3, ($3=="" ? "==" : "!="), typeof(""), ""}'
unassigned == string
$ echo 'a b ' | awk '{print typeof($3), $3, ($3==0 ? "==" : "!="), typeof(0), 0}'
unassigned != number 0
Note the difference between the two for the comparison against a number 0.
Now, how awk actually behaves makes sense wrt this other part of the standard
under "Expressions in awk":
---
Syntax | Name | Type of Result | Associativity
$expr | Field reference | String | N/A
...
A string value shall be considered a numeric string if it comes from one of the
following:
Field variables
and an implementation-dependent condition corresponding to either case (a) or
(b) below is met.
----
where case a is a call to strtod() and case be is seeing if the field value it
parses as a NUMERIC token.
So according to this part of the POSIX spec the type of a field, $expr, is
String and it can only become Numeric String if it satisfies the criteria in the
rest of that section (strtod() result or is a NUMERIC token). Therefore, since
"" is not a number by any of the stated criteria, an uninitialized field should
be a String with value "" and then the awk behavior above makes perfect sense.
If only that didn't contradict the required behavior given the quote at the
start of this post from the other part of the POSIX standard.
So now what? Can/should we get the POSIX spec changed to match actual awk
behavior? Maybe there's some awks out there that actually behave as POSIX says
they should in that first quote, idk. Do we need gawk to behave differently when
--posix is enabled?
Regards,
Ed.