When is ARGC incremented in ARGV[ARGC++]="foo"?

Ed Morton

unread,

Jun 10, 2020, 1:11:52 PM6/10/20

to

When adding a file name "foo" to the end of ARGV I used to use:

BEGIN {
ARGV[ARGC++]="foo"
}

and then a long time ago (e.g. maybe 15-20 years ago!) someone persuaded
me (with standards invocations IIRC) that some awks might evaluate
ARGC++ before doing the assignment so the above could be interpreted as
if I had written:

BEGIN {
ARGC++
ARGV[ARGC]="foo"

}

so I started writing this explicitly instead:

BEGIN {
ARGV[ARGC]="foo"
ARGC++
}

Now I'm wondering if I'm mis-remembering or if that previous advice was
wrong or out of date and in my original code block above the ARGC
increment in a POSIX awk today actually is guaranteed to always happen
after the assignment to "foo". Anyone know **for sure**? An applicable
standards quote would be appreciated if possible for my own peace of mind.

Ed.

Janis Papanagnou

unread,

Jun 10, 2020, 2:29:07 PM6/10/20

to

On 10.06.2020 19:11, Ed Morton wrote:
> When adding a file name "foo" to the end of ARGV I used to use:
>
> BEGIN {
> ARGV[ARGC++]="foo"
> }
>
> and then a long time ago (e.g. maybe 15-20 years ago!) someone persuaded me
> (with standards invocations IIRC) that some awks might evaluate ARGC++ before
> doing the assignment so the above could be interpreted as if I had written:

I'm not quite sure what you mean when you are above talking about "standards
invocations" in context of "some awks". (Please clarify, if necessary, but
see below.)

> BEGIN {
> ARGC++
> ARGV[ARGC]="foo"
>
> }
>
> so I started writing this explicitly instead:
>
> BEGIN {
> ARGV[ARGC]="foo"
> ARGC++
> }
>
> Now I'm wondering if I'm mis-remembering or if that previous advice was wrong
> or out of date and in my original code block above the ARGC increment in a
> POSIX awk today actually is guaranteed to always happen after the assignment
> to "foo". Anyone know **for sure**? An applicable standards quote would be
> appreciated if possible for my own peace of mind.

I would suggest to consider the Awk standard and in addition the original (or
rather the "new" version from 1985) awk's documentation. You have access to
the standard. And the latter (from W., K., and A.'s book) already documented
the well known (and expected) behaviour:

"The prefix form ++n increments n before delivering its value;
the postfix form n++ increments n after delivering its value."

and I would be surprised if any awk would deviate from that behaviour. (But
if so then I'd ignore such an IMO broken awk.)

HTH.

Janis

>
> Ed.

Kaz Kylheku

unread,

Jun 10, 2020, 2:38:32 PM6/10/20

to

On 2020-06-10, Ed Morton <morto...@gmail.com> wrote:
> When adding a file name "foo" to the end of ARGV I used to use:
>
> BEGIN {
> ARGV[ARGC++]="foo"
> }
>
> and then a long time ago (e.g. maybe 15-20 years ago!) someone persuaded
> me (with standards invocations IIRC) that some awks might evaluate
> ARGC++ before doing the assignment so the above could be interpreted as
> if I had written:
>
> BEGIN {
> ARGC++
> ARGV[ARGC]="foo"
>
> }

If so, that is broken. ARGC++ returns the previous value, the end.

According to POSIX, Awk expressions fall under a section called "1.1.2
Concepts Derived from the ISO C Standard" which basically says that all
utilities that implement C expressions use the ISO C semantics for that.

This means that all the idiocy from C, like i = i++ being undefined,
is inherited by POSIX scripting languages.

In C, the expression ARGV[ARGC++] = "foo" is well-defined (if all
input operands are valid in every relevant way, of course).

Where you run into a problem is if you do things like this:

# duplicate last arg

ARGV[ARGC++] = ARGV[ARGC]

We don't know whether the access to ARGC on the right samples the new
value or the old value (or even some in-between partially updated
garbage value).

> increment in a POSIX awk today actually is guaranteed to always happen
> after the assignment to "foo". Anyone know **for sure**? An applicable
> standards quote would be appreciated if possible for my own peace of mind.

We don't know when the update of ARGC happens, but the overall
assignment expression has no dependency on that; it evaluates ARGC++,
which is required to yield the prior value, and doesn't do anything
else with ARGC to interfere with that.

The assignment into the ARGV[] array itself doesn't do anything with
ARGC. So that is to say, if ARGC is 1 and we do

ARGV[99] = "foo"

then ARGC does not automagically jump to 100.

Assignments to nonexistent fields change NF, so with fields we can
create trouble of the following sort, and its ilk:

# Suppose NF is initially 1: there is one field.

$(++NF + 1) = "foo"

What fields now exist, with what values, and what is NF?

++NF yields 2 and so $(1 + 2), namely $3, ends up with "foo".

But we have two competing modifications of NF in the same expression:
The assignment to $(3) wants to implicitly replace NF with 3.
The ++NF increment wants to replace the old value 1 with 2.
This cannot be well-defined.

Ed Morton

unread,

Jun 10, 2020, 3:00:10 PM6/10/20

to

OK, I'm convinced that `ARGV[ARGC++]="foo"` is safe and I think the
unsafe case Kaz mentioned of `ARGV[ARGC++]=ARGV[ARGC]` is what I was
actually thinking of.

Thanks for the responses.

Ed.

Manuel Collado

unread,

Jun 11, 2020, 4:32:01 AM6/11/20

to

El 10/06/2020 a las 20:38, Kaz Kylheku escribió:
> [...]

> Assignments to nonexistent fields change NF, so with fields we can
> create trouble of the following sort, and its ilk:
>
> # Suppose NF is initially 1: there is one field.
>
> $(++NF + 1) = "foo"
>
> What fields now exist, with what values, and what is NF?
>
> ++NF yields 2 and so $(1 + 2), namely $3, ends up with "foo".
>
> But we have two competing modifications of NF in the same expression:
> The assignment to $(3) wants to implicitly replace NF with 3.
> The ++NF increment wants to replace the old value 1 with 2.
> This cannot be well-defined.

Why not? In imperative programming a basic rule is that function
arguments are evaluated before calling the function. And operators are
in fact functions, with a more eye-catching syntax. Rewriting the given
sentence in a functional style we have:

assign(field(add(incr(NF),1)),"foo")

And the strict evaluation order is

++ + $ =

So the assignment is certainly evaluated after the ++NF increment.

Am I missing something?
--
Manuel Collado - http://mcollado.z15.es

Kaz Kylheku

unread,

Jun 11, 2020, 11:53:32 AM6/11/20

to

On 2020-06-11, Manuel Collado <m-co...@users.sourceforge.net> wrote:
> El 10/06/2020 a las 20:38, Kaz Kylheku escribió:
>> [...]
>> Assignments to nonexistent fields change NF, so with fields we can
>> create trouble of the following sort, and its ilk:
>>
>> # Suppose NF is initially 1: there is one field.
>>
>> $(++NF + 1) = "foo"
>>
>> What fields now exist, with what values, and what is NF?
>>
>> ++NF yields 2 and so $(1 + 2), namely $3, ends up with "foo".
>>
>> But we have two competing modifications of NF in the same expression:
>> The assignment to $(3) wants to implicitly replace NF with 3.
>> The ++NF increment wants to replace the old value 1 with 2.
>> This cannot be well-defined.
>
> Why not? In imperative programming a basic rule is that function
> arguments are evaluated before calling the function.

I don't see a function there, just special syntax. we don't know when
the ++NF side effect completes in relation to the $(...) assignment
updating NF.

> And operators are
> in fact functions, with a more eye-catching syntax.

We do not have this sort of assurance in Awk and C.

C certainly doesn't say anything such as that the + operator is actually
a function with syntactic sugar; if that were the case, a sequence point
would have to occur before the call to the function.

In C++, a use of + is a function if it resolves to one via overloading,
otherwise it isn't.

Not even in Lisp, which has special operators, but they do not have an
eye-catching syntax at all; you can't always tell them apart from
functions by that alone.

> Rewriting the given
> sentence in a functional style we have:
>
> assign(field(add(incr(NF),1)),"foo")

I hope you don't mean to insinuate that incr(NF) is a function just
because you've made it resemble one using f(x) notation. It has to
operate on the identifier NF, rather than its value!

But, yes, if we think of this as C, *if* assign is an actual function
and not a macro expanding into code, then there is a sequence point
before the function is invoked, whereupon the ++NF side effect is
settled. Thus the function sees the new value of NF reliably and updates
that.

> And the strict evaluation order is

> ++ + $ =

The + operation certainly cannot proceed until it has both its operands,
one of which is the value from the ++ operator. So the ++ operation has
to be carried to sufficient completion to yield that value. However,
the side effect of the ++ can occur anywhere between the previous and
next sequence point.

If add is really a function, then the same reasoning as above applies; a
sequence point occurs after the evaluation of the arguments just before
the function call. + doesn't correspond to an add function though. It
isn't specified that way in C, and the Awk documentation in POSIX defers
to ISO C in these matters.

In Lisp terminology, the C + is a "special operator". (Or, rather,
a special token in the read syntax which denotes an internal special
operator). In a "Lispified" C, we wouldn't see a visual difference.
Whether a function or special operator, the invocation of + would look
like (+ a b). Our best bet would be to check the refernce manual.
There would be other clues, like there not being a function binding
for +, such that attempts to indirect upon + like (apply + 1 2) would
fail. Sneakily, apply could be an operator which makes it work, but
unlikely for (let ((fun +)) (apply fun 1 2)).

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1