{print $0 ++s}

kk

unread,

Sep 28, 2017, 5:34:21 PM9/28/17

to

Hi, can someone explain the behavior of {print $0 ++s} which seems to
update $0? I was expecting it to behave as {print $0 (++s)}, but this
seems not to be the case.
Thanks in advance

Janis Papanagnou

unread,

Sep 28, 2017, 5:53:15 PM9/28/17

to

On 28.09.2017 23:34, kk wrote:
> Hi, can someone explain the behavior of {print $0 ++s} which seems to update
> $0? I was expecting it to behave as {print $0 (++s)}, but this seems not to be
> the case.

You have three tokens, $0, ++, s . Obviously, when parsing the expression
from left to right the parser associates the increment operator to $0. You
should be aware that by putting no space between ++ and s you don't change
the operator precedence or associativity. Use parenthesis (as you've done
above) to define another evaluation precedence.

Janis

> Thanks in advance

Janis Papanagnou

unread,

Sep 28, 2017, 5:56:17 PM9/28/17

to

Since you do string concatenation in print's argument you can alternatively
also write {print $0 "" ++s} to achieve what you want.

>
> Janis
>
>> Thanks in advance
>

kk

unread,

Sep 28, 2017, 6:23:15 PM9/28/17

to

Thanks Janis. I had a look at
http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html which
does state "Because the concatenation operation is represented by
adjacent expressions rather than an explicit operator, it is often
necessary to use parentheses to enforce the proper evaluation
precedence." However it also states that pre-increment is of higher
precedence than post-increment, which seems at odds with the left to
right parsing of a++b that results in ++ being treated as
post-increment, at least when a is an lvalue.

Janis Papanagnou

unread,

Sep 28, 2017, 6:33:25 PM9/28/17

to

Ah, now I see where you're coming from. I hadn't inspected the POSIX specs.
The classical definition did not distinguish precedence levels of pre- and
post-increment operators. So POSIX went beyond that. Now with POSIX in mind
one would expect that GNU awk would behave POSIX like with the respective
option -P activated, but (as I just tried) it doesn't. (A bug?)

All that said, I can just suggest to stay with the conservative definition
and use parenthesis or the empty string to be on the safe side and maintain
compatibility where necessary.

Janis

Janis Papanagnou

unread,

Sep 28, 2017, 6:50:55 PM9/28/17

to

Wait. One additional note. We also have to consider that there's the invisible
string concatenation operator in your expression $0 ++ s , so we cannot just
assume that there's no concatenation when determining the operator precedence.
We could see it either as $0 <cat> ++ s or as $0 ++ <cat> s .
When the parser reads the ++ it doesn't assume that there's a <cat> to be
assumed before it. Maybe it's no bug but yet another bad effect that stems
from the "invisible" concatenation operator, and how the rules in the parser
are defined when to assume an existing concatenation in such expressions.

Janis

Kaz Kylheku

unread,

Sep 28, 2017, 10:03:18 PM9/28/17

to

Given the syntax a ++ b, it's not a simple ambiguity between two
operators.

There is an ambiguity about where we perceive the invisible catenation
operator to be, and based on that interpretation, ++ changes between two
different categories: prefix and postfix.

Interpretation 1: a <cat> ++ b --> ++ b prefix increment.

Interpretation 2: a ++ <cat> b --> a ++ postfix increment.

Where <cat> is an invisible operator not represented by a token,
but emerges via grammar rules which deal with the juxtaposition
of expressions directly.

This is not something where the disambiguating resolution can be
understood in terms of the simplifying concept of operator precedence,
since between the two interpretations, the ++ token jumps onto different
operands and becomes a different operator, and since the fictional <cat>
operator cannot be assigned a precedence at all.

The simplest treatment in a LALR(1) parser is that whenever the
parser has scanned an expression (it has a complete expression in
its stack that could be reduced) and the next token is ++, it should
just "shift" that ++ and keep scanning, rather than "reduce" and
try to deal with catenation.

The latter arrangement could be made, but I think it would complicate
the grammar quite severely. Basically the postfix operatorsl like ++
would have to be moved to a different expression "level" above
catenation or something like that.

I think Janis' left-to-right parsing remark reflects this intuition
that the natural parse is to shift these postfix operators (and any
other token), so that catenation is recognized only when some token
appears that cannot continue the current expression and *can* begin a
new expression.

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1

Kaz Kylheku

unread,

Sep 28, 2017, 10:09:58 PM9/28/17

to

This is firstly pointless because an expression like ++a++ is useless.

Secondly, it appears backwards. All unary operators should have a lower
precedence than all postfix operators.

Experiments with GNU Awk show that expressions like -x++ and ++a[42]
follow this principle: they are treated as -(x++) and ++(a[42]).

So for --a++, nonsense though it may be, to be treated as (--a)++
would go against this general pattern of unary lower than postfix.
FWIF, gawk reports this as "syntax error". If the combination is a
syntax error, then the relative precedence of postfix and unary ++ is
rather moot; it only informs us about the detail of the error: whether
the ++ or -- is being semantically mis-applied.

The a ++ b situation cannot be resolved by the relative precedence of
postfix and unary ++, because they do not both occur simultaneously;
we have to resolve the ambiguity first, and only then do we know whether
the ++ is unary or postfix.

Kaz Kylheku

unread,

Sep 28, 2017, 10:22:13 PM9/28/17

to

On 2017-09-28, Janis Papanagnou <janis_pa...@hotmail.com> wrote:

A LALR(1) type parser like what is used in Awks (One True Awk and GNU
Awk have Yacc files) will simply not have the <cat> token as an explicit
concept at all. So indeed "when the parser reads the ++ it doesn't
assume that there's a <cat> to be assumed before it". Not in the
straightforward way of setting up the grammar.

Now we might be able to effectively obtain that behavior, but it would
complicated the grammar substantially. Basically we have to remove the
postfix ++ operator from certain kinds of expressions. So then when the
parser has seen the $0, and the next token is ++, it knows that the ++
cannot belong to the kind of expression it has just scanned --- and at
the same time, the ++ can legally start another expression of that same
type which is allowed to clump sequentially. Then in this grammar, we
re-introduce postfix operators on some higher expression level.

I forsee numerous difficulties in this, given that there is only one
token of lookahead.

We might have to resort to a trick whereby a ++ which is not followed
by anything (end of file, or a closing parenthesis or statement
semicolon or whatever) is treated as a special syntactic category,
say "dangling_postfix". And then we have a rule that an expression can
be followed by a "dangling_postfix", which generates a postfix
expression. Thus:

a ++ b ++ c ++

gets parsed as

a ++ b ++ c ++
- catexpr

catexpr ++ b ++ c ++
---- unary_expr

catexpr unary_expr ++ c ++
------------------ catexpr

catexpr ++ c ++
------------ catexpr (by unary_expr again)

catexpr ++
-- dangling_postfix

catexpr dangling_postfix
------------------------ postfix_expr

postfix_expr

Something like that. It's quite a bit of complication compared to
the straightforward "++ just continues the expression we have now".

Geoff Clare

unread,

Sep 29, 2017, 8:41:05 AM9/29/17

to

Janis Papanagnou wrote:

> On 29.09.2017 00:23, kk wrote:
>> Thanks Janis. I had a look at
>> http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html which does
>> state "Because the concatenation operation is represented by adjacent
>> expressions rather than an explicit operator, it is often necessary to use
>> parentheses to enforce the proper evaluation precedence." However it also
>> states that pre-increment is of higher precedence than post-increment, which
>> seems at odds with the left to right parsing of a++b that results in ++ being
>> treated as post-increment, at least when a is an lvalue.
>
> Ah, now I see where you're coming from. I hadn't inspected the POSIX specs.
> The classical definition did not distinguish precedence levels of pre- and
> post-increment operators. So POSIX went beyond that.

No, it didn't, at least not in SUSv3/POSIX.1-2001 which is what the URL
above relates to. In the actual (PDF) standard, pre- and post-increment
have equal precedence, but the HTML translation does not format the table
correctly (putting horizontal lines between all rows instead of only where
the precedence changes).

However, in SUSv4/POSIX.1-2008 (follow the link in "A newer edition of
this document exists _here_" at the top of the page), the table now shows
groupings *but* it has a bogus extra line between the pre- and post-
operators that I'm sure shouldn't be there! I will need to track down
where that came from so that I can report it as a defect.

--
Geoff Clare <net...@gclare.org.uk>

Geoff Clare

unread,

Sep 29, 2017, 9:41:04 AM9/29/17

to

Okay, I found where it came from, and I now see that something I missed
before is that the order was switched so that the post- operators have
higher precedence than the pre- operators. Thus the point of the change
was to specify that a++b is handled as (a++)b whereas previously it was
ambiguous.

--
Geoff Clare <net...@gclare.org.uk>