Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Difficulty to use sensible line breaks in expressions

18 views
Skip to first unread message

Janis Papanagnou

unread,
Oct 12, 2022, 10:14:02 AM10/12/22
to
About the difficulty to use sensible line breaks in expressions,
without adding syntactically spurious escape characters.
(Note 1: The need for line breaks arise with longer expressions.)
(Note 2: Yes, we can use/add line-continuation/escape characters.)

1
2 function f (a,b) { }
3
4 {
5 # okay
6 if (f(a,b) < c + d) print a, b, c, d
7
8 # okay
9 if (f(a,b) < c + d) print a, b,
10 c, d
11
12 # okay
13 if (f(a,
14 b) < c + d) print a, b, c, d
15
16 # error
17 if (f(a, b) <
18 c + d) print a, b, c, d
19
20 # error
21 if (f(a,b) < c +
22 d) print a, b, c, d
23
24 # error
25 if (f(a,b) < c + d
26 ) print a, b, c, d
27
28 # okay
29 if (f(a,b) < c &&
30 d) print a, b, c, d
31
32 # okay
33 if (f(a,b) < (c &&
34 d)) print a, b, c, d
35
36 # error
37 if (f(a,b) < (c +
38 d)) print a, b, c, d
39 }

awk: awk-breaks:18: if (f(a, b) <
awk: awk-breaks:18: ^ unexpected newline or end of string
awk: awk-breaks:18: c + d) print a, b, c, d
awk: awk-breaks:18: ^ syntax error
awk: awk-breaks:22: if (f(a,b) < c +
awk: awk-breaks:22: ^ unexpected newline or end of string
awk: awk-breaks:26: if (f(a,b) < c + d
awk: awk-breaks:26: ^ unexpected newline or end of
string
awk: awk-breaks:38: if (f(a,b) < (c +
awk: awk-breaks:38: ^ unexpected newline or end of string
awk: awk-breaks:38: d)) print a, b, c, d
awk: awk-breaks:38: ^ syntax error
awk: awk-breaks:38: d)) print a, b, c, d
awk: awk-breaks:38: ^ syntax error
awk: awk-breaks:39: d)) print a, b, c, d
awk: awk-breaks:39: ^ unexpected
newline or end of string


Is throwing (some/any of) these syntax errors mandated by POSIX? - If
not, Awk variants, I suppose, could decide to implement semantically
sensible [valid] interpretations and remove existing inconsistencies?

Janis

Kaz Kylheku

unread,
Oct 12, 2022, 12:56:21 PM10/12/22
to
Newlines are significant in Awk, and appear as a token (the NEWLINE
token int the POSIX grammar).

Not all parts of the grammar recognize newline tokens, so they
cause a syntax error.

I think that would require that, for instance the phrase structure for
E + E would admit zero or more newline tokens on either side of the +,
which are ignored.

Or else, we have the parser communicate with the lexer, so that the
lexer makes newlines disappear and reappear in a syntax-directed way.

I suspect that this wouldn't be upstreamed into gawk.

I have a fork of gawk called egawk (enhanced gnu awk) where this
approach could be tried.

At certain points in the parser, we call
some function in the lexer which says "eat newlines; do not feed me
NEWLINE tokens", and at other points we re-enable newlines.

The lexer could do it itself; for instance if a '(' token is processed,
it may be okay to enable newline-eating until the matching ')',
which just requires a counter. So then line breaks would be allowed
in anything parenthesized, without disturbing their syntactic role
as alternative semicolon terminators.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Kaz Kylheku

unread,
Oct 12, 2022, 3:15:32 PM10/12/22
to
On 2022-10-12, Kaz Kylheku <864-11...@kylheku.com> wrote:
> I have a fork of gawk called egawk (enhanced gnu awk) where this
> approach could be tried.

I got it working very easily, at the proof of concept stage,
not having validated test cases and such:

Patched:

~/gawk$ ./gawk 'BEGIN {
if (x +
x == 0) { print "blah" } }'
blah

Stock distro gawk:

~/gawk$ gawk 'BEGIN {
if (x +
x == 0) { print "blah" } }'
gawk: cmd. line:3: if (x +
gawk: cmd. line:3: ^ unexpected newline or end of string
gawk: cmd. line:3: x == 0) { print "blah" } }
gawk: cmd. line:3: ^ syntax error


Patched, in --posix mode:

~/gawk$ ./gawk --posix 'BEGIN {
if (x +
x == 0) { print "blah" } }'
gawk: cmd. line:3: if (x +
gawk: cmd. line:3: ^ unexpected newline or end of string
gawk: cmd. line:3: x == 0) { print "blah" } }
gawk: cmd. line:3: ^ syntax error


Patch:

~/gawk$ git diff awkgram.y
diff --git a/awkgram.y b/awkgram.y
index fc35100d..c24e35c5 100644
--- a/awkgram.y
+++ b/awkgram.y
@@ -3911,6 +3911,13 @@ yylex(void)

case '\n':
sourceline++;
+ /*
+ * If not in POSIX mode, allow free-form newline in bracketed
+ * and parenthesized expressions, by swallowing '\n' rather than
+ * turning it into a NEWLINE token.
+ */
+ if (! do_posix && in_parens)
+ goto retry;
return lasttok = NEWLINE;

case '#': /* it's a comment */

Very easy; the lexer already counts parentheses, so nothing to do.

All of the above said and patched, note that you can use backslash
continuations, which is a bit ugly:

~/gawk$ gawk 'BEGIN {
if (x + \
x == 0) { print "blah" } }'
blah

So before trying to upstreaming, you need a convincing argument why
standard-conforming backslash-newline continuations aren't good enough.

Janis Papanagnou

unread,
Oct 12, 2022, 5:24:36 PM10/12/22
to
On 12.10.2022 21:15, Kaz Kylheku wrote:
>
> So before trying to upstreaming, you need a convincing argument why
> standard-conforming backslash-newline continuations aren't good enough.

I acknowledged line-continuation/escapes in my OP:
>> About the difficulty to use sensible line breaks in expressions,
>> without adding syntactically spurious escape characters.
...
>> (Note 2: Yes, we can use/add line-continuation/escape characters.)

It may be just me, but I consider line-continuation as a hack of the
last century or even of the 1960's (cf. the '+' symbol in column 1 of
punch cards, where THAT continuation has NOT the issues of invisible
whitespace characters after the '\' that we have at least since the
UNIX epoch). In the Awk language, because of its design, we have to
put certain things together on a line because of an otherwise changed
semantics; e.g. pattern { action } cannot be split before the
braces. In other places (see my OP-examples) it's syntactically and
semantically unnecessary. There's also inconsistencies (see examples
again) in expressions (with + vs. && to name just one).

But as you pointed out in your first post, the syntax is in POSIX, so
at least in POSIX mode it should behave standard conforming. (If the
POSIX syntax is "informational" only the valuation may change, though.)

In cases where fatal (syntax-)errors are [unnecessarily] produced,
though, I think that a more graceful/accommodating behavior would
not only add to readability, safety, and consistency, it might also
increase the attractivity for new users and acceptance by users (in
case anyone is concerned about such considerations).

That's all. I don't think that anything will change here. And I will
continue to write lengthy lines in Awk (where its syntax requires it)
and hope to not need looking into it again some time later, or check
(in case of bug tracking) whether any continuation will have a NL
immediately after it. And in 10 years when I will have forgot my post
I'll probably ask that question again.

Janis

PS: Thanks for your prove of concept and tests.

0 new messages