Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GAWK: When you really, really need to exit() now...

251 views
Skip to first unread message

Kenny McCormack

unread,
Nov 29, 2014, 11:13:46 AM11/29/14
to
As we all know, when you execute the "exit" command in AWK, it doesn't
really exit your program. Instead, it transfers control to your END
block(s) and those get executed. Essentially, "exit" means "act as if
you've read all the lines of all the input files". This is sort of a
mis-feature, but of course, it is the way AWK is and certainly can't be
changed.

However, sometimes, you really need to exit NOW - and skip any END block
processing. Usually, this is when some error condition is detected, that
requires an immediate abort after printing an error message. As it turns
out, TAWK has this covered, as they have an "abort()" function that does
exactly this. AFAIK (& CMIIW), there is no equivalent functionality in
GAWK. Needless to say, I think there should be. Consider this post to be
a feature request. Thank you.

Note, BTW, that this problem is particularly acute if one is writing
library code, since the library code doesn't know if the user has a END
clause or not. See the example below for how this is handled by the
GAWK-supplied library function "assert".

Currently available workarounds (in GAWK) include:

1) Set a flag and check that in a specially supplied END block (that
you assume/hope will be called before any user-written END block)

Here is "assert.awk" from the GAWK distribution:

--- Code ---
# assert --- assert that a condition is true. Otherwise exit.

#
# Arnold Robbins, arn...@skeeve.com, Public Domain
# May, 1993

function assert(condition, string)
{
if (! condition) {
printf("%s:%d: assertion failed: %s\n",
FILENAME, FNR, string) > "/dev/stderr"
_assert_exit = 1
exit 1
}
}

END {
if (_assert_exit)
exit 1
}
--- Code ---

2) Call a function you know doesn't exist. This is a hack that I've
been using in GAWK for decades now. It works and it makes it very
clear that something has gone horribly wrong. Conveniently, you
can use abort(), which exists in TAWK, but not in any other AWK (to
my knowledge). It gets the job done in any event. Observe:

% gawk4 'BEGIN { abort() }';echo $status
gawk4: cmd. line:1: fatal: function `abort' not defined
2
%

3) Use the "call_any" library to call the system "exit" function. This
also gets the job done. Observe:

% gawk4 'BEGIN { print "Start";exit 1;print "Done"}END {print "in END block"}' ; echo $status
Start
in END block
1

% gawk4 -l call_any 'BEGIN { print "Start";call_any("ii","exit",1);print "Done"}END {print "in END block"}' ; echo $status
Start
1

--
There are two kinds of Republicans: Billionaires and suckers.
Republicans: Please check your bank account and decide which one is you.

Ed Morton

unread,
Nov 29, 2014, 2:48:13 PM11/29/14
to
On 11/29/2014 10:13 AM, Kenny McCormack wrote:
> As we all know, when you execute the "exit" command in AWK, it doesn't
> really exit your program. Instead, it transfers control to your END
> block(s) and those get executed. Essentially, "exit" means "act as if
> you've read all the lines of all the input files". This is sort of a
> mis-feature, but of course, it is the way AWK is and certainly can't be
> changed.
>
> However, sometimes, you really need to exit NOW - and skip any END block
> processing. Usually, this is when some error condition is detected, that
> requires an immediate abort after printing an error message. As it turns
> out, TAWK has this covered, as they have an "abort()" function that does
> exactly this. AFAIK (& CMIIW), there is no equivalent functionality in
> GAWK. Needless to say, I think there should be. Consider this post to be
> a feature request. Thank you.

This inversion of control would be about as good an idea as writing C++ code to
force skipping the class destructor. Having said that, if you REALLY want to get
out of your script without executing the END section, just add a line that
divides by zero or calls a function that you didnt define or produces some other
fatal error:

$ cat file
a
b
c
d

$ awk '{x=$0; print} $0=="e"{print "ERROR: the sky is falling"; abort()}
END{print "x="x}' file
a
b
c
d
x=d

$ awk '{x=$0; print} $0=="b"{print "ERROR: the sky is falling"; abort()}
END{print "x=",x}' file
a
b
ERROR: the sky is falling
awk: cmd. line:1: (FILENAME=file FNR=2) fatal: function `abort' not defined

Ed.

Kenny McCormack

unread,
Nov 30, 2014, 8:41:17 AM11/30/14
to
In article <m5d7tb$ift$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...
>This inversion of control would be about as good an idea as writing C++
>code to force skipping the class destructor. Having said that, if you
>REALLY want to get out of your script without executing the END section,
>just add a line that divides by zero or calls a function that you didnt
>define or produces some other fatal error:

Reading your post, I suspect that whatever newsreader you are using
truncated some (most?) of the OP, since a lot of your post duplicates
content already present in the OP. You might want to go back and re-read
it, since I had already suggested using abort().

That said, I like your idea of dividing by zero. Nice!
Cleaner than calling an undefined function.

Also, regarding the implication that not executing the END block is somehow
"unclean" and/or a bad idea in general. The problem here is that
traditionally, the END block is used to print out totals and averages -
that is, a summary of the data processed in the pattern/action space.
Well, obviously, if something goes wrong (say, the wrong number of
arguments is passed [i.e., ARGC != whatitshouldbe] and this is detected in
the BEGIN block), then you don't want any totals or averages displayed.

Now, having said that, note that TAWK again has this issue covered. In
TAWK, there are two additional block types - INIT, which is a generalized
version of BEGIN - and TERM, which is a generalized version of END. As it
happens, I've never used either of these (except for trivial
experimentation), but they sound good from what I've read. In particular,
the idea of TERM is that it gets executed regardless of how the program
ends [*]. In particular, if you call the abort() function, TERM is still
executed. Even if you exit via hitting the Interrupt key (e.g., ^C), TERM
is executed. So, if you had some cleanup that absolutely had to get done,
regardless of how the program exits, you would put it in TERM. In fact,
the documentation suggests that you should actually be using TERM most of
the time, and that END is only for "specialized" situations [**].

[*] Well, modulo some extreme cases, I suppose, like, say, having the power
removed.

[**] No doubt because of the weirdness associated with the way END works
(as documented in this thread).

--
b w r w g y b r y b

Kaz Kylheku

unread,
Nov 30, 2014, 11:07:45 AM11/30/14
to
On 2014-11-29, Ed Morton <morto...@gmail.com> wrote:
> On 11/29/2014 10:13 AM, Kenny McCormack wrote:
>> However, sometimes, you really need to exit NOW - and skip any END block
>> processing. Usually, this is when some error condition is detected, that
>> requires an immediate abort after printing an error message. As it turns
>> out, TAWK has this covered, as they have an "abort()" function that does
>> exactly this. AFAIK (& CMIIW), there is no equivalent functionality in
>> GAWK. Needless to say, I think there should be. Consider this post to be
>> a feature request. Thank you.
>
> This inversion of control would be about as good an idea as writing C++ code
> to force skipping the class destructor.

Sure, if we overlook that the C exit function is available in C++.

Joe User

unread,
Nov 30, 2014, 4:57:38 PM11/30/14
to
On Sat, 29 Nov 2014 16:13:44 +0000, Kenny McCormack wrote:

> As we all know, when you execute the "exit" command in AWK, it doesn't
> really exit your program. Instead, it transfers control to your END
> block(s) and those get executed. Essentially, "exit" means "act as if
> you've read all the lines of all the input files". This is sort of a
> mis-feature, but of course, it is the way AWK is and certainly can't be
> changed.

I think you hit upon the right answer, by coding an exit statement into
the FIRST END block.

Maybe that trick should be added to the gawk man page.

At the top of your code, before any END blocks, include these untested
lines:

END {if (EXITnow) exit;}
function exit_now() { EXITnow=1; exit }

There's a problem if END blocks are in included files. Then, you have to
make sure that the above code is in the FIRST included awk file.

It's pretty straightforward, even if it could be better documented in the
man page.

--
We tend to scoff at the beliefs of the
ancients. But we can't scoff at them personally,
to their faces, and this is what annoys me.

-- Jack Handey

Ed Morton

unread,
Nov 30, 2014, 5:43:30 PM11/30/14
to
On 11/30/2014 7:41 AM, Kenny McCormack wrote:
> In article <m5d7tb$ift$1...@dont-email.me>,
> Ed Morton <morto...@gmail.com> wrote:
> ...
>> This inversion of control would be about as good an idea as writing C++
>> code to force skipping the class destructor. Having said that, if you
>> REALLY want to get out of your script without executing the END section,
>> just add a line that divides by zero or calls a function that you didnt
>> define or produces some other fatal error:
>
> Reading your post, I suspect that whatever newsreader you are using
> truncated some (most?) of the OP, since a lot of your post duplicates
> content already present in the OP. You might want to go back and re-read
> it, since I had already suggested using abort().

You're right, but it wasn't my newsreader, I tend to ignore most of what you
post and in this case I missed something relevant. My apologies.

> That said, I like your idea of dividing by zero. Nice!
> Cleaner than calling an undefined function.
>
> Also, regarding the implication that not executing the END block is somehow
> "unclean" and/or a bad idea in general. The problem here is that
> traditionally, the END block is used to print out totals and averages -
> that is, a summary of the data processed in the pattern/action space.
> Well, obviously, if something goes wrong (say, the wrong number of
> arguments is passed [i.e., ARGC != whatitshouldbe] and this is detected in
> the BEGIN block), then you don't want any totals or averages displayed.

If some condition can occur that is so severe it can cause you to want to
completely deviate from the normal processing of your script then it is
important and so is more than deserving of some variable to be set and a couple
of lines of code in the END section to process it. That way you don't have to
wonder when reading the END section of a script (e.g. debugging a large script
someone else wrote) if there's some other section of the code that is sneakily
short-circuiting the normal control flow and throwing you out of the script
early and if you need to, for example, print debugging info every time a script
ends you don't have to go looking through the whole script for places to add
that print.

Ed.

Ed Morton

unread,
Nov 30, 2014, 5:45:40 PM11/30/14
to
I didn't say you couldn't do it, I said you shouldn't do it.
0 new messages