Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GAWK: A fix for "missing file is a fatal error"

508 views
Skip to first unread message

Kenny McCormack

unread,
Aug 20, 2008, 8:48:06 AM8/20/08
to
In one of my scripts, I found that in GAWK, if a file is missing (can't
be opened), GAWK bombs with a fatal error. I have a vague memory of there
being some discussion of this issue in this group at some point in the
distant past, and the consensus was that there wasn't anything you could
do about it.

Note to standards jockeys: No, this isn't a bug in your precious GAWK in
the usual "standards" sense. So, don't even bother.

The fact is that, under certain conditions, it *is* a mis-feature, and it
would be nice to at least have the option of continuing. I note in
passing that TAWK handles this rather better - you get a warning about a
missing file, but the script continues. Ideal, of course, would be a
settable option, so you can select the behavior that you want.

Note: I am talking about files read in the "automatic input loop", not
via "getline".

Obviously one solution would be to hack (fix) the GAWK source code and
recompile, but that is inconvenient for me (due to some reasons beyond
the scope of this document). So, I elected to fix it via an "interposer".
See below.

This solution works for me under Linux - you may need to adjust
accordingly for your environment.

$ cat open_fix.c
/* A lib to fix the GAWK missing files problem */
/* Usage: export LD_PRELOAD=/path/to/this/lib */
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

static int (*myopen64) (const char *,int);

int open64(const char *path, int flags, ...) {
int ret;

if (!myopen64)
myopen64 = (int (*)(const char *,int)) dlsym(RTLD_NEXT,"open64");
ret = myopen64(path,flags);
return ret != -1 ? ret : myopen64("/dev/null",flags);
}
$ gcc -fPIC -W -Wall -Werror -c open_fix.c
$ ld -G -h libopen_fix.so.1 -ldl -o libopen_fix.so open_fix.o
$ LD_PRELOAD=./libopen_fix.so gawk '{print FILENAME,$0}' goodfile badfile goodfile1

Have fun!

pk

unread,
Aug 20, 2008, 10:01:15 AM8/20/08
to
On Wednesday 20 August 2008 14:48, Kenny McCormack wrote:

> Obviously one solution would be to hack (fix) the GAWK source code and
> recompile, but that is inconvenient for me (due to some reasons beyond
> the scope of this document). So, I elected to fix it via an "interposer".

What's wrong with checking if the file exists like this:

awk '{print FILENAME,$0}' "$( [ -f file ]&&echo file||echo /dev/null )" etc.

(except the fact that one usually doesn't post to that a newsgroup)

Janis Papanagnou

unread,
Aug 20, 2008, 10:15:47 AM8/20/08
to
pk wrote:
> On Wednesday 20 August 2008 14:48, Kenny McCormack wrote:
>
>
>>Obviously one solution would be to hack (fix) the GAWK source code and
>>recompile, but that is inconvenient for me (due to some reasons beyond
>>the scope of this document). So, I elected to fix it via an "interposer".
>
>
> What's wrong with checking if the file exists like this:
>
> awk '{print FILENAME,$0}' "$( [ -f file ]&&echo file||echo /dev/null )" etc.

Maybe because that would be impractical if you're processing many file
arguments...?

awk '{print FILENAME,$0}' prefix*.ext


Janis

Janis Papanagnou

unread,
Aug 20, 2008, 10:16:59 AM8/20/08
to
Janis Papanagnou wrote:
> pk wrote:
>
>> On Wednesday 20 August 2008 14:48, Kenny McCormack wrote:
>>
>>
>>> Obviously one solution would be to hack (fix) the GAWK source code and
>>> recompile, but that is inconvenient for me (due to some reasons beyond
>>> the scope of this document). So, I elected to fix it via an
>>> "interposer".
>>
>>
>>
>> What's wrong with checking if the file exists like this:
>>
>> awk '{print FILENAME,$0}' "$( [ -f file ]&&echo file||echo /dev/null
>> )" etc.
>
>
> Maybe because that would be impractical if you're processing many file
> arguments...?
>
> awk '{print FILENAME,$0}' prefix*.ext

Oops, makes not much sense with wildcards.

Kenny McCormack

unread,
Aug 20, 2008, 10:17:21 AM8/20/08
to
>(except the fact that one usually doesn't post that to a newsgroup)
>

1) The above is ugly.
2) The above is ugly.
3) It doesn't scale. My issue is that I have many, many input files and
I'd hate to have to code that kludge in a loop.
4) The issue is that the files can appear or disappear at any time - so
there is a race condition going on - you can't really rely on the above
to work.
5) I was going to point out that the above is shell, so OT, but then again,
I suppose C is OT as well. But not quite so much OT as shell is.

Anyway, it works for me, and that's the important thing.
I have always thought that it is better to enhance (fix) the language,
then to kludge around it.

pk

unread,
Aug 20, 2008, 10:33:26 AM8/20/08
to
On Wednesday 20 August 2008 16:17, Kenny McCormack wrote:

> In article <g8h7u1$4pq$1...@aioe.org>, pk <p...@pk.invalid> wrote:
>>On Wednesday 20 August 2008 14:48, Kenny McCormack wrote:
>>
>>> Obviously one solution would be to hack (fix) the GAWK source code and
>>> recompile, but that is inconvenient for me (due to some reasons beyond
>>> the scope of this document). So, I elected to fix it via an
>>> "interposer".
>>
>>What's wrong with checking if the file exists like this:
>>
>>awk '{print FILENAME,$0}' "$( [ -f file ]&&echo file||echo /dev/null )"
>>etc.
>>
>>(except the fact that one usually doesn't post that to a newsgroup)
>>
>
> 1) The above is ugly.
> 2) The above is ugly.

Very good technical reasons.

> 3) It doesn't scale. My issue is that I have many, many input files and
> I'd hate to have to code that kludge in a loop.

You still have to code *your* kludge.

> 4) The issue is that the files can appear or disappear at any time - so
> there is a race condition going on - you can't really rely on the
> above to work.

This is a good reason (which you didn't mention before).

> 5) I was going to point out that the above is shell, so OT, but then
> again, I suppose C is OT as well. But not quite so much OT as shell is.

Yes, of course you are the one who decides that, I had forgot it.

pk

unread,
Aug 20, 2008, 10:41:27 AM8/20/08
to
On Wednesday 20 August 2008 16:16, Janis Papanagnou wrote:

>>> What's wrong with checking if the file exists like this:
>>>
>>> awk '{print FILENAME,$0}' "$( [ -f file ]&&echo file||echo /dev/null
>>> )" etc.
>>
>>
>> Maybe because that would be impractical if you're processing many file
>> arguments...?
>>
>> awk '{print FILENAME,$0}' prefix*.ext
>
> Oops, makes not much sense with wildcards.

You're right, you need even more ugly kludges in that case, while the
interposer works fine because it only sees the filenames as expanded by the
shell.

pk

unread,
Aug 20, 2008, 10:43:16 AM8/20/08
to
On Wednesday 20 August 2008 16:16, Janis Papanagnou wrote:

>> Maybe because that would be impractical if you're processing many file
>> arguments...?
>>
>> awk '{print FILENAME,$0}' prefix*.ext
>
> Oops, makes not much sense with wildcards.

In that case the shell does all the work and the resulting file list
contains only files that actually exist. If we want to be picky, there's
still the race condition problem between the moment the shell expands the
list and awk tries to open each file.

Kenny McCormack

unread,
Aug 20, 2008, 11:35:26 AM8/20/08
to
In article <g8h9qd$df1$1...@aioe.org>, pk <p...@pk.invalid> wrote:
...

>> 1) The above is ugly.
>> 2) The above is ugly.
>
>Very good technical reasons.

Indeed.

>> 3) It doesn't scale. My issue is that I have many, many input files and
>> I'd hate to have to code that kludge in a loop.
>
>You still have to code *your* kludge.

One man's kludge is another man's thing of beauty.

>> 4) The issue is that the files can appear or disappear at any time - so
>> there is a race condition going on - you can't really rely on the
>> above to work.
>
>This is a good reason (which you didn't mention before).

Yes. In fact, that's the real problem - the race condition between when
the shell expands the filenames and when AWK gets around to reading them.

By the way, my input file specification is: /proc/*/cmdline

>> 5) I was going to point out that the above is shell, so OT, but then
>> again, I suppose C is OT as well. But not quite so much OT as shell is.
>
>Yes, of course you are the one who decides that, I had forgot it.

Yes. I am the boss here. And don't nobody be forgettin' it!

Janis Papanagnou

unread,
Aug 20, 2008, 1:57:24 PM8/20/08
to
pk wrote:
> On Wednesday 20 August 2008 16:16, Janis Papanagnou wrote:
>
>>>Maybe because that would be impractical if you're processing many file
>>>arguments...?
>>>
>>> awk '{print FILENAME,$0}' prefix*.ext
>>
>>Oops, makes not much sense with wildcards.
>
> In that case the shell does all the work and the resulting file list
> contains only files that actually exist.

Yes, that's why I cancelled my original message and added this comment.
I think it's still a Good Thing to let an "invisible" layer handle that
instead of using explicit workarounds for each of the given files and
avoiding "non-scalable" (as Kenny called it) shell constructs, which was
the intention introducing my wildcard example in the first place to show
the problem that arises with many file arguments.

Janis

Kenny McCormack

unread,
Aug 20, 2008, 2:06:11 PM8/20/08
to
In article <g8hlu5$hvb$1...@online.de>,

Good post. Thanks.

I like your concept of an "invisible layer".

Ed Morton

unread,
Aug 20, 2008, 6:08:16 PM8/20/08
to
On 8/20/2008 7:48 AM, Kenny McCormack wrote:
> In one of my scripts, I found that in GAWK, if a file is missing (can't
> be opened), GAWK bombs with a fatal error. I have a vague memory of there
> being some discussion of this issue in this group at some point in the
> distant past, and the consensus was that there wasn't anything you could
> do about it.
>
> Note to standards jockeys: No, this isn't a bug in your precious GAWK in
> the usual "standards" sense. So, don't even bother.
>
> The fact is that, under certain conditions, it *is* a mis-feature, and it
> would be nice to at least have the option of continuing.

I agree. A fatal error in this situation stinks. You could work around it, of
course, with an up-front getline test:

$ ls f1 f2 f3
ls: cannot access f2: No such file or directory
f1 f3

$ cat f1 f3
f1, line 1
f3, line 1

$ awk '1' f1 f2 f3
f1, line 1
awk: (FILENAME=f1 FNR=1) fatal: cannot open file `f2' for reading (No such file
or directory)

$ awk 'BEGIN{if ((getline<ARGV[2])<0) ARGV[2]="/dev/null"; else
close(ARGV[2])}1' f1 f2 f3
f1, line 1
f3, line 1

$ echo "f2, line 1" > f2

$ awk 'BEGIN{if ((getline<ARGV[2])<0) ARGV[2]="/dev/null"; else
close(ARGV[2])}1' f1 f2 f3
f1, line 1
f2, line 1
f3, line 1

You can't tell from the getline if the file's missing or just can't be opened
but you probably wouldn't care.

Ed.

Andrew Schorr

unread,
Aug 21, 2008, 5:17:46 PM8/21/08
to
On Aug 20, 6:08 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
> $ awk 'BEGIN{if ((getline<ARGV[2])<0) ARGV[2]="/dev/null"; else
> close(ARGV[2])}1' f1 f2 f3
> f1, line 1
> f3, line 1

This approach seems sensible to me. And rather than use LD_PRELOAD
to solve the problem, why not use an xgawk include file? If
you stick the follwing file in /usr/share/xgawk/fixopen.awk

BEGIN {
for (i = 1; i < ARGC; i++) {
if ((getline < ARGV[i]) < 0)
delete ARGV[i]
else
close(ARGV[i])
}
}

then you can say:

bash-3.1$ xgawk '1; END {print "DONE"}' /tmp/does_not_exist
xgawk: cmd. line:1: fatal: cannot open file `/tmp/does_not_exist' for


reading (No such file or directory)

vs.

bash-3.1$ xgawk -i fixopen '1; END {print "DONE"}' /tmp/does_not_exist
DONE

That seems perhaps easier to maintain than using LD_PRELOAD. Plus
it has the advantage of deleting the file from the argument list,
so there's no need to open /dev/null in its place.

Regards,
Andy

P.S. I recognize that this has a problem in the case where the
user is passing other types of information (besides filenames)
on the command line. For some people that may be a problem;
I tend not to pass non-filename arguments very often...


Kenny McCormack

unread,
Aug 21, 2008, 5:26:57 PM8/21/08
to
In article <8535b7bd-e673-4184...@e39g2000hsf.googlegroups.com>,

Andrew Schorr <asc...@telemetry-investments.com> wrote:
>On Aug 20, 6:08 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
>> $ awk 'BEGIN{if ((getline<ARGV[2])<0) ARGV[2]="/dev/null"; else
>> close(ARGV[2])}1' f1 f2 f3
>> f1, line 1
>> f3, line 1
>
>This approach seems sensible to me. And rather than use LD_PRELOAD
>to solve the problem, why not use an xgawk include file? If
>you stick the follwing file in /usr/share/xgawk/fixopen.awk

Read the rest of the thread and you will understand why this is a
non-starter.

Aharon Robbins

unread,
Aug 22, 2008, 7:34:46 AM8/22/08
to
In article <48AC95D0...@lsupcaemnt.com>,

Ed Morton <mor...@lsupcaemnt.com> wrote:
>On 8/20/2008 7:48 AM, Kenny McCormack wrote:
>> Note to standards jockeys: No, this isn't a bug in your precious GAWK in
>> the usual "standards" sense. So, don't even bother.
>>
>> The fact is that, under certain conditions, it *is* a mis-feature, and it
>> would be nice to at least have the option of continuing.
>
>I agree. A fatal error in this situation stinks.

It's historical practice. Unix awk has worked this way since forever. IF
you don't need the filenames, you could always use

cat /proc/*/cmdline 2>/dev/null | awk 'program text'

In any case, it would not be a good idea to change gawk's default
behavior in this case.

Kenny: You are, of course, welcome to fork the gawk code base and create
a language that works to your specifications. You have my blessings.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

pk

unread,
Aug 22, 2008, 8:08:48 AM8/22/08
to
On Friday 22 August 2008 13:34, Aharon Robbins wrote:

> Kenny: You are, of course, welcome to fork the gawk code base and create
> a language that works to your specifications. You have my blessings.

In this particular case, I think a command line switch to enable the
behavior could be enough.

Ed Morton

unread,
Aug 22, 2008, 8:27:24 AM8/22/08
to

Yeah, but I think it'd make sense to see the default behavior changed and just
do the abort if a new switch or the existing "--compat/traditional" switch was
being used.

On the other hand, I've never actually encountered this problem in real use so
it's just an opinion...

Ed.

pk

unread,
Aug 22, 2008, 8:39:54 AM8/22/08
to
On Friday 22 August 2008 14:27, Ed Morton wrote:

>> In this particular case, I think a command line switch to enable the
>> behavior could be enough.
>>
>
> Yeah, but I think it'd make sense to see the default behavior changed and
> just do the abort if a new switch or the existing "--compat/traditional"
> switch was being used.

That would break the scripts that relay on the traditional behavior, but
again I never wrote one that does and I don't know how many of those are
there around (if any).



> On the other hand, I've never actually encountered this problem in real
> use so it's just an opinion...

Mine too.

pk

unread,
Aug 22, 2008, 8:41:03 AM8/22/08
to
On Friday 22 August 2008 14:39, pk wrote:

> That would break the scripts that relay

Should be "rely", of course.

Andrew Schorr

unread,
Aug 22, 2008, 8:50:39 AM8/22/08
to
On Aug 21, 5:26 pm, gaze...@shell.xmission.com (Kenny McCormack)
wrote:

> Read the rest of the thread and you will understand why this is a
> non-starter.

That's a rather cryptic response. I have read the thread. I have
always found LD_PRELOAD solutions to be hacks that are difficult to
maintain.

The point is that xgawk already exists as a testbed for new gawk
features. It currently contains some features that could help with
your problem. And it would also be a good place to add a patch to
address this particular issue, if you feel the existing xgawk
facilities
aren't rich enough.

Regards,
Andy

Kenny McCormack

unread,
Aug 22, 2008, 9:01:14 AM8/22/08
to
In article <48AEB0A...@lsupcaemnt.com>,

Ed Morton <mor...@lsupcaemnt.com> wrote:
>On 8/22/2008 7:08 AM, pk wrote:
>> On Friday 22 August 2008 13:34, Aharon Robbins wrote:
>>
>>
>>>Kenny: You are, of course, welcome to fork the gawk code base and create
>>>a language that works to your specifications. You have my blessings.
>>
>>
>> In this particular case, I think a command line switch to enable the
>> behavior could be enough.
>>
>
>Yeah, but I think it'd make sense to see the default behavior changed and just
>do the abort if a new switch or the existing "--compat/traditional" switch was
>being used.

I agree with 'pk' on this one. A switch to invoke the "non-traditional"
behavior is the way to go. While I *admire* the TAWK way, I tend to
agree that the "traditional" Unix/GAWK way is what most users expect.

>On the other hand, I've never actually encountered this problem in real use so
>it's just an opinion...

True. And that's what makes this whole thread rather, shall we say,
unique. It is hard to imagine a real world instance of this _other than_
when dealing with /proc...

Still, I think that the LD_PRELOAD method is good - obviously this
syntax functions as a "switch" - if I want this functionality, I use
LD_PRELOAD; if I don't, I don't. As I said, if I were really serious
about making this a permanent change, I'd fix it in the source, but it's
just not feasible for me to do that at the moment.

Ed Morton

unread,
Aug 22, 2008, 9:36:58 AM8/22/08
to
On 8/22/2008 8:01 AM, Kenny McCormack wrote:
> In article <48AEB0A...@lsupcaemnt.com>,
> Ed Morton <mor...@lsupcaemnt.com> wrote:
>
>>On 8/22/2008 7:08 AM, pk wrote:
>>
>>>On Friday 22 August 2008 13:34, Aharon Robbins wrote:
>>>
>>>
>>>
>>>>Kenny: You are, of course, welcome to fork the gawk code base and create
>>>>a language that works to your specifications. You have my blessings.
>>>
>>>
>>>In this particular case, I think a command line switch to enable the
>>>behavior could be enough.
>>>
>>
>>Yeah, but I think it'd make sense to see the default behavior changed and just
>>do the abort if a new switch or the existing "--compat/traditional" switch was
>>being used.
>
>
> I agree with 'pk' on this one. A switch to invoke the "non-traditional"
> behavior is the way to go. While I *admire* the TAWK way, I tend to
> agree that the "traditional" Unix/GAWK way is what most users expect.

While I usually would agree with that, in this case we're talking about
something that almost never happens so I doubt if anyone would add that switch
every time they invoke awk just in case it does, so if we have a switch to
invoke the "new" behavior then it'll probably never get used so those who would
fall over this problem still will, and there's an alternative workaround using
getline IF you need to deal with it, so it's just pointless to add a switch to
turn ON the new behavior.

On the other hand, making the new behavior the default would almost certainly
not cause anyone any problems, and if it does they can add the new switch.

Ed.


John DuBois

unread,
Aug 22, 2008, 2:55:09 PM8/22/08
to
In article <g8m88m$8bl$1...@news4.netvision.net.il>,

Aharon Robbins <arn...@skeeve.com> wrote:
>
>It's historical practice. Unix awk has worked this way since forever. IF
>you don't need the filenames, you could always use
>
> cat /proc/*/cmdline 2>/dev/null | awk 'program text'
>
>In any case, it would not be a good idea to change gawk's default
>behavior in this case.

I agree quite strongly! Having existing awk programs - including all those
relied upon for normal system functioning - suddenly have the potential to fail
silently rather than verbosely in the case of a missing file would be a very
bad idea.

I have no objection to an option to enable alternate behavior, though I'm among
those who would have little use for it.

John
--
John DuBois spc...@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/

Ed Morton

unread,
Aug 23, 2008, 8:04:26 AM8/23/08
to
On 8/22/2008 1:55 PM, John DuBois wrote:
> In article <g8m88m$8bl$1...@news4.netvision.net.il>,
> Aharon Robbins <arn...@skeeve.com> wrote:
>
>>It's historical practice. Unix awk has worked this way since forever. IF
>>you don't need the filenames, you could always use
>>
>> cat /proc/*/cmdline 2>/dev/null | awk 'program text'
>>
>>In any case, it would not be a good idea to change gawk's default
>>behavior in this case.
>
>
> I agree quite strongly! Having existing awk programs - including all those
> relied upon for normal system functioning - suddenly have the potential to fail
> silently rather than verbosely in the case of a missing file would be a very
> bad idea.

It doesn't have to be silent, there's no reason for it to be a catastrophic
failure like today, there's no real reason an application should want a
significant difference between trying to open a missing file vs trying to open
an unreadable file like today, and a missing file is handled inconsistently
today between being opened by getline vs being opened in the normal work loop so
handling of missing files could seriously be considered as broken right now and
this prooposal is a fix.

> I have no objection to an option to enable alternate behavior, though I'm among
> those who would have little use for it.

Right, but then no-one would actually use it as I mentioned elsethread.

Ed.


Andrew Schorr

unread,
Aug 23, 2008, 10:54:52 AM8/23/08
to
FYI, it looks to me as if Arnold has already committed a patch
to the Savannah CVS tree that changes the fatal error to
a warning if the WHINY_USERS environment variable is set:

+++ ./io.c 2008-08-22 10:30:05.534799000 -0400
@@ -316,6 +316,11 @@ nextfile(int skipping)
if (isdir && do_traditional)
continue;
#endif
+ if (whiny_users) {
+ warning(_("cannot open file `
%s' for rea
ding (%s)"),
+ fname,
strerror(errno));
+ continue;
+ }
goto give_up;
}
curfile->flag |= IOP_NOFREE_OBJ;

I imagine that should satisfy the various constituencies.

Regards,
Andy

Kenny McCormack

unread,
Aug 23, 2008, 11:02:53 AM8/23/08
to
In article <58a97cc4-43e0-4863...@b1g2000hsg.googlegroups.com>,

Andrew Schorr <asc...@telemetry-investments.com> wrote:
>FYI, it looks to me as if Arnold has already committed a patch
>to the Savannah CVS tree that changes the fatal error to
>a warning if the WHINY_USERS environment variable is set:

Interesting. Looks like we may have to frequently mention here on the
newsgroup, for the benefit of the various newbies, the need to set
WHINY_USERS in order to get proper functionality of GAWK.

Note that I am sort-of, semi, half-kidding. I do strongly believe that
array sorting is just natural and should always be on (unless your
arrays are really, really, huge, or your machine made during the Stone
Age, I can't see how it can cost). However, as my posts here have made
clear, I'm not all that certain that this "file not found" issue is in
need of an over-arching solution. I.e., I could see turning
WHINY_USERS on for the array sorting, but not necessarily
wanting/needing this other feature turned on.

I suppose I should search the current sources to see what, if any, other
effects may have been tied to WHINY_USERS.

John DuBois

unread,
Aug 23, 2008, 12:42:25 PM8/23/08
to
In article <48AFFCC...@lsupcaemnt.com>,

Ed Morton <mor...@lsupcaemnt.com> wrote:
>On 8/22/2008 1:55 PM, John DuBois wrote:
>> In article <g8m88m$8bl$1...@news4.netvision.net.il>,
>> Aharon Robbins <arn...@skeeve.com> wrote:
>>
>>>It's historical practice. Unix awk has worked this way since forever. IF
>>>you don't need the filenames, you could always use
>>>
>>> cat /proc/*/cmdline 2>/dev/null | awk 'program text'
>>>
>>>In any case, it would not be a good idea to change gawk's default
>>>behavior in this case.
>>
>>
>> I agree quite strongly! Having existing awk programs - including all those
>> relied upon for normal system functioning - suddenly have the potential to fail
>> silently rather than verbosely in the case of a missing file would be a very
>> bad idea.
>
>It doesn't have to be silent, there's no reason for it to be a catastrophic
>failure like today,

This need is set by all of the existing awk code out there, most of which is
not run interactively, and approximately none of which does any sort of error
checking on availablity of input files. I do *not* want that code to continue
to produce output, exit successfully, etc. if input files are not available.

>there's no real reason an application should want a
>significant difference between trying to open a missing file vs trying to open
>an unreadable file like today

What significant difference?

> and a missing file is handled inconsistently
>today between being opened by getline vs being opened in the normal work loop

This is exactly the difference that *should* exist. In a getline loop, there
is a failure indication intrinsically available to the code. If a file is
simply skipped, there isn't.

In fact, let me put it this way: If I was designing the language today, I would
make it behave (almost) exactly as it does. A file that couldn't be opened for
any reason would, by default, be a fatal error. What I might do differently:
a) provide a command-line option to make it a non-fatal error; and b) provide a
failure block which, if used, would make it an otherwise-silent non-event:
something like OPENFAIL { }.

Grant

unread,
Aug 23, 2008, 1:41:47 PM8/23/08
to

Oh yes, and add SIGNAL { } too. Having to wrap gawk script in a shell wrapper
to catch signals -- well, it can be done, like the shell wrapper for open fail.

Grant.
--
http://bugsplatter.id.au/

Ed Morton

unread,
Aug 23, 2008, 8:13:31 PM8/23/08
to
On 8/23/2008 11:42 AM, John DuBois wrote:
> In article <48AFFCC...@lsupcaemnt.com>,
> Ed Morton <mor...@lsupcaemnt.com> wrote:
>
>>On 8/22/2008 1:55 PM, John DuBois wrote:
>>
>>>In article <g8m88m$8bl$1...@news4.netvision.net.il>,
>>>Aharon Robbins <arn...@skeeve.com> wrote:
>>>
>>>
>>>>It's historical practice. Unix awk has worked this way since forever. IF
>>>>you don't need the filenames, you could always use
>>>>
>>>> cat /proc/*/cmdline 2>/dev/null | awk 'program text'
>>>>
>>>>In any case, it would not be a good idea to change gawk's default
>>>>behavior in this case.
>>>
>>>
>>>I agree quite strongly! Having existing awk programs - including all those
>>>relied upon for normal system functioning - suddenly have the potential to fail
>>>silently rather than verbosely in the case of a missing file would be a very
>>>bad idea.
>>
>>It doesn't have to be silent, there's no reason for it to be a catastrophic
>>failure like today,
>
>
> This need is set by all of the existing awk code out there, most of which is
> not run interactively, and approximately none of which does any sort of error
> checking on availablity of input files. I do *not* want that code to continue
> to produce output, exit successfully, etc. if input files are not available.

It does that today if the input file is empty and *I* don't care if it's empty
or can't be opened or doesn't exist.

>
>>there's no real reason an application should want a
>>significant difference between trying to open a missing file vs trying to open
>>an unreadable file like today
>
>
> What significant difference?

There isn't one in general. I was looking at a difference that only exists on
cygwin:

$ ls -l f?
-rw-r--r-- 1 morton mkgroup-l-d 11 Aug 20 16:58 f1
---------- 1 morton mkgroup-l-d 0 Aug 23 18:49 f2
-rw-r--r-- 1 morton mkgroup-l-d 11 Aug 20 16:59 f3

$ gawk '1' f1 f2 f3


f1, line 1
f3, line 1

$ rm -f f2

$ gawk '1' f1 f2 f3
f1, line 1
gawk: (FILENAME=f1 FNR=1) fatal: cannot open file `f2' for reading (No such file
or directory)

It looked like it was quitely skipping the unreadable file, but when I added
content to that file:

$ ls -l f?
-rw-r--r-- 1 morton mkgroup-l-d 11 Aug 20 16:58 f1
---------- 1 morton mkgroup-l-d 14 Aug 23 18:51 f2
-rw-r--r-- 1 morton mkgroup-l-d 11 Aug 20 16:59 f3

$ gawk '1' f1 f2 f3
f1, line 1
file2, line 1
f3, line 1

I see it's just that cygwin is ignoring the unreadable permission of f2.

>
>>and a missing file is handled inconsistently
>>today between being opened by getline vs being opened in the normal work loop
>
>
> This is exactly the difference that *should* exist. In a getline loop, there
> is a failure indication intrinsically available to the code. If a file is
> simply skipped, there isn't.

That's just a design choice. You could choose to set some standard variable and
have it available for anyone who cared to test, probably in the END section.

> In fact, let me put it this way: If I was designing the language today, I would
> make it behave (almost) exactly as it does. A file that couldn't be opened for
> any reason would, by default, be a fatal error.

Why? If you're going to do that, why not make an empty file a fatal error too?
If you care about it, why not test all the files up front and then not open any
of them rather than producing partial output? Those are rhetorical questions - I
don't really care what the answers are as what to do is just a matter of
opinion, BUT a fatal error tears the rug out from under you in terms of handling
various input coonditions.

> What I might do differently: a) provide a command-line option to make it a
non-fatal error; and b) provide a
> failure block which, if used, would make it an otherwise-silent non-event:
> something like OPENFAIL { }.

I agree with both of those, though obviously I'd switch the default behavior.

Ed.

Kenny McCormack

unread,
Aug 24, 2008, 4:37:32 AM8/24/08
to
In article <48B0A7AB...@lsupcaemnt.com>,
Ed Morton <mor...@lsupcaemnt.com> wrote (>) in response to someone else (>>):
...

>> What I might do differently: a) provide a command-line option to make
>> it a non-fatal error; and b) provide a failure block which, if used,
>> would make it an otherwise-silent non-event: something like OPENFAIL
>> { }.
>
>I agree with both of those, though obviously I'd switch the default behavior.
>
> Ed.
>

I think what most people are arguing is that you *can't* change the
default behavior, however much we wish it had been done right (IOHO) in
the beginning, because it *might* break existing code. The situation is
much the same as that which has Solaris keeping two very broken programs
around (and makes them the default on the default PATH). I am
referring, of course, to their keeping /bin/awk (very broken) and
/bin/sh (original sh, warts and all, even as the world is moving towards
the so-called "POSIX" shell).

P.S. IOHO: In our humble opinion

Andrew Schorr

unread,
Aug 24, 2008, 12:13:34 PM8/24/08
to
On Aug 23, 11:02 am, gaze...@shell.xmission.com (Kenny McCormack)
wrote:

> I suppose I should search the current sources to see what, if any, other
> effects may have been tied to WHINY_USERS.

In the current savannah CVS source, I see only 3 places where
WHINY_USERS
matters:

1. Array index sorting (using qsort) in "for" loops.
2. The new patch to turn an open failure into a warning instead of a
fatal error
3. In the profiling code, there is a place in pp_string_fp where
it changes how characters are printed (octal vs. %c). I'm not
exactly sure how that manifests itself.

Perhaps WHINY_USERS should be a bitmask to selectively enable one or
more
of these features.

Regards,
Andy

Andrew Schorr

unread,
Aug 24, 2008, 2:15:26 PM8/24/08
to
On Aug 24, 4:37 am, gaze...@shell.xmission.com (Kenny McCormack)
wrote:

> I think what most people are arguing is that you *can't* change the
> default behavior, however much we wish it had been done right (IOHO) in
> the beginning, because it *might* break existing code.  The situation is
> much the same as that which has Solaris keeping two very broken programs
> around (and makes them the default on the default PATH).  I am
> referring, of course, to their keeping /bin/awk (very broken) and
> /bin/sh (original sh, warts and all, even as the world is moving towards
> the so-called "POSIX" shell).

This whole discussion reminds me of the BEGINFILE/ENDFILE proposal
that was discussed in this group in 2006. If that extension were
implemented, then it would be much easier to handle missing files
(e.g. the BEGINFILE rule could test whether the file is readable and
skip it if not). There has always been an issue of whether a gawk
program might be interested in knowing whether a zero-length file
was supplied as an argument. Currently, there's no simple way to
detect that situation. But with BEGINFILE/ENDFILE, that situation
could be detected, and I think we could also find a way to handle
unreadable files. Perhaps this is worth implementing as an
xgawk extension?

Regards,
Andy

Jürgen Kahrs

unread,
Aug 24, 2008, 3:21:31 PM8/24/08
to
Andrew Schorr wrote:

> There has always been an issue of whether a gawk
> program might be interested in knowing whether a zero-length file
> was supplied as an argument. Currently, there's no simple way to
> detect that situation. But with BEGINFILE/ENDFILE, that situation
> could be detected, and I think we could also find a way to handle
> unreadable files. Perhaps this is worth implementing as an
> xgawk extension?

You probably remember that Peter Saveliev has already
implemented this and supplied a documented patch
against xgawk:

http://lml.ls.fi.upm.es/~mcollado/xmlgawk/b-e-g-summary.html
http://xgawk.radlinux.org/Articles/patch-fileworks/show

Aharon Robbins

unread,
Aug 25, 2008, 5:49:08 AM8/25/08
to
In article <ea499eca-ec16-4ae2...@p25g2000hsf.googlegroups.com>,

Andrew Schorr <asc...@telemetry-investments.com> wrote:
>3. In the profiling code, there is a place in pp_string_fp where
> it changes how characters are printed (octal vs. %c). I'm not
> exactly sure how that manifests itself.

This was motivated by a different whiny user, so that characters with
the high bit set (e.g. Chinese) come out in the output in the same way
they went in.

>Perhaps WHINY_USERS should be a bitmask to selectively enable one or
>more of these features.

Nah.

Arnold

Aharon Robbins

unread,
Aug 25, 2008, 5:57:00 AM8/25/08
to
In article <48B0A7AB...@lsupcaemnt.com>,

I seem to recall that the gawk manual shows you exactly how to do this
with a library file if it's important to you. I think it was less than 20
lines of code, and all it takes is adding a -f xxx.awk (I forget the name)
to the command line.

We can argue back and forth forever as to what the "right" design decision
is, but as is the case with many things in awk, we (I) am constrained by
both historical practice and standards. Since there's an easy workaround
for those who want it, using standard awk features, I don't see this as
a major issue.

Andrew Schorr

unread,
Aug 25, 2008, 9:18:47 AM8/25/08
to
On Aug 25, 5:57 am, arn...@skeeve.com (Aharon Robbins) wrote:
> I seem to recall that the gawk manual shows you exactly how to do this
> with a library file if it's important to you. I think it was less than 20
> lines of code, and all it takes is adding a -f xxx.awk (I forget the name)
> to the command line.

Yes, the gawk ARGIND extension gives the ability to
detect zero-length files. It is discussed here:

http://www.gnu.org/manual/gawk/html_node/Empty-Files.html

I think many of the issues discussed here can be addressed by
including
awk code libraries, such as that one. Note that xgawk makes it a bit
easier to
do this (by adding an @include directive). However, it seems to me
that BEGINFILE would still be needed if there's a desire to be able
to change parsing mode based on the filename. Suppose, for example,
that you
wanted to change the value of RS based on the filename (or, in the
case of
xgawk, you want to switch to XML parsing mode for a file that ends
in .xml).
This needs to be done before the 1st record is read. Is there any way
to do
this without having a BEGINFILE hook? In practice, what I do is use
getline to take control of this process instead of passing the
filename
on the command line. That solves the problem for me, but I know that
some people really detest getline and prefer to use command-line file
arguments for all processing...

Regards,
Andy

Regards,
Andy

Kenny McCormack

unread,
Sep 1, 2008, 11:36:35 AM9/1/08
to
In article <g8m88m$8bl$1...@news4.netvision.net.il>,
Aharon Robbins <arn...@skeeve.com> wrote:
>In article <48AC95D0...@lsupcaemnt.com>,
>Ed Morton <mor...@lsupcaemnt.com> wrote:
>>On 8/20/2008 7:48 AM, Kenny McCormack wrote:
>>> Note to standards jockeys: No, this isn't a bug in your precious GAWK in
>>> the usual "standards" sense. So, don't even bother.
>>>
>>> The fact is that, under certain conditions, it *is* a mis-feature, and it
>>> would be nice to at least have the option of continuing.
>>
>>I agree. A fatal error in this situation stinks.
>
>It's historical practice. Unix awk has worked this way since forever. IF
>you don't need the filenames, you could always use
>
> cat /proc/*/cmdline 2>/dev/null | awk 'program text'

I do need the filenames. Still, this is an interesting "yet another
shell/script kludge" solution to the problem. I imagine if one really
wanted to go down this route, one could kludge something up with
something like "pr" that does preserve the filenames - probably piping
the output of "pr" through AWK to do the cleanup, etc, etc.

>In any case, it would not be a good idea to change gawk's default
>behavior in this case.

Agreed - although it now seems that you have, in fact, put together an
"official" patch for this.

>Kenny: You are, of course, welcome to fork the gawk code base and create
>a language that works to your specifications. You have my blessings.

Well, thank you for that. But as you know, I'm not really interested in
changing the "official" sources. Rather, I explicitly went for a
solution that doesn't require recompiling GAWK itself (but still has the
benefit of being a "system" (C-code level) solution).

Here is my final (?) code for this. It functions as a drop-in
replacement for "gawk" (can be used in the #! line). Compile as
indicated in the comments. Yes, this is unabashedly Linux-specific.

/* A lib to fix the GAWK missing files problem */
/* Usage: export LD_PRELOAD=/path/to/this/lib */
/* Compile via:
* gcc -s -W -Wall -Werror -fpic -pie -rdynamic -o libopen_fix.so open_fix.c -ldl
*/

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>

int open64(const char *path, int flags, ...) {
static int (*real_open64) (const char *,int);
static char *warn;
int ret;

if (!real_open64) {
real_open64 = (int (*)(const char *,int)) dlsym(RTLD_NEXT,"open64");
if ((warn = getenv("OPENFIX_WARN"))) {
long t = time(0);
fprintf(stderr,"openfix: Initializing at: %s",ctime(&t));
}
}
if ((ret = real_open64(path,flags)) != -1) return ret;
if (warn)
fprintf(stderr,"openfix: open failed for '%s' (flags = %d)\n",path,flags);
return real_open64("/dev/null",flags);
}

int main(int argc, char **argv) {
char *buff = malloc(512);
int ret;

if ((ret = readlink("/proc/self/exe",buff,512)) == -1)
perror("readlink"), exit(1);
buff[ret] = 0;
setenv("LD_PRELOAD",buff,1);
if (getenv("OPENFIX_WARN"))
printf("LD_PRELOAD = '%s'\n",getenv("LD_PRELOAD"));
execvp("gawk",argv);
perror("gawk");
exit(!!argc);
}

jh

unread,
Sep 3, 2008, 7:43:48 AM9/3/08
to
Aharon Robbins wrote:
> In article <48B0A7AB...@lsupcaemnt.com>,
> Ed Morton <mor...@lsupcaemnt.com> wrote:
>> Why? If you're going to do that, why not make an empty file a fatal error too?
>> If you care about it, why not test all the files up front and then not open any
>> of them rather than producing partial output? Those are rhetorical questions - I
>> don't really care what the answers are as what to do is just a matter of
>> opinion, BUT a fatal error tears the rug out from under you in terms of handling
>> various input coonditions.
>
> I seem to recall that the gawk manual shows you exactly how to do this
> with a library file if it's important to you. I think it was less than 20
> lines of code, and all it takes is adding a -f xxx.awk (I forget the name)
> to the command line.

Yes, it does. Checking for Readable Data Files and Checking For
Zero-length Files, both on pg. 195 of the Oct. 2007 version of the
manual, GAWK: Effective AWK Programming.

Kenny McCormack

unread,
Sep 3, 2008, 8:05:55 AM9/3/08
to
In article <3pqdnd1l773o5SPV...@neonova.net>,

Keep in mind that none of these "script kludges" are effective in the
cases where it matters (e.g., when processing files in /proc).

Andrew Schorr

unread,
Sep 3, 2008, 10:58:52 AM9/3/08
to
On Sep 3, 8:05 am, gaze...@shell.xmission.com (Kenny McCormack) wrote:
> Keep in mind that none of these "script kludges" are effective in the
> cases where it matters (e.g., when processing files in /proc).

That's true. I suspect the only way to handle such a situation is to
have a BEGINFILE rule that is called after the open has been
attempted.
If the file open failed, then BEGINFILE might be called with ERRNO set
to a non-NULL string. Inside the BEGINFILE rule, one could call
nextfile to skip on error. If nextfile is not called from BEGINFILE,
then this would be a fatal error (after BEGINFILE processing has
completed). That approach would give
the same behavior as now in the default case (where there is no
BEGINFILE
rule), but it would provide a hook for recovering if the file open
fails (by calling nextfile from BEGINFILE if ERRNO is non-null).

Unfortunately, I don't think either of the 2 BEGINFILE/ENDFILE patches
currently floating around gives this behavior. But perhaps it could
be achieved?

Regards,
Andy

Kenny McCormack

unread,
Sep 3, 2008, 11:59:24 AM9/3/08
to
In article <fdce0196-6529-4e0f...@c58g2000hsc.googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
...

>Unfortunately, I don't think either of the 2 BEGINFILE/ENDFILE patches
>currently floating around gives this behavior. But perhaps it could
>be achieved?

I don't understand why you guys are still beating this horse.

The problem has been solved.

Time to move on - and slay other dragons.

Jürgen Kahrs

unread,
Sep 3, 2008, 12:39:30 PM9/3/08
to
Kenny McCormack wrote:

> I don't understand why you guys are still beating this horse.
>
> The problem has been solved.

This is probably because of other ongoing discussions
that are not visible her in comp.lang.awk.

0 new messages