Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GAWK fatal error if input file is a directory

33 views
Skip to first unread message

Kenny McCormack

unread,
Jun 9, 2009, 4:20:01 PM6/9/09
to
Those of you with good memories will remember a thread I started up here
about a year ago that concerned the problem that GAWK will blow up with
a fatal, uncatchable error if it cannot open an input file. Various
shell workarounds, etc, were proposed as well as the usual "Why do you
want to do this?" type queries. The upshot of that thread was:

1) That the reason I need this is because I am processing /proc/*,
which can (obviously) change at any moment.
2) I put together a shared (LD_PRELOAD) library/executable that
solves the problem quite nicely for me.

Well, now I've found a new wrinkle in this. If an input file is a
directory, you also get a fatal (uncatchable) error, and my
library/executable doesn't help that. I assume I could do another such
hack that would, but not sure I want to invest the time at the moment.

Who here has not done something like: gawk -f program *
and had it blow up because there was a directory in amongst the files in *?

I ask you... And so I ask again, wouldn't it be a good thing if GAWK
could be told to silently ignore these sorts of problems?

Grant

unread,
Jun 9, 2009, 5:17:50 PM6/9/09
to

Soryy my first reaction is that if you care, put a shell wrapper
around the problem. I tend to use the available tools I'm familiar
with as best I can in combination -- not modify a tool if another
in the 'toolbox' will do.

Like a chisel can carve wood on its own, but needs a tap (or belt)
from a hammer to do the heavy work :)

Grant.
--
http://bugsplatter.id.au

Aharon Robbins

unread,
Jun 10, 2009, 1:08:11 AM6/10/09
to
As a result of the previous thread, the input file is a directory error
has been fixed in CVS. I suggest you start using the CVS code instead of
the released code. (I have started a release spiral with my maintainers;
I'll announce a public beta when it's ready.) You could also add this
to your code:

BEGIN {
for (i = 1; i < ARGC; i++)
if (system("test -d '" ARGV[i] '") == 0)
delete ARGV[i]
}

A five line addition to your script solves the command-line issue.

Or a shell wrapper.

I will note that of five awks tested, only the MKS awk silently ignores
a directory on the command line. All others (nawk, mawk, gawk, busybox
awk) treat it as a fatal error.

Arnold

In article <h0mg5h$kh8$1...@news.xmission.com>,


--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Kenny McCormack

unread,
Jun 10, 2009, 7:45:51 AM6/10/09
to
In article <h0nf3r$8e$1...@news.bytemine.net>,

Aharon Robbins <arn...@skeeve.com> wrote:
>As a result of the previous thread, the input file is a directory error
>has been fixed in CVS. I suggest you start using the CVS code instead of
>the released code. (I have started a release spiral with my maintainers;
>I'll announce a public beta when it's ready.) You could also add this
>to your code:

Kewl. I'll look into that. Thanks.

Anton Treuenfels

unread,
Jun 10, 2009, 11:01:28 PM6/10/09
to
Just out of curiousity, why treat trying to open a directory as a fatal
error? As opposed to, say, issuing a warning that "<Name> cannot be opened"
or "<Name> is a directory" and simply moving on the next ARGV[i]?

Of course I'm making the assumption that processing ARGV[i] is independent
of processing ARGV[i-1], so overall the program still functions properly.
But isn't that exactly the same assumption that deleting directories from
ARGV as an initial processing step is making?

- Anton Treuenfels

"Aharon Robbins" <arn...@skeeve.com> wrote in message
news:h0nf3r$8e$1...@news.bytemine.net...

Aharon Robbins

unread,
Jun 11, 2009, 12:58:46 PM6/11/09
to
In article <Q6Sdnb8FVsIF7q3X...@earthlink.com>,

Anton Treuenfels <teamt...@yahoo.com> wrote:
>Just out of curiousity, why treat trying to open a directory as a fatal
>error?

Both historical compatibility and POSIX compliance, IIRC.

>As opposed to, say, issuing a warning that "<Name> cannot be opened"
>or "<Name> is a directory" and simply moving on the next ARGV[i]?

I should double check the standard - if it's supposed to be a fatal error,
maybe I'll move that to --posix for the next release.

>Of course I'm making the assumption that processing ARGV[i] is independent
>of processing ARGV[i-1], so overall the program still functions properly.
>But isn't that exactly the same assumption that deleting directories from
>ARGV as an initial processing step is making?

Not sure what you're saying here; once gawk finds that a command line
file is a directory it dies. It doesn't notice that an argument names
a directory until it tries to open it during the main loop.

Arnold

Anton Treuenfels

unread,
Jun 12, 2009, 12:37:38 AM6/12/09
to

"Aharon Robbins" <arn...@skeeve.com> wrote in message
news:h0rd46$nh6$1...@news.bytemine.net...

> In article <Q6Sdnb8FVsIF7q3X...@earthlink.com>,
> Anton Treuenfels <teamt...@yahoo.com> wrote:
>>Just out of curiousity, why treat trying to open a directory as a fatal
>>error?
>
> Both historical compatibility and POSIX compliance, IIRC.
>
>>As opposed to, say, issuing a warning that "<Name> cannot be opened"
>>or "<Name> is a directory" and simply moving on the next ARGV[i]?
>
> I should double check the standard - if it's supposed to be a fatal error,
> maybe I'll move that to --posix for the next release.
>
>>Of course I'm making the assumption that processing ARGV[i] is independent
>>of processing ARGV[i-1], so overall the program still functions properly.
>>But isn't that exactly the same assumption that deleting directories from
>>ARGV as an initial processing step is making?
>
> Not sure what you're saying here; once gawk finds that a command line
> file is a directory it dies. It doesn't notice that an argument names
> a directory until it tries to open it during the main loop.

Sorry if I wasn't clear. I was referring to your suggested fix in reply to
the OP. It appears to delete directories from ARGV[] in a BEGIN section
prior to entering the main loop. From an execution perspective this is
indistinguishable from "silently skipping" directories, and only slightly
distinguishable from issuing a warning as well as skipping.

Actually I think I might have been a little hasty in claiming that for this
to always work the processing of any ARGV[i] had to be independent of the
processing of any previous ARGV[1..i-1]. Since directories are never
processed in any circumstance, no subsequent file could possibly depend on
any information they might contain. So they can be skipped with confidence
in all cases, at least as far as file dependencies go.

My point is simply that since this is so, why not codify it as a built-in
behavior? The BEGIN {} suggestion works perfectly well for its intended
purpose, but why should it be necessary at all?

But. I don't know if this is the POSIX standard, but at www.opengroup.org
there is an AWK standard. It does say in one section that input files
"shall" be text files, and in another that "if any file operand is specified
and the named file cannot be accessed" awk "shall" roll over and die.

To finesse that and stay within the standard one might argue that a
directory file is not a text file, hence it is not a valid input file, hence
suicide is not necessarily required since only for input files is this
demanded.

Another approach might be to argue that the key word is "accessed", and if a
directory can be accessed (although not necessarily processed), suicide is
again avoided in favor of skipping. An implementation might do something
along the lines of (1) OS call: does file exist? No -> cannot be accessed,
die; (2) OS call: is file type text? No -> skip (possible warning); (3) OS
call: open file (continue normally)

- Anton Treuenfels

Aharon Robbins

unread,
Jun 12, 2009, 4:19:51 AM6/12/09
to
Hi.

In article <YrGdneXbTMIIRqzX...@earthlink.com>,


Anton Treuenfels <teamt...@yahoo.com> wrote:
>> Not sure what you're saying here; once gawk finds that a command line
>> file is a directory it dies. It doesn't notice that an argument names
>> a directory until it tries to open it during the main loop.
>
>Sorry if I wasn't clear. I was referring to your suggested fix in reply to
>the OP. It appears to delete directories from ARGV[] in a BEGIN section
>prior to entering the main loop. From an execution perspective this is
>indistinguishable from "silently skipping" directories, and only slightly
>distinguishable from issuing a warning as well as skipping.

True.

>Actually I think I might have been a little hasty in claiming that for this
>to always work the processing of any ARGV[i] had to be independent of the
>processing of any previous ARGV[1..i-1]. Since directories are never
>processed in any circumstance, no subsequent file could possibly depend on
>any information they might contain. So they can be skipped with confidence
>in all cases, at least as far as file dependencies go.

Also true.

>My point is simply that since this is so, why not codify it as a built-in
>behavior? The BEGIN {} suggestion works perfectly well for its intended
>purpose, but why should it be necessary at all?

As I said, for both historical compatibility and POSIX compliance.

>But. I don't know if this is the POSIX standard, but at www.opengroup.org
>there is an AWK standard.

That's the one.

>It does say in one section that input files
>"shall" be text files, and in another that "if any file operand is specified
>and the named file cannot be accessed" awk "shall" roll over and die.

That's it. That means gawk is currently compliant.

>To finesse that and stay within the standard one might argue that a
>directory file is not a text file, hence it is not a valid input file, hence
>suicide is not necessarily required since only for input files is this
>demanded.

Exactly 180 degrees wrong. Since it's not a valid input file, gawk should
roll over and die, as it does.

I think the correct solution here is to move the rolling over and dieing into
--posix mode and skip directories with a warning as the default. I will try to
get this into the development version.

I do think we've beaten this topic pretty much to death, now.

Thanks,

Manuel Collado

unread,
Jun 15, 2009, 3:00:51 AM6/15/09
to
Aharon Robbins escribi�:
> ...

> I think the correct solution here is to move the rolling over and dieing into
> --posix mode and skip directories with a warning as the default. I will try to
> get this into the development version.
>
> I do think we've beaten this topic pretty much to death, now.

Yes. But please don't forget the suggestion made sometime ago about a
possible gawk extension w.r.t. directories as input "files": instead of
just skipping them, they could be processed as a list of filenames, one
name per record.

Regards.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Aharon Robbins

unread,
Jun 15, 2009, 3:24:49 PM6/15/09
to
Aharon Robbins escribi�:

>> I think the correct solution here is to move the rolling over and dieing into
>> --posix mode and skip directories with a warning as the default. I will try to
>> get this into the development version.
>>
>> I do think we've beaten this topic pretty much to death, now.

In article <h14rmm$741$1...@heraldo.rediris.es>,


Manuel Collado <m.co...@invalid.domain> wrote:
>Yes. But please don't forget the suggestion made sometime ago about a
>possible gawk extension w.r.t. directories as input "files": instead of
>just skipping them, they could be processed as a list of filenames, one
>name per record.

I'd prefer to see this as a loadable built-in that took advantage of the
open-hook mechanisms.

Anyone interested in doing this is welcome to submit a patch. It
shouldn't be all that hard.

0 new messages