1) That the reason I need this is because I am processing /proc/*,
which can (obviously) change at any moment.
2) I put together a shared (LD_PRELOAD) library/executable that
solves the problem quite nicely for me.
Well, now I've found a new wrinkle in this. If an input file is a
directory, you also get a fatal (uncatchable) error, and my
library/executable doesn't help that. I assume I could do another such
hack that would, but not sure I want to invest the time at the moment.
Who here has not done something like: gawk -f program *
and had it blow up because there was a directory in amongst the files in *?
I ask you... And so I ask again, wouldn't it be a good thing if GAWK
could be told to silently ignore these sorts of problems?
Soryy my first reaction is that if you care, put a shell wrapper
around the problem. I tend to use the available tools I'm familiar
with as best I can in combination -- not modify a tool if another
in the 'toolbox' will do.
Like a chisel can carve wood on its own, but needs a tap (or belt)
from a hammer to do the heavy work :)
Grant.
--
http://bugsplatter.id.au
BEGIN {
for (i = 1; i < ARGC; i++)
if (system("test -d '" ARGV[i] '") == 0)
delete ARGV[i]
}
A five line addition to your script solves the command-line issue.
Or a shell wrapper.
I will note that of five awks tested, only the MKS awk silently ignores
a directory on the command line. All others (nawk, mawk, gawk, busybox
awk) treat it as a fatal error.
Arnold
In article <h0mg5h$kh8$1...@news.xmission.com>,
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
Kewl. I'll look into that. Thanks.
Of course I'm making the assumption that processing ARGV[i] is independent
of processing ARGV[i-1], so overall the program still functions properly.
But isn't that exactly the same assumption that deleting directories from
ARGV as an initial processing step is making?
- Anton Treuenfels
"Aharon Robbins" <arn...@skeeve.com> wrote in message
news:h0nf3r$8e$1...@news.bytemine.net...
Both historical compatibility and POSIX compliance, IIRC.
>As opposed to, say, issuing a warning that "<Name> cannot be opened"
>or "<Name> is a directory" and simply moving on the next ARGV[i]?
I should double check the standard - if it's supposed to be a fatal error,
maybe I'll move that to --posix for the next release.
>Of course I'm making the assumption that processing ARGV[i] is independent
>of processing ARGV[i-1], so overall the program still functions properly.
>But isn't that exactly the same assumption that deleting directories from
>ARGV as an initial processing step is making?
Not sure what you're saying here; once gawk finds that a command line
file is a directory it dies. It doesn't notice that an argument names
a directory until it tries to open it during the main loop.
Arnold
Sorry if I wasn't clear. I was referring to your suggested fix in reply to
the OP. It appears to delete directories from ARGV[] in a BEGIN section
prior to entering the main loop. From an execution perspective this is
indistinguishable from "silently skipping" directories, and only slightly
distinguishable from issuing a warning as well as skipping.
Actually I think I might have been a little hasty in claiming that for this
to always work the processing of any ARGV[i] had to be independent of the
processing of any previous ARGV[1..i-1]. Since directories are never
processed in any circumstance, no subsequent file could possibly depend on
any information they might contain. So they can be skipped with confidence
in all cases, at least as far as file dependencies go.
My point is simply that since this is so, why not codify it as a built-in
behavior? The BEGIN {} suggestion works perfectly well for its intended
purpose, but why should it be necessary at all?
But. I don't know if this is the POSIX standard, but at www.opengroup.org
there is an AWK standard. It does say in one section that input files
"shall" be text files, and in another that "if any file operand is specified
and the named file cannot be accessed" awk "shall" roll over and die.
To finesse that and stay within the standard one might argue that a
directory file is not a text file, hence it is not a valid input file, hence
suicide is not necessarily required since only for input files is this
demanded.
Another approach might be to argue that the key word is "accessed", and if a
directory can be accessed (although not necessarily processed), suicide is
again avoided in favor of skipping. An implementation might do something
along the lines of (1) OS call: does file exist? No -> cannot be accessed,
die; (2) OS call: is file type text? No -> skip (possible warning); (3) OS
call: open file (continue normally)
- Anton Treuenfels
In article <YrGdneXbTMIIRqzX...@earthlink.com>,
Anton Treuenfels <teamt...@yahoo.com> wrote:
>> Not sure what you're saying here; once gawk finds that a command line
>> file is a directory it dies. It doesn't notice that an argument names
>> a directory until it tries to open it during the main loop.
>
>Sorry if I wasn't clear. I was referring to your suggested fix in reply to
>the OP. It appears to delete directories from ARGV[] in a BEGIN section
>prior to entering the main loop. From an execution perspective this is
>indistinguishable from "silently skipping" directories, and only slightly
>distinguishable from issuing a warning as well as skipping.
True.
>Actually I think I might have been a little hasty in claiming that for this
>to always work the processing of any ARGV[i] had to be independent of the
>processing of any previous ARGV[1..i-1]. Since directories are never
>processed in any circumstance, no subsequent file could possibly depend on
>any information they might contain. So they can be skipped with confidence
>in all cases, at least as far as file dependencies go.
Also true.
>My point is simply that since this is so, why not codify it as a built-in
>behavior? The BEGIN {} suggestion works perfectly well for its intended
>purpose, but why should it be necessary at all?
As I said, for both historical compatibility and POSIX compliance.
>But. I don't know if this is the POSIX standard, but at www.opengroup.org
>there is an AWK standard.
That's the one.
>It does say in one section that input files
>"shall" be text files, and in another that "if any file operand is specified
>and the named file cannot be accessed" awk "shall" roll over and die.
That's it. That means gawk is currently compliant.
>To finesse that and stay within the standard one might argue that a
>directory file is not a text file, hence it is not a valid input file, hence
>suicide is not necessarily required since only for input files is this
>demanded.
Exactly 180 degrees wrong. Since it's not a valid input file, gawk should
roll over and die, as it does.
I think the correct solution here is to move the rolling over and dieing into
--posix mode and skip directories with a warning as the default. I will try to
get this into the development version.
I do think we've beaten this topic pretty much to death, now.
Thanks,
Yes. But please don't forget the suggestion made sometime ago about a
possible gawk extension w.r.t. directories as input "files": instead of
just skipping them, they could be processed as a list of filenames, one
name per record.
Regards.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
In article <h14rmm$741$1...@heraldo.rediris.es>,
Manuel Collado <m.co...@invalid.domain> wrote:
>Yes. But please don't forget the suggestion made sometime ago about a
>possible gawk extension w.r.t. directories as input "files": instead of
>just skipping them, they could be processed as a list of filenames, one
>name per record.
I'd prefer to see this as a loadable built-in that took advantage of the
open-hook mechanisms.
Anyone interested in doing this is welcome to submit a patch. It
shouldn't be all that hard.