Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

dynamic input file names

65 views
Skip to first unread message

igor2

unread,
Jan 18, 2010, 5:14:15 AM1/18/10
to
Hi all,

I have an awk script that i call as "awk -f config.awk -f scriptname.awk
input1 input2"; it does process the two input files as
expected. File config.awk contains a single BEGIN {} block with all
the configuration stored in awk variables. Lately I found it would be
better to have the input file names defined in config.awk. I could
define them somewhere else and get a shell script around the whole thing
that substitutes the file names in the awk command line, but I would like
to aboid that for specific reasons. Neither do I want to use a lot of
getlines.

Is there a way to get awk to open new input files (or all input files) in
a way that I generate the input file names in BEGIN? Like if I would
modify ARGV[] or something... If there is a solution for this, is that
portable? Or any other creative idea that achieves something silimar
without getline or a wrapper around the script?

TIA,

Tibor Palinkas

Grant

unread,
Jan 18, 2010, 4:06:29 PM1/18/10
to

This is a code fragment that inserts or appends filenamnes into gawk,
I don't know if it is portable, perhaps there's a clue here?
BEGIN {
...
# now massage the command line argument list
i = 1
while (i < ARGC && ARGV[i] ~ /=/) # skip past var=val list
++i
if (i < ARGC) {
for (j = ARGC; j >= i; j--) # make room for ip2c + services
ARGV[j + room_needed] = ARGV[j]
if (got_ip2c) { # insert ip2c-database
ARGV[i++] = ip2c_index; ++ARGC
ARGV[i++] = ip2c_names; ++ARGC
}
ARGV[i] = services; ++ARGC # insert services
}
else {
if (got_ip2c) { # append ip2c-database
ARGV[ARGC++] = ip2c_index
ARGV[ARGC++] = ip2c_names
}
ARGV[ARGC++] = services # append services
ARGV[ARGC++] = "-"
}
...

Code snippet from another script pushes next file to read onto ARGV
stack, and then switches to reading new file:
...
# read .conf to get datafile location
/^$|^#|^junkview/ { next }
FNR == NR && /^datapath/ {
ARGV[ARGC++] = $2 "/ip2c-names"
datafile = $2 "/ip2c-data"
nextfile
}
# [process ip2c-names]
FNR != NR {
...
}

'nextfile' is gawk extension?

Grant.
--
http://bugs.id.au

Janis Papanagnou

unread,
Jan 18, 2010, 4:18:12 PM1/18/10
to
igor2 wrote:
> Hi all,
>
> I have an awk script that i call as "awk -f config.awk -f scriptname.awk
> input1 input2"; it does process the two input files as
> expected. File config.awk contains a single BEGIN {} block with all
> the configuration stored in awk variables. Lately I found it would be
> better to have the input file names defined in config.awk. I could
> define them somewhere else and get a shell script around the whole thing
> that substitutes the file names in the awk command line, but I would like
> to aboid that for specific reasons. Neither do I want to use a lot of
> getlines.
>
> Is there a way to get awk to open new input files (or all input files) in
> a way that I generate the input file names in BEGIN? Like if I would
> modify ARGV[] or something...

So just modify ARGV[], then. For example (if I understand your demands
correctly)...


$ more *

::::::::::::::
aaa
::::::::::::::
A line 1
A line 2
A line 3

::::::::::::::
bbb
::::::::::::::
B line 1
B line 2
B line 3
B line 4
B line 5

::::::::::::::
config.awk
::::::::::::::
BEGIN {
n = split("aaa;bbb",files,";") ### [*]
for (i=1; i<=n; i++)
ARGV[ARGC++] = files[i]
}

::::::::::::::
run.awk
::::::::::::::
{ print NR, FNR, $0 }

$ awk -f config.awk -f run.awk
1 1 A line 1
2 2 A line 2
3 3 A line 3
4 1 B line 1
5 2 B line 2
6 3 B line 3
7 4 B line 4
8 5 B line 5


[*] You can, of course pass the string of files as variable(s), too.

Janis

igor2

unread,
Jan 19, 2010, 1:12:20 AM1/19/10
to
On Mon, 18 Jan 2010, Janis Papanagnou wrote:

>igor2 wrote:
>> Hi all,
>>
>> I have an awk script that i call as "awk -f config.awk -f scriptname.awk
>> input1 input2"; it does process the two input files as
>> expected. File config.awk contains a single BEGIN {} block with all
>> the configuration stored in awk variables. Lately I found it would be
>> better to have the input file names defined in config.awk. I could
>> define them somewhere else and get a shell script around the whole thing
>> that substitutes the file names in the awk command line, but I would like
>> to aboid that for specific reasons. Neither do I want to use a lot of
>> getlines.
>>
>> Is there a way to get awk to open new input files (or all input files) in
>> a way that I generate the input file names in BEGIN? Like if I would
>> modify ARGV[] or something...
>
>So just modify ARGV[], then. For example (if I understand your demands
>correctly)...
>

Tried, and it worked. Both with gawk and mawk. However, it still feels a
bit wicked. Is it portable? Would most other awk implementations do the
same?

I tried it with mawk 1.3.3 and gawk 3.1.6. If anyone here has different
implementtions or much older/newer versions, I would be glad to know if it
worked there. Here is a minimalistic test script with full command line:

awk '
BEGIN { ARGV[ARGC++] = "infile2" }
{ print "From " FILENAME ": " $0 }
' infile1

This prints all lines of infile1 first then all lines of infile2.

TIA

Tibor Palinkas

Janis Papanagnou

unread,
Jan 19, 2010, 1:32:04 AM1/19/10
to
igor2 wrote:
> On Mon, 18 Jan 2010, Janis Papanagnou wrote:
>
>> igor2 wrote:
>>> Hi all,
>>>
>>> I have an awk script that i call as "awk -f config.awk -f scriptname.awk
>>> input1 input2"; it does process the two input files as
>>> expected. File config.awk contains a single BEGIN {} block with all
>>> the configuration stored in awk variables. Lately I found it would be
>>> better to have the input file names defined in config.awk. I could
>>> define them somewhere else and get a shell script around the whole thing
>>> that substitutes the file names in the awk command line, but I would like
>>> to aboid that for specific reasons. Neither do I want to use a lot of
>>> getlines.
>>>
>>> Is there a way to get awk to open new input files (or all input files) in
>>> a way that I generate the input file names in BEGIN? Like if I would
>>> modify ARGV[] or something...
>> So just modify ARGV[], then. For example (if I understand your demands
>> correctly)...
>>
>
> Tried, and it worked.

Glad to hear.

> Both with gawk and mawk. However, it still feels a
> bit wicked.

For wicked feelings consult a psychologist not this newsgroup. :-)

> Is it portable?

From the SUS standard (excerpt):

"Input files to the awk program from any of the following
sources shall be text files:
Any file operands or their equivalents, achieved by
modifying the awk variables ARGV and ARGC
[...]"

> Would most other awk implementations do the
> same?

See note about SUS (or tell us on what versions exactly it should run on).

Janis

igor2

unread,
Jan 19, 2010, 10:30:57 AM1/19/10
to
On Tue, 19 Jan 2010, Janis Papanagnou wrote:

<snip>


>> Is it portable?
>
> From the SUS standard (excerpt):
>
> "Input files to the awk program from any of the following
> sources shall be text files:
> Any file operands or their equivalents, achieved by
> modifying the awk variables ARGV and ARGC
> [...]"

Cool, I think that's enough for me; meanwhile installed original-awk and
tried and worked with that too.

>> Would most other awk implementations do the
>> same?
>
>See note about SUS (or tell us on what versions exactly it should run on).

I have no idea on what exactly it would run on, this why I'd try
to make it run on anything :)

Thank you for the support.

Regards,

Tibor Palinkas

Kenny McCormack

unread,
Jan 19, 2010, 11:14:37 AM1/19/10
to
In article <Pine.LNX.4.21.10011...@catv-50624409.bp04catv.broadband.hu>,
igor2 <ig...@inno.bme.hu> wrote:
...

>>See note about SUS (or tell us on what versions exactly it should run on).
>
>I have no idea on what exactly it would run on, this why I'd try
>to make it run on anything :)

The bottom line is that it will work on any remotely normal (Unix or
Unix-like) system in use today, EXCEPT for Solaris, where (by default,
unpatched, etc, etc weasel words) "awk" will get you the old, broken awk
(in /usr/bin). So, on Solaris, you have to be careful.

If you are writing a shell script, you could do something like
(pseudo-code, not bothering to make this completely correct shell code):

if exists /usr/xpg4/bin/awk (or whatever) then use that, else just "awk".

Aleksey Cheusov

unread,
Jan 20, 2010, 4:45:52 AM1/20/10
to

>>>See note about SUS (or tell us on what versions exactly it should run on).
>>
>>I have no idea on what exactly it would run on, this why I'd try
>>to make it run on anything :)

> The bottom line is that it will work on any remotely normal (Unix or
> Unix-like) system in use today, EXCEPT for Solaris, where (by default,
> unpatched, etc, etc weasel words) "awk" will get you the old, broken awk
> (in /usr/bin). So, on Solaris, you have to be careful.

Actually *ALL* awk implementations shipped with Solaris are broken :-/

cheusov@solaris>
0 cheusov>/usr/xpg4/bin/awk '$0 ~ /=/'
/usr/xpg4/bin/awk: syntax error Context is:
>>> $0 ~ /= <<<

1 cheusov>/usr/bin/nawk '$0 ~ /=/'
/usr/bin/nawk: syntax error at source line 1
context is
$0 ~ >>> /= <<<
/usr/bin/nawk: bailing out at source line 1

2 cheusov>/usr/bin/awk '$0 ~ /=/'
awk: syntax error near line 1
awk: bailing out near line 1

2 cheusov>uname -a
SunOS solaris 5.10 Generic_125100-08 sun4u sparc SUNW,Ultra-5_10

0 cheusov>

This is just one bug, there are others.

P.S.
"The One True AWK" by Brian Kernigan has also lots of problems.
Lots of them were sucessfully fixed in NetBSD.

- fixed: serious bug with regular expression, PR/33392
- fixed: support for multibyte charsets in tolower/toupper functions,
PR/36394
- fixed: there is a hardcoded limit on a number of open files, PR/37205
- fixed: incorrect handling of \ at the end of line in awk script, PR/37212
- fixed: incorrect matching of [:cntrl:], PR/38737
- fixed: warning about non-portable escape sequences, PR/39002
- fixed: free(): warning: junk pointer, too low to make sense, PR/39132
- fixed: -Ft is broken, PR/39133
- fixed: segfaults when "nextfile" is in BEGIN {...}, bin/39134
- fixed: nawk doesn't handle RS as a RE but as a single character, PR/30294
- fixed: awk(1) crash with RE and ^ anchor, PR/40689

Others are not fixed:
- Bizarre behavior in awk with invalid numeric constants, PR/42463
- LC_NUMERIC in awk is not POSIX compliant, PR/42320
- NetBSD awk/nawk concatenation op. is slower than that of GNU awk, PR/39759
- /usr/bin/awk: formatting issues in printf, PR/39135
- regexps should treat { and } as {n,m}, but as regular
characters, PR/38127

I sent all this to Brian, but he ignored all these issues and fixes.

So, if somebody needs BSD licensed awk, I'd recommend awk from NetBSD.
http://mova.org/~cheusov/pub/netbsd-tools/

I'm packaging it in PkgSrc
http://pkgsrc.se/wip/netbsd-awk

> If you are writing a shell script, you could do something like
> (pseudo-code, not bothering to make this completely correct shell code):

> if exists /usr/xpg4/bin/awk (or whatever) then use that, else just "awk".

Can anybody explain what is the reason for Sun to ship a non-POSIX
environment by default?

--
Best regards, Aleksey Cheusov.

Kenny McCormack

unread,
Jan 20, 2010, 6:58:16 AM1/20/10
to
In article <s934omh...@chel.imb.invention.com>,
Aleksey Cheusov <v...@gmx.net> wrote:
...

>Actually *ALL* awk implementations shipped with Solaris are broken :-/

Interesting. I had no idea. Note that I never use anything other than
TAWK or GAWK (*), so I have no direct knowledge of the Solaris breakage.

(*) Also, mawk under one particular set of circumstances, although this
is now rendered moot by the "WHINY_USERS" feature in GAWK.

...


>Can anybody explain what is the reason for Sun to ship a non-POSIX
>environment by default?

The usual reason - not wanting to break existing code. Particularly,
the startup scripts. There is, reportedly, one single area (which I
can't remember at the moment) of non-backwards compatibility between
"old" and "new" AWK. And it looks like the decision was made at Sun to
not risk any breakage in the startup scripts by changing AWK (even
though I am pretty sure that that one specific bit of non-backwards
compatibility was probably never exercised/exposed by the scripts).

Anyway, the usual thing - refusing to "upgrade". It always comes back
to bite you, eventually. The question is just "When is 'eventually'?"

0 new messages