Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How can I use wildcards as arguments for AWK-Scripts?

346 views
Skip to first unread message

mohsen...@gmail.com

unread,
Nov 25, 2014, 5:05:42 AM11/25/14
to
Hi everyone

I've written an AWK script which takes as an argument a Logfile.
There are 36 logfiles called from "LOG_01.txt" to "LOG_36.txt".

The content of the first two Logfiles looks like below. I have to determine the number of each name which occurs in each Logfile.

LOG_01.txt LOG_02.txt . . . . . .
-------------------------- . . . . . .
EA000002000 EA000002036 . . . . . .
EA000002000 EA000002036 . . . . . .
EA000002001 EA000002037 . . . . . .
EA000002001 EA000002037 . . . . . .
EA000002001 EA000002039 . . . . . .
EA000002002 EA000002039 . . . . . .
EA000002003 EA000002039 . . . . . .
EA000002003 EA000002039 . . . . . .
EA000002004 EA000002040 . . . . . .
EA000002004 EA000002040 . . . . . .
EA000002005 EA000002041 . . . . . .
EA000002005 EA000002041 . . . . . .
EA000002007 EA000002042 . . . . . .
EA000002007 EA000002042 . . . . . .

When I use my awk script with one argument it works well as below and generates an output file called LOG_01_Nr.txt

script.awk LOG_01.txt
The generated file is ==> LOG_01_Nr.tx

But when I use wildcards "??" for all 36 logfiles as below it takes the content of all Logfiles and processes them and put the result only in one file called again "LOG_01_Nr.txt".

script.awk LOG_??.txt
The generated file is ==> LOG_01_Nr.tx

And because it processes all the files at once the counter is not set to zero and the number of occurrences goes higher and higher.

How can I tell to script that it has to use each file alone for itself.?


Best regards
Mohsen

Kenny McCormack

unread,
Nov 25, 2014, 5:54:33 AM11/25/14
to
In article <100528ef-d918-4b14...@googlegroups.com>,
<mohsen...@gmail.com> wrote:
>Hi everyone
>
>I've written an AWK script which takes as an argument a Logfile.
>There are 36 logfiles called from "LOG_01.txt" to "LOG_36.txt".
...
>But when I use wildcards "??" for all 36 logfiles as below it takes the content
>of all Logfiles and processes them and put the result only in one file called
>again "LOG_01_Nr.txt".

It would help if:

1) You told us which OS you were using.

2) If you provided the source code of your script.

It would be nice if there was some way for the posting software to require
these items to be provided before the post was accepted by the Usenet
system, but, alas, things have not advanced to that point yet. Maybe in
Usenet 3.0...

Anyway, I'm going to assume Unix (for 1) above) and, as it turns out, I'm
going to assume that your script is correct.

So, in Unix (i.e., in shell script language), the most straightforward way
to solve your problem is to use a shell loop, like this:

#!/bin/bash
for i in LOG_??.txt;do
awk myscript.awk "$i"
done

Now, having said that, note that it probably is possible to do it all
inside the AWK script itself, but on Unix, the above is the easiest. On
other OSes (I.e., those with broken command interpreters), you might
actually prefer to do it in the AWK script rather than in the (broken)
command interpreter.

P.S. I am assuming that your script does contain the necessary logic to
construct the output filename - i.e., to convert "LOG_01.txt" to "LOG_01_Nr.txt".
But note that it might actually be slicker to have the script just write to
the standard output and use the shell to construct the output filename
(current versions of bash have pretty sophisticated capabilities in that
regard).

--
The motto of the GOP "base": You can't *be* a billionaire, but at least you
can vote like one.

Luuk

unread,
Nov 25, 2014, 6:39:16 AM11/25/14
to
On 25-11-2014 11:54, Kenny McCormack wrote:
> In article <100528ef-d918-4b14...@googlegroups.com>,
> <mohsen...@gmail.com> wrote:
>> Hi everyone
>>
>> I've written an AWK script which takes as an argument a Logfile.
>> There are 36 logfiles called from "LOG_01.txt" to "LOG_36.txt".
> ...
>> But when I use wildcards "??" for all 36 logfiles as below it takes the content
>> of all Logfiles and processes them and put the result only in one file called
>> again "LOG_01_Nr.txt".
>
> It would help if:
>
> 1) You told us which OS you were using.
>
> 2) If you provided the source code of your script.
>
> It would be nice if there was some way for the posting software to require
> these items to be provided before the post was accepted by the Usenet
> system, but, alas, things have not advanced to that point yet. Maybe in
> Usenet 3.0...
>
> Anyway, I'm going to assume Unix (for 1) above) and, as it turns out, I'm
> going to assume that your script is correct.
>

Assumption is.... ;)

What happens if you do:
awk myscript.awk LOG_02.txt

will it create "LOG_02_Nr.txt"?

@OP: if so, please follow the instructions given

mohsen...@gmail.com

unread,
Nov 25, 2014, 7:37:29 AM11/25/14
to
-----------------------------------------------------------------------
Dear Kenny

Excuse me that I forgot to give my OS on which I work.
I'm using Linux (bash) under CYGWIN.

I wanted to attach my script and the input files to my question but I didn't find a button for the attachment.

Nevertheless below is my script, which generates also the output filename.
=======================================
#! /usr/bin/awk -f
#======================================
# Author: Mohsen Owzar
# Date: 25.11.2014
# Filename: RecognizeSlotNr.awk
#======================================
# Description:
# e.g.
# RecognizeSlotNr.awk <Filename>
# RecognizeSlotNr.awk LOG_16.txt
#======================================
# BEGIN Part
#======================================
BEGIN {
FILE = ARGV[1]
FILE1 = FILE
FirstEncoderNr = 2000
sub(/\.txt/, "", FILE1)
OUT = FILE1 "_Nr.txt"
print "Processing ==> " FILE > OUT
print "******************************" > OUT
NAM_Before = ""
}
#======================================
# MAIN Part
#======================================
{
N++
NAM = $1
sub(/EA/, "", NAM)
NAM = NAM + 0
Diff = NAM - FirstEncoderNr
SlotNr = int(Diff / 6) + 1
CNT[SlotNr]++

if (NAM != NAM_Before) {
NAM_CNT ++
ENC_NAM[++M] = NAM
}
NAM_Before = NAM
# printf("%-8s%-8s%-8s\n", NAM, Diff, SlotNr)
}

##======================================
## END Part
##======================================
END {
print "The number of Encoders is: " NAM_CNT "\n" > OUT
printf("%-8s%-4s\n", "ENC", "Nr") > OUT
print("***********") > OUT
for (i = 1; i <= M; i++) {
Diff = ENC_NAM[i] - FirstEncoderNr
Nr = int(Diff / 6) + 1
printf "%-8s%-4s\n", ENC_NAM[i], Nr > OUT
}
print "****************" > OUT
for (i in CNT) {
printf "Slot %-4s ==> %-4s\n", i, CNT[i] > OUT
}
print "\nThe generated File is ==> " OUT
}

==============================================
As you wrote in your answer, I tried to write a shell script to manage this problem (GiveNutzenNr.sh). This script is as the following:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#! /usr/bin/bash
# Filename: GiveNutzenNr.sh
# It concatenates all single logfiles into one big logfile.
# It replaces the date and time into the name of the logfile.

IN_File=$*
RecognizeSlotNr.awk $IN_File

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

As I wanted to run this script as the following:

$ GiveNutzenNr.sh LOG_??.txt
./GiveNutzenNr.sh: Zeile 5: $'\r': Kommando nicht gefunden.
« kann nicht zum Lesen geöffnet werden (No such file or directory)

It brought the above Error message, even if all the Logfiles are there.

I have used this construction (shell script) before and it has worked well. I don't know what now the problem is.

I thought that inside the awk program is not a problem to gather all the files with wildcards e.g. "LOG_??.txt".

In this case you don't need a wrapper shell script to use the AWK script.

best regards
Mohsen

mohsen...@gmail.com

unread,
Nov 25, 2014, 7:48:02 AM11/25/14
to
Hi Luuk

Yes, it puts all the calculated data into the file "LOG_02.txt".

Regards
Mohsen

Kenny McCormack

unread,
Nov 25, 2014, 7:58:00 AM11/25/14
to
In article <e1ba7320-12ec-43e7...@googlegroups.com>,
<mohsen...@gmail.com> wrote:
...
>I thought that inside the awk program is not a problem to gather all the files
>with wildcards e.g. "LOG_??.txt".
>
>In this case you don't need a wrapper shell script to use the AWK script.

Yes. The all-in-AWK way to do it is like this:

--- Code ---
# Assumes GAWK - because if you're not using GAWK [*], you should be.
BEGINFILE { ofn = gensub(/\.txt/,"_NR&",1,FILENAME) }
{
# do stuff - build up counters/arrays
}
ENDFILE { # do stuff - dump counters/arrays to "ofn"
# reset counters/arrays
close(ofn) }
--- Code ---

Note: If it really is this simple, you could eliminate the BEGINFILE and
calculate ofn in ENDFILE, but I thought it might be useful to have ofn
already there for the "build up counters/arrays" part.

[*] Or, of course, TAWK.

mohsen...@gmail.com

unread,
Nov 25, 2014, 8:51:39 AM11/25/14
to
Hi again dear Kenny

Yes, I'm using GAWK. It means that I do not need the part "ofn" you have written in your answer.

I didn't understand what you mean with your all-in-AWK way answer.
You mean that I can use wildcards in AWK only in combination with a wrapper shell script?
There is no other way to manage this in the AWK script?

Regards
Mohsen

Kenny McCormack

unread,
Nov 25, 2014, 8:57:59 AM11/25/14
to
In article <13cf1134-eb24-4740...@googlegroups.com>,
<mohsen...@gmail.com> wrote:
...
>Yes, I'm using GAWK. It means that I do not need the part "ofn" you have written
>in your answer.

No, the point is that BEGINFILE, ENDFILE, and gensub() are not found in
standard (vendor-supplied) AWKs. But as you are using GAWK, the code
should "just work".

>I didn't understand what you mean with your all-in-AWK way answer.

Then more study is needed.

>You mean that I can use wildcards in AWK only in combination with a
>wrapper shell script?

Not sure what you mean, but the point is that the code should work as
written.

>There is no other way to manage this in the AWK script?

It will work.

--
Modern Conservative: Someone who can take time out from flashing her
wedding ring around and bragging about her honeymoon to complain that a
fellow secretary who keeps a picture of her girlfriend on her desk is
"flauting her sexuality" and "forcing her lifestyle down our throats".

mohsen...@gmail.com

unread,
Nov 25, 2014, 9:51:27 AM11/25/14
to
> >You mean that I can use wildcards in AWK only in combination with a
> >wrapper shell script?
>
> Not sure what you mean, but the point is that the code should work as
> written.

I meant that I have to use the shell script (below) to get all the Filenames and then I have to give them to the AWK script.

============================
#! /usr/bin/bash
# Filename: GiveNutzenNr.sh

IN_File=$*
RecognizeSlotNr.awk $IN_File
============================

Many thanks for your help.

Best regards
Mohsen

Manuel Collado

unread,
Nov 25, 2014, 10:12:40 AM11/25/14
to
El 25/11/2014 13:37, mohsen...@gmail.com escribió:
> Am Dienstag, 25. November 2014 11:54:33 UTC+1 schrieb Kenny McCormack:
>> In article <100528ef-d918-4b14...@googlegroups.com>,
>> <mohsen...@gmail.com> wrote:
>>> Hi everyone
>>>
>>> I've written an AWK script which takes as an argument a Logfile.
>>> There are 36 logfiles called from "LOG_01.txt" to "LOG_36.txt".
>> ...
>>> But when I use wildcards "??" for all 36 logfiles as below it takes the content
>>> of all Logfiles and processes them and put the result only in one file called
>>> again "LOG_01_Nr.txt".

Of course. Your code just composes the output file name in the BEGIN
clause, from the first file argument name. And write all the output just
once in the END clause.

> #======================================
> BEGIN {
> FILE = ARGV[1]
> FILE1 = FILE
> ...
> OUT = FILE1 "_Nr.txt"
> ...
> }
>....................................
> ##======================================
> END {
> print "The number of Encoders is: " NAM_CNT "\n" > OUT
> ...
> }

What you probably want is to process each file separately. As a hint,
please start changing the BEGIN pattern to BEGINFILE, and the END
pattern to ENDFILE (assuming you have a recent version of gawk).

BEGINFILE {
FILE1 = FILENAME
...
OUT = FILE1 "_Nr.txt"
...
}

ENDFILE {
print "The number of Encoders is: " NAM_CNT "\n" > OUT
...
}

After looking to the given results, you will probably be able to modify
the rest of the code to suit your needs.

Hope this helps.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

mohsen...@gmail.com

unread,
Nov 25, 2014, 10:19:22 AM11/25/14
to
Thanks a lot Manuel for your hints.

Regards
Mohsen

Luuk

unread,
Nov 25, 2014, 10:48:12 AM11/25/14
to
On 25-11-2014 13:37, mohsen...@gmail.com wrote:
> $ GiveNutzenNr.sh LOG_??.txt
> ./GiveNutzenNr.sh: Zeile 5: $'\r': Kommando nicht gefunden.
> « kann nicht zum Lesen geöffnet werden (No such file or directory)


add 'set -x' to your script:
#!/bin/bash
set -x
IN_File=$*
RecognizeSlotNr.awk $IN_File



luuk@opensuse:~/tmp> ./GiveNutzenNr.sh LOG_??.txt
+ IN_File='LOG_01.txt LOG_02.txt'
+ RecognizeSlotNr.awk LOG_01.txt LOG_02.txt
./GiveNutzenNr.sh: line 4: RecognizeSlotNr.awk: command not found

mohsen...@gmail.com

unread,
Nov 26, 2014, 3:58:24 AM11/26/14
to
> add 'set -x' to your script:
> #!/bin/bash
> set -x
> IN_File=$*
> RecognizeSlotNr.awk $IN_File
>
> luuk@opensuse:~/tmp> ./GiveNutzenNr.sh LOG_??.txt
> + IN_File='LOG_01.txt LOG_02.txt'
> + RecognizeSlotNr.awk LOG_01.txt LOG_02.txt
> ./GiveNutzenNr.sh: line 4: RecognizeSlotNr.awk: command not found

Hi Luuk
I've added the set command in my script and used the BEGINFILE and ENDFILE from Manuel.
I see at first all the input files which were taken and then I get 36 generated files. That is what I wanted.

Thanks to all for your helps
Mohsen

Bruce Barnett

unread,
Nov 27, 2014, 9:55:04 AM11/27/14
to
Several have already answered your question, but generally there are two ways to pass arbitrary variables into an awk script. The first way is to pass a script into awk from the shell and make sure the script's variable is evaluated before awk sees it, i.e.
#!/bin/sh
COLUMN=3
#print the specified column, in this case - same as print $3
awk '{print ' "$column" '}' <file

The second way is to pass it as a variable using the form
awk scriptname variable=value, i.e
#!/bin/sh
awk 'END {print VAR}' VAR=2 </dev/null

In your case, you can also get the filename using the special variable FILENAME
as Kenny McCormack suggested.

Kenny McCormack

unread,
Nov 27, 2014, 10:12:08 AM11/27/14
to
In article <08e128ef-4311-4283...@googlegroups.com>,
Bruce Barnett <grym...@gmail.com> wrote:
>Several have already answered your question, but generally there are two
>ways to pass arbitrary variables into an awk script. The first way is to
>pass a script into awk from the shell and make sure the script's variable
>is evaluated before awk sees it, i.e.
>
>#!/bin/sh
>COLUMN=3
>#print the specified column, in this case - same as print $3
>awk '{print ' "$column" '}' <file

Except that in most Unix shells, including any "standard" version of
/bin/sh, variable names are case-sensitive, so the above won't work.

Also note that this method has to be used carefully, and only when you
really need it (i.e., when you really know what you're doing - and why).

>The second way is to pass it as a variable using the form
>awk scriptname variable=value, i.e
>#!/bin/sh
>awk 'END {print VAR}' VAR=2 </dev/null

The more normal way to do this is:

awk -v VAR=2 'BEGIN {print VAR}'

The point is that in practice, the "variables assignments in place of
filenames" method is rarely used. You almost always want to use the "-v"
option, which ensures that the variabe is set before the BEGIN clause is
executed.

>In your case, you can also get the filename using the special variable FILENAME
>as Kenny McCormack suggested.

Yes.

--
"They shall be attended by boys graced with eternal youth, who to the
beholder?s eyes will seem like sprinkled pearls. When you gaze upon that
scene, you will behold a kingdom blissful and glorious."

--- Qur'an 76:19 ---

Bruce Barnett

unread,
Nov 27, 2014, 12:37:17 PM11/27/14
to
On Thursday, November 27, 2014 10:12:08 AM UTC-5, Kenny McCormack wrote:
> In article <08e128ef-4311-4283...@googlegroups.com>,
> Bruce Barnett wrote:

> >#!/bin/sh
> >COLUMN=3
> >#print the specified column, in this case - same as print $3
> >awk '{print ' "$column" '}' <file
>
> Except that in most Unix shells, including any "standard" version of
> /bin/sh, variable names are case-sensitive, so the above won't work.


Oops. Right. typo.

Mike Sanders

unread,
Dec 14, 2014, 4:56:01 AM12/14/14
to
mohsen...@gmail.com wrote:

> How can I tell to script that it has to use each file alone for itself.?

A Windows solution for wildcards (just thinking aloud)...

example 1 process only files of type .txt


@echo off
for %%x in (*.txt) do awk %%x > %%x.output


example 2 process files of type .txt, .csv, .foo


@echo off
for %%x in (*.txt *.csv *.foo) do awk %%x > %%x.output


--
Mike Sanders
www: http://freebsd.hypermart.net
gpg: 0xD94D4C13
0 new messages