Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Changing $1

75 views
Skip to first unread message

Steve Graham

unread,
Oct 16, 2014, 7:39:12 PM10/16/14
to
I have a file full of file names, such as PSOP1.DAT. In another
directory there are corresponding file names, but they end in .m,
instead of .DAT.

I want to create an AWK script which will do a diff against each one of
these pairs of files.

Just not sure how, once I've gotten the name in one directory (i.e.
PSOP1.DAT), I can change that into the name in the corresponding
directory (i.e. PSOP1.m).

This will print a list of the original filenames, with an extra space
between them. Just not sure how to get the modified filename in there.

awk '{rtn1=$1; cmd="echo diff " rtn1 "; echo \n"; system(cmd)}' dir.txt

Any ideas?

Thanks, Steve

Kenny McCormack

unread,
Oct 16, 2014, 7:51:08 PM10/16/14
to
In article <m1pkuu$gun$1...@speranza.aioe.org>,
Steve Graham <jsgra...@yahoo.com> wrote:
>I have a file full of file names, such as PSOP1.DAT. In another
>directory there are corresponding file names, but they end in .m,
>instead of .DAT.

First of all, the best way to change $1 is 4 quarters.

>I want to create an AWK script which will do a diff against each one of
>these pairs of files.

I would do it like this:

gawk '{ print "diff",$0,"OtherDir/"gensub(/.DAT$/,".m",1) }' | sh

Do it without the "| sh" until you are sure it is working right.

Also, you will need to quote the filenames if there are any spaces or other
weird characters in the filenames. This is left as an exercise for the
reader.

--
There are two kinds of Republicans: Billionaires and suckers.
Republicans: Please check your bank account and decide which one is you.

Steve Graham

unread,
Oct 16, 2014, 8:54:55 PM10/16/14
to
gaz...@shell.xmission.com (Kenny McCormack) wrote:
> In article <m1pkuu$gun$1...@speranza.aioe.org>,
> Steve Graham <jsgra...@yahoo.com> wrote:
>> I have a file full of file names, such as PSOP1.DAT. In another
>> directory there are corresponding file names, but they end in .m,
>> instead of .DAT.
>
> First of all, the best way to change $1 is 4 quarters.
>
>> I want to create an AWK script which will do a diff against each one of
>> these pairs of files.
>
> I would do it like this:
>
> gawk '{ print "diff",$0,"OtherDir/"gensub(/.DAT$/,".m",1) }' | sh
>
> Do it without the "| sh" until you are sure it is working right.
>
> Also, you will need to quote the filenames if there are any spaces or other
> weird characters in the filenames. This is left as an exercise for the
> reader.
>

Thanks so much.

Here is what I ended up with:

gawk ' { print "echo " $0 "\ndiff",$0,"../v28/"gensub(/.DAT$/,".m",1)
"\necho" }' dir.txt | sh > compare.txt

By the way, why do you use .DAT$ instead of .DAT?

Thanks, Steve Graham

Janis Papanagnou

unread,
Oct 17, 2014, 1:52:27 AM10/17/14
to
The regexp pattern
/.DAT/ will match a name _containing_ any character followed by "DAT"
/.DAT$/ will match a name _ending_ with any character followed by "DAT"
/\.DAT$/ will match a name ending with ".DAT"
You need the last of the three patterns.

Note that all that your awk is effectively contributing to the solution is
performing the loop over the files and the substitution. The rest is shell
code. So you could also do it completely in shell. E.g. something like

while IFS= read -r f ; do echo "$f" ; diff "$f" "../v28/${f%.DAT}.m" ; echo ;
done < dir.txt > compare.txt

(untested)

Janis

>
> Thanks, Steve Graham

Steve Graham

unread,
Oct 17, 2014, 2:11:12 AM10/17/14
to
gaz...@shell.xmission.com (Kenny McCormack) wrote:
> In article <m1pkuu$gun$1...@speranza.aioe.org>,
> Steve Graham <jsgra...@yahoo.com> wrote:
>> I have a file full of file names, such as PSOP1.DAT. In another
>> directory there are corresponding file names, but they end in .m,
>> instead of .DAT.
>
> First of all, the best way to change $1 is 4 quarters.
>
>> I want to create an AWK script which will do a diff against each one of
>> these pairs of files.
>
> I would do it like this:
>
> gawk '{ print "diff",$0,"OtherDir/"gensub(/.DAT$/,".m",1) }' | sh
>
> Do it without the "| sh" until you are sure it is working right.
>
> Also, you will need to quote the filenames if there are any spaces or other
> weird characters in the filenames. This is left as an exercise for the
> reader.
>

I just saw your 4 quarters for a dollar quip. Lol

Steve

Steve Graham

unread,
Oct 17, 2014, 2:13:52 AM10/17/14
to
Janis Papanagnou wrote:
> On 17.10.2014 02:55, Steve Graham wrote:
>> gaz...@shell.xmission.com (Kenny McCormack) wrote:

...

>>
>> Thanks so much.
>>
>> Here is what I ended up with:
>>
>> gawk ' { print "echo " $0 "\ndiff",$0,"../v28/"gensub(/.DAT$/,".m",1) "\necho"
>> }' dir.txt | sh > compare.txt
>>
>> By the way, why do you use .DAT$ instead of .DAT?
>
> The regexp pattern
> /.DAT/ will match a name _containing_ any character followed by "DAT"
> /.DAT$/ will match a name _ending_ with any character followed by "DAT"
> /\.DAT$/ will match a name ending with ".DAT"
> You need the last of the three patterns.
>
> Note that all that your awk is effectively contributing to the solution is
> performing the loop over the files and the substitution. The rest is shell
> code. So you could also do it completely in shell. E.g. something like
>
> while IFS= read -r f ; do echo "$f" ; diff "$f" "../v28/${f%.DAT}.m" ; echo ;
> done < dir.txt > compare.txt
>
> (untested)
>
> Janis
>
>>
>> Thanks, Steve Graham
>

Thanks, Janis

Kenny McCormack

unread,
Oct 17, 2014, 2:58:06 AM10/17/14
to
In article <m1ppcs$p1g$1...@speranza.aioe.org>,
Steve Graham <jsgra...@yahoo.com> wrote:
...
>Thanks so much.
>
>Here is what I ended up with:
>
>gawk ' { print "echo " $0 "\ndiff",$0,"../v28/"gensub(/.DAT$/,".m",1)
>"\necho" }' dir.txt | sh > compare.txt
>
>By the way, why do you use .DAT$ instead of .DAT?

It should actually have been \.DAT$ (I typo'd and left out the \).
The $ makes it match only at the end of the string (just in case you have a
filename like foo.DAT.andmore.DAT) and the backslash means to match a dot
explicitly. Otherwise, it would match a filename like foobliDAT.

Also note that as written, the matching is case-sensitive, so I assume your
files really do end in .DAT (all caps). Sometimes people actually want
filename matching to be case-insensitive (if working in a case-insensitive
filesystem environment, such as Windows or Mac). As far as I can tell,
this doesn't apply to you - you seem to be working in Unix or Linux - but I
thought I'd mention it in passing.

P.S. Another poster has given a shell-only solution, but I think that's
ugly and limited in scope (and off-topic here). I like the idiom of using
'gawk' to generate shell commands and then piping the output to 'sh'. It's
quite useful. Note that this method is also more efficient than using
system() internally to the script, since the shell only gets spawned once
(rather than once per command executed).

--
"Insisting on perfect safety is for people who don't have the balls to live
in the real world."

- Mary Shafer, NASA Ames Dryden -

Janis Papanagnou

unread,
Oct 17, 2014, 5:27:07 AM10/17/14
to
On 17.10.2014 08:58, Kenny McCormack wrote:
> [...]
>
> P.S. Another poster has given a shell-only solution, but I think that's
> ugly and limited in scope (and off-topic here).

I deliberately omitted the [OT] flag since all solutions that primarily use
"system()" or "| sh" are depending on specific shell features and syntax;
the key functionality (modulo the awk wrapper glue) in all posted solutions
is OT here.

WRT "limited in scope" I'm not sure what you mean to say. The OP's requested
function is shell (and [non-awk-]tool) functionality. You don't gain much if
using awk just for the implicit loop that shells also support. WRT "uglyness"
I think that unnecessarily spreading functionality between tools is ugly.
YMMV.

Moreover, the awk solutions are also inferior because they are not correct
in the general case; you need to generate quotes around your shell entities.
This quoting will need escaping, and all this will make the _awk_ solution
even uglier:

print "echo \"" $0 "\"\ndiff",$0,"\"../v28/"gensub(/\.DAT$/,".m\"",1) "\necho"

(Hope I have located the stuff in the right place in that quoting mess.[*])

> I like the idiom of using
> 'gawk' to generate shell commands and then piping the output to 'sh'.

I like it too; I use it whereever appropriate. (In the given case I don't
think you gain anything, rather make a simple function less legible. YMMV.)

> It's
> quite useful. Note that this method is also more efficient than using
> system() internally to the script, since the shell only gets spawned once
> (rather than once per command executed).

Yes.

Janis

[*] Note to the OP: Using printf will typically result in better readable
code, IMO, in cases you have many arguments. Here the unquoted form...

printf "\necho %s\ndiff %s %s\necho \n",
$0, $0, "../v28/"gensub(/\.DAT$/,".m",1)

(Though, because of readability, I'd prefer using mutiple prints, one for
each to generate shell command...

printf "echo %s\n", ...
printf "diff %s %s\n", ...
)

Or quoted...

printf "\necho %s\ndiff \"%s\" \"%s\"\n echo\n",
$0, $0, "../v28/"gensub(/\.DAT$/,".m",1)


Kenny McCormack

unread,
Oct 17, 2014, 5:44:41 AM10/17/14
to
In article <m1qnda$hhe$1...@news.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
...
>Moreover, the awk solutions are also inferior because they are not correct
>in the general case; you need to generate quotes around your shell entities.
>This quoting will need escaping, and all this will make the _awk_ solution
>even uglier:
>
>print "echo \"" $0 "\"\ndiff",$0,"\"../v28/"gensub(/\.DAT$/,".m\"",1) "\necho"
>
>(Hope I have located the stuff in the right place in that quoting mess.[*])

You could just do "set -x" at the top [*], then you wouldn't need the
"echo"s. A lot (most) of the ugliness above is for the "echo"s.
I've actually done this trick (using gawk to create shell code that starts
with "set -x").

[*] I.e.,

BEGIN { print "set -x" }
...

(and piped into 'sh')

Anyway, I just don't really like shell as a general purpose programming
language (despite having done a lot of it). (OT) I actually kind of like
(at least the theory behind if not the actual implementation [*] of) MS's
move away from COMMAND/CMD towards something more programmatic (VB and/or
Powershell).

[*] Which has been pretty start/stop/start/stop/kludgey-as-usual-for-them.

P.S. In passing, note that Perl can be seen as an attempt to produce a
shell-replacement - something like the shell but more "programmatic". But
I hate Perl...

--
"The smart way to keep people passive and obedient is to strictly limit the
spectrum of acceptable opinion, but allow very lively debate within that
spectrum...."

- Noam Chomsky, The Common Good -

Janis Papanagnou

unread,
Oct 17, 2014, 6:11:07 AM10/17/14
to
On 17.10.2014 11:44, Kenny McCormack wrote:
> In article <m1qnda$hhe$1...@news.m-online.net>,
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> ...
>> Moreover, the awk solutions are also inferior because they are not correct
>> in the general case; you need to generate quotes around your shell entities.
>> This quoting will need escaping, and all this will make the _awk_ solution
>> even uglier:
>>
>> print "echo \"" $0 "\"\ndiff",$0,"\"../v28/"gensub(/\.DAT$/,".m\"",1) "\necho"
>>
>> (Hope I have located the stuff in the right place in that quoting mess.[*])
>
> You could just do "set -x" at the top [*], then you wouldn't need the
> "echo"s.

Yeah, but that will not display the terse file name information one
usually wants but all executed commands, and there's still the need
for the additional \" escaped quoting around the diff arguments (if
one wants to be on the safe side).

> A lot (most) of the ugliness above is for the "echo"s.
> [...]
>

(what follows until end of post is OT)

> Anyway, I just don't really like shell as a general purpose programming
> language (despite having done a lot of it).

I don't like it much either. But there's still a lot functions that
are better expressed in shell idioms. Neither shell nor awk I consider
to be general purpose programming languages, both limited in their own
way.

> (OT) I actually kind of like
> (at least the theory behind if not the actual implementation [*] of) MS's
> move away from COMMAND/CMD towards something more programmatic (VB and/or
> Powershell).

Those MS ad hoc command interpreters are really bad, so moving away
towards Powershell is in principle a Good Thing. - But you need the
whole .NET framework for Powershell to run, don't you?

>
> [*] Which has been pretty start/stop/start/stop/kludgey-as-usual-for-them.
>
> P.S. In passing, note that Perl can be seen as an attempt to produce a
> shell-replacement

And with a more unique interface to the OS functions than the OS itself
has.

> - something like the shell but more "programmatic".

One significant property is IMO the data structures, missing in both,
shell and awk. (Note: new ksh versions support types, but with syntax
from [what looks like] the 1970's of course.)

> But I hate Perl...

I don't like it either, but it has its "raison d'être".

Janis

Jonathan Hankins

unread,
Jan 24, 2015, 2:00:37 AM1/24/15
to
On Friday, October 17, 2014 at 1:58:06 AM UTC-5, Kenny McCormack wrote:
> In article <m1ppcs$p1g$1...@speranza.aioe.org>,
> Steve Graham <jsgra...@yahoo.com> wrote:
> ...
> I like the idiom of using
> 'gawk' to generate shell commands and then piping the output to 'sh'. It's
> quite useful. Note that this method is also more efficient than using
> system() internally to the script, since the shell only gets spawned once
> (rather than once per command executed).

In gawk, you could also:

print cmd | "sh"
#...
rc = close("sh")

if you want to generate commands, pipe them to sh-or-whatever, then do something based on rc, etc. You could also use sh-or-whatever as a co-process with |&.

-Jonathan Hankins
0 new messages