Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

awk vs shell script

41 views
Skip to first unread message

Ed Morton

unread,
Sep 9, 2015, 7:54:04 PM9/9/15
to
Given these 3 files:

$ cat file1
BEGIN{ print "Hello World!" }

$ cat file2
awk 'BEGIN{ print "Hello World!" }'

$ cat file3
#!/bin/awk -f
BEGIN{ print "Hello World!" }

"file1" as clearly an awk script and would be executed as "awk -f file1" while
"file2" is clearly a shell script and would be executed as "./file2".

Is "file3" an awk script or a shell script? Why?

You could say "file3" is a shell script since it's called directly from the
shell as "./file3" (just like "./file2") and the first line of it (the shebang)
is not parsed by the awk interpreter.

You could say "file3" is an awk script since after the first line it must be
entirely awk.

I'm looking for a definitive answer, not just an opinion, any pointers appreciated.

Regards,

Ed.

Barry Margolin

unread,
Sep 9, 2015, 8:08:12 PM9/9/15
to
In article <msqgn8$e75$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:

> Given these 3 files:
>
> $ cat file1
> BEGIN{ print "Hello World!" }
>
> $ cat file2
> awk 'BEGIN{ print "Hello World!" }'
>
> $ cat file3
> #!/bin/awk -f
> BEGIN{ print "Hello World!" }
>
> "file1" as clearly an awk script and would be executed as "awk -f file1"
> while
> "file2" is clearly a shell script and would be executed as "./file2".

file2 should be executed as "bash file2". Or you should add a shebang
line "#!/bin/sh".

If you don't have a shebang line, the script is executed by the current
shell, which would likel be wrong if the user is running csh and the
script is for sh (the above one-liner happens to be compatible with
both, but any more complex script is unlikely to be).

>
> Is "file3" an awk script or a shell script? Why?

It's an awk script, because the shebang line tells the OS to run it
using /bin/awk, not the shell.

> You could say "file3" is a shell script since it's called directly from the
> shell as "./file3" (just like "./file2") and the first line of it (the
> shebang)
> is not parsed by the awk interpreter.

You can also compile a C program and run the resulting executable from
the shell using "./a.out", but that doesn't make it a shell script.

You don't have to run it from the shell. You could run it from a C
program exec(). The shebang line is not parsed by the shell, it's parsed
by the kernel.

>
> You could say "file3" is an awk script since after the first line it must be
> entirely awk.

Exactly.

This is no different from beginning a perl script with #!/usr/bin/perl.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Kaz Kylheku

unread,
Sep 9, 2015, 9:57:06 PM9/9/15
to
On 2015-09-09, Ed Morton <morto...@gmail.com> wrote:
> Is "file3" an awk script or a shell script? Why?

file3 is a "hash bang interpreter script". The shell is not involved
in its processing, therefore it isn't a shell script.

The name or full path of file3 can be passed directly to the exec family of
system calls.

The operating system kernel recognizes #!, pulls out the name of the
interpreter and feeds that entire file to that interpreter.

> You could say "file3" is an awk script since after the first line it must be
> entirely awk.

The first line must also be awk; the whole thing is fed to awk. So lucky
for awk that # is a comment character! (The correspondence between #!
and the # comment character in Unix-environment languages is no coincidence.)

Languages that don't use # as a comment character have to provide
a syntactic exception or other trick to pass through hash bang.

Janis Papanagnou

unread,
Sep 9, 2015, 10:25:21 PM9/9/15
to
What sort of answer or reference would qualify as a "definite answer"
for you?

Here's one take[*]:

$ file file[123]
file1: awk or perl script, ASCII text
file2: ASCII text
file3: awk script, ASCII text executable

And mind the "or" with file1! (And also the abstraction level, ASCII
vs. language.)

But note that slightly reformatting file2 may lead to somthing else:

file2: awk or perl script, ASCII text

My opinion (sic!) is that this question makes no sense. (Mind: "Is this
program a C program or C++ program"?) - It depends on how you call it,
and which programs can work with it (interprete/compile/process/... it).

Janis

[*] As seen by Cygwin's file(1) command.

>
> Regards,
>
> Ed.

Stephane Chazelas

unread,
Sep 10, 2015, 5:05:14 AM9/10/15
to
2015-09-09 20:08:08 -0400, Barry Margolin:
> In article <msqgn8$e75$1...@dont-email.me>,
> Ed Morton <morto...@gmail.com> wrote:
>
> > Given these 3 files:
> >
> > $ cat file1
> > BEGIN{ print "Hello World!" }
> >
> > $ cat file2
> > awk 'BEGIN{ print "Hello World!" }'
> >
> > $ cat file3
> > #!/bin/awk -f
> > BEGIN{ print "Hello World!" }
> >
> > "file1" as clearly an awk script and would be executed as "awk -f file1"
> > while
> > "file2" is clearly a shell script and would be executed as "./file2".
>
> file2 should be executed as "bash file2". Or you should add a shebang
> line "#!/bin/sh".
[...]

No, without a sha-bang and if executable, on a POSIX system and
in a "POSIX environment", upon executing using exec*(3) (not
execve(2)), or "env" or ":!" in ex/vi, or find -exec, or "sh"
(including as popen()/system())..., the script will be
interpreted by a POSIX shell. You have no guarantee of that when
using "#! /bin/sh -" for which the behaviour is unspecified per
POSIX.

In practice though, while ommitting the she-bang will make it
more likely to be interpreted by a POSIX shell on Solaris 10 and
before (one of the last systems where /bin/sh was not a POSIX
shell), it wasn't fool proof either (some commands (like awk),
even when in a POSIX environment would still run /bin/sh to
interpret the script IIRC, and on Solaris, you're not in a POSIX
environment by default), so hardcoding the she-bang to #!
/usr/xpg4/bin/sh - or #! /usr/bin/ksh was still a safer option.

--
Stephane

Barry Margolin

unread,
Sep 10, 2015, 10:47:10 AM9/10/15
to
In article <20150909...@kylheku.com>,
That's why it's also no coincidence that most scripting languages use #
as their comment character.

Barry Margolin

unread,
Sep 10, 2015, 10:51:03 AM9/10/15
to
In article <2015091009...@chaz.gmail.com>,
But if you're executing the script from C Shell, are you in a "POSIX
environment"? How do you know it will simply call exec() on the script?

But I just tried it in csh on OS X, and it worked the way you said, so I
was wrong.

Stephane Chazelas

unread,
Sep 10, 2015, 11:15:13 AM9/10/15
to
2015-09-10 10:51:00 -0400, Barry Margolin:
[...]
> > No, without a sha-bang and if executable, on a POSIX system and
> > in a "POSIX environment", upon executing using exec*(3) (not
> > execve(2)), or "env" or ":!" in ex/vi, or find -exec, or "sh"
> > (including as popen()/system())..., the script will be
> > interpreted by a POSIX shell. You have no guarantee of that when
> > using "#! /bin/sh -" for which the behaviour is unspecified per
> > POSIX.
>
> But if you're executing the script from C Shell, are you in a "POSIX
> environment"? How do you know it will simply call exec() on the script?
>
> But I just tried it in csh on OS X, and it worked the way you said, so I
> was wrong.
[...]

I agree that's fragile as it relies on the cooperation of all
the applications that execute commands (either they use the
system's API, or if they use execve() directly, they need to
reproduce the same behaviour).

And in practice the behaviour vary between shells for instance,
where when execve returns ENOEXEC, some will do some sanity
check on the file to see if it looks like a potential shell
script and others won't, some will interpret the script in a
child of them (in POSIX mode, sometimes not even forking if they
they think they can get away with it), some will reexec
themselves in a child, some will invoke what they beleive is the
right sh (which may not be the same as the one picked by
execvp(3)).

So yes, leaving off a she-bang means you have a shell script,
but I can't say I would recommend it.

--
Stephane

Ed Morton

unread,
Sep 10, 2015, 12:54:08 PM9/10/15
to
On 9/9/2015 8:56 PM, Kaz Kylheku wrote:
> On 2015-09-09, Ed Morton <morto...@gmail.com> wrote:
>> Is "file3" an awk script or a shell script? Why?
>
> file3 is a "hash bang interpreter script". The shell is not involved
> in its processing, therefore it isn't a shell script.

I was about to jump in with both feet and buy into it being a "hash bang
interpreter script" instead of either an "awk script" or a "shell script" since
it being something different from both fits with Barry's point about a C program
that was compiled to an "a.out" not being a shell script (but nor is the "a.out"
a C program) even though the file can be executed from the shell so I was
starting to liken the compiler converting a C program to an executable to the
shebang converting a "hash bang interpreter script" to an awk script but then I
realised that if I saw this in a file:

#!/bin/bash
printf "Hello World!\n"

I wouldn't consider it to be anything other than simply a shell script.

We've had lots of good answers but I was hoping someone would say "according to
standard XYZ that is a <fill in the blank> file" and provide a reference.

Ed.

Barry Margolin

unread,
Sep 10, 2015, 12:58:23 PM9/10/15
to
In article <2015091015...@chaz.gmail.com>,
Stephane Chazelas <stephane...@gmail.com> wrote:

> 2015-09-10 10:51:00 -0400, Barry Margolin:
> [...]
> > > No, without a sha-bang and if executable, on a POSIX system and
> > > in a "POSIX environment", upon executing using exec*(3) (not
> > > execve(2)), or "env" or ":!" in ex/vi, or find -exec, or "sh"
> > > (including as popen()/system())..., the script will be
> > > interpreted by a POSIX shell. You have no guarantee of that when
> > > using "#! /bin/sh -" for which the behaviour is unspecified per
> > > POSIX.
> >
> > But if you're executing the script from C Shell, are you in a "POSIX
> > environment"? How do you know it will simply call exec() on the script?
> >
> > But I just tried it in csh on OS X, and it worked the way you said, so I
> > was wrong.
> [...]
>
> I agree that's fragile as it relies on the cooperation of all
> the applications that execute commands (either they use the
> system's API, or if they use execve() directly, they need to
> reproduce the same behaviour).

It turns out it's more complicated than either of us described.

If the first doesn't begin with #!, the kernel doesn't execute the
script, it returns ENOEXEC like in the days before shebang support was
added to the kernel. Then it's up to the shell to do something
reasonable. C Shell's behavior is:

* If the first line begins with a C-shell comment (starting with # in
position 1) then this will be interpreted by the C-shell and must
use
C-shell syntax.

* Otherwise, the file will be considered a Bourne shell script.

>
> And in practice the behaviour vary between shells for instance,
> where when execve returns ENOEXEC, some will do some sanity
> check on the file to see if it looks like a potential shell
> script and others won't, some will interpret the script in a
> child of them (in POSIX mode, sometimes not even forking if they
> they think they can get away with it), some will reexec
> themselves in a child, some will invoke what they beleive is the
> right sh (which may not be the same as the one picked by
> execvp(3)).
>
> So yes, leaving off a she-bang means you have a shell script,
> but I can't say I would recommend it.

IIRC, scripts were not originally implemented in the kernel, that came
much later. I think the evolution was something like this: (I'm sure
someone will correct my details)

If the Bourne Shell received an ENOEXEC error from the kernel, it would
fork a subshell to execute it.

When C Shell came along, it performed the check described above --
starting a script with # marked it as a Csh script, otherwise it ran it
using Bourne shell for compatibility. At this point in history, # wasn't
a comment character in Bourne Shell -- comments were written using the :
no-op command -- so there was no possibility of ambiguity.

In Edition 8, Dennie Ritchie added support for the shebang line to the
kernel, which generalized on this idea. Shortly after, # was added as
the comment character in Bourne Shell.

Barry Margolin

unread,
Sep 10, 2015, 1:03:21 PM9/10/15
to
In article <msscfs$gk3$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:

> We've had lots of good answers but I was hoping someone would say "according
> to
> standard XYZ that is a <fill in the blank> file" and provide a reference.

Someone did that. The relevant standard (POSIX) explictly states that
the interpretation of files that begin with #! is undefined.

But we're talking about terminology, not execution. So even if the
standard did say how these things are processed, it wouldn't answer the
question of how we should refer to them.

Kaz Kylheku

unread,
Sep 10, 2015, 1:23:51 PM9/10/15
to
On 2015-09-10, Ed Morton <morto...@gmail.com> wrote:
> On 9/9/2015 8:56 PM, Kaz Kylheku wrote:
>> On 2015-09-09, Ed Morton <morto...@gmail.com> wrote:
>>> Is "file3" an awk script or a shell script? Why?
>>
>> file3 is a "hash bang interpreter script". The shell is not involved
>> in its processing, therefore it isn't a shell script.
>
> I was about to jump in with both feet and buy into it being a "hash bang
> interpreter script" instead of either an "awk script" or a "shell script" since
> it being something different from both fits with Barry's point about a C program
> that was compiled to an "a.out" not being a shell script (but nor is the "a.out"
> a C program) even though the file can be executed from the shell so I was
> starting to liken the compiler converting a C program to an executable to the
> shebang converting a "hash bang interpreter script" to an awk script but then I
> realised that if I saw this in a file:
>
> #!/bin/bash
> printf "Hello World!\n"
>
> I wouldn't consider it to be anything other than simply a shell script.

class/subclass/instance quibble

The class of shell scripts (rather, those which have a #!) line is a subclass
of the class of hash bang interpreter scripts.

Ed Morton

unread,
Sep 10, 2015, 1:51:09 PM9/10/15
to
On 9/10/2015 12:03 PM, Barry Margolin wrote:
> In article <msscfs$gk3$1...@dont-email.me>,
> Ed Morton <morto...@gmail.com> wrote:
>
>> We've had lots of good answers but I was hoping someone would say "according
>> to
>> standard XYZ that is a <fill in the blank> file" and provide a reference.
>
> Someone did that. The relevant standard (POSIX) explictly states that
> the interpretation of files that begin with #! is undefined.
>
> But we're talking about terminology, not execution. So even if the
> standard did say how these things are processed, it wouldn't answer the
> question of how we should refer to them.
>

That would seem to be a significant problem. Without names for things it's hard
to have a discussion about them and it's easy for confusion and
misinterpretation to occur.

The context for all of this was someone asking how to print "the script name"
from within awk given a file named "file4.awk" with contents:

#!/bin/awk -f
BEGIN{ print <the script name> }

It got me thinking - what is "the script name"? If the file content was:

awk 'BEGIN{ print <the script name> }'

then would "the script name" be the name of the file containing the shell script
that calls awk? Or would it be "-" or similar since the awk script has no name.
By comparison, if we had:

printIt.awk:
BEGIN{ print <the script name> }

file4.awk:
awk -f printIt.awk

then "the script name" would either be "file4.awk" if the shell script name was
intended to be printed, or "printIt.awk" if the awk script name was intended to
be printed.

So then I moved onto why is the file named "file4.awk" anyway? Is it even an awk
script? In either case, in general we're supposed to name software based on what
it does, not how it's implemented, and in the context of a larger software
suite, if I wanted to improve performance, for example, I should be able to
replace the entire contents of "file4.awk" with, say, an executable compiled
from C that has identical functionality but runs faster without changing any
other software in my system (i.e. abide by tight cohesion and loose coupling)
but I can't replace "file4.awk" with a C executable named "file4.awk" since
while that would mean I don't have to change any calling scripts, it would
obfuscate the code.

So, that's why I'm trying to get a definitive answer to "what is the correct
term for that file" as it'd help me unravel my own confusion on the subject in
genera and the best solution for the specific problem at hand.

Ed.

Janis Papanagnou

unread,
Sep 10, 2015, 4:19:24 PM9/10/15
to
Am 10.09.2015 um 20:50 schrieb Ed Morton:
> On 9/10/2015 12:03 PM, Barry Margolin wrote:
>>
>> [...] The relevant standard (POSIX) explictly states that
>> the interpretation of files that begin with #! is undefined.
>>
>> But we're talking about terminology, not execution. So even if the
>> standard did say how these things are processed, it wouldn't answer the
>> question of how we should refer to them.
>>
>
> That would seem to be a significant problem. Without names for things
> it's hard to have a discussion about them and it's easy for confusion
> and misinterpretation to occur.

tl;dr

If you can run your script successfully as 'awk -f yourscript'
it's an awk program, if you can run it successfully as, say,
'sh yourscript' it's a shell program. In this respect the #!
is meaningless for terminology; it's at best an informal hint.

>
> [...]
>
> So, that's why I'm trying to get a definitive answer to "what is the
> correct term for that file" as it'd help me unravel my own confusion on
> the subject in genera and the best solution for the specific problem at
> hand.

I already said something about the command interpreting the file,
which constitutes what a file "is", beyond its inherent property
of being text. To try making that point more obvious try answering
the question what the files containing the subsequent code are:

#! /bin/awk -f should not be used, use /usr/local/bin/perl instead !#
BEGIN { print "Hello World!" }

#! /bin/awk -f should not be used, use /usr/local/bin/perl instead !#
print "Hello World!\n" unless (0);

a) if executed in non-Unix (say DOS) environment
b) if executed with/without - careful: meta information! - chmod +x on Unix
c) without execution flag and and explicit interpreter awk
d) without execution flag and and explicit interpreter perl

You should note that #! is an informal comment in the first place.
Whether it's a interpreter script depends on the OS and the file
system's meta-information (execute flag), but that is an operational
property, not a file property. And also no script language property.
Whatever the #! will define, you will not be able to run the second
program with awk, and you will be able to run the first program with
awk and with perl. As opposed to your "Hello World" sample you usually
you have enough code in a file to identify the language. So it boils
down to identify the underlying language and necessary interpreter -
indepentent of OS, and execute flags, and #! mechanism! If you can
run such code explicitly with awk it's an "awk program". The shebang
comment is irrelevant to identify the "file type" WRT terminology;
it may just be used in addition to explain that in certain context
with certain preconditions it helps to run the program with an OS
supported implicit interpreter mechanism. If you can run your script
successfully as 'awk -f yourscript' it's an awk program, if you can
run it successfully as, say, 'sh yourscript' it's a shell program.

As already said upthread: "It depends on how you call it, and which
programs can work with it (interprete/compile/process/... it)."

Janis

>
> Ed.

Ed Morton

unread,
Sep 10, 2015, 6:29:36 PM9/10/15
to
Thanks to all who replied, my take-away is that the script is whatever the
shebang line says it is, so if a file starts with "#!/bin/awk" it's an awk
script and so it'd be acceptable to name the file "file3.awk".

Ed.
0 new messages