Years ago I read a document that discussed the reason for the #! at
the beginning of a script (something about tricking old kernels into
not treating the file as a text file?) but I cannot locate any
information on #!. I've searched through the comp.unix.shell, bash,
and shell intro FAQs with no luck.
Anyone know where I can find out this info?
thanks in advance.
Maybe http://www.opengroup.org has something about it.
Roger
>Unix reads the first 2 bytes of executable files and if they are #! then the
>rest of the first line is treated as a shell that this file is given to. If
>the first 2 characters aren't #! then the file is just executed. I think that
>I was told that the #! has something to do with the magic number, but there's
>a bunch of cobwebs around that one.
>
Every executable has a socalled magic number. Binary files in ELF
format has the first 4 bytes ' ELF', and COFF binarys have other magic
byte values at the beginning of the file depending on the exact type of
the binary file. MSDOS EXE files have the magic number "MZ". That is how
you can run old aout and new ELF linux binaries on the same linux system,
and with iBCS installed you can also run SCO COFF binaries.
If the kernel finds the sequence "#!" at the beginning of an executable
it takes the rest of the line as a program which will interpret the file
as a script.
Villy
> Binary files in ELF format has the first 4 bytes '?ELF' [...]
> If the kernel finds the sequence "#!" at the beginning of an executable
> [...]
IMHO, this "4" might be the origin of a myth:
According to the GNU Autoconf Tutorial [1], "4.2BSD based systems
(such as Sequent DYNIX)", interpret the magic as a long, so that
a blank, i.e. "#! /" would be required. This is cited frequently.
But it turns out, that this is actually not true for 4.2BSD [2]
(and it is extremely unlikely, that any _derived_ Unix-flavour
changed this, breaking the omnipresent existing scripts).
Does anybody know more about the origin of this myth?
[1] <http://www.gnu.org/manual/autoconf/html_node/Portable-Shell.html>
[2] sys/kern_exec.c, mainly l.123 ff. It is not available on tuhs.org,
but e.g. on McKusicks CDs.
Sven
> According to the GNU Autoconf Tutorial [1], "4.2BSD based systems
> (such as Sequent DYNIX)", interpret the magic as a long, so that
> a blank, i.e. "#! /" would be required. This is cited frequently.
Take what I am saying with a grain of salt. But let me grab some old
manuals, and dredge my memory.
Hmm. My a.out(5) manpage from 4th bsd distribution says (VAX/11)
------------------
struct exec {
long a_magic; /* magic number */
...
};
#define OMAGIC 0407 /* old impure format */
#define NMAGIC 0410 /* read-only text */
#define ZMAGIC 0413 /* demand load format */
-------------------
Now these values correspond to PDP op codes I think. I wasn't a PDP
programmer. I programmed for the Data General Nova, which was similar
to the PBP-8. Octal 0401 was a no op: jump to current location .+1
It was a 16-bit instruction in the Nova.
And the exec(2) manpage says (4/1/81 - 4th bsd distribution)
-------------------
"if the first two characters are "#!" then exec attempts to read a
pathname from the executable file and use that program as the command
file interpretor. "
-------------------
It also says the space (or a tab) is *mandatory.*
Hmm. Strange. I rarely used a space. But I was using this on Eunice
under VAX/VMS - a 32-bit machine.
> Does anybody know more about the origin of this myth?
Just guesses. But if the CPU was a 16-bit machine, and the magic
number was 16 bits long, then it's easy to distinquish between a
script and a binary because the magic numbers above are non-ASCII. And
(guess) if the machine was an 18-bit architecture (PDP-9?) then you
need more than 16 bits to make sure the first 18 bits are not a MAGIC
number. Therefore you need 3 8-bit characters for a 18-bit machine.
Hmm. since you have to worry about big-endian/little-endian, this
might matter for 32-bit machines. It's a guess.
As for the #!, I think the story goes......
At first, the kernel could only execute binary files. The shell, when
asked to execute a file, could stat the file looking for the "x" bit, and then
would look at the first few bytes to
determine if it was ASCII or binary. If it was a MAGIC number, it could exec() the file
But if the first few bytes were ASCII, then the shell know it was a shell
script. It would fork itself and interpret the file.
So the Bourne shell would use sh to execute a shell script, and the C
shell would use /bin/csh to execute the script.
But you can see the problem with this. As long as the C shell always
executed C shell scripts, etc. there would be no problem.
What happens if you mix scripts on the same machine?
Sh executes csh scripts and vice versa. Disaster.
The early /bin/sh did NOT use "#" for a comment. Instead,
it used : for the null command.
There was great rivalry between AT&T and Berkeley. It seems that every
time Berkele added an improvement, AT&T said - we can do better, and
make a incompatible but better improvement.
So someone (Berkeley?) created the rule:
If the first character is ":", then use sh, if "#", then use csh.
And they modified sh and csh to use this convention.
But then other UNIX distributions didn't have this same rule. Csh
would see a file starting with ":" and use csh to execute it. Now
scripts ported from one machine to another would not work.
Clearly this was a problem. In addition, a binary could not execute a
script. You could not replace "program" with a shell wrapper.
So Berkeley added the functionality to the KERNEL to determine what
interpretor to use, based on the magic number. So the system read in
the first 16 (18?, 32?, little-endian, big-endian) bits to see if they
were magic or non-magic (i.e. ASCII), and if "#! " it would execute
the filename as an interpretor.
They used the new "#!" as the convention, so that current C shell scripts
would work. And because this was a new combination, old rules still
worked unchanged.
--
Sending unsolicited commercial e-mail to this account incurs a fee of
$500 per message, and acknowledges the legality of this contract.
> [ exec(2) manpage (4/1/81 - 4th bsd distribution) ]
> It also says the space (or a tab) is *mandatory.*
This "mandatory" seems to be exactly what i was searching for.
I had a look on McKusicks CSRG Archive CDs again:
"4.1.snap/usr/man/man2/exec.2"
<http://www.uni-ulm.de/~s_smasch/various/shebang/exec.2.html>
So i finally found it with your help.
This "4.1.snap" snapshot of 4.1BSD is the only version that i found,
which contains this (but neither in other 4.1.x versions nor in 4.2).
Can you tell, what version your BSD is exactly?
However, this is only documentation. The comparison itself happened
competely _bytewise_ (see below) and no blank was _required_ to my
best knowledge.
> [...] So Berkeley added the functionality to the KERNEL to determine
> what interpretor to use, based on the magic number.
Actually this mechanism doesn't origin from Berkeley but from
Research Unix between Version 7 and Version 8 (but we don't know
the latter):
4.0BSD, /usr/src/sys/newsys/sys1.c, contains an email from
Dennis Ritchie, which reads:
# >From dmr Thu Jan 10 04:25:49 1980 remote from research
# The system has been changed so that if a file being executed
# begins with the magic characters #! , the rest of the line is
# understood to be the name of an interpreter for the executed file.
# [...]
# Blanks after ! are OK. Use a complete pathname (no search is done).
See full text and some code at
<http://www.uni-ulm.de/~s_smasch/various/shebang/sys1.c.html>
This was then incorporated in BSD, but actually not really activated
until 4.2BSD.
> As for the #!, I think the story goes......
> The shell, when asked to execute a file, could stat the file looking
> for the "x" bit, and then would look at the first few bytes to
> determine if it was ASCII or binary.
The other way round: The shell exec()s the file, if executable. If
this fails, it tries to source it itself. BTW, libc also is active
in this concern.
> So someone (Berkeley?) created the rule:
> If the first character is ":", then use sh, if "#", then use csh.
> But then other UNIX distributions didn't have this same rule.
> Csh would see a file starting with ":" and use csh to execute it.
It's the macro "OTHERSH" in csh source, which activates this hack.
However it is set by default. sh(1) should have been patched on
such systems as well, though - and this way round problems might
have happened on non-BSDs.
Sven
> This "4.1.snap" snapshot of 4.1BSD is the only version that i found,
> which contains this (but neither in other 4.1.x versions nor in 4.2).
> Can you tell, what version your BSD is exactly?
The cover says
UNIX Programmer's Manual
Seventh Edition
Virtual Vax-11 Version
June 1981
On the bottom it says Berkeley.
Because it says "4th" and not 4.1 or 4.2, I suspect it was 4.0.
Yes. Inside it says 4th distribution, October 1980.
The prefaces are, in order,
4th Berkely Edition
3rd Berkeley Edition
Preface to the UNIX/32V Edition
Preface to the 7th Edition
So the space says in the manual page it was mandatory. But I don't
remember this being the case with Eunice (which is where I got this
manual).
Perhaps it was either required in the 3rd edition, and later changed,
or else the Eunice version made the space optional. Or I'm mis-remembering.
Thanks for the corrections, Sven.
>>> [ exec(2) manpage 4/1/81 ]
>>
>> Can you tell, what version your BSD is exactly?
>
> The cover says [...] June 1981
> [...] Inside it says 4th distribution, October 1980.
> [...] Because it says "4th" and not 4.1 or 4.2, I suspect it was 4.0.
No, 4.0 actually had been released about 10/80, but your man page is
younger. And particularly, it's dated exactly like the "4.1.snap"shot
on the CSRG CDs.
> Perhaps it was either required in the 3rd edition, and later changed,
(No, because that time it wasn't existent yet and the very previous
and next versions of 4.1.snap don't require it at all. But perhaps
they had the intention to change it or to keep the opportunity to do
so, but had withdrawn this idea when 4.2BSD came out.)
Sven