I installed a brand new linux Mandrake and I noticed a strange thing when
using the shebang in a python script.
$ which python
$ python
$ /usr/bin/python
$ chmod +x myscript.py
$ ./myscript.py
error : interpreter not found. (!!!)
The shebang works for #!/bin/sh (and others)
Did I miss something ?
Thanks in advance for any hint.
There's a very subtle bug (feature?) in bash (and maybe other shells?)
that will generate this error if the line is terminated with a CR/LF
pair instead of just a linefeed.
Your script came from a DOS/Windows system and the CR's weren't
stripped. If you copied using FTP make sure to use "text" mode. If you
extracted a zip archive using the "unzip" command use "-a" to convert
text files.
Ensure your very first line _is_ exactly the one you _see_ , i.e.
`#!/usr/bin/python\n', without any additional invisible character;
trailing `\x00\n', for instance, would make you believe your script
runs fine but does nothing.
I bet bash nerves shall be soothed if you delete then quietly rewrite
this line.
This script is a copy from a FAT32 (windows) partition. Emacs carried on
saving silently this file with DOS like end of lines.
A "dos2unix" fixed this.
Not silently. You'll be seeing a "(DOS)" in your emacs mode line if you edit
a file in DOS mode.
Yes, it's common to other shells. It's not a feature, but it's not a
"bug" per se -- using CR LF terminated text files on Unix is operator
Well guess what, it happens. It even happens when the "operator" is
aware of the problem. So lets not call it a bug, and instead call it
poor programming because the error message is not only incorrect, but
will also waste a fair amount of time of someone trying to debug the
problem because it points them in the wrong direction.
It's understandable once you realize that the shell
thinks the '\r' is part of the filename. Just like
os.execv ('/usr/bin/python\r', ('myfile.py',))
Regards. Mel.
It may be the kernel rather than the shell...
IMHO, it's "buggy" :
* most unix/windows/macos modern tools accept any end of line mark, like
Python does.
* the errror message points the user to a bad direction.
So fix it. Something like this should do the trick:
$ su -
Password: 1234
% for d in /bin /usr/bin /usr/local/bin; do cd $d; for i in *; do ln -s $i
`printf '%s\r' $i`; done; done
% exit
But how many people use \r at the end of filenames? Or are even aware
that they can?
Even if it isn't a bug, it's a feature that causes more harm than
You seem to be under the impression that this is some "process the \r at
the end of the filename" feature. It isn't. The kernel will treat
everything from the shebang to the linefeed as the command-line to be
used; there's no special "feature" specifically spotting a rogue
character and tripping the foolish user up.
This is simple, known, documented behaviour. If other systems place
foreign characters in the shebang line, it's up to the *user* to know
that; the kernel does what it's told. I certainly don't want the kernel
having special-case, workaround code for line-ending confusions that are
nothing to do with it.
When writing shell scripts, there are many things to learn; line endings
is but one of them. When moving files between operating systems, there
are many things to learn; the differences in line endings is but one of
It's not the job of the kernel to protect the user from herself. That's
the job of userspace programs -- or meatspace processes :-)
It's simply an end-of-line issue. The "bug" here is that DOS chose to
use CR LF as the end-of-line terminator. (Mac gets even fewer points,
since it chose to do deliberately do smoething even more different.)
This has nothing to do with Unix, it's an inherent difference between
platforms. The platforms are not the same; if you pretend like they are
then you'll continually run into problems.
From another angle, I've been using the program ws-ftp to
move files betweem M$ systems and *nix systems, and it fixes
all the line-end problems for me. I only get bitten now if I
move files another way.
Regards. Mel.
All FTP programs have considerations for encoding translations,
from simple ones such as ASCII and BINARY/IMAGE, to others
more obscure such as TENEX, EBCDIC, etc.
Any FTP program will work, if you transfer them in ASCII mode.
That's debatable. At least the Mac still only uses *one*
character for each end-of-line...
At the time the Mac was created, line endings of LF (Unix) and CR LF
(CP/M, DOS) were common. The only reason you'd choose CR is to do
something different. Their implementation of data forks vs. resource
forks are another example of Apple doing something different simply for
its own sake, which introduce massive interplatform compatibility
That's all fine, but the Mac does exhibit one problem I can't fathom:
If you have two or more Python installations, the first one in
your path gets invoked no matter what the shebang line says.
If the first line of a script is #!/usr/local/bin/python, I expect the
interpreter located in /usr/local/bin to execute the script, not the
one in /usr/bin, or the one in /sw/bin, but that is what you get if
you run the script as an executable.
The process list shows why - python is called without a path, e.g. as
"python". The same behavior occurs if the shell is bash or tcsh.
As far as I know, OS X is the only "modern" Unix to behave this way.
This behavior caused me a large amount of angst until I figured out
what the problem was. I had to resort to writing a script that
renames the pythons in /usr/bin/ and /sw/bin when I don't want them
and renames them back if and when I do.
I mostly use Python 2.3 from CVS, but I haven't been able to get
mod_python working with a Framework build yet, so I need to switch
to Fink's python once in a while.
>At the time the Mac was created, line endings of LF (Unix) and CR LF
>(CP/M, DOS) were common. The only reason you'd choose CR is to do
>something different. Their implementation of data forks vs. resource
>forks are another example of Apple doing something different simply for
>its own sake, which introduce massive interplatform compatibility
I'm not sure it's entirely fair to jump to the conclusion that Apple chose CR
as line end only to be different.
My interpretation is that going from CRLF to a single character signalled
a switch from hardware-control semantics to symbolic semantics. I.e., CR and LF
originally literally referred to printing hardware with a physical "carriage" like
a typewriter or -- keeping paper handling stationary -- a moving print head,
and line feeding actually fed paper a line at a time. They're still used when
dealing with devices and emulated/simulated devices, of course.
The CR is what you get when you hit the Enter key, so Apple did the most direct thing
in using that key code as an EOL symbol. Perhaps they thought that was "cleaner" and
that they would lead the way to a cleaner standard way of doing things when they
achieved market dominance ;-)
Of course, LF has (IMO) a better semantic relationship to the EOL meaning, so translating
the Enter key to LF seems better than CR. Either way, ISTM yet another symptom of what happens
when the hardware-oriented evolves towards the abstract. You wind up with vestigial hardware
semantics in abstract contexts where they don't really belong, e.g., as in one of my pet peeves:
drive letters in file paths.
Bengt Richter
Tru64 (5.1) also shows this behavior (which recently bit me too), but
it's arguably a bug in Python rather than in the OS. If you look
carefully, I think you'll find that the correct binary (e.g.,
/usr/local/bin/python) is in fact being invoked, but that that binary
then uses the libraries associated with the first python in your PATH.
The reason this is happening is that python determines where all of
its libraries live by examining argv[0], if a more suitable method is
not available. If this gives the full path, everything is fine, but
if only the basename is given ("python"), then the startup code walks
to the PATH to guess. As you've noticed, in some cases, this guess is
I never use True64, but my company does, so I'm glad you identified
the same problem on that platform. argv[0] should contain the full
path to the interpreter and it does not, which makes me believe
this is an OS error, not a Python error, except you could argue that
relying on argv is not a platform independent way to find the
correct path.
If I could get the Panther install CD to boot on my PowerBook, I
could see if this is still going to be a problem in the future.
I think the terminology is not taken from typewriters, but from some
old printers where you needed both characters to start a new line.
CR moved the print head to the beginning of the line and LF moved the
paper one line. It can't be compared with a typewriter, where the
[Enter] key did both operations. The Microsoft (other operating
systems also had similar EOF) way is actually the "correct" way, since
the "cursor" needs to move down one line and start at the beginning.
The Unix way is of cource more elegant, because you have a digital
computer and not some mechanical device. It doesn't matter if it's CR
or LF, because both characters only does half of the operation. Apple
should have chosen LF to preserve compatibillity.
CR stands for "carriage return". If you're talking about a print head
moving across the paper, you're no longer talking about a carriage
"returning", so the terminology obviously didn't come from electric
Carriage Return is a direct reference to the paper carriage on a manual
typewriter. These predate electric printing machines, and thus the
terminology was borrowed when teletypes needed control codes to control
their print head.
On such typewriters, the "line feed" function was also separate; once
the carriage was returned to the start of the line, one could cause
the paper to feed up a line at a time to introduce more vertical space;
this didn't affect the position of the paper carriage, so was
conceptually a separate operation.
So, it was teletypes that needlessly preserved the CR and LF as separate
control operations, due to the typewriter-based thinking of their
designers. If they'd been combined into the one operation, we would
have all the same functionality but none of the confusion over line
ending controls.
What [Enter] key? In a *proper* typewriter the act of ending one line and
starting another was effected by using the "carriage return lever", which
physically moved the platen back to the left margin, and incidentally also
fed the paper through the platen. Most typewriters could be set to feed one,
one-and-a-half or two lines. Remember, in these devices the printing
position was fixed (no print head) and the paper had to be moved along each
time a character was printed.
For that matter it might just as well have been ESC or any other arbitrary
character value - clearly a single character will suffice to delimit a line.
I fail to see why Apple should have chosen LF to preserve compatibility with
Unix if Microsoft are "correct". But then I'm just a crotchety old fartbot,
and you're just a mad surfer.
mess-with-old-farts-at-your-peril-ly y'rs - steve
let's see: the first typewriters arrived in 1873 or so, and the baudot tele-
printer was patented in 1874. sounds like parallel development to me.
the first electrical teletypes were, as far as I can tell, modified typewriters.
> So, it was teletypes that needlessly preserved the CR and LF as separate
> control operations, due to the typewriter-based thinking of their designers.
I think you're seriously underestimating the effort it took to build an
electromechanical telegraph machine at the very beginning of the 20th
Although in actual fact the KSR33 teletype did need a fifth of a second to
guarantee that the print head would have returned to the left margin from
column 72 haracters was a "feature". Sometimes you would (all right, *I*
would) depress the two keys in the wrong order, and the result was that you
would see a single character printed in the middle of the new line during
the "flyback" period.
mobile-mine-of-useless-information-ly y'rs - steve
Further highlighting the foolishness of keeping them as separate
operations. If they interacted in this non-intuitive and damaging way,
the "go to the start of the next line" should have been a single
transaction for the user, with the implementation deciding how to carry
it out.
Having CR/LF terminate a line makes reasonable sense for a teletype,
or for a simple printer. Effects such as underline, strikethrough, and
double-print (aka boldface) are accomplished by outputting new
characters on the same line, which can be accomplished by outputting a
CR without a LF.
Similarly, this places double-spacing (two LFs per CR) in the hands of
the controlling machine, rather than the (presumably dumb) slave
One might argue that special control characters would be more
appropriate for these uncommon cases. Then the more common case would
take only a single character, and the special cases two. Most of the
manual typewriters I've used employ a variation on this scheme: a
single "global variable" mechanical switch controls double or single
spacing, and you can overtype only by executing a CR/LF and then
unrolling the LF.
The CR/LF model does have a simpler implementation, though: the
problem of managing linefeeds is passed up to the application. And, as
we know, worse is better (http://www.jwz.org/doc/worse-is-better.html).
What I am trying to remember is how the Friden flexowriter that was connected
to the LGP-30 I once coded for worked re CR/LF. It definitely had a moving carriage
and moved like a typewriter, but ISTR that one key would do the CRLF. But there was
more than one model, and I suspect there was one that had separate CR/LF codes/functions.
Maybe one we used with a PDP-8 later ;-)
Bengt Richter
It wasn't needless. A CR with no LF was often used for overstriking,
as a way of extending the rather limited character set. 'O'
overstruck with '-' would make a passable \Theta, for instance..
Overstriking is amply catered for with the BS (BackSpace) control code.
For the "theta" example: emit a 'O', emit a BS, emit a '-'. In fact,
this method continues today in some *roff outputs, for bold or digraph
characters. The CR is completely redundant for this purpose.
I maintain that the CR and LF were needlessly preserved as separate
operations, with no benefit.
Yes, I was thinking about teletypewriter, but couldn't remember the
I said that using just one character is more elegant, and I think
Apple was right when they chose a single character. I have no real
knowledge of the different line-endings that were used at the time,
but I assume that LF (alone) was more frequently used than CR (alone)
or other characters. If I'm right, the Apple's choice was less
Doesn't really matter - I already told you I'm a crotchety old fartbot :-)
So, FWIW, I agree with you that a single character makes hugely more sense
than CRLF. Apple are well-known for making self-limiting decisions :-)
Although the first is correct (using BS to get the effect) it implies
that the printer/typewriter that you want to make this effect on, is
able to backspace _precicely_ whereas the CR method can be made simple
and easy with a mechanical(?) stopper, that ensures that you always
start at the same point.
> I maintain that the CR and LF were needlessly preserved as separate
> operations, with no benefit.
I disagree. IIRC there were plenty of printers where the carriage
return operation took a disproportionately long time, so therefore
backspacing was the prefered method for overstriking, but at the same
time that meant line feeds were a big performance win for smart
I still have some saved "graphics" output from my NEC Spinwriter
generated by printing a dot "." combined with micro-linefeeds. Ah, the
joys of doing "screenshots" of UCSD Pascal turtlegraphics on an Apple
II... hmmm, think I'm dating myself?
Fair enough then; the operations were kept separate for a reason
justifiable at the time.
Sadly, now we have to live with the legacy of the resulting confusion,
long after those benefits are obsolete.
