What are readers views of shell scripts calling other programs by specifying absolute pathname?
Some alternative choices I am aware of, to call commands such as:
/bin/ls /opt/myapps/mybin/myprog
are:
1. Put all directories in the PATH
PATH=/bin:/usr/bin:/opt/myapps/mybin
ls /a/b/c/d myprog -a aaa -b bbb
I prefer this method. It makes for less cluttered looking code than if full pathname is specified, and less cryptic than seeing lots of variables, like $LS, used in choice 2.
In Perl, running in taint-checking mode (perl -T), the PATH must be nailed down in this way, not inherited from the environment. Of course, I could still nail down the PATH, but also call programs using choice 2 or 3.
2. Define all invoked commands at the top
LS="/bin/ls" MYPROG="/opt/myapps/mybin/myprog"
$LS /a/b/c/d $MYPROG -a aaa -b bbb
I think this is widely practiced. An advantage is that you can test for the existance of executables, before trying to call them:
[[ -x $MYPROG ]] || { print -u2 "$MYPROG: No such executable"; exit 1; }
(Admittedly, I practice this choice in Perl, for safe maintenance. But far less often in shell scripts, since shells have far fewer built-ins, and peppering code with the likes of $LS and $CAT looks ugly.)
3. Spell out the full pathname
/bin/ls /a/b/c/d /opt/myapps/mybin/myprog -a aaa -b bbb
This avoids the ugliness of variables like $LS, but having to spell out the (right) pathname for every command (and know which ones are built-ins, so have no pathname) seems plain awkward.
A separate issue is whether production scipts should inherit the PATH from their (possibly uncontrolled) environment, or nail it down, either explicitly like in choice 1, or implicitly as in:
Clyde Ingram wrote: > What are readers views of shell scripts calling other programs by > specifying absolute pathname?
> Some alternative choices I am aware of, to call commands such as:
> /bin/ls > /opt/myapps/mybin/myprog
> are:
> 1. Put all directories in the PATH
> PATH=/bin:/usr/bin:/opt/myapps/mybin
> ls /a/b/c/d > myprog -a aaa -b bbb
I usually do that for scripts that only I use since I can always easily debug problems in my own scripts and this is a quick and easy way to get at where the tools should be.
> I think this is widely practiced. > An advantage is that you can test for the existance of executables, > before trying to call them:
Exactly. I do this in scripts that many people are going to use since they need to be more robust and provide good error detection and error messages.
The naming convention that most people use is to reserve all-upper-case names for exported variables, so your variable names should really be something like:
ls="/bin/ls" myprog="/opt/myapps/mybin/myprog"
$ls /a/b/c/d $myprog -a aaa -b bbb
but that introduces a potential problem because if you forget to put the "$" in when using those variables, e.g.:
ls="/bin/ls" myprog="/opt/myapps/mybin/myprog"
ls /a/b/c/d myprog -a aaa -b bbb
then you find yourself either calling the wrong versions of "ls" and "myprog" or thenm being inecplicably missing. Even worse, you may initially be calling the correct versions and it's 3 years down the road before the wrong version appears earlier in your PATH and then you have to try to fifure out what suddenly went wrong in your previously working script when you didn't change the script!
To get around this, you need to adopt a different naming convention for your non-exported variables. I always prefix mine with underscore, e.g.:
_ls="/bin/ls" _myprog="/opt/myapps/mybin/myprog"
$_ls /a/b/c/d $_myprog -a aaa -b bbb
but you could come up with something else if you don't like that.
<snip>
> 3. Spell out the full pathname
> /bin/ls /a/b/c/d > /opt/myapps/mybin/myprog -a aaa -b bbb
> This avoids the ugliness of variables like $LS, but having to spell > out the (right) pathname for every command (and know which ones are > built-ins, so have no pathname) seems plain awkward.
Don't do that since there will be times when you want to call the same command in multiple places in one script so then you'd be duplicating the path information and making the script harder to maintain.
Another option you didn't mention is to create a variable for the path to each tool, e.g.:
lsBin="/bin" myprogBin="/opt/myapps/mybin"
${lsBin}/ls /a/b/c/d ${myprogBin}/myprog -a aaa -b bbb
I wouldn't do that myself since it can also result in some duplication and it'd be easy to forget the bin prefix and end up calling the wrong tool.
A final option would be to create a function for each external tool, e.g.
function ls { /bin/ls "$@"; } function myprog { /opt/myapps/mybin/myprog "$@"; }
ls /a/b/c/d myprog -a aaa -b bbb
and then your only potential problem would be if you forgot to create a function for the tool, then you'd just be calling whichever version is first in your PATH.
Probably the best solution overall is to define a meaningful variable for the path to the bin of each cluster of tools you use and then use that in the function definitions so that if, say, all of your printer management tools moved to a different bin, you'd just change the variable, e.g.:
function prt { ${prtBin}/prt "$@"; } function prtstat { ${prtBin}/prtstat "$@"; } function prtcancel { ${prtBin}/prtcancel "$@"; } function dump { ${dataBin}/dump "$@"; } function delete { ${dataBin}/delete "$@"; } function agenda { ${calendarBin}/agenda "$@"; }
> A separate issue is whether production scipts should inherit the PATH > from their (possibly uncontrolled) environment, or nail it down, > either explicitly like in choice 1, or implicitly as in:
> . /opt/myapps/mybin/myenv
> (which presumably defines PATH for any caller).
I tend to just inherit the callers PATH so I can take advantage of the system administrators having set some default tool dirs. I could could see not wanting to do that for some applications though.
> What are readers views of shell scripts calling other programs by > specifying absolute pathname?
> Some alternative choices I am aware of, to call commands such as:
> /bin/ls > /opt/myapps/mybin/myprog
> are:
> 1. Put all directories in the PATH
> PATH=/bin:/usr/bin:/opt/myapps/mybin
> ls /a/b/c/d > myprog -a aaa -b bbb
> I prefer this method. It makes for less cluttered looking code than > if full pathname is specified, and less cryptic than seeing lots of > variables, like $LS, used in choice 2.
> In Perl, running in taint-checking mode (perl -T), the PATH must be > nailed down in this way, not inherited from the environment. > Of course, I could still nail down the PATH, but also call programs > using choice 2 or 3.
I prefer this, too. In fact it is most readable and most portable.
You are right. There must be bad shell courses out there...
> An advantage is that you can test for the existance of executables, > before trying to call them:
> [[ -x $MYPROG ]] || { print -u2 "$MYPROG: No such executable"; > exit 1; }
The existings of executable tools like ls,awk,sed can often be taken for granted. On the other hand, most differences are in tools behavior, arguments, capabilities.
> (Admittedly, I practice this choice in Perl, for safe maintenance. > But far less often in shell scripts, since shells have far fewer > built-ins, and peppering code with the likes of $LS and $CAT looks > ugly.)
After reading a book about quality, you might want to go for "safe maintenance". After a while you will find that your code gets more and more ugly, and lacks quality.
> 3. Spell out the full pathname
> /bin/ls /a/b/c/d > /opt/myapps/mybin/myprog -a aaa -b bbb
> This avoids the ugliness of variables like $LS, but having to spell > out the (right) pathname for every command (and know which ones are > built-ins, so have no pathname) seems plain awkward.
... so is ugly as well. In addition: inflexible.
> A separate issue is whether production scipts should inherit the PATH > from their (possibly uncontrolled) environment, or nail it down, > either explicitly like in choice 1, or implicitly as in:
> . /opt/myapps/mybin/myenv
if myenv is common to many scripts.
> (which presumably defines PATH for any caller).
If security matters, nail it down (PATH and IFS). The normal method is to prepend it:
PATH=/bin:/usr/bin:/opt/myapps/mybin:$PATH
If you *want* the users to take influence, append it:
PATH=${PATH}:/bin:/usr/bin:/opt/myapps/mybin
-- Michael Tosch IT Specialist HP Managed Services Germany Phone +49 2407 575 313
>> [[ -x $MYPROG ]] || { print -u2 "$MYPROG: No such executable"; >> exit 1; }
> The existings of executable tools like ls,awk,sed can often be taken > for granted. > On the other hand, most differences are in tools behavior, arguments, > capabilities.
On SUSv3 conformant systems, in a SUSv3 conformant script interpreted by a SUSv3 conformant shell,
> If security matters, nail it down (PATH and IFS). > The normal method is to prepend it:
> PATH=/bin:/usr/bin:/opt/myapps/mybin:$PATH
> If you *want* the users to take influence, append it:
> PATH=${PATH}:/bin:/usr/bin:/opt/myapps/mybin
IFS is not a problem. Depending on the shell/the script there may be with ENV, BASH_ENV, FIGNORE, SHELLOPTS, ARGV0, HOME, ZDOTDIR, FPATH, LANG, LC_*, TMOUT (funny with bash and ksh93), LD_PRELOAD, SHLIB_PATH, LD_LIBRARY_PATH, all sorts of other dynamic linker variables, STTY, TMPPREFIX... some of which you can't do anything against (as it's too late when the script is started).
If a user wants to break a script, he'll always be able to do so, he can edit the script and put garbage in it. I think it's enough to only fix what the user might have reasonably changed, for the rest, the user is to be blamed if the script failed because of an unexpected value for a variable.
> IFS is not a problem. Depending on the shell/the script there > may be with ENV, BASH_ENV, FIGNORE, SHELLOPTS, ARGV0, HOME, > ZDOTDIR, FPATH, LANG, LC_*, TMOUT (funny with bash and ksh93), > LD_PRELOAD, SHLIB_PATH, LD_LIBRARY_PATH, all sorts of other > dynamic linker variables, STTY, TMPPREFIX... some of which you > can't do anything against (as it's too late when the script is > started).
[...]
some precisions:
IFS: affects: very early Bourne shells (others ignore the IFS variable found in there environment on startup) effect: on those shells, syntax parsing, word splitting... example: $ IFS=i sh -c exit runs "ex" on the "t" file.
ENV: affects: pdksh, ksh88, zsh in sh or ksh emulation, some shells based on ash. effect: sources the given script, command substitution expanded in ENV value. If the value expands to the path of a fifo, the shell is blocked. example: $ ENV='$(echo foo >&2)' ksh -c : foo
HOME: affects: zsh effect: it's the place where ".zshenv" file is found if $ZDOTDIR is not set. $ echo echo foo > /tmp/.zshenv $ HOME=/tmp zsh -c : foo
ZDOTDIR: affects: zsh effect: see above
FPATH: affects: zsh, ksh effect: same as PATH except that's for library functions example: $ echo echo foo > /tmp/zmv $ FPATH=/tmp zsh -c 'autoload zmv; zmv a b' foo
LANG, LC_...: affects: most modern shells, ksh93 badly effect: changes the sort order, the charset/language used for messages, the displayed time format, the "ls -l" output format, the numeric format (breaks ksh93 script that use floating point arithmetic)... example: $ date +%B January $ LC_TIME=fr sh -c '[ "$(date +%B)" = January ] || echo We are not in January' We are not in January $ LC_NUMERIC=fr_FR ksh93 -c 'echo $((3.14159))' ksh93: line 1: 3.14159: arithmetic syntax error
TMOUT: affects: bash, ksh93 effect: "read" fails if it takes more than $TMOUT to perform example: $ TMOUT=1 ksh93 -c '(sleep 2; echo a) | (read a; echo b$a)' b
TMOUT, PPID, HISTCMD, MAILCHECK, LINENO, OPTIND, RANDOM, SECONDS... affects: ksh93, pdksh for some effect: ksh93 returns immediately with an error if the value is not a valid arithmetic expression. example: $ RANDOM=++ ksh93 -c 'echo foo' ksh93: ++: more tokens expected
LD_PRELOAD, LD_LIBRARY_PATH... affects: every non statically linked shell effect: shells rely on functions from libc or other libraries, they can be replaced by other ones this way. Other side effects, with other variables, depending on the system. example: $ LD_TRACE_LOADED_OBJECTS=1 sh -c : libdl.so.2 => /lib/libdl.so.2 (0x40021000) libc.so.6 => /lib/libc.so.6 (0x40024000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
STTY: affects: zsh effect: change the terminal settings (runs stty before each command) example: $ STTY=-g zsh -c : 500:5:bf:8a3b:3:1c:8:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:0:0:0:0:0:0:0:0:0 :0:0:0:0:0
TMPPREFIX: affects: zsh effect: change the path where temporary files are created (for here documents/strings and =(...)) example: $ TMPPREFIX=/ zsh -c 'cat <<< foo' zsh: permission denied
TMPDIR: affects: pdksh effect: same as above, except that pdksh reverts to the system default tmp dir if it is unable to create a tmpfile in $TMPDIR (but it may still fail for file paths too long for instance).
OPTIND: affects: zsh (a bug) effect: getopts fails example: $ OPTIND=4 zsh -c 'getopts a var -a; echo "<$var>"' <>
PS4: affects: most shells effect: change the display for xtracing, and command substitution is performed. $ PS4='$(()' pdksh -cx : pdksh: no closing quote
EXECSHELL: affects: pdksh effects: command used to run command that return ENOEXEC (valid scripts without a shebang) example: $ echo echo b > a; chmod +x a $ EXECSHELL=echo pdksh -c ./a ./a
POSIXLY_CORRECT: affects: pdksh, CDPATH: affects: most recent shells effect: cd'ing to a directory may no longer fail, cd may output unexpected strings. example: $ mkdir /tmp/A /tmp/B $ cd /tmp/B $ CDPATH=/tmp bash -c 'cd A && pwd' /tmp/A /tmp/A
will only work in locales where the decimal_point is ".".
So, you have to fix LC_NUMERIC/LC_ALL first in your script. In other languages, you have to tell it when you want to use localisation, in shells, that's the contrary. I only makes sense to use localization in the shell for user interaction, so, that's up to the programmer to decide when to use it.
Stephane CHAZELAS wrote: > So, you have to fix LC_NUMERIC/LC_ALL first in your script.
I differ a bit...
> In other languages, you have to tell it when you want to use > localisation, in shells, that's the contrary. I only makes sense > to use localization in the shell for user interaction, so, that's > up to the programmer to decide when to use it.
LANG is the problem (not only) here. In many cases, LC_CTYPE would be completely sufficient. LC_MESSAGES might serve almost all of the remaining people. LANG in turn includes LC_COLLATE and LC_NUMERIC (which is problematic).
> if [[ ${LC_ALL+set} ]]; then > set_user_locale="LC_ALL='$LC_ALL'"
And you _never_ should have LC_ALL being set...
I guess a script (usually) really shouldn't try to fix wrong locale settings...
> I guess a script (usually) really shouldn't try to fix > wrong locale settings...
When I have LANG, LC_NUMERIC set to fr_FR@euro, I tell the applications, that, from my user point of view, decimal number have to be displayed or read from me as 3,14. That's not an incorrect setting. What is incorrect is to use the value of LC_NUMERIC in a program for other thing than user interaction. (AFAIUI).
Stephane CHAZELAS wrote: > When I have LANG, LC_NUMERIC set to fr_FR@euro, I tell the > applications, that, from my user point of view, decimal number > have to be displayed or read from me as 3,14.
I thought about actively setting LC_NUMERIC only for the applications where it's actually needed. but - no, that doesn't make sense at all.