Janis Papanagnou <
janis_pa...@hotmail.com>:
>>> Shell is a high level language abstraction layer from the Unix system
>>> calls. It opens a lot of pitfalls by design and by herself that do
>>> not stem from the Unix system interface.
>>
>> Yes, that's absolutely correct. I didn't mean to bash Unix.
>
> (I didn't mean that you meant to bash Unix. - Intended pun, BTW?)
Yes, but an innocent one, neither Bash nor me bashing Unix.
> Some shell books also describe the basic Unix concepts to a degree
> necessary to understand the related shell concepts.
If there are such books available, then why aren't they read and
understood (see below)?
[…]
Often shell script programmers are in the need of computing a
command line, then save it into a shell variable for later (re-)
use. But that's not easy. One can't tell the shell to store a
command line into a shell variable rather than to parse it: If
a shell reads a command line, it will parse and execute it.
Of course one can store an arbitrary character string into a shell
variable, for example the variable “commandline”. Then, if one
wants the shell to execute that string as a command line, one simply
writes the command
“eval "$commandline"”.
For example, the simple command
(0)
“printf '%s\n' \
'Saving the shell'\''s positional parameters in a' \
'variable is the big problem; expanding them is easy.'
”,
when invoked, would give the following output:
“Saving the shell's positional parameters in a
variable is the big problem; expanding them is easy.
”
Now, what would one have to write into a command line, that would
store this command (0) into the shell variable “commandline” in
order to execute it by
“eval "$commandline"”?
Just to copy and paste
“commandline=printf '%s\n' \
'Saving the shell'\''s positional parameters in a' \
'variable is the big problem; expanding them is easy.'
”
won't work: The shell's parser sees “commandline=printf”, which
is an assignment to the environment variable “commandline”
preceding the simple command “'%s\n' 'Saving the shell'\''s
positional ...”, which is not the “printf”-command, that was
intended.
To get the command line stored into the variable, one would have
to write
“commandline='printf '\''%s\n'\'' \
'\''Saving the shell'\''\'\'''\''s positional parameters in a'\'' \
'\''variable is the big problem; expanding them is easy.'\'''
”.
This is painful at least to do.
Note: There are many different command lines, that will execute
exactly the same “printf” command, when given to “eval”. What they
have got in common, however, is the invocation arguments list,
which is computed by the shell after parsing the command line and
consists of the following elements: “printf”, “%s\n”,
“Saving the shell's positional parameters in a”, and
“variable is the big problem; expanding them is easy.”.
And there is good news: This argument list can be stored
in some variables and reused for later execution without the need
of an extra level of quoting, for example:
Storing an argument list into the variables “arg0”, “arg1”,
... “arg<n-1>”, unsetting the variable “arg<n>” and storing the
number of variables (i.e.: <n>) into the variable “argc”:
“
argc=0 &&
for value in printf '%s\n' \
'Saving the shell'\''s positional parameters in a' \
'variable is the big problem; expanding them is easy.'
do
eval 'arg'"${argc}"'="$value"'
argc=$((argc+1))
done
eval 'unset arg'"${argc}"
”
Retrieving:
“
set ''; shift; n=0
while eval 'test -n "${arg'"$n"'+defined}"'
do
eval 'arg="$arg'"$n"'"'
set '' "$@" "$arg"; shift
done
”
Executing:
“"$@"”
> Gathering arguments is not the big problem; usually expanding
> them is.
Yes, indeed: Gathering arguments into the shell's positional
parameters (“"$@"”) is not a big problem.
But because it's tedious, many shell script programmers prefer
simpler but errorprone techniques, and therefore I don't
agree completely.
Problems arise, if one wants to store the positional parameters in
one shell variable rather than in a variable of its own for each
one to be able to free the positional parameters for another
purpose, then later invoke the stored values.
(1)
If one wants to save them into the variable “parameters”, one
could try the variable assignment
“parameters="$*"”.
Then the variable expansion “"$parameters"” will have the
following value (all in one line with single blanks in between):
“printf %s\n Saving the shell's positional parameters in a variable is the big problem; expanding them is easy.”
In this variable value, the shell cannot tell the difference
between, for example, the space character before respectively
after the word “Saving”, because the information about which of
the white space characters should separate the parameters and
which of them should remain in the parameter, is already lost. To
get it right, the former should separate the third from the second
invocation argument, the latter should be retained as part of the
third invocation argument. So neither of the following expansion
variants will work:
(1.1)
The command line
“${parameters}”
will invoke a command equivalent to the command line
“printf %s\\n Saving the shell\'s positional parameters in a \
variable is the big problem\; expanding them is easy.”,
which will split the variable value using the “"$IFS"” characters,
thus output each of the words in a line of its own:
“Saving
the
shell's
positional
parameters
in
a
variable
is
the
big
problem;
expanding
them
is
easy.
”
(1.2)
The command line
“"${parameters}"”
will invoke a command equivalent to the following command line
(all in one line with single blanks in between):
“'printf %s\n Saving the shell'\''s positional parameters in a variable is the big problem; expanding them is easy.'”,
which will fail, because there is no utility named
“printf %s\n Saving...”.
(1.3)
With the command line
“eval "${parameters}"”,
the shell will try to parse the variable value as the following
command line (all in one line with single blanks in between):
“printf %s\n Saving the shell's positional parameters in a variable is the big problem; expanding them is easy.”
and will fail at the apostrophe of the “shell's” genitive case.
> In any way it's a shell [programming] issue (no Unix issue).
Yes, its all about (as if) translating the shell command line into
an (“execve()” like) invocation arguments list, which makes the
parsing of the variable's value necessary, see
<
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_06>:
However, all of the standard utilities, including the regular
built-ins […], but not the special built-ins described
in Special Built-In Utilities, shall be implemented in a manner
so that they can be accessed via the exec family of functions as
defined in the System Interfaces volume of POSIX.1-2008 and can
be invoked directly by those standard utilities that require it
(env, find, nice, nohup, time, xargs)..
An, indeed, for many standard utilities, the only way for
them to be accessed is via the exec family of functions.
That is, the shell must construct an invocation arguments list
From the command line.
And of course, shell scripts, which gather a command line into one
variable by themselves, must do it in a way, that allows the shell
to construct an invocation arguments list out of the variable
value afterwards.
Apparently that is not learned by most shell script programmers,
and I guess, that is the case, because they don't even know of an
invocation arguments list, let alone the difference between it and
the command line. (That was the problem with Debian's “su” about
a decade ago: It merely glued the given arguments like (1) with
spaces in between into the command line, which was parsed by the
invoked shell, like the example (1.3) above.)
[Array variables in modern shells]
> Arrays help to keep groups of arguments semantically together,
> but they do not solve the expansion problem,
In the following example, the expansion problem is solved using an
array with “bash”:
(2)
The invocation arguments list could be gathered using an array by
doing
“declare -a parameters
parameters=(printf '%s\n' \
'Saving the shell'\''s positional parameters in a' \
'variable is the big problem; expanding them is easy.')
”.
The array variable contains an invocation arguments list rather
than a command line. It can be invoked by the command line
“"${parameters[@]}"”,
which will neither break the parameters at internal white space
nor confuse the shell's parser at the apostrophe or the semicolon,
because the parser won't look at the contents of the array
elements. The shell will simply pass them unmodified as the
invocation arguments list to the “execve()” system call (if it's
an external utility, or behave, as if it did) or process it by
itself (if it's a built-in).
(3)
With “bash”, there is another way to get it right without using
arrays: The elements of the invocation arguments list could be
glued together with spaces in between, but properly quoted by
means of the formatting directive “%q” of the “bash”-built-in
“printf” and assigned to the shell variable “commandline”:
“commandline="$(printf '%q ' printf '%s\n' \
'Saving the shell'\''s positional parameters in a' \
'variable is the big problem; expanding them is easy.')"
”
The variable “commandline” will then have the following contents
(all in one line with single spaces in between):
“printf %s\\n Saving\ the\ shell\'s\ positional\ parameters\ in\ a variable\ is\ the\ big\ problem\;\ expanding\ them\ is\ easy. ”
It could be given to the shell's “eval” command:
“eval "''${commandline}"”.
The “eval” command will see the following command line (all in one line with
single spaces in between):
“''printf %s\\n Saving\ the\ shell\'s\ positional\ parameters\ in\ a variable\ is\ the\ big\ problem\;\ expanding\ them\ is\ easy. ”,
which is equivalent (that is, it produces the same invocation
arguments list, when parsed and evaluated) though not equal to the
original command line (0).
(4)
And finally, there is a way (though not a built-in one), to get it
right with a POSIX system, as well:
As the POSIX “printf” doesn't have the “%q” formatting directive,
a function, say, “quote_words”, is to be written, which takes each
of its positional parameters, replaces each apostrophe in it by
the sequence “'\''”, then encloses the positional parameter in
apostrophes, glues all this translated positional parameters into
one variable with spaces in between, and finally outputs that
variable to standard output.
It can then be used as a replacement for “printf '%q '” in (3)
above.
> specifically not the buffer limit that you may reach with long
> argument lists if you use an external command.
This is another problem: the limited length of the invocation
arguments list (see below).
> IFS and quoting is basic, crucial shell knowledge. You learn
> all that from learning the shell
As there are many shell scripts and applications using the shell
in the wild, that do it wrong like one of the solutions (1.1) or
(1.3) above, you are going to say, that whose developers lacked
basic, crucial shell knowledge? You may be right.
I can think of
* the options loop in “/usr/bin/ps2pdfwr” (like 1.1):
OPTIONS="-P- -dSAFER"
while true
do
case "$1" in
-?*) OPTIONS="$OPTIONS $1" ;;
*) break ;;
esac
shift
done
[…]
exec "$GS_EXECUTABLE" $OPTIONS ...
The arguments are glued together in a partial command line
without proper quoting and later split using “"$IFS"”.
* the “/etc/init.d/dovecot” shell script (like 1.1):
DAEMON_ARGS=""
[…]
PIDBASE=${PIDBASE:-`
sed -r "s/^[ \t]*base_dir[ \t]*=[ \t]*([^ \t]*)/\1/;t;d" \
${CONF}`}
PIDFILE=${PIDBASE:-/var/run/dovecot}/master.pid
[…]
start-stop-daemon --nicelevel 19 --start --quiet \
--pidfile $PIDFILE --exec $DAEMON -- \
-c ${CONF} $DAEMON_ARGS \
|| return 2
* the corrupted Debian “su” about a decade ago (like 1.3).
> Again, a shell issue. - Though an advanced issue; I hope a beginner
> would not root-su (probably even to other machines) without knowning
> a lot of Unix.
The cause of the error was, not to understand the difference
between the invocation arguments (here: of the shell) in the
System Interfaces volume of POSIX.1-2008 and the shell's command
line. That is basic and crucial knowledge.
The problem with “su” was not at the su user's side. It was in
the “su” program itself: Neither a shell beginner nor a shell
expert could use it in a reliable way.
My impression about shell knowledge of many shell script
programmers is “Invocation arguments? Never heard of.
Constructing a command line? Sorry, I neither know how to do
this, nor do I understand the explanation in my shell book.”
Are there books for shell programming, that address the solutions
(0) and (4), and, with “bash” and similar modern shells, (2) and
(3)?
[The size limit of the invocation arguments list]
> (not from the unrelated system calls, also not from the exec()
> family, since only the shell can tell you that it has a limit,
> otherwise you could only guess).
I guess, the shell's limit (as long as there is virtual memory)
will be at least as wide as the “exeve()” system call's limit.
Therefore, I think, the shell won't help with this problem. If
large invocation argument lists are a problem, there is GNU xargs,
which by means of the option “--null” is capable of processing
arbitrary invocation arguments it reads from standard input (for
example, arguments containing white space, backslashes,
apostrophes or quotation marks).
> (Note that the shell has also own abstractions of concepts that you
> find on OS-level, and you'd get certainly confused if you read the
> corresponding system call and system data definitions instead of
> the shell description.
… not instead of the shell description, but as well as the shell
description. As the quotation above from the POSIX standard
shows, understanding invocation arguments lists is crucial, when
programming shell scripts.
The “su” developers wouldn't have done it wrong, had they read the
system call and system data definitions as well as the shell's
manual page or
<
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html>,
because “su” uses the “exec” family of functions to invoke the
shell.