Finding the last arg

CSSE LAN Test Account

unread,

Dec 26, 1990, 11:19:03 AM12/26/90

to

Well, it should have been easy, but everything I've tried has failed, so
I'll go to the experts... (;-)

What I've been trying to do is write a script whose semantics are sort of
like the cp/mv/ln commands, in that the last argument is special. If it
is a directory, I want to do something different that amounts to appending
character strings to its name to give file name[s] within the directory;
if it is an ordinary file, I just want to use its name.

The problem is that I can't figure out any Bourne-shell expression that
gives the last argument. In C-shell, it's easy (argv[$#]). But I need
to write Bourne shell scripts. In fact, they need to be BSD bourne shell
scripts rather that ATT Bourne shell scripts. The difference is probably
significant here, because of the differences in the way these two Bourne
shells handle variables. Actually, it'd be really nice to find a solution
that is portable, but that may be just a dream.

The obvious thing to try is some sort of expression combining $# with ${},
e.g. ${$#}. This gets a syntax error. The simpler $$# is valid, but it
just gives the process id followed by a '#', not even close. I've also
tried assigning $# to a variable and using that, but all I've managed to
get is the number of the last arg, not its value.

At the moment I've kludged the job by writing a program lastfld.c that just
prints argv[argc-1]. But it seems incredibly stupid to have to pass all
the args to a subprocess which throws away all but the last one. I mean,
the shell knows which arg is which; why go to the expense of starting up
a subprocess to learn something that the parent already knows?

The pseudo-code I'd like to use is:
if [ -d last-command-line-arg ]
then
...
fi
Any ideas?

Rouben Rostamian

unread,

Dec 26, 1990, 1:50:19 PM12/26/90

to

In article <18...@shlump.nac.dec.com> lan_...@netrix.nac.dec.com (CSSE LAN Test Account) writes:
>What I've been trying to do is write a script whose semantics are sort of
>like the cp/mv/ln commands, in that the last argument is special. If it
>is a directory, I want to do something different that amounts to appending
>character strings to its name to give file name[s] within the directory;
>if it is an ordinary file, I just want to use its name.
>
>The problem is that I can't figure out any Bourne-shell expression that
>gives the last argument. In C-shell, it's easy (argv[$#]). But I need
>to write Bourne shell scripts.

Here's what you need:
#!/bin/sh
eval lastarg=\${$#}
...etc...

--
Rouben Rostamian Telephone: (301) 455-2458
Department of Mathematics and Statistics e-mail:
University of Maryland Baltimore County bitnet: rostamian@umbc
Baltimore, MD 21228, U.S.A. internet: rost...@umbc3.umbc.edu

Bob McGowen x4312 dept208

unread,

Dec 26, 1990, 5:44:52 PM12/26/90

to

In article <18...@shlump.nac.dec.com> lan_...@netrix.nac.dec.com (CSSE LAN Test Account) writes:

>Well, it should have been easy, but everything I've tried has failed, so
>I'll go to the experts... (;-)
>
>What I've been trying to do is write a script whose semantics are sort of
>like the cp/mv/ln commands, in that the last argument is special. If it
>is a directory, I want to do something different that amounts to appending
>character strings to its name to give file name[s] within the directory;
>if it is an ordinary file, I just want to use its name.
>
>The problem is that I can't figure out any Bourne-shell expression that
>gives the last argument. In C-shell, it's easy (argv[$#]). But I need
>to write Bourne shell scripts. In fact, they need to be BSD bourne shell
>scripts rather that ATT Bourne shell scripts. The difference is probably
>significant here, because of the differences in the way these two Bourne

As an aside, could you explain this comment? I have had minimal contact
with BSD, but my experience does not seem to support this statement.

>shells handle variables. Actually, it'd be really nice to find a solution
>that is portable, but that may be just a dream.

---deleted discussion---
>Any ideas?

Three solutions came to mind: 1) using eval; 2) using shift; 3) using a
combined while loop/if statement with the shift.

1) eval echo this is the last argument \$$#

This works because the shell evaluates the right $ with the # to
give the last argument (not really, but see later), then re-reads
the line with the value of this number following the first $, which
it now sees as an unprotected character and gives the substituted arg
that matches. Now for the "not really": this will only work for
a list of args that is not bigger than nine. The tenth arg would
result in output like:

this is the last argument a0

where the a is arg 1 ($10 is the concatenation of $1 and 0).

2) shift `expr $# - 1`
echo the last arg is $1

The Bourne shell on AT&T's SysV/386 and the Sun OS support a
numeric argument to the shift. This number of args are shifted
out of the list. So if there are ten args, expr gives nine for
the shift, the last arg becomes $1. The problem here is that
all the preceding args are thrown away.

3) while [ ! -z "$1" ]
do
if [ $# -eq 1 ]
then
last=$1
shift
else
rest="$rest $1"
shift
fi
done

Each argument is collected as encountered on the command line
(except for the case where it contains white space -- "a b";
this is a little hard to handle with this type of structure
because the "arguments" are parsed three times: once as $1,
again in the if to construct the $rest and later whenever $rest
is used). All except the last are collected into the variable
rest, the last is in the variable last. You can now manipulate
the parts as needed. For instance, to create a path name
with the last arg as the first component:

for item in $rest
do
echo $last/$item
done

I hope this slightly lengthy discussion helps.

Bob McGowan (standard disclaimer, these are my own ...)
Product Support, Wyse Technology, San Jose, CA
..!uunet!wyse!bob
b...@wyse.com

Jeff Beadles

unread,

Dec 27, 1990, 1:09:03 AM12/27/90

to

In <18...@shlump.nac.dec.com> lan_...@netrix.nac.dec.com writes:

...

>The problem is that I can't figure out any Bourne-shell expression that
>gives the last argument.

...
>Any ideas?

Well, there's nothing that I am aware of in /bin/sh that will ALWAYS allow
this to work. The problem is that if you have more than 9 arguements, you
have to shift them off to get to what you want. Here's one try that might
be of use:

#!/bin/sh

if [ $# -gt 0 -a $# -lt 10 ] ; then
LAST=`eval echo \\$$#` ## 1-9 == Great!
elif [ $# -le 0 ] ; then
LAST="NO_ARGS" ## 0 == Oops! None!
else
LAST="TOO_MANY_ARGS" ## >9 == Oops! Too many!
fi

echo "LAST= >$LAST<"
exit 0

However, this is not all that swift. "foo a b c d e f g h i j k" won't work
with this.

The only other hope is to save off the arguements and loop until the end. This
has the possibility of screwing up any special characters and/or white space
though, and is generally not-so-hot.

-Jeff
--
Jeff Beadles je...@onion.pdx.com

Conor P. Cahill

unread,

Dec 27, 1990, 10:49:17 AM12/27/90

to

In article <18...@shlump.nac.dec.com> lan_...@netrix.nac.dec.com (CSSE LAN Test Account) writes:

>The obvious thing to try is some sort of expression combining $# with ${},
>e.g. ${$#}. This gets a syntax error. The simpler $$# is valid, but it

Try:
eval \$$#

--
Conor P. Cahill (703)430-9247 Virtual Technologies, Inc.,
uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160
Sterling, VA 22170

Maarten Litmaath

unread,

Dec 27, 1990, 5:15:52 PM12/27/90

to

# How to get the last argument of a shell script reliably.
# If you are sure the number of arguments is in [1-9], you can use this:
#
# eval last=\$$#
#
# The general (backquote) solution works for ANY number of arguments.
# In POSIX-compatible shells this works for any $#:
#
# eval last=\${$#}

set a b c d e f g h i j k l m -n # For example.

case $# in
0)
# So the user tried to be funny; let's give him a little surprise.
exec kill -SYS $$
;;
[1-9])
eval last=\$$#
;;
*)
last=`
n=$#
set '' ${1+"$@"}
shift $n
echo -n "$1"
`
esac

echo "last=|$last|"
--
_ d _\ _\ 2
ih -- Y = (c A . p + m c B) Y
dt 0

Maarten Litmaath

unread,

Dec 28, 1990, 7:51:26 PM12/28/90

to

In article <30...@wyse.wyse.com>,
b...@wyse.wyse.com (Bob McGowen x4312 dept208) writes:
)...
) 3) while [ ! -z "$1" ]
) do
) if [ $# -eq 1 ]
) then
) last=$1
) shift
) else
) rest="$rest $1"
) shift
) fi
) done

How about the following alternative? It works for `funny' arguments too.

--------------------cut here--------------------
#!/bin/sh
# find the last argument AND reset all arguments but the one found,
# using only BUILTIN constructs

case $# in
0)
echo 'No arguments specified.' >&2
exit 1
esac

argv=
last=

while :
do
case $# in
1)
last=$1
break
esac
eval i$#=\$1
argv="$argv \"\$i$#\""
shift
done

eval set x $argv
shift

echo "last=$last"
echo 'new arguments:'
for i
do
echo " $i"
done
--
n...@berlin.mt.cs.cmu.edu: "I heard that an Awk was spotted in Sherwood forest."
r...@frog.UUCP (Rick Kelly): "It was seen running in the background."

Wm E. Davidsen Jr

unread,

Dec 30, 1990, 11:16:07 PM12/30/90

to

In article <1990Dec27.1...@virtech.uucp> cpc...@virtech.UUCP (Conor P. Cahill) writes:
| In article <18...@shlump.nac.dec.com> lan_...@netrix.nac.dec.com (CSSE LAN Test Account) writes:
| >The obvious thing to try is some sort of expression combining $# with ${},
| >e.g. ${$#}. This gets a syntax error. The simpler $$# is valid, but it
|
| Try:
| eval \$$#

With sh there is no valid way to do this, it breaks when $# > 9. With
ksh the followinf is true:
eval echo $# \$## # uses only one digit of $#
eval echo $# \${$#} # works for all values

Hope that clarifies it. I treid several shells, all flavors of sh seem
to stop at $#>9 and need evaluation via shift.
--
bill davidsen - davi...@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
sysop *IX BBS and Public Access UNIX
moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me

Lawrence V. Cipriani

unread,

Jan 1, 1991, 10:57:14 AM1/1/91

to

In article <18...@shlump.nac.dec.com>, lan_...@netrix.nac.dec.com (CSSE LAN Test Account) writes:

> The pseudo-code I'd like to use is:
> if [ -d last-command-line-arg ]
> then
> ...
> fi

If you don't need to have the last command line argument in a variable you
can do this:

if eval [ -d "\${${#}}" ]
then
...
fi
--
Larry Cipriani, att!cbvox!lvc or l...@cbvox.att.com
"I just love the smell of gunpowder!" - Bugs Bunny

Dan Mercer

unread,

Jan 1, 1991, 6:42:51 PM1/1/91

to

In article <1990Dec27.1...@virtech.uucp> cpc...@virtech.UUCP (Conor P. Cahill) writes:

Obviously this fails for $# > 9. So you need some form of shift,
preferably forking and execing as few processes as possible (nice to
avoid expr). You'd also like to preserve the current args. So try
this:

args="$@" # get arglist
argno=$# # and number of current args
export args argno
last=`set -- spacer $args # set as args - add spacer to eliminate expr
shift $argno # shift over
echo $1`

This works if args contains no args with embedded whitespace - if that
is a possibility, then expr must be used.

args="$@"
export args argno
last=` set -- spacer $args

shift \`expr $# - 1\`

echo $1`
echo $last

But this also fails if last arg may contain whitespace. Oh well!

--
Dan Mercer
NCR Network Products Division - Network Integration Services
Reply-To: mer...@npdiss1.StPaul.NCR.COM (Dan Mercer)
"MAN - the only one word oxymoron in the English Language"

Martin Weitzel

unread,

Jan 2, 1991, 5:37:29 AM1/2/91

to

In article <1990Dec27....@onion.pdx.com> je...@onion.pdx.com (Jeff Beadles) writes:

>In <18...@shlump.nac.dec.com> lan_...@netrix.nac.dec.com writes:
>>The problem is that I can't figure out any Bourne-shell expression that
>>gives the last argument.
>...
>>Any ideas?
>
>Well, there's nothing that I am aware of in /bin/sh that will ALWAYS allow
>this to work. The problem is that if you have more than 9 arguements, you
>have to shift them off to get to what you want.

Not really. I have waited some time, but as Chris Torek seems to be in
vacation I'll bite the bullet this time :-) [%].

What ALLWAYS works in the Bourne-Shell is this:

for last do :; done

Explanation: The for loop loops over all arguments of the current procedure,
if you leave out the `in'-clause. Every argument is put into the variable
you name after `for'. This leaves the last argument in last when the loop is
through.

You can also access the second last (third last, etc) argument if you
extend this trick a little:

for i
do
last3=$last2
last2=$last
last=$i
done

And please DON'T use `for i in $*' instead of leaving the `in'-clause out!
It's not the same - despite some UNIX books claim so!! (For the nit-pickers:
`for i' is the same as `for i in ${1+"$@"}' i.e. you will never run into
problems if there are no arguments or some arguments have embedded blanks.)

%: Shouldn't this be in the FAQ-list or is it? (I've no recent copy available)
--
Martin Weitzel, email: mar...@mwtech.UUCP, voice: 49-(0)6151-6 56 83

Chet Ramey

unread,

Jan 2, 1991, 12:41:57 PM1/2/91

to

>>In fact, they need to be BSD bourne shell
>>scripts rather that ATT Bourne shell scripts. The difference is probably
>>significant here, because of the differences in the way these two Bourne
>
> As an aside, could you explain this comment? I have had minimal contact
> with BSD, but my experience does not seem to support this statement.

The BSD /bin/sh is the one from v7, with minimal changes for the 4.2 BSD
signal semantics (restarted system calls, etc.) and # as a comment
character. The AT&T /bin/sh changed drastically beginning with System V.2,
and further changes appeared in System V.3.

Here's a short list of what was added to the v7 sh for the System V.2 sh:

shell functions and the `return' builtin
redirection of input/output for builtins (e.g. `read x < /dev/tty')
command hashing, `set -h', and the hash builtin
test/[, echo, pwd, ulimit builtins
exit status of a pipeline was defined to be the exit status of the last command
set -a, set -f
# as the comment character (this was put in at Berkeley and adopted by AT&T)
colon form of parameter substitution to test for nullness of a variable
CDPATH
MAILCHECK, MAILPATH
the use of `!' to negate a [] pattern for filename generation
the `<<-' form of here-document redirection (strip leading tabs)
set --
numeric parameter to the shift builtin
the unset builtin (!)
restricted shell mode

There were a few changes between the System V.2 sh and the V.3 sh, in
addition to bug fixes:

the shell is now 8-bit clean (that is, the shell no longer implements quoting
by turning on the 8th bit of characters)
function arguments no longer overwrite the global set of positional parameters
the getopts builtin

The BSD /bin/sh, which is still pretty much the V7 /bin/sh, includes
`login' as a builtin and allows `chdir' as a synonym for `cd'. The
System V shell includes `newgrp' as a builtin. As of 4.3 BSD, the BSD sh
accepts # as a comment only when non-interactive.

Chet
--
Chet Ramey ``I die, Horatio''
Network Services Group, Case Western Reserve University
ch...@ins.CWRU.Edu
My opinions are just those, and mine alone.

Bob McGowen x4312 dept208

unread,

Jan 2, 1991, 3:12:05 PM1/2/91

to

In article <1991Jan2.1...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:
>>>In fact, they need to be BSD bourne shell
>>>scripts rather that ATT Bourne shell scripts. The difference is probably
>>>significant here, because of the differences in the way these two Bourne
>>
>> As an aside, could you explain this comment? I have had minimal contact
>> with BSD, but my experience does not seem to support this statement.
>
>The BSD /bin/sh is the one from v7, with minimal changes for the 4.2 BSD
>signal semantics (restarted system calls, etc.) and # as a comment
>character. The AT&T /bin/sh changed drastically beginning with System V.2,
>and further changes appeared in System V.3.
>
>Here's a short list of what was added to the v7 sh for the System V.2 sh:

---description of differences deleted---

>accepts # as a comment only when non-interactive.
>
>Chet
>--

Thanks for the list of differences. It will be useful, I'm sure!-)

However, the original posters statement was that there were differences
in how the versions of the shells handle "variables", which my
experience seems to show is either the same or very nearly so, since
all of the scripts I have written work with both BSD and AT&T sh's.
So, even though the above list is very useful, the specific point of
possible differences in how the shells handle parameters (command
line arguments and variables) is still open.

Any information on this particular point would be helpful.

Thanks,

Hans Mulder

unread,

Jan 3, 1991, 4:47:45 PM1/3/91

to

In article <1991Jan2.1...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:

>Here's a short list of what was added to the v7 sh for the System V.2 sh:

[ stuff deleted ]

>exit status of a pipeline was defined to be the exit status of the last command

Nit-pick:
Both the v7 and the SVR2 sh manuals mention that
"the exit status of a pipeline is the exit status of the last command".

The problem is the the word "last" is ambiguous.

The v7 sh takes it to mean "last to terminate", SVR2 interprets it as
"rightmost".

Thus the pipeline "sleep 10 | false" would be considered successful by
the v7 shell (since the sleep terminates last, and successfully), while
the SVR2 sh considers it unsuccessful.

I think the latter behaviour is more useful. For one thing, it makes
"if foo | grep -s bar" work right 100% of the time.

Happy New Year,

Hans Mulder ha...@cs.kun.nl

Larry Wall

unread,

Jan 3, 1991, 6:55:21 PM1/3/91

to

In article <25...@wn1.sci.kun.nl> ha...@cs.kun.nl (Hans Mulder) writes:
: I think the latter behaviour is more useful. For one thing, it makes

: "if foo | grep -s bar" work right 100% of the time.

Presuming, of course, that your grep actually returns a reasonable
exit status. Some don't, alas.

And the moral of the story? It's easier to port a shell than a shell script.

Larry Wall
lw...@jpl-devvax.jpl.nasa.gov

Darin McGrew

unread,

Jan 3, 1991, 7:36:15 PM1/3/91

to

In article <25...@wn1.sci.kun.nl> ha...@cs.kun.nl (Hans Mulder) writes:

>Both the v7 and the SVR2 sh manuals mention that
>"the exit status of a pipeline is the exit status of the last command".
>
>The problem is the the word "last" is ambiguous.
>
>The v7 sh takes it to mean "last to terminate", SVR2 interprets it as
>"rightmost".

An interesting glitch I ran into with a sh from a BSD port
(running SunOS 4.1, it works as expected). The script

(sleep 5; false)&
if (sleep 10)
then echo true
else echo false
fi

would echo "false" because the exit status from the backgrounded
subshell would clobber the exit status of other subshell. This
caused all manner of havoc in a script that forked several
background shells, and figuring out what was going wrong was a
lot of fun.

Darin McGrew mcg...@Eng.Sun.COM
Affiliation stated for identification purposes only.

Brandon S. Allbery KB8JRR

unread,

Jan 4, 1991, 10:05:43 PM1/4/91

to

As quoted from <1991Jan2.1...@usenet.ins.cwru.edu> by ch...@odin.INS.CWRU.Edu (Chet Ramey):
+---------------

| Here's a short list of what was added to the v7 sh for the System V.2 sh:
|

| colon form of parameter substitution to test for nullness of a variable

| set --
| numeric parameter to the shift builtin

| restricted shell mode
+---------------

Those four were in the System III /bin/sh.

+---------------

| System V shell includes `newgrp' as a builtin. As of 4.3 BSD, the BSD sh
| accepts # as a comment only when non-interactive.

+---------------

Silliness. I disliked this in csh, I devoutly hope AT&T didn't pick it up for
V.4 /bin/sh. Why should # be a comment only when noninteractive? There are
valid reasons to use comments during interactive sessions --- *especially*
under BSD, which has a working (read: pty-based) "script" command....

++Brandon
--
Me: Brandon S. Allbery VHF/UHF: KB8JRR on 220, 2m, 440
Internet: all...@NCoast.ORG Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery Delphi: ALLBERY

Stuart Poulin

unread,

Jan 4, 1991, 8:28:23 PM1/4/91

to

Of course if you know the last arg will never have white space and
the command line wont have args like "=", you can use expr:
Last=`expr "$*" : '.* $.*$' \| "$*"`

-or-
Use awk, it can handle a "=" by itself but still no white space.
Last=`Last=`echo "$*" | awk '{ print $NF }'`

-or-
Something with sed or grep.

I like the "for last do :; done" method - it's very clever and always works.

Stuart Poulin DNS: stu...@amc.com
Applications Engineer/System Administrator
Applied Microsystems Corporation UUCP: amc-gw!stuart
Redmond, Washington 98073 Dial: 800-ASK-4AMC,206-882-2000

John Chambers

unread,

Jan 5, 1991, 9:35:27 AM1/5/91

to

> >The problem is that I can't figure out any Bourne-shell expression that
> >gives the last argument. In C-shell, it's easy (argv[$#]). But I need
> >to write Bourne shell scripts. In fact, they need to be BSD bourne shell
> >scripts rather that ATT Bourne shell scripts. The difference is probably
> >significant here, because of the differences in the way these two Bourne
>
> As an aside, could you explain this comment? I have had minimal contact
> with BSD, but my experience does not seem to support this statement.

Well, one difference that has bitten me is
${x-foo}
versus:
${x:-foo}

Often only one of these is accepted by any particular /bin/sh, and the
other is a syntax error. Sometimes both are accepted (with different
meanings). BSD /bin/sh usually only understands the first format, and
the script dies if you use the second format.

(Isn't it wonderful to have such a nice tool for portable scripts? ;-)
--
Zippy-Says: Imagine ... a world without clothing folds, chiaroscuro, or marital difficulties ...
Home: 1-617-484-6393 Work: 1-508-952-3274
Uucp: ...!{harvard.edu,ima.com,eddie.mit.edu,ora.com}!minya!jc (John Chambers)
Uucp-map: minya adelie(DEAD)

John Chambers

unread,

Jan 5, 1991, 10:56:45 AM1/5/91

to

> What ALLWAYS works in the Bourne-Shell is this:
>
> for last do :; done

Wow! A one-liner that works for more than 9 args! Of course, there's
the question as to whether this loop is actually faster than starting
a subprocess that just does puts(argv[artc-1]), but at least there's
a way to do it that is portable.

That comment isn't worth wasting the bandwidth, of course; my motive
for this followup is a bit of bizarreness that I discovered while
testing this command. The usual format of a for loop is 3 lines:
for last
do :
done
Usually when I want to collapse such vertical code into a horizontal
format, I follow the rule "Replace the newlines with semicolons", and
it works. For instance,
if [ <test> ]
then <stuff>
else <stuff>
fi
reduces to
if [ <test> ];then <stuff>;else <stuff>;fi
which I can do in vi via a series of "Jr;" commands. With the above
for-loop, this gives
for last;do :;done
which doesn't work. The shell gives a syntax error, complaining about
an unexpected ';' in the line. Myself, I found this to be a somewhat
unexpected error message. It appears my simple-minded algorithm for
condensing code doesn't work in this case.

So what's going on here? What the @#$^&#( is the shell's syntax that
makes the semicolon not only unneeded, but illegal in this case?

One of the real hassles I keep finding with /bin/sh (and /bin/csh is
even worse ;-) is that the actual syntax regarding things like white
space, newlines, and semicolons seems to be a secret. It often takes
a lot of experimenting to find a way to get these syntax characters
right. Is there any actual documentation on sh's syntax? Is it truly
as ad-hoc as the above example implies? Is there perhaps some logical
structure underlying it all that would explain why
for last do :; done
and
for last
do :
done
both work but
for last;do :;done
doesn't?

Dan_Ja...@att.com

unread,

Jan 6, 1991, 12:41:57 PM1/6/91

to

>>>>> On 5 Jan 91 14:35:27 GMT, j...@minya.UUCP (John Chambers) said:

John> ${x-foo}
John> versus:
John> ${x:-foo}

John> Often only one of these is accepted by any particular /bin/sh, and the
John> other is a syntax error.

My understanding is that the first should always work, and the second
should work in newer /bin/sh's with this added bonus feature [BSD 4.3
/bin/sh doesn't accept it, like you mentioned].
--
Dan_Ja...@ATT.COM Naperville IL USA +1 708-979-6364

Dan_Ja...@att.com

unread,

Jan 6, 1991, 1:06:19 PM1/6/91

to

>>>>> On 5 Jan 91 15:56:45 GMT, j...@minya.UUCP (John Chambers) said
among many other things that I was too lasy to go downstirs to get the
book to look up [it's cold down there]:

John> Is there any actual documentation on sh's syntax?

In the 4.3 BSD document set you'll find a syntax description in the sh
articles in [I think] the User's Supplementary Documents volume.

Martin Weitzel

unread,

Jan 7, 1991, 3:11:20 PM1/7/91

to

In article <4...@minya.UUCP> j...@minya.UUCP (John Chambers) writes:
>> What ALLWAYS works in the Bourne-Shell is this:
>>
>> for last do :; done
>
>Wow! A one-liner that works for more than 9 args! Of course, there's
>the question as to whether this loop is actually faster than starting
>a subprocess that just does puts(argv[artc-1]), but at least there's
>a way to do it that is portable.

I have compared the alternatives here on my 386 box and as you might guess
the differences in speed depends on the length of the argument list.

For ~25 arguments the for-loop is the fastest, above that up to ~100
arguments there's few difference, but the for loop uses more usr-time and
the sub-process more sys-time. There seem to be minor differences between
what is called as sub-process, i.e. a specialized C program (as the poster
suggested) or another shell-script (as Maarten Litmaath posted earlier in
this thread).

For the rather untypical size of 250 arguments there still isn't much
difference but sometimes the sub-process is faster (the results vary over
some range and I didn't go into the efforts to calculate the average).
My general experience with the 386 is that it starts sub-processes really
fast, so I think the for-do method will even win even for more than 250
arguments on a lot of systems.

(BTW: I've learned by my experiments that the shell internally limits
the number of arguments that can be passed to a sub process to 254.
I allways thought the only limit were the space supplied by the OS
to pass the stuff to the sub-process, which is typically several KByte
for the *contents* of arguments + environment. I never noticed the limit
on the *number* of arguments before.)

>That comment isn't worth wasting the bandwidth, of course; my motive
>for this followup is a bit of bizarreness that I discovered while
>testing this command. The usual format of a for loop is 3 lines:
> for last
> do :
> done
>Usually when I want to collapse such vertical code into a horizontal
>format, I follow the rule "Replace the newlines with semicolons", and
>it works. For instance,
> if [ <test> ]
> then <stuff>
> else <stuff>
> fi
>reduces to
> if [ <test> ];then <stuff>;else <stuff>;fi
>which I can do in vi via a series of "Jr;" commands. With the above
>for-loop, this gives
> for last;do :;done
>which doesn't work. The shell gives a syntax error, complaining about
>an unexpected ';' in the line. Myself, I found this to be a somewhat
>unexpected error message. It appears my simple-minded algorithm for
>condensing code doesn't work in this case.
>
>So what's going on here? What the @#$^&#( is the shell's syntax that
>makes the semicolon not only unneeded, but illegal in this case?

Funny, I stumbled over the same thing when I "invented" my for-do method
for accessing the last argument some years ago. The explanation is a bit
longer, so all who aren't interested in the details should leave at this
point.

The syntax for the "for" statment is more or less the following (I stick
to the "yacc"-style here, but include keywords into single quotes even if
they are longer than one character, what is not allowed with "yacc"):

for_stmt : 'for' NAME 'in' word_list SEP 'do' cmd_list 'done'
| 'for' NAME 'do' cmd_list 'done'
;

word_list: WORD
| word_list WORD

cmd_list : cmd arg_list SEP
| cmd_list cmd arg_list SEP
;

arg_list : /*empty*/
| arg_list WORD
;

SEP : ';'
| '\n'
;

(The meaning of NAME and WORD should be obvious - I don't want to go into
the syntactic details too far. I have further left out an undocumented
shell feature, that allows you to replace "do" and "done" with "{" and "}";
note that the latter is only true for for-do-done, not for while-do-done
and until-do-done!)

Note that white space is allowed everywhere in between the tokens
and nonterminals. But SEP is a mandatory seperator (which can be
a newline or a semicolon). The reason for requiring a separator in
some cases is simple: There is the possibility that some keywords of
the shell might also be used as regular argument to commands or within
a word_list - we'll come back to this in a moment.

The shell detects the two forms of the "for" statement simply by looking
at what follows the loop-variable. If it is an "in" then there must also
follow a word_list, which in turn must be terminated by a mandatory
seperator, as explained above. If there follows a "do" there is no
wordlist. If there follows a semicolon after the loop-variable, this
is against the syntax (this was what the poster puzzled).

Of course, Mr. Bourne could have made the syntax to allow for it by
changing the RHS of the rule for the "for" statement without "in" into

'for' NAME SEP 'do' cmd_list 'done'

but IMHO the difficulties of the poster (and many more, me included)
have some other reason, that has something to do with the difference
between
- mandatory command separators resp. terminators and
- optional white space before commands and keywords and
- spaces as separators of command and argument list and
- the semicolon beeing allowed only in the first case and
- the newline beeing allowed in the first and second case,
- space characters beeing allowed in the second and third.

In a simple command, i.e. a programm name that is followed by some arguments,
there's not much of a problem as it seems "natural" for most users to type
spaces to separate the arguments and newlines to terminate commands and it
seems obvious that the two can not be used interchangable, as this either
would terminate the argument list prematurely (if you try to separate
arguments with a newline) or it doesn't properly end your command (if you
don't type newline).

Now let's consider the more complex shell statements. Some very stupid
users might in fact expect that the shell can read their mind, but all the
others will understand that the shell must either treat ALL keywords (and
maybe even all the commands) special, not allowing them as regular arguments,
or needs some other separator as the one used between arguments, if there
shall follow a keyword after a command (or there shall be two commands) in
the same line. The logic can be applied to most keywords regardless if
they introduce some complex command or if they mark the beginning of the
next part of the command (like "then" or "else" in an "if" statement).

More puzzling is that the shell also ALLOWS newlines in place of spaces
where it's clear that a complex command isn't complete%. One place where
this occurs is when you start a "for" statement and have not yet supplied
the matching "done". For example

for var in foo bar
<some newlines here (1)>
do <some newlines here (2)>
cmd
<some newlines here (3)>
done

is all allowed, though seldom used, except for exactly one newline in
the place marked (2). Note that the newlines before and after "cmd" here
can not simply be seen as "empty commands", because if they could, the
following would be legal:

for var in foo bar
do
done

which IS NOT, since there is at least ONE command necessary between "do"
and "done" (please refer to the syntax given above). Note further that a
semicolon by itself is NOT an empty command, as

for var in foo bar
do ;
done

does not work - you need at least the colon here:

for var in foo bar
do :
done

------
%: More puzzling is that the shell does only allow it in some places.
E.g. "for <newline>" is a syntax error while "for i <newline>"
patiently waites for the "in" or "do".
------

>One of the real hassles I keep finding with /bin/sh (and /bin/csh is
>even worse ;-) is that the actual syntax regarding things like white
>space, newlines, and semicolons seems to be a secret. It often takes
>a lot of experimenting to find a way to get these syntax characters
>right. Is there any actual documentation on sh's syntax? Is it truly
>as ad-hoc as the above example implies?

For all I know the C-shell is more or less "ad-hoc", but for the Bourne
shell (which, until now and for the rest of this article, I allways mean
when I speak of "the shell") you can find a formal syntax allready in a
very ancient document, the "Bell Systems Technical Journal" (BSTJ in short)
from July/August 1978, ISSN0005-8580. The grammar starts on page 1987 as
Appendix A of an article written by S.R. Bourne himself. Though it fails
to mention some of the finer points (like the space/newline problems just
discussed) it may serve as a start for you and I found that it could even
be fed to yacc without much problems (I never tried to fill in the actions
to make it work as a "real" shell ...)

>Is there perhaps some logical
>structure underlying it all that would explain why
> for last do :; done
>and
> for last
> do :
> done
>both work but
> for last;do :;done
>doesn't?

Well, "logic" is not so much an absolut value as many of us think, as it
often depends on what you expect. This is so because we may think we
have recognized something as a "rule" and tend to see all withstanding
observations as "illogical", where just the examples we studied were too
limited to recognize that we had only a seen special case (in this generality
that may also be true for the things we consider to be the "universal
laws" or "laws of nature" - but this brings us away from the topic.)

Now, what you observed were that newline and semicolon are interchangable
in all the examples you looked at and have tried before you came to that
"for" statement. (Remember I told you in the beginning that I had the same
problem with this - so it can not be said that your expectations were
without reason.) A bit more experimentation could also have shown that in
general the both are not really interchangable. E.g. if you type a single
newline nothing happens (except the shell prompts again), if you type two
newlines still nothing happens but if you type a semicolon + a newline this
is a syntax error. Hence semicolon and newline are not so much
interchangable as it seemed on first glance.

Now, having a little more experience we can come up with some other
explanation:

- commands can not be empty (they consist at least of
an external or builtin command; the ":" is the builtin
command which does nothing but evaluate its arguments)
- a semicolon or a newline% terminates a command
- a command list is a non-empty sequence of commands, all
of which must be properly terminated
- a semicolon or a newline terminates the word list of
the "in" part of the "for" statement
- space characters and newlines are allowed before commands
- nearly all the keywords of the shell are only recognized if
they are found in the position of a command, i.e. if there is
a previous command or a word list of a "for" statement there
MUST be a separator and their CAN be some space characters or
newlines
- the most important exceptions from the above are "in" (as
well for the "for" statement as for the "case" statement) and
"do". But as the word list in the "in" part of a "for" statement
(or the command list after the "while" or "until" in such a
statement) must be properly terminated, a "do" NOT in command
position can only occur in a "in"-less "for" statement.

-----
%: There are other valid command separators/terminators that are recognized
together with the semicolon, but this doesn't matter here.
-----

In some sense, this are the "laws of nature" as derived from observing
the shell's behaviour. As the shell is not really nature but the outcome
of the thoughts of some human beeing, we could of course complain now
that this is "illogical" (compared to our sense logic!) or that there
are "too many exceptions" and that it could be simplified with fewer,
but more general rules.

But when thinking how to smoothen things out by using fewer rules, we
often do not recognize all the consequences that this would have.
Assume for a momemt we would treat both, newline and semicolon, as
statement terminator. Have you really considered what this would mean?
Typing a newline (at your terminal or as empty line in a shell script)
would be a syntax error (sic!) as a single semicolon is. Quite simple
I hear you say, then we allow for an empty statement to be really empty,
which would allow for single newlines as well as single semicolons. But
be careful! We then must think about the exit status of such a statement.
Should it allways be true as the colon command? But then you must be very
careful inserting empty lines into a script, because the following two
would have different semantics

if | if cmd
cmd |
then | then

and you must never separate command execution and accessing $? by a
newline, since the empty command "newline" destroys the value of any
previous command's exit status. Again I hear you say, we make the
empty statment special - it shall leave the status of the "real" command
that was executed last. But now the following will become dangerous

while
do
<do something until exit or break>
done

as it depends on the last command BEFORE the loop when the loop is
entered the first time, and after that on the last command executed
WITHIN the loop. So, step by step we may introduce more special casing
for something that looked like a trivial change in the first place!

I hope you have gained a little more understanding for the syntax of the
shell now. It isn't really as strange as it might seem on first glance,
though I admit a few things are not so obvious and it's easy to come to
some wrong conclusions if you have insufficient experience. (If this
article hadn't become that long I could write a little more on it - maybe
some other time.)

Brandon S. Allbery KB8JRR

unread,

Jan 7, 1991, 8:52:06 PM1/7/91

to

As quoted from <4...@minya.UUCP> by j...@minya.UUCP (John Chambers):
+---------------

+---------------

I suspect a /bin/sh bug. I just tried it on an ancient ksh and an almost-as-
old [ ;-) ] Xenix; Xenix sh barfed, but ksh accepted it. I'll try it under
System V 3.2 /bin/sh tomorrow.

Chet Ramey

unread,

Jan 7, 1991, 2:12:18 PM1/7/91

to

In article <DANJ1.91J...@cbnewse.ATT.COM> da...@ihlpa.att.com writes:
>John> Is there any actual documentation on sh's syntax?

>In the 4.3 BSD document set you'll find a syntax description in the sh
>articles in [I think] the User's Supplementary Documents volume.

The article is ``An Introduction to the Unix Shell'', USD:3, by S. R. Bourne.
There is a grammar in the appendix.

Take it with a grain of salt, though; as described by that grammar, sh does
not accept ``who | wc'' as a legal command.

Chet
--
Chet Ramey ``There's just no surf in
Network Services Group Cleveland, U.S.A. ...''

Eric Gisin

unread,

Jan 8, 1991, 3:29:45 PM1/8/91

to

The shell's interpretation of newline is context sensitive.
It is usually equivalent to ";", but in a few cases it
is equivalent to white-space (space or tab). The latter cases
include after "|", "&&", "||", "for NAME", and "case WORD".

So all the following are valid:
$ ls |
> wc
$ true &&
> false ||
> maybe
$ for x
> in a b c
> do :
> done
$ case x
> in x) echo x! # ;; optional here
> esac

Arnold Robbins

unread,

Jan 8, 1991, 1:17:38 PM1/8/91

to

In article <1991Jan2.1...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:

>The BSD /bin/sh is the one from v7, with minimal changes for the 4.2 BSD
>signal semantics (restarted system calls, etc.) and # as a comment
>character. The AT&T /bin/sh changed drastically beginning with System V.2,
>and further changes appeared in System V.3.

Actually, many of the features noted below appeared as early as the
System III shell. My manual is at my parents' home, so most of this is
from memory:

I am pretty sure that all of these were in the System III sh.

redirection of input/output for builtins (e.g. `read x < /dev/tty')

test/[, echo, pwd, ulimit builtins

colon form of parameter substitution to test for nullness of a variable
CDPATH

# as the comment character

the `<<-' form of here-document redirection (strip leading tabs)

I'm not so sure about these, but believe they also date from System III:

set --
numeric parameter to the shift builtin

MAILCHECK, MAILPATH
I think adding a numeric parameter to break and next happened at S3 too.

This is all pretty accurate:

>Here's a short list of what was added to the v7 sh for the System V.2 sh:
shell functions and the `return' builtin

command hashing, `set -h', and the hash builtin

exit status of a pipeline was defined to be the exit status of
the last command
set -a, set -f

the use of `!' to negate a [] pattern for filename generation

I'm not sure when this showed up...
the unset builtin (!)

>restricted shell mode
This was actually in the V7 sh (or at least in the 4.2 version) but I
don't think it was documented, and it may never have worked very well.

Something else that's been in the Bourne shell since forever is redirection
with <>, which opens a file for reading and writing. It is documented in
the Korn shell, since David says he only got it working right recently (i.e.
within the past few years).

I can check the System III manual if folks are really interested.
--
Arnold Robbins AudioFAX, Inc. | Laundry increases
2000 Powers Ferry Road, #200 / Marietta, GA. 30067 | exponentially in the
INTERNET: arn...@audiofax.com Phone: +1 404 933 7612 | number of children.
UUCP: emory!audfax!arnold Fax-box: +1 404 618 4581 | -- Miriam Robbins

Wm E. Davidsen Jr

unread,

Jan 8, 1991, 8:52:41 PM1/8/91

to

In article <4...@minya.UUCP> j...@minya.UUCP (John Chambers) writes:

| Often only one of these is accepted by any particular /bin/sh, and the
| other is a syntax error. Sometimes both are accepted (with different
| meanings).

I don't dispute what you say, but I have never seen a sh which didn't
buy {$a-b} notation. Of course the meaning is different when both are
available, the whole : set of expansions was added about SysIII (my
SysIII system is down or I'd check).

${a-b} if $a is undefined use b instead
${a:-b} if $a is undefined *or null* use b instead

and so on for all the other operators.

Chet Ramey

unread,

Jan 9, 1991, 4:58:29 PM1/9/91

to

In article <3...@audfax.audiofax.com> arn...@audiofax.com (Arnold Robbins) writes:

>I am pretty sure that all of these were in the System III sh.
> redirection of input/output for builtins (e.g. `read x < /dev/tty')

This did not appear until V.2. The V.2 sh manual has a sentence to the
effect that `now you can do redirection with builtins'.

Another thing new with the V.2 shell that I forgot to mention is the source
conversion from `Bournegol' to C.

Chet
--
Chet Ramey ``There's just no surf in
Network Services Group Cleveland, U.S.A. ...''

Arnold Robbins

unread,

Jan 10, 1991, 12:41:09 PM1/10/91

to

In article <1991Jan9.2...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:
>Another thing new with the V.2 shell that I forgot to mention is the source
>conversion from `Bournegol' to C.

And, boy, did it make a difference! A quantum leap in readability and
maintainability (and therefore modifiability) of the shell. It seems to me
that at about V.2 AT&T got serious, and went through *everything*, formatting
the C code, regularizing argument parsing via getopt, and so on.

I think it's pretty fair to say that AT&T concentrated on the user-level
stuff through V.2, while UCB concentrated on the kernel level stuff
through 4.3.

Michael Meissner

unread,

Jan 10, 1991, 6:22:03 PM1/10/91

to

In article <3...@audfax.audiofax.com> arn...@audiofax.com (Arnold Robbins) writes:

| In article <1991Jan9.2...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:
| >Another thing new with the V.2 shell that I forgot to mention is the source
| >conversion from `Bournegol' to C.
|
| And, boy, did it make a difference! A quantum leap in readability and
| maintainability (and therefore modifiability) of the shell. It seems to me
| that at about V.2 AT&T got serious, and went through *everything*, formatting
| the C code, regularizing argument parsing via getopt, and so on.

I seem to remember that people were saying that the C version of the
shell was much faster in doing shell scripts then the Bournegol
version, though I suspect the real win was caused by not exec-ing test
and echo. My .profile seems to do a zillion if's and such, and it was
MUCH faster using a modern shell (System V.2 /bin/sh, ksh, or bash)
than the tired old V7 shell (which Ultrix shipped as /bin/sh).
--
Michael Meissner email: meis...@osf.org phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

Brandon S. Allbery KB8JRR

unread,

Jan 10, 1991, 10:54:16 PM1/10/91

to

As quoted from <10...@mwtech.UUCP> by mar...@mwtech.UUCP (Martin Weitzel):
+---------------

| In some sense, this are the "laws of nature" as derived from observing
| the shell's behaviour. As the shell is not really nature but the outcome
| of the thoughts of some human beeing, we could of course complain now
| that this is "illogical" (compared to our sense logic!) or that there
| are "too many exceptions" and that it could be simplified with fewer,
| but more general rules.
|
| But when thinking how to smoothen things out by using fewer rules, we
| often do not recognize all the consequences that this would have.

+---------------

There is one other problem. I daresay it would be possible to make Bourne
shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK!
At least, not without making the shell much less useful --- yacc (or other
parser generators) grammars are not designed for interaction. In order to
do interaction *well*, the shell needs to be able to have at least some idea
of what is going on *without* having read an entire complex command (read
"if/while/for/case/etc."). I've tried writing a yacc grammar that does this
kind of thing in a graceful manner; I ended up using context-sensitive hacks,
which I dislike in otherwise simple parsers. This is also why csh is not
actually like C --- C can depend on the parser collecting statements for it,
but csh is primarily designed for interactive use and therefore must be able
to keep track of what's going on incrementally.

Boyd Roberts

unread,

Jan 11, 1991, 12:17:57 AM1/11/91

to

In article <1991Jan9.2...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:
>

>Another thing new with the V.2 shell that I forgot to mention is the source
>conversion from `Bournegol' to C.
>

You mean `shellgol'.

Boyd Roberts bo...@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

Chet Ramey

unread,

Jan 11, 1991, 9:45:26 AM1/11/91

to

In article <19...@necisa.ho.necisa.oz.au> bo...@necisa.ho.necisa.oz.au (Boyd Roberts) writes:

>You mean `shellgol'.

Well, adb is written in it, too...

Ronald S H Khoo

unread,

Jan 11, 1991, 8:22:25 PM1/11/91

to

all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:

> There is one other problem. I daresay it would be possible to make Bourne
> shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK!
> At least, not without making the shell much less useful

Well, the some of the chaps at research seem to be quite happy with "rc"
and that's got a yacc grammar... Apparently it was too painful to
port /bin/sh to Plan 9 so Duff wrote "rc". (He presented a paper on it
to the UKUUG Summer Conference last year)

rc has exactly what you describe -- a regularised /bin/sh syntax.

And of course, since they use Gnots running Pike's windowing stuff, no
command line history/editing or anything like that in rc, it's just a shell,
and looks quite nice too. Pity it's not available.
--
ron...@robobar.co.uk +44 81 991 1142 (O) +44 71 229 7741 (H)

Brandon S. Allbery KB8JRR

unread,

Jan 12, 1991, 5:25:07 PM1/12/91

to

As quoted from <1991Jan12....@robobar.co.uk> by ron...@robobar.co.uk (Ronald S H Khoo):
+---------------

| all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
| > There is one other problem. I daresay it would be possible to make Bourne
| > shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK!
| > At least, not without making the shell much less useful
|
| Well, the some of the chaps at research seem to be quite happy with "rc"
| and that's got a yacc grammar... Apparently it was too painful to
| port /bin/sh to Plan 9 so Duff wrote "rc". (He presented a paper on it
| to the UKUUG Summer Conference last year)

+---------------

I wondered if anyone would comment about that after I read the "rc" stuff.
However, "rc" follows the general Plan 9 form (which, many ages, ago, was the
general Unix form) of moving stuff into separate programs. "rc" is, in many
ways, nowhere near as complex as even the V7 shell, much less the System V
shell; it can get away with simple means of handling interactiveness in
complex control structures. I was able to handle interactive use simply in a
certain yacc grammar up to a certain point, then I had to start using context
flags all over the place to make interactive use behave in an intuitive way.
I don't recall what point it was, except that the program I was working on was
gradually turning into a shell, which is why I eventually scrapped it in favor
of using the existing shell.

Martin Weitzel

unread,

Jan 13, 1991, 2:00:07 PM1/13/91

to

In article <1991Jan11.0...@NCoast.ORG> all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
>As quoted from <10...@mwtech.UUCP> by mar...@mwtech.UUCP (Martin Weitzel):
>+---------------

>| But when thinking how to smoothen [the shell syntax by using] fewer rules,

>| we often do not recognize all the consequences that this would have.
>+---------------
>
>There is one other problem. I daresay it would be possible to make Bourne
>shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK!
>At least, not without making the shell much less useful --- yacc (or other
>parser generators) grammars are not designed for interaction.

My observations differ a little here. It is true that using a parser
generator like yacc sometimes makes less concious of the actual parsing
algorithm that may have to look for the next token to decide which rule
should be reduced (and hence which action should be executed).

But you can also write yacc-able grammars that can be parsed without look
ahead! (Actions are generally a bit more complex then - in most cases you
have to build the parsing tree explicitly as data structur rather than
simply depend on yyparse's value stack.)

But the conclusion that parsers generator grammars are not designed for
interaction is similar to the `goto-considered-harmful' discussion: You
cannot say that C programs are generally less structured just because
the language contains a `goto'-statement. It much depends on the typical
usage of the `goto' throughout a program, whether the program looks
structured or more like spaghetti-code. Of course, if C had no `goto'
at all even those old-time BASIC-hackers were forced to look at other
ways to do control-flow. In so far I see some truth in Brandon's statement:
Parser generators make it easy to write grammars which do not fit well
into an interactive environment.

>In order to
>do interaction *well*, the shell needs to be able to have at least some idea
>of what is going on *without* having read an entire complex command (read
>"if/while/for/case/etc."). I've tried writing a yacc grammar that does this
>kind of thing in a graceful manner; I ended up using context-sensitive hacks,
>which I dislike in otherwise simple parsers.

Again, `context-sensitive hacks' are not a bad thing a priori (maybe they
are if they are real `hacks', but I think Brandon meant that he fed
back some information from the syntax analysis to the lexer). There are
two different situations: Either you plan a completly new syntax for
a new language. In this case I would not recommend the coupling between
parser and scanner, because such a syntax becomes more difficult to learn
for a user of this new language (things have different meanings in different
contexts).

On the other hand, if you need to parse a given language that the user
allready knows (e.g. some natural language or a sub-language thereof),
feedback from syntax analysis to lexical analysis will help much, as long
as it duplicates what the user allready expects.

Finding a yacc-able syntax for the Bourne-Shell is a mixed case: A
long-time shell-user would expect all the things in it that a newcomer
might consider to be irregularities. (I don't dare to decide which are
really irregularities as I belong rather to the former group, but at
least I know that most of the irregularities - e.g. implied double
quotes around the word after an `=' in an assignment and between
`case-in' - help to save some key-strokes, though they really are very
non-intuitive for newcomers.)

>This is also why csh is not
>actually like C --- C can depend on the parser collecting statements for it,
>but csh is primarily designed for interactive use and therefore must be able
>to keep track of what's going on incrementally.

Here I can second Brandon's statement and will even work it out a bit more:
One of the major problems come up if the syntax allows an if-statement with
an optional else-part, as this is the case in C (but not in the Bourne
Shell, as it has the closing `fi'). The user expects (of course) that
the if-part should be executed after it is completly written down.
But the parsing algorithm may want to look if there follows an `else'.
This is because the user "knows" what he or she will do next but the
Shell can not read the user's mind. That sort of things must be taken
care of during the design of an interactive language. Simply adopting
the syntax of a non-interactive language for an interactive language is
bound to fail here.

To summarize: IMHO it are not the parser generators which complicate
things, but inappropriate design of an interactive language.
(Esp. to Brandon: Do your experiences stem from trying to derive a
yacc-able grammar for the Bourne-Shell or rather for the C-Shell?)

BTW: I've redirected followups to comp.lang.misc, since the topic tends
to turn away from the focus of comp.unix.shell.

Chuck.Phillips

unread,

Jan 14, 1991, 10:43:04 AM1/14/91

to

>>>>> On 5 Jan 91 15:56:45 GMT, j...@minya.UUCP (John Chambers) said:
John> With the above for-loop, this gives
John> for last;do :;done
John> which doesn't work. The shell gives a syntax error, complaining
John> about an unexpected ';' in the line. Myself, I found this to be a
John> somewhat unexpected error message. It appears my simple-minded
John> algorithm for condensing code doesn't work in this case.

John> So what's going on here? What the @#$^&#( is the shell's syntax that
John> makes the semicolon not only unneeded, but illegal in this case?

A little sh history: sh comments originally went from a ':' to the end of a
line. ('#'-style comments were added later.) The funny thing about ':'
comments is they get evaluated and have a return value, unlike '#' comments
which are merely whitespace.

For example:

: Arbitrary text here > foop

This "comment" wipes out the file foop, if it exists, then creates a null
file called "foop". (Wonder why they added '#' comments? ;^) The shell
does not attempt to execute a command called "Arbitrary", and the comment
has a return value of zero (a.k.a. success).

':' statements have other uses besides a filler with a return value of '0'
and an obscure way to create null files.

: ${VARIABLE=value}

Sets variable VARIABLE to "value" unless VARIABLE is already set. Lastly,
beginning a sh script with ":" instead of "#! /bin/sh" saves an unnecessary
fork and exec on some versions of UNIX. (Note to bash users: In bash,
scripts beginning with ":" are interpreted by bash while "#! /bin/sh"
scripts are, of course, executed by "/bin/sh".)

To answer your original question:

for last;do :;done

The ";done" part of the statement is a comment, which yields a syntax
error. (sh error messages are often misleading, IMHO.)

John> One of the real hassles I keep finding with /bin/sh (and /bin/csh is
John> even worse ;-) is that the actual syntax regarding things like white
John> space, newlines, and semicolons seems to be a secret.

Oh, that reminds me. You're not to repeat anything I've said. :-)
I agree that csh is worse, mostly due to its (mis)handling of newlines. I
still use csh interactively, but program in sh after being burned a few
times by csh's unexpected syntax exceptions and variations. (Just try
piping the result of one loop into another loop in csh without using a
temporary file, FIFO or multiple invocations of csh.)

#include <std/disclaimer.h>

Hope this helps,
--
Chuck Phillips MS440
NCR Microelectronics chuck.phillips%ftcollins.ncr.com
2001 Danfield Ct.
Ft. Collins, CO. 80525 ...uunet!ncrlnk!ncr-mpd!bach!chuckp

Arnold Robbins

unread,

Jan 14, 1991, 1:40:29 PM1/14/91

to

In article <MEISSNER.91...@curley.osf.org> meis...@osf.org (Michael Meissner) writes:
>I seem to remember that people were saying that the C version of the
>shell was much faster in doing shell scripts then the Bournegol
>version, though I suspect the real win was caused by not exec-ing test
>and echo.

Undoubtedly the case. In my previous life as a system administrator,
/bin/test on a 4BSD vax was the number one command executed, according
to the accounting data. As much as it goes against the conceptual
decomposiiton of "one job, one program" that is part of the Unix philosophy,
having test built in to the shell is an obvious win. I'm less convinced that
this is the case with echo and pwd, but I don't remember the statistics
beyond that test was number one.

Maarten Litmaath

unread,

Jan 17, 1991, 2:42:03 PM1/17/91

to

In article <CHUCK.PHILLIPS...@halley.FtCollins.NCR.COM>,
Chuck.P...@FtCollins.NCR.COM (Chuck.Phillips) writes:
)[...]
)for last;do :;done
)
)The ";done" part of the statement is a comment, which yields a syntax
)error. (sh error messages are often misleading, IMHO.)

Wrong. It's the FIRST semicolon that's the syntax error, and the `;done'
part is NOT a comment. How could it ever be a comment? You said yourself
that the words following a `:' command are parsed, that is, the unquoted
semicolon is an operator ending the command.
--
kinne...@kuhub.cc.ukans.edu (Bill Kinnersley):
"Do phonograph turntables turn the other way in Australia?"
g...@krebs.acc.Virginia.EDU (Galen J. Hekhuis)
"How do you think satanic messages were discovered on records?"

Joe Smith

unread,

Jan 18, 1991, 2:41:44 AM1/18/91

to

In article <1991Jan2.1...@usenet.ins.cwru.edu> ch...@po.CWRU.Edu writes:
>The BSD /bin/sh is the one from v7, with minimal changes for the 4.2 BSD
>signal semantics (restarted system calls, etc.) and # as a comment
>character. The AT&T /bin/sh changed drastically beginning with System V.2,
>and further changes appeared in System V.3.

[Lots of good details deleted.]

From the discription, it sounds like the sh that Sun ships with SunOS-3.5
and later is much closer to the ATT version than the BSD version.
Is this a correct interpretation?
--
Joe Smith (408)922-6220 | SMTP: j...@tardis.tymnet.com or j...@gemini.tymnet.com
BT Tymnet Tech Services | UUCP: ...!{ames,pyramid}!oliveb!tymix!tardis!jms
PO Box 49019, MS-C51 | BIX: smithjoe | CA license plate: "POPJ P," (PDP-10)
San Jose, CA 95161-9019 | humorous dislaimer: "My Amiga 3000 speaks for me."

Chet Ramey

unread,

Jan 18, 1991, 9:31:01 AM1/18/91

to

In article <14...@tardis.Tymnet.COM> j...@tardis.Tymnet.COM (Joe Smith) writes:
>From the discription, it sounds like the sh that Sun ships with SunOS-3.5
>and later is much closer to the ATT version than the BSD version.
>Is this a correct interpretation?

Yes. The version of sh shipped in SunOS 3.x (x >= 0, I think -- Guy Harris
will catch me if I'm wrong) is based on the V.2 /bin/sh, with mods from the
BRL version of that shell to fix up it's baroque memory management a bit.

Starting with SunOS 4.0, the Sun sh is derived from the V.3 sh.

Chet