One of the best expositions of the Unix "getopt-style" command line
option argument paradigm that there was for many years can be found
in appendix 5 of _Unix for Superusers_ by Eric Foxley. It is a clear
precursor of what can be found in the XBD section of the Single Unix
Specification. It does, however, predate (by roughly 5 years) the
introduction of the notion of "long" (i.e. multiple-letter) options
beginning with double '-' characters.
AKA> What opinions do people have about standards for commandline options?
AKA> In particular, I'm thinking about format, delimiting, and always-present
AKA> options. I would also love to see resource sites about this.
I can describe what I do.
I've published, amongst other things, two major tool suites. Complete
and detailed explanations of their command tail handling are too long to
post here, but can be found in the documentation of either. Here is a
précis that might seem long, but that is in fact shorter than what can
be found in the documentation:
All of the tools in a suite follow a single, consistent, parsing model
when it comes to splitting a command tail into arguments. Quotation
marks can be used to prevent words from being split at whitespace and
backslashes can be used within quoted strings to include literal
quotation marks. Each suite has a set of "standard options" (such as,
for examples, /?, /S, /LOCALDATE, and /U) that mean the same thing to
more than one command in the suite. All commands (apart from ATTRIB,
for obvious syntactic reasons) support both '-' and '/' as the option
argument delimiter character (selecting between them using the simple
heuristic of using the first one encountered as the option delimiter
from that point onwards until the end of the command tail). An argument
comprising exactly two option delimiter characters disables all further
option argument recognition until the end of the command tail, allowing
non-option arguments that happen to begin with the option delimiter
character to be specified. The /? standard option must be the first
argument in the command tail and causes the remainder of the command
tail to be ignored. Only single-letter options that themselves take
no arguments can be combined behind a single delimiter character.
Multiple-letter options or options that require arguments cannot be
combined. Option arguments that themselves take arguments (such as /S
and /A) can be immediately followed by their argument or be separated
from that argument by ':' or '='. Option arguments that control boolean
flags (such as, for examples, /E and /OLDDESTDIR) may optionally be
followed by '+' or '-' to force the flag to the true or false state,
and without either simply toggle the state of the flag.
AFAIK, there is no simple way to describe a complete DOSISH command
line parser. E.g., what should you do so that the only argument the
command gets is the string (starts with a, ends with i)
a \b \\c\"\d e\\""f g\\\"""\\""\"h"i
Even EMX docs do not describe the exact quoting rules (what the user
needs to know are un-quoting rules: what should be the command line to
get the required argc/argv).
Ilya
P.S. Let me experiment... With EMX
perl -e "print shift" "a \b \\c\\\"\d e\\\\\"\"f g\\\\\\\"\"\"\\\\\"\"\\\"h\"i"
works. So the rule for EMX is:
for every " (possibly preceeded by zero or more backslashes) one
needs to backwack all these backslashes, and ".
Not *that* complicated (but I cheated: several years ago I read the
relevant part of the source for EMX). The only non-obvious part is
that one does not need to backwack backslashes which are not followed
by ".
I wondered when you would show up. <g>
Jonathan de Boyne Pollard wrote:
>> Surprisingly, I can find few out there.
>
> One of the best expositions of the Unix "getopt-style" command line
> ... appendix 5 of _Unix for Superusers_ by Eric Foxley. It is a clear
> precursor of what can be found in the XBD section of the Single Unix
> Specification. It does, however, predate (by roughly 5 years) the
> introduction of the notion of "long" (i.e. multiple-letter) options
> beginning with double '-' characters.
I've checked getopt implementations and the IEEE POSIX standards, but have heard
nothing about Foxley, XBD, or the "Single Unix Specification", so those are all
handy. The long option lack is possibly an advantage - it will be much clearer,
and getting people to even use single-character options consistently would be a
major step forward.
>
>> What opinions do people have about standards for commandline options?
> All of the tools in a suite follow a single, consistent, parsing model
> when it comes to splitting a command tail into arguments. Quotation
> marks can be used to prevent words from being split at whitespace and
> backslashes can be used within quoted strings to include literal
> quotation marks.
I was wondering about the string escapes that should be used; looks like this is
a Windows standard for some people too (the WScript.Arguments collection uses
it, for example).
Each suite has a set of "standard options" (such as,
> for examples, /?, /S, /LOCALDATE, and /U) that mean the same thing to
> more than one command in the suite.
Good.
> ... The /? standard option
> must be the first argument in the command tail and causes the
> remainder of the command
> tail to be ignored.
Question on this, on one topic you didn't mention (handling of unrecognized
arguments), and a peripheral one about long options.
(1) On help options, GNU seems to suggest that a helpish switch should always
call up help and not execute anything; Unfortunately, I saw that in their
2-paragraph "standard options" docs which covered long args ("--help" was the
ref), so it doesn't necessarily imply that they think short options should act
that way. Thoughts?
(2) Unrecognizable arguments - what do you do with those?
This normally (and appropriately I think) causes immediate termination in apps;
usually I see one of the following:
+ a note about the help switch.
+ often, the help info and an echo of the unrecognized option.
+ Sometimes, the help itself is displayed; this may or may not mention what
argument triggered it.
My uninformed opinion right now is that the following probably "should" all be
done in this case:
+ Echo back the failing option FIRST for easy parsing
+ include a canonical form of the help switch for more information.
+ Set errorlevel to something in particular.
+ I don't know about the help echo; if the full help is very long, I would
definitely not want to display it in full form to avoid obscuring the error
result.
(3) Long Options
How desirable are these in general?
I can see the merit in them for tools with lots of options. In terms of
creating a general, basic library for people to use in writing CLI tools,
however, what is lost other than mnemonic value by having a very general, simple
library that ONLY supports short-form options?
> followed by '+' or '-' to force the flag to the true or false state,
> and without either simply toggle the state of the flag.
That's a handy one - and I haven't seen it outlined before as such..
--
Please respond in the newsgroup so everyone may benefit.
http://dev.remotenetworktechnology.com
(email requests for support contract information welcomed)
----------
Microsoft's new UNIFIED Terminal Services Newsgroup:
news:microsoft.public.windows.terminal_services
AKA> (1) On help options, GNU seems to suggest that a helpish switch
AKA> should always call up help and not execute anything; Unfortunately,
AKA> I saw that in their 2-paragraph "standard options" docs which
AKA> covered long args ("--help" was the ref), so it doesn't
AKA> necessarily imply that they think short options should act that
AKA> way. Thoughts?
The simple answer is that there isn't a generally accepted single
character option on Unices and on Linux for obtaining on-line help. For
one thing, it's not obvious what that character should be. "-?"
wouldn't work. "-h" already has historical meanings that are
different.
This is why such behaviour isn't mentioned in relation to single
character options.
AKA> (2) Unrecognizable arguments - what do you do with those?
I don't do anything. But my programs display an error message and the
synopsis from the on-line help (but not the complete list of arguments
and options), and exit with a failure status. Two examples:
[C:\]SET PROMPT=Errorlevel is $R [$P]
Errorlevel is 0 [C:\]dnsqry 1 2 3
IUZ0002: Unrecognised argument "3".
Usage: DNSQRY [/?] [/SERVERIP:a.b.c.d] [/SERVERPORT:n] [/CLIENTIP:a.b.c.d] [/CLIENTPORT:n] [/RECURSIVE] type name
Errorlevel is 1 [C:\]comp 1 2 3
CLU0003: Unrecognised argument "3".
Usage: COMP [/?] [/A[[+|-]drash]] [/S[[+|-]drash]] [/M[n]] [/E/Q/U[+|-]] Filespec1 Filespec2
Errorlevel is 1 [C:\]
AKA> + I don't know about the help echo; if the full help is very
AKA> long, I would definitely not want to display it in full form
AKA> to avoid obscuring the error result.
Precisely.
[C:\]grep /? | wc
lines words letters chars bytes (name)
-------- -------- -------- -------- --------
56 421 1509 2201 2313
-------- -------- -------- -------- --------
56 421 1509 2201 2313 (Total)
[C:\]grep
CLU0018: A pattern to match against is required.
Usage: GREP [/?] [/A[[+|-]drash]] [/S[[+|-]drash]] [/O[num[,num]]] [/R/E/L/I/C/Q/B/U/V/Z[+|-]] Pattern [Filespecs ...]
[C:\]
That's not because the rules for parsing command tails are complicated. They
usually aren't. It's because more than set of rules exists. For example:
Programs compiled with different C or C++ compilers may well employ different
rules. For another example: Programs not written in C or in C++ may not
provide a whitespace quoting mechanism at all.