Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

python 2.7.12 on Linux behaving differently than on Windows

394 views
Skip to first unread message

DFS

unread,
Dec 4, 2016, 3:26:32 PM12/4/16
to
$python program.py column1=2174 and column2='R'


Windows (correct)
$print sys.argv[3]
column2='R'

Linux (incorrect)
$print sys.argv[3]
column2=R

It drops the apostrophes, and the subsequent db call throws an error:
sqlite3.OperationalError: no such column: R

The way it's used in code is:
argcnt = len(sys.argv)
querystr = ' '.join(sys.argv[1:argcnt])


I tried using dbl-quotes in the command line, and with the join()
statement, and neither worked.


Edit: I got it to work this way:
column2="'R'"

but that's bogus, and I don't want users to have to do that.


Ideas?

Thanks

BartC

unread,
Dec 4, 2016, 5:20:11 PM12/4/16
to
You can put double quotes around the whole thing:

"column2='R'"

otherwise I don't know what the solution other than for a program be
aware of the possibility and allow for either input, if there are no
conflicts (for example if both R and 'R' are valid inputs and mean
different things).

Command parameters /do/ behave differently between Windows and Linux,
for example try writing *.* as that third parameter.

In Windows, it will print *.*.

In Linux, if you have 273 files in the current directory, if will print
the name of the first, and there will be /272 further command
parameters/, each the name of a file. (I couldn't believe this when I
found out; one of my directories recently had 3.4 million files in it, I
don't really want *.* expanded to 3.4m arguments. Here, the fix is again
to use double quotes: "*.*". But what if the user doesn't do that?)

--
Bartc


Steve D'Aprano

unread,
Dec 4, 2016, 5:52:48 PM12/4/16
to
On Mon, 5 Dec 2016 07:26 am, DFS wrote:

> $python program.py column1=2174 and column2='R'

Here is a simple script demonstrating the issue:


# --- program.py ---
import sys
print "argv:", sys.argv
print ' '.join(sys.argv[1:])



I haven't tested it on Windows, but on Linux it behaves as you describe (and
as Linux users will expect):

[steve@ando ~]$ python program.py column1=2174 and column2='R'
argv: ['program.py', 'column1=2174', 'and', 'column2=R']
column1=2174 and column2=R



This is *absolutely normal behaviour* for the most common shell on Linux,
bash. I would expect that the other common shells will do the same thing.
Quotation marks need to be escaped if you want the shell to pass them
through to the program, either like this:

column2="'R'"


or like this:

column2=\'R\'



> It drops the apostrophes, and the subsequent db call throws an error:
> sqlite3.OperationalError: no such column: R

I'm not sure how to interpret this error, so I'm guessing. Please correct me
if I'm wrong, but doesn't this mean that your column is called:

single quote R single quote

that is, literally 'R', which means that if you were using it in Python
code, you would have to write the column name as this?

"'R'"

If so, perhaps the best solution is to get rid of the quotes in the column
names, so that all your users, Windows and Linux, just write this:

program.py column1=2174 and column2=R


(P.S. what happens if they write

program.py column1 = 2174 and column2 = R

instead?)



> The way it's used in code is:
> argcnt = len(sys.argv)
> querystr = ' '.join(sys.argv[1:argcnt])

You can simplify that to:

querystr = ' '.join(sys.argv[1:])




> I tried using dbl-quotes in the command line, and with the join()
> statement, and neither worked.

By the time you join the arguments, its too late. You have to quote them
first.


> Edit: I got it to work this way:
> column2="'R'"
>
> but that's bogus, and I don't want users to have to do that.

(1) It's not bogus.

(2) Linux users will expect that you have to escape quotation marks if you
want to pass them through the shell.

(3) If my interpretation is correct, what *is* bogus is that that your
column names include quotes in them. Get rid of the quotes, and your
problem goes away.

If that's not the case, then you'll either have to get your users to escape
the quotes, or you'll have to add them in yourself.

(In case this is not obvious by now, this is not a Python issue. This is
entirely due to the behaviour of the shell.)




--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

Chris Angelico

unread,
Dec 4, 2016, 5:58:40 PM12/4/16
to
On Mon, Dec 5, 2016 at 9:19 AM, BartC <b...@freeuk.com> wrote:
> Command parameters /do/ behave differently between Windows and Linux, for
> example try writing *.* as that third parameter.
>
> In Windows, it will print *.*.
>
> In Linux, if you have 273 files in the current directory, if will print the
> name of the first, and there will be /272 further command parameters/, each
> the name of a file. (I couldn't believe this when I found out; one of my
> directories recently had 3.4 million files in it, I don't really want *.*
> expanded to 3.4m arguments. Here, the fix is again to use double quotes:
> "*.*". But what if the user doesn't do that?)

Technically this isn't a Win/Lin difference, but a shell difference.
The behaviours you're referring to are common to many Unix shells
(including bash, the most commonly used shell on typical Linux
systems). If you want a glob to be processed by the application,
rather than the shell, you have to escape it with quotes or
backslashes. Most of the time, it's easiest to write your app to
accept multiple file names, and let the shell expand them - there's a
lot more flexibility than just * and ?, and every program behaves the
same way, because it's the same shell parsing them all.

ChrisA

Gregory Ewing

unread,
Dec 4, 2016, 6:06:54 PM12/4/16
to
Steve D'Aprano wrote:
> On Mon, 5 Dec 2016 07:26 am, DFS wrote:
>
>> no such column: R
>
> doesn't this mean that your column is called:
>
> single quote R single quote

I think he intends it to be an SQL string literal (which uses
single quotes), but since the quotes disappeared, SQL is trying
to interpret it as a column name.

>>Edit: I got it to work this way:
>>column2="'R'"
>>
>>but that's bogus, and I don't want users to have to do that.

DFS: Is the argument always going to be a literal, or could
the user sometimes want to pass a column name?

If it's always a literal, your program could add the quotes
before passing it to SQL.

If it's not always a literal, you'll just have to deal with
the fact that the unix shell is a programming language that
has its own interpretation of quotes. Either the user will
have to escape the quotes, or you'll have to provide another
way of distinguishing literals from column names.

--
Greg

Steve D'Aprano

unread,
Dec 4, 2016, 6:26:14 PM12/4/16
to
On Mon, 5 Dec 2016 09:19 am, BartC wrote:

> Command parameters /do/ behave differently between Windows and Linux,
> for example try writing *.* as that third parameter.
>
> In Windows, it will print *.*.
>
> In Linux, if you have 273 files in the current directory, if will print
> the name of the first, and there will be /272 further command
> parameters/, each the name of a file. (I couldn't believe this when I
> found out; one of my directories recently had 3.4 million files in it, I
> don't really want *.* expanded to 3.4m arguments. Here, the fix is again
> to use double quotes: "*.*". But what if the user doesn't do that?)

If the user doesn't escape the wildcard, then the shell will expand it,
exactly as the user would expect.

I'm not sure why you were surprised by that. * is a shell wildcard. By using
a * you are explicitly telling the shell to expand it to any file that
matches. Did you think it was a regular character like 'a' and 'z'?

I think it boils down to what the user expects. Linux and Unix users tend to
be technically-minded folks who use the command line a lot and demand
powerful tools, and they expect that wildcards like * should be expanded.
Windows treats the command line as an afterthought, and until a few years
ago you were limited to a DOS shell. Today, your options are not as
limited: there's Powershell, and bash for Windows.

Chris Angelico

unread,
Dec 4, 2016, 6:27:14 PM12/4/16
to
On Mon, Dec 5, 2016 at 9:52 AM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
> I'm not sure how to interpret this error, so I'm guessing. Please correct me
> if I'm wrong, but doesn't this mean that your column is called:
>
> single quote R single quote
>
> that is, literally 'R', which means that if you were using it in Python
> code, you would have to write the column name as this?
>
> "'R'"
>

AIUI this is meant to be a string literal, which in SQL is surrounded
by single quotes. This also means that anyone who's using this script
needs to be comfortable with writing raw SQL; plus, there's no
protection against SQL injection, so anyone using the script has to
have full database permission. The best solution might well be to
change the protocol somewhat: instead of taking raw SQL on the command
line, take "column=value", parse that in Python, and provide the value
as a string (or maybe as "int if all digits else string").

ChrisA

MRAB

unread,
Dec 4, 2016, 6:48:53 PM12/4/16
to
On 2016-12-04 22:52, Steve D'Aprano wrote:
> On Mon, 5 Dec 2016 07:26 am, DFS wrote:
>
>> $python program.py column1=2174 and column2='R'
>
> Here is a simple script demonstrating the issue:
>
>
> # --- program.py ---
> import sys
> print "argv:", sys.argv
> print ' '.join(sys.argv[1:])
>
>
>
> I haven't tested it on Windows, but on Linux it behaves as you describe (and
> as Linux users will expect):
>
> [steve@ando ~]$ python program.py column1=2174 and column2='R'
> argv: ['program.py', 'column1=2174', 'and', 'column2=R']
> column1=2174 and column2=R
>
[snip]

On Windows:

py.exe program.py column1=2174 and column2='R'

gives:

argv: ['program.py', 'column1=2174', 'and', "column2='R'"]
column1=2174 and column2='R'

and:

py.exe program.py column1=2174 and column2="R"

gives:

eryk sun

unread,
Dec 4, 2016, 7:10:03 PM12/4/16
to
On Sun, Dec 4, 2016 at 10:19 PM, BartC <b...@freeuk.com> wrote:
>
> Command parameters /do/ behave differently between Windows and Linux, for
> example try writing *.* as that third parameter.
>
> In Windows, it will print *.*.

In Windows each program parses its own command line. Most C/C++
programs use the CRT's default [w]argv parsing. The CRT defaults to
disabling wildcard expansion. Link with [w]setargv.obj to enable this
feature:

https://msdn.microsoft.com/en-us/library/8bch7bkk

The CRT [w]argv parsing doesn't care about single quotes, but literal
double quotes need to be escaped.

DFS

unread,
Dec 5, 2016, 12:52:53 AM12/5/16
to
On 12/04/2016 06:06 PM, Gregory Ewing wrote:
> Steve D'Aprano wrote:
>> On Mon, 5 Dec 2016 07:26 am, DFS wrote:
>>
>>> no such column: R
>>
>> doesn't this mean that your column is called:
>>
>> single quote R single quote
>
> I think he intends it to be an SQL string literal (which uses
> single quotes), but since the quotes disappeared, SQL is trying
> to interpret it as a column name.
>
>>> Edit: I got it to work this way:
>>> column2="'R'"
>>>
>>> but that's bogus, and I don't want users to have to do that.
>
> DFS: Is the argument always going to be a literal, or could
> the user sometimes want to pass a column name?


Not following you.

The list of args is turned into a WHERE clause.



> If it's always a literal, your program could add the quotes
> before passing it to SQL.
>
> If it's not always a literal, you'll just have to deal with
> the fact that the unix shell is a programming language that
> has its own interpretation of quotes. Either the user will
> have to escape the quotes, or you'll have to provide another
> way of distinguishing literals from column names.


BartC's suggestion of quoting the whole string works fine.

$python program.py "column1=2174 and column2='R'"

or I could do as you suggest and add them using code, if the app is
running on Linux.


Thanks




DFS

unread,
Dec 5, 2016, 1:20:31 AM12/5/16
to
On 12/04/2016 05:52 PM, Steve D'Aprano wrote:
> On Mon, 5 Dec 2016 07:26 am, DFS wrote:
>
>> $python program.py column1=2174 and column2='R'
>
> Here is a simple script demonstrating the issue:
>
>
> # --- program.py ---
> import sys
> print "argv:", sys.argv
> print ' '.join(sys.argv[1:])
>
>
>
> I haven't tested it on Windows, but on Linux it behaves as you describe (and
> as Linux users will expect):
>
> [steve@ando ~]$ python program.py column1=2174 and column2='R'
> argv: ['program.py', 'column1=2174', 'and', 'column2=R']
> column1=2174 and column2=R


'print sys.argv' is a handy feature (python is full of 'em).

I'll definitely get some use out of that.



>> It drops the apostrophes, and the subsequent db call throws an error:
>> sqlite3.OperationalError: no such column: R
>
> I'm not sure how to interpret this error, so I'm guessing. Please correct me
> if I'm wrong, but doesn't this mean that your column is called:
>
> single quote R single quote
>
> that is, literally 'R', which means that if you were using it in Python
> code, you would have to write the column name as this?
>
> "'R'"
>
> If so, perhaps the best solution is to get rid of the quotes in the column
> names, so that all your users, Windows and Linux, just write this:
>
> program.py column1=2174 and column2=R
>
>
> (P.S. what happens if they write
>
> program.py column1 = 2174 and column2 = R
>
> instead?)


On Linux:
sqlite3.OperationalError: no such column: R


But if I dbl-quote the whole thing:

"column1 = 2174 and column2 = 'R'"

it works fine, with the spaces.




>> Edit: I got it to work this way:
>> column2="'R'"
>>
>> but that's bogus, and I don't want users to have to do that.
>
> (1) It's not bogus.


It's extremely bogus. It's discarding part of my input.

The fix isn't so bad, though.



> (2) Linux users will expect that you have to escape quotation marks if you
> want to pass them through the shell.
>
> (3) If my interpretation is correct, what *is* bogus is that that your
> column names include quotes in them.


Of course not. I'm not sure I've ever seen quotes or apostrophes in a
column name.

And that's a pretty crazy interpretation that nothing in my post would
lead you to.


Steven D'Aprano

unread,
Dec 5, 2016, 3:33:21 AM12/5/16
to
On Monday 05 December 2016 17:20, DFS wrote:

>>> Edit: I got it to work this way:
>>> column2="'R'"
>>>
>>> but that's bogus, and I don't want users to have to do that.
>>
>> (1) It's not bogus.
>
>
> It's extremely bogus. It's discarding part of my input.

When you type

name = 'Fred'

do you expect that the first character of the variable name is a quotation
mark? No? Then why is it okay for Python to discard the quotation mark but not
bash or some other shell?

Shells don't just repeat the characters you type, they interpret them.
Characters like $ & * ? [ ] { } and others have special meaning to the shell,
as do quotation marks.


[...]
>> (3) If my interpretation is correct, what *is* bogus is that that your
>> column names include quotes in them.
>
>
> Of course not. I'm not sure I've ever seen quotes or apostrophes in a
> column name.

*shrug* It would be a pretty crazy thing to do, and I'm very glad to hear that
my wild guess as to what was going on was wrong.

> And that's a pretty crazy interpretation that nothing in my post would
> lead you to.

That's the curse of knowledge speaking: you're familiar enough with SQL that
you can no longer remember what it was like to be unfamiliar with it.

I'm not familiar with SQL syntax, and to me, the fact that SQL was complaining
when the quotation marks were missing from the column name suggested to me that
the quotes were part of the name. To me, the analogy that came to my mind was
the similar-looking error in the shell:


steve@runes:~$ ls "'foo'"
'foo'
steve@runes:~$ ls 'foo'
ls: cannot access foo: No such file or directory

The quotation marks are part of the filename, and so they need to be protected
from the shell or else you get an error quite similar to the one you got:

no such column: R


but there is (or so it appeared) a column 'R' (hence it looks like the quotes
are part of the column name).

Anyway, I'm glad you have a solution that satisfies you.



--
Steven
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." - Jon Ronson

BartC

unread,
Dec 5, 2016, 6:42:22 AM12/5/16
to
On 04/12/2016 23:25, Steve D'Aprano wrote:
> On Mon, 5 Dec 2016 09:19 am, BartC wrote:
>
>> Command parameters /do/ behave differently between Windows and Linux,
>> for example try writing *.* as that third parameter.
>>
>> In Windows, it will print *.*.
>>
>> In Linux, if you have 273 files in the current directory, if will print
>> the name of the first, and there will be /272 further command
>> parameters/, each the name of a file. (I couldn't believe this when I
>> found out; one of my directories recently had 3.4 million files in it, I
>> don't really want *.* expanded to 3.4m arguments. Here, the fix is again
>> to use double quotes: "*.*". But what if the user doesn't do that?)
>
> If the user doesn't escape the wildcard, then the shell will expand it,
> exactly as the user would expect.
>
> I'm not sure why you were surprised by that. * is a shell wildcard. By using
> a * you are explicitly telling the shell to expand it to any file that
> matches.

I don't know what a shell is. To me, it's some sort of user interface to
the OS. So if someone types:

> X A B C

You would expect X to be launched, and be given arguments A, B and C.
You wouldn't expect any of them to be expanded to some unknown number of
arguments.

In the same way that in code, you don't expect X(A,B,C) to be expanded
to X(A,B0,B1,B2,B3,B4,B5,....., C) if B happens to specify a slice.


> Did you think it was a regular character like 'a' and 'z'?

If one of the parameters was a regular expression, would you expect it
to be expanded to the entire possible set of inputs that match the
expression?

> I think it boils down to what the user expects. Linux and Unix users tend to
> be technically-minded folks who use the command line a lot and demand
> powerful tools, and they expect that wildcards like * should be expanded.

Which is dumb. How does the shell know exactly what I want to do with
*.* or f*.c or whatever? Perhaps I don't want the expanded list right
now (all the 3.4 million elements); perhaps I want to apply the same
pattern to several directories; perhaps I'm passing it on to another
program; perhaps I'm going to be writing it as another script; perhaps I
just want to print out the parameter list; perhaps I want to transform
or validate the pattern first; maybe I need to verify an earlier
parameter before applying the pattern or the way it's applied depends on
earlier arguments...

The input is a PATTERN; I want to process it, and apply it, myself.

And it doesn't work anyway; suppose I write:

>X A *.* C D

How does the program know when the expanded filenames of *.* end, and
the two extra parameters start? Remember it doesn't know there were four
parameters, all it seems is one linear stream of arguments. Any
structure the input may have had is lost.

What happens here:

>X *.a *.b *.c

It'll just get one big long list of files, not three separate sets.

As I said, it's dumb. And expecting users to stick quotes everywhere is
not a solution, because that's not going to happen.

> Windows treats the command line as an afterthought, and until a few years
> ago you were limited to a DOS shell. Today, your options are not as
> limited: there's Powershell, and bash for Windows.

At least Windows does it properly. It doesn't even chop the command line
into different parameters, making it considerably more flexible. (Unless
you have a program based on a C-style main(nargs,args) entry point where
the C runtime will do this for you.)

--
Bartc

Chris Angelico

unread,
Dec 5, 2016, 7:23:42 AM12/5/16
to
On Mon, Dec 5, 2016 at 10:42 PM, BartC <b...@freeuk.com> wrote:
> At least Windows does it properly. It doesn't even chop the command line
> into different parameters, making it considerably more flexible. (Unless you
> have a program based on a C-style main(nargs,args) entry point where the C
> runtime will do this for you.)

Yes, because there's no way that you can ever get security problems
from improperly parsing command-line arguments. That's why the
recommended way to create a subprocess is os.system(), not the Popen
calls that take a list of already-separated parameters. Right?

ChrisA

BartC

unread,
Dec 5, 2016, 9:11:34 AM12/5/16
to
On 05/12/2016 12:23, Chris Angelico wrote:
> On Mon, Dec 5, 2016 at 10:42 PM, BartC <b...@freeuk.com> wrote:
>> At least Windows does it properly. It doesn't even chop the command line
>> into different parameters, making it considerably more flexible. (Unless you
>> have a program based on a C-style main(nargs,args) entry point where the C
>> runtime will do this for you.)
>
> Yes, because there's no way that you can ever get security problems
> from improperly parsing command-line arguments.

And you will never get any problems if a program expects 3 parameters
but instead gets some arbitrary number of arguments, perhaps thousands,
if one happens to be *, including some that could coincide with some
actual meaningful input that the program recognises.

That's why the
> recommended way to create a subprocess is os.system(), not the Popen
> calls that take a list of already-separated parameters. Right?

And nothing will ever go wrong with incorrectly calling Popen that
takes, if I counted them correctly, up to 14 different parameters?

BTW what does Popen() do when one argument is '*.*'? Will that get
expanded to multiple extra arguments, and at what point will it be
expanded?

(I tried to test it, but:

import subprocess
subprocess.Popen("python")

didn't work under Linux: 'No such file or directory'. It works under
Windows but I wanted to see what it did with a parameter *.

Another difference.)

--
Bartc

Chris Angelico

unread,
Dec 5, 2016, 10:06:05 AM12/5/16
to
On Tue, Dec 6, 2016 at 1:11 AM, BartC <b...@freeuk.com> wrote:
>
> BTW what does Popen() do when one argument is '*.*'? Will that get expanded
> to multiple extra arguments, and at what point will it be expanded?

Nope. Popen is not a shell.

It sounds as if you want a nerfed shell. Go ahead! I'm sure one
exists. It'll frustrate you no end once you get used to a better
shell, though - always does when I find myself on Windows...

ChrisA

Paul Moore

unread,
Dec 5, 2016, 10:11:51 AM12/5/16
to
On Monday, 5 December 2016 14:11:34 UTC, BartC wrote:
> On 05/12/2016 12:23, Chris Angelico wrote:
> > On Mon, Dec 5, 2016 at 10:42 PM, BartC <b...@freeuk.com> wrote:
> >> At least Windows does it properly. It doesn't even chop the command line
> >> into different parameters, making it considerably more flexible. (Unless you
> >> have a program based on a C-style main(nargs,args) entry point where the C
> >> runtime will do this for you.)
> >
> > Yes, because there's no way that you can ever get security problems
> > from improperly parsing command-line arguments.
>
> And you will never get any problems if a program expects 3 parameters
> but instead gets some arbitrary number of arguments, perhaps thousands,
> if one happens to be *, including some that could coincide with some
> actual meaningful input that the program recognises.

Windows and Linux are different. Neither is unambiguously "right" nor is either unambiguously "wrong". In both cases you need to understand what happens when you type a command, or you *will* get caught out by corner cases.

Calling either approach "dumb" is neither helpful nor productive.

For this specific example, of a program that takes a fragment of SQL as its command line, is one that's very hard to handle cleanly in a cross-platform way, because you actually don't want the shell, or the application, to interpret your arguments for you. The "best" approach is often to accept the SQL command as a single argument (argv[1]) and rely on your users quoting the argument appropriately. Admittedly, that simply pushes the problem onto your users, who may well also be uncomfortable with the subtleties of command line parsing, but at least they are using their choice of shell, so they have a chance of knowing.

The alternative, if you *really* don't want to force your users to understand shell parsing, is to prompt the user for the SQL - either as a simple console input, or (for users who are really uncomfortable with the command line) via a GUI program and a dialog box.

But criticising the (entirely valid, simply different) choices of another OS vendor as "dumb" isn't acceptable, nor is it a way to get to a solution to your issue.

Paul

Paul Moore

unread,
Dec 5, 2016, 10:17:59 AM12/5/16
to
For a non-nerfed (but *radically* different to bash) Windows shell, try Powershell. You'll probably hate it, but not because it's limited in capabilities :-)

Paul

PS Apparently, powershell is now open source and available for Linux and OSX. See https://github.com/PowerShell/PowerShell - although I don't know if all the features available on Windows exist on other platforms (yet).

Chris Angelico

unread,
Dec 5, 2016, 10:31:05 AM12/5/16
to
On Tue, Dec 6, 2016 at 2:17 AM, Paul Moore <p.f....@gmail.com> wrote:
> For a non-nerfed (but *radically* different to bash) Windows shell, try Powershell. You'll probably hate it, but not because it's limited in capabilities :-)
>

Radically different from every shell I've ever called a shell. It
looks and feels more like a scripting language than a shell. Here's a
Windows batch script I put together that uses PS to do some of its
work:

https://github.com/Rosuav/Gypsum/blob/master/get_gypsum.bat

Granted, that was put together simply by searching Stack Overflow for
"how do I do *this* in PowerShell", so there might be other ways to do
it; but it feels to me like a VBScript interpreter, not a shell.

But hey. It exists by default on recent-enough Windowses, has the
features I want, and can be invoked by someone double-clicking on a
batch file. If you call it "flurgle", it won't make any difference to
me or to my work.

ChrisA

BartC

unread,
Dec 5, 2016, 10:41:59 AM12/5/16
to
That's not the point I was making.

Say you have this program a.py:

import sys
print (sys.argv)

And let's say there are just 3 files in the current directory: a.py,
b.py and c.py.

If run from a Linux shell:

python a.py *

The output is: ['a.py', 'b.py', 'c.py'] or something along those lines
(there might be two copies of a.py).

Are you saying that if someone executes:

subprocess.Popen(["python","a.py", "*"])

the output will be: ['a.py','*']?

In that case forget Windows vs. Linux, you now have a program that will
get command parameters processed differently depending on whether it was
invoked from a shell or not.

Or a program that sometimes will see "*" as an argument, and sometimes a
big list of files that merges into all the other arguments.

--
bartc

Chris Angelico

unread,
Dec 5, 2016, 10:54:15 AM12/5/16
to
On Tue, Dec 6, 2016 at 2:41 AM, BartC <b...@freeuk.com> wrote:
>
> Are you saying that if someone executes:
>
> subprocess.Popen(["python","a.py", "*"])
>
> the output will be: ['a.py','*']?
>
> In that case forget Windows vs. Linux, you now have a program that will get
> command parameters processed differently depending on whether it was invoked
> from a shell or not.

Yes, that is correct. *SHELLS DO STUFF*. If you can't comprehend this,
you should get to know your shell. Try this:

python a.py %PATH%

subprocess.Popen(["python", "a.py", "%PATH%"])

Would you expect those to do the same? If you do, prepare for Windows
to surprise you. If you don't, why do you keep expecting other shells
to do nothing?

ChrisA

Paul Moore

unread,
Dec 5, 2016, 11:04:10 AM12/5/16
to
Python programs, when started, get a list of arguments via sys.argv.

1. On Linux, the OS primitive for executing a program takes a list of arguments and passes them directly. The user's shell is responsible for splitting a command line into "arguments".
2. On Windows, the OS primitive takes a command line. The application is responsible for splitting it into arguments, if it wants to. Most do, for compatibility with the normal argv convention inherited via C from Unix. Many programs let the C runtime do that splitting for them - I don't recall if Python does, or if it implements its own splitting these days.
3. The Popen class directly passes the supplied argv to the called program. (Technically, it has to do some nasty internal juggling to preserve the argv, but you don't need to care about that).

The program always gets a list of arguments. What *provides* that list to it (the Unix shell, the C/Python runtime, or the caller of Popen) varies. And you need (in more complex cases) to understand how the calling environment constructs that list of arguments if you want to reason about behaviour.

Paul

DFS

unread,
Dec 5, 2016, 11:14:27 AM12/5/16
to
conemu

BartC

unread,
Dec 5, 2016, 11:23:52 AM12/5/16
to
You still don't get point. I write a program P, a native executable. It
takes command line parameters but exactly what it gets depends on
whether it's started from a 'shell' or from inside another program? I
don't want to worry about that stuff or exactly how it is invoked!

> subprocess.Popen(["python", "a.py", "%PATH%"])

Yes, %...% is an escape sequence. Those % signs are supposed to stand
out and have been chosen not to clash with typical input.

And the end result of the transformation is, here, also a SINGLE thing;
any parameter following will still be the second parameter, not the 14771th!

Are you saying that the * in ABC*EF also makes the whole thing some
escape pattern? And one that could massively expand the number of
parameters, pushing all the following ones out of the way, and making it
impossible to discover where these expanded parameters end and the
normal ones recommence.

If someone had thought this up now, it would rapidly be dismissed as
being unworkable. But because it's been in Unix/Linux/whatever so long,
no one can see anything wrong with it!

--
Bartc

DFS

unread,
Dec 5, 2016, 11:38:26 AM12/5/16
to
It has aliases you can use to make Linux "people" feel right at home.


https://blogs.technet.microsoft.com/heyscriptingguy/2015/06/11/table-of-basic-powershell-commands/

Chris Angelico

unread,
Dec 5, 2016, 11:39:58 AM12/5/16
to
On Tue, Dec 6, 2016 at 3:23 AM, BartC <b...@freeuk.com> wrote:
> You still don't get point. I write a program P, a native executable. It
> takes command line parameters but exactly what it gets depends on whether
> it's started from a 'shell' or from inside another program? I don't want to
> worry about that stuff or exactly how it is invoked!
>
>> subprocess.Popen(["python", "a.py", "%PATH%"])
>
> Yes, %...% is an escape sequence. Those % signs are supposed to stand out
> and have been chosen not to clash with typical input.
>
> And the end result of the transformation is, here, also a SINGLE thing; any
> parameter following will still be the second parameter, not the 14771th!
>
> Are you saying that the * in ABC*EF also makes the whole thing some escape
> pattern? And one that could massively expand the number of parameters,
> pushing all the following ones out of the way, and making it impossible to
> discover where these expanded parameters end and the normal ones recommence.

Yes. That is exactly what I am saying. Also, try this:

set ENV_VAR=test1 test2
python a.py arg1 %ENV_VAR% arg2

How many args do you get? Is the end result really a single thing? Is
that single-thing-ness consistent across applications?

Windows's cmd.exe is not as simple as you think it is. Linux's bash is
not as insane as you think it is. Both of them take user input,
*MANIPULATE IT*, and pass it along. Keep on making these assumptions
and we will keep on proving them false.

ChrisA

Lew Pitcher

unread,
Dec 5, 2016, 11:41:39 AM12/5/16
to
On Monday December 5 2016 10:41, in comp.lang.python, "BartC" <b...@freeuk.com>
wrote:

> On 05/12/2016 15:05, Chris Angelico wrote:
>> On Tue, Dec 6, 2016 at 1:11 AM, BartC <b...@freeuk.com> wrote:
>>>
>>> BTW what does Popen() do when one argument is '*.*'? Will that get
>>> expanded to multiple extra arguments, and at what point will it be
>>> expanded?
>>
>> Nope. Popen is not a shell.
>>
>> It sounds as if you want a nerfed shell. Go ahead! I'm sure one
>> exists. It'll frustrate you no end once you get used to a better
>> shell, though - always does when I find myself on Windows...
>
> That's not the point I was making.
>
> Say you have this program a.py:
>
> import sys
> print (sys.argv)
>
> And let's say there are just 3 files in the current directory: a.py,
> b.py and c.py.
>
> If run from a Linux shell:
>
> python a.py *
>
> The output is: ['a.py', 'b.py', 'c.py'] or something along those lines
> (there might be two copies of a.py).

And, that's because, before invoking Popen, the SHELL has globbed that '*'
argument into the three filenames and substituted those names where the '*'
was.

If you don't use a shell, then (on Unix), you have to perform the globbing
yourself before invoking Popen.

>
> Are you saying that if someone executes:
>
> subprocess.Popen(["python","a.py", "*"])
>
> the output will be: ['a.py','*']?
>
> In that case forget Windows vs. Linux, you now have a program that will
> get command parameters processed differently depending on whether it was
> invoked from a shell or not.

Yes.

> Or a program that sometimes will see "*" as an argument, and sometimes a
> big list of files that merges into all the other arguments.
>

Yes.

--
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

Steve D'Aprano

unread,
Dec 5, 2016, 11:49:33 AM12/5/16
to
On Mon, 5 Dec 2016 10:42 pm, BartC wrote:

> I don't know what a shell is. To me, it's some sort of user interface to
> the OS.

https://en.wikipedia.org/wiki/Unix_shell

You've never used cmd.com or command.exe? "The DOS prompt"? That's
(effectively) a shell.

Pedants may wish to explain exactly why the DOS prompt isn't a shell but to
a first approximation I think its close enough.

And yes, that's exactly what it is: its a text-based user interface to the
OS. And like any user-interface, designers can choose different user
interfaces, which will vary in power and convenience. And in the Unix/Linux
world, the command shell is not just a text interface, its a powerful
command interpreter and programming language.



> So if someone types:
>
> > X A B C
>
> You would expect X to be launched, and be given arguments A, B and C.

Would I? I don't think so.

Even the DOS prompt supports some level of globbing. Its been a while since
I've used the DOS prompt in anger, but I seem to recall being able to do
things like:

dir a*

to get a listing of all the files starting with "a". So *something* is
treating the * as a special character. In Linux and Unix, that's invariably
the shell, before the dir command even sees what you typed.

In DOS, it might be the dir command itself. The disadvantage of the DOS way
of doing this is that *every single command and application* has to
re-implement its own globbing, very possibly inconsistently. That's a lot
of duplicated work and re-inventing the wheel, and the user will never know
what

some_program a*

will do. Will it operate on all the files in the current directory that
start with "a"? Or will it try to operate on a single file called
literally "a*"? Which of course won't exist because * is a reserved
character on DOS/Windows file systems. You can't know ahead of time unless
you study the manual to see what metacharacters this specific command
supports.

The Unix way is far more consistent: applications typically don't have to
care about globbing, because the shell handles glob expansion, environment
variables, etc.

[Aside: if you take the big picture, the Unix/Linux way is probably LESS
consistent, because you can have any number of shells (sh, ash, bash, csh,
tcsh, dash, hush, zsh, and I expect many more). But in practice, there's
one lowest-common-denominator standard (sh) and one major de facto standard
(bash), and most of the shells are supersets of the original sh, so simple
things like wildcards behave in pretty similar ways.]

The downside of this is that if you don't want metacharacters expanded, you
have to tell the shell to ignore it. The easiest way is to escape them with
a backslash, or quote the string. But of course this being Unix, its
completely configurable (using an obscure syntax that's different for every
shell):

http://stackoverflow.com/questions/11456403/stop-shell-wildcard-character-expansion



> You wouldn't expect any of them to be expanded to some unknown number of
> arguments.

Actually yes I would. If they could be interpreted as file names with
globbing or environment variables, that's exactly what I would expect.

Even at the DOS prompt.

And I'd be completely gobsmacked if (say) the dir command understood the ?
metacharacter but the cp command didn't.


> In the same way that in code, you don't expect X(A,B,C) to be expanded
> to X(A,B0,B1,B2,B3,B4,B5,....., C) if B happens to specify a slice.

In Python? No, I wouldn't expect that. Python's not a shell, and the design
is different. In Python, you have to use the metacharacter * to expand a
single argument into multiple arguments.



>> Did you think it was a regular character like 'a' and 'z'?
>
> If one of the parameters was a regular expression, would you expect it
> to be expanded to the entire possible set of inputs that match the
> expression?

No, because Unix shells use globs, not regexes. Just like the DOS prompt.
Globs are simpler and require less typing, something system administrators
appreciate because (unlike programmers) interactive commands are written
far more than they are read, so brevity is appreciated.

(That doesn't excuse the umount command though. Really, you couldn't include
the "n"?)

So yes, I would expect that if I said

dir a*

I would get a listing of all the files starting with "a", not just the
single file called literally "a*".



>> I think it boils down to what the user expects. Linux and Unix users tend
>> to be technically-minded folks who use the command line a lot and demand
>> powerful tools, and they expect that wildcards like * should be expanded.
>
> Which is dumb. How does the shell know exactly what I want to do with
> *.* or f*.c or whatever? Perhaps I don't want the expanded list right
> now (all the 3.4 million elements);

Sure, no problem. Just escape the metacharacters so they aren't expanded,
just as in Python if you want the literal string backslash n you can
write "\\n" to escape the backslash metacharacter.


> perhaps I want to apply the same pattern to several directories;

ls {foo,bar,baz/foobar}/*.*


is equivalent to:

ls foo/*.* bar/*.* baz/foobar/*.*


And if I want a file that is *literally* called star dot star from any of
those directories:

ls {foo,bar,baz/foobar}/"*.*"



> perhaps I'm passing it on to another
> program; perhaps I'm going to be writing it as another script; perhaps I
> just want to print out the parameter list; perhaps I want to transform
> or validate the pattern first; maybe I need to verify an earlier
> parameter before applying the pattern or the way it's applied depends on
> earlier arguments...

Fine. Just escape the damn thing and do whatever you like to it.



> The input is a PATTERN; I want to process it, and apply it, myself.

When you double-click on a .doc file, and Windows launches Word and opens
the file for editing, do you rant and yell that you didn't want Word to
open the file, you wanted to process the file name yourself?

Probably not. You probably consider it perfectly normal and unobjectionable
that when you double click a file, Windows:

- locates the file association for that file;
- launches that application (if its not already running);
- instructs the application to open the file;

(and whatever else needs to be done). That's what double clicking is *for*,
and if you don't like it, you shouldn't be using Windows.

Well, in the Unix world, the shells are designed for the benefit of system
administrators who are *mostly* dealing with file names. Regardless of
what you think of it, they *want* this behaviour. For forty plus years,
Unix system administrators have been extremely productive using this model,
and Microsoft has even brought the Linux bash shell to Windows.

The bottom line is this:

In your application, if you receive an argument *.*, what do you do with it?
You probably expand it yourself. How does the application know when not to
expand the wild cards? You need to support some sort of command-line switch
to turn it off, but that will effect the entire command line. So you need
some sort of escaping mechanism so that you can pass

myprogram *.* *.*

and have the first *.* expanded but not the second. (For example.)

Congratulations, you've just re-invented your own mini-shell.

Marko Rauhamaa

unread,
Dec 5, 2016, 11:54:04 AM12/5/16
to
Chris Angelico <ros...@gmail.com>:

> On Tue, Dec 6, 2016 at 2:17 AM, Paul Moore <p.f....@gmail.com> wrote:
>> For a non-nerfed (but *radically* different to bash) Windows shell,
>> try Powershell. You'll probably hate it, but not because it's limited
>> in capabilities :-)
>
> Radically different from every shell I've ever called a shell. It
> looks and feels more like a scripting language than a shell.

In recent years, I've been disillusioned with bash and started using
Python more and more where I would previously have used bash. Python's
explicit syntax does automatically give you a level of security, but I
must say the subprocess.Popen.communicate syntax is painful as hell.
Your traditional one-liners turn into five-liners, and a casual observer
will have a slightly hard time understanding what's going on.


Marko

Chris Angelico

unread,
Dec 5, 2016, 12:06:34 PM12/5/16
to
On Tue, Dec 6, 2016 at 3:53 AM, Marko Rauhamaa <ma...@pacujo.net> wrote:
> In recent years, I've been disillusioned with bash and started using
> Python more and more where I would previously have used bash. Python's
> explicit syntax does automatically give you a level of security, but I
> must say the subprocess.Popen.communicate syntax is painful as hell.
> Your traditional one-liners turn into five-liners, and a casual observer
> will have a slightly hard time understanding what's going on.

Congratulations. You've just discovered why bash is useful.

ChrisA

Steve D'Aprano

unread,
Dec 5, 2016, 12:12:19 PM12/5/16
to
On Tue, 6 Dec 2016 02:41 am, BartC wrote:

> In that case forget Windows vs. Linux, you now have a program that will
> get command parameters processed differently depending on whether it was
> invoked from a shell or not.

Er, yeah? You say that as if it were a bad thing.

Look at it this way. Suppose you were a really cunning system administrator
who spent 8 or 10 hours a day typing in commands and arguments that were
frequently file names. So you built yourself a macro system where you type
a command:

foo bar baz

and the macro would walk through the command line, automatically expanding
environment variables and file name globs. And because you used this 95% of
the time, and it was intended as a convenience for interactive use, you
linked the macro to the Enter key, so that only a single key press was
needed to do this processing.

But of course you're not an idiot, you know full well that there are
occasions where you want to avoid the expansion. So you include commands to
disable and enable that macro, and to give you even more fine-grained
control of what is expanded where, you build it escaping mechanisms so that
you can expand part of the command line and not other parts.

And then I come along, full of righteous indignation, and start yelling

"Wait a minute, now your application gets its command parameters processed
differently depend on whether you called it from C or used your macro!"

And you would answer:

"Of course it does, that's the whole bloody point of the macro!"

(By the way, I think that you will find that when you call Popen, if you set
shell=True it will be invoked in a subshell, which means you'll get the
full shell experience including command expansion. For good and evil.
That's a security risk, if you're getting the arguments from an untrusted
source, so don't pass shell=True unless you know what you're doing.)

Skip Montanaro

unread,
Dec 5, 2016, 12:14:27 PM12/5/16
to
On Mon, Dec 5, 2016 at 10:49 AM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
>
> In DOS, it might be the dir command itself. The disadvantage of the DOS way
> of doing this is that *every single command and application* has to
> re-implement its own globbing, very possibly inconsistently. That's a lot
> of duplicated work and re-inventing the wheel, and the user will never know
> what
>
> some_program a*
>
> will do.

ISTR that the way DOS/Windows operate at the text prompt level was
modeled on VMS. As you indicated, each command was responsible for its
own "globbing". I've never programmed in DOS or Windows, and its been
decades since I programmed in VMS, but I imagine that both
environments probably provide some standard sort of globbing library.

On an only peripherally related note, I was reminded this morning of
how some/many GUI environments try to protect people from themselves.
I am just now coming up to speed in a new job which provides me with a
Windows window onto an otherwise Linux development environment. I
tried starting the X server this morning (something called Xming), and
it complained about not being able to write its log file (I suspect
Xming was alread. I tried to navigate to that location through the
Computer doohickey (Explorer?) but couldn't get there. The program (or
more likely the program's programmers) had decided that I had no
business "exploring" into my AppData folder. To get there, I had to
drop into a command prompt.

So, another vote for a text/shell interface. It gives you enough rope
to hang yourself, but assumes you won't, because, "we're all adults
here." I do understand why Windows hides stuff from users in the GUI
though. As a webm...@python.org monitor, I can attest to the
relatively large number of questions received there asking about
removing Python "because I don't use it for anything." :-) This
started happening about the time the long defunct Compaq started to
write admin tools for Windows in Python.

Skip

Marko Rauhamaa

unread,
Dec 5, 2016, 12:23:45 PM12/5/16
to
Chris Angelico <ros...@gmail.com>:
Bash is nice, too nice. It makes it easy to write code that's riddled
with security holes. The glorious Unix tradition is to ignore the
pitfalls and forge ahead come what may.


Marko

Chris Angelico

unread,
Dec 5, 2016, 12:35:29 PM12/5/16
to
Bash assumes that the person typing commands has the full power to
execute commands. I'm not sure what you mean by "security holes",
unless it's passing text through bash that came from people who aren't
allowed to type commands.

ChrisA

DFS

unread,
Dec 5, 2016, 12:56:20 PM12/5/16
to
On 12/05/2016 03:33 AM, Steven D'Aprano wrote:
> On Monday 05 December 2016 17:20, DFS wrote:
>
>>>> Edit: I got it to work this way:
>>>> column2="'R'"
>>>>
>>>> but that's bogus, and I don't want users to have to do that.
>>>
>>> (1) It's not bogus.
>>
>>
>> It's extremely bogus. It's discarding part of my input.
>
> When you type
>
> name = 'Fred'
>
> do you expect that the first character of the variable name is a quotation
> mark? No?

eh? In that example the first character of the var name is n.


> Then why is it okay for Python to discard the quotation mark but not
> bash or some other shell?


$python
'>>> name = 'Fred'
'>>> print name
Fred
'>>> name = "Fred"
'>>> print name
Fred
'>>> name = """Fred"""
'>>> print name
Fred


Point taken. In that situation, python discards the quotes, just as
bash does. The difference is bash is handing off the 'interpreted' data
to another program, while python isn't.





> Shells don't just repeat the characters you type, they interpret them.

Yes, I see that now. I still don't think bash/shell should alter the
input going to another program.



> Characters like $ & * ? [ ] { } and others have special meaning to the shell,
> as do quotation marks.
>
>
> [...]
>>> (3) If my interpretation is correct, what *is* bogus is that that your
>>> column names include quotes in them.
>>
>>
>> Of course not. I'm not sure I've ever seen quotes or apostrophes in a
>> column name.
>
> *shrug* It would be a pretty crazy thing to do, and I'm very glad to hear that
> my wild guess as to what was going on was wrong.
>
>> And that's a pretty crazy interpretation that nothing in my post would
>> lead you to.
>
> That's the curse of knowledge speaking: you're familiar enough with SQL that
> you can no longer remember what it was like to be unfamiliar with it.
>
> I'm not familiar with SQL syntax, and to me, the fact that SQL was complaining
> when the quotation marks were missing from the column name


I think you're still not getting it.

There were no quotes missing from the column name, either before or
after the shell changed ("interpreted") my input.

You know enough about SQL to understand a simple query, right?

SELECT [rows and columns]
FROM [table]
WHERE [conditions]

In real life, in my program, the command line input becomes part of the
WHERE clause.

$python pyprogram.py column1=2174 and column2='R'

SELECT [rows and columns]
FROM [table]
WHERE column1=2174 and column2='R'

It means retrieve the rows and columns of data where the value stored in
column1 = 2174 and the value stored in column2 = 'R'.

The quotes ('R') signify to the database that the data stored in the
column is text data.

You can think of it like a spreadsheet:
Row ColumnA ColumnB
1 2171 A
2 2172 A
3 2174 A
4 2174 R
5 2174 R
6 2174 Z
7 2175 A

The intersection of "columnA=2174 and columnB='R'" is rows 4-5.

The actual column names are different of course. Here's what it looks
like in real life.

http://imgur.com/a/CNnJ4


By dropping the apostrophes, bash has changed the SQL statement and
sqlite interprets the changed statement as if I'm looking for a match
between the values in column2 and the values in a column named R. Since
column R doesn't exist it errors out.

If I /did/ have a column named R, the incorrect SQL statement would've
executed and returned data I wasn't looking for.




> suggested to me that
> the quotes were part of the name. To me, the analogy that came to my mind was
> the similar-looking error in the shell:
>
>
> steve@runes:~$ ls "'foo'"
> 'foo'
> steve@runes:~$ ls 'foo'
> ls: cannot access foo: No such file or directory
>
> The quotation marks are part of the filename, and so they need to be protected
> from the shell or else you get an error quite similar to the one you got:
>
> no such column: R
>
>
> but there is (or so it appeared) a column 'R' (hence it looks like the quotes
> are part of the column name).
>
> Anyway, I'm glad you have a solution that satisfies you.


Yes. Surrounding the whole criteria in dbl-quotes works fine and isn't
hard to explain.

And I want to say thanks to you and all you guys who spend hours
providing detailed, knowledgeable answers to questions here on clp.
Most people don't think about the time and effort you put into it, but
15-20 years ago I used to answer a lot of posts on
comp.databases.ms-access, so I do.


BartC

unread,
Dec 5, 2016, 1:02:29 PM12/5/16
to
On 05/12/2016 16:49, Steve D'Aprano wrote:
> On Mon, 5 Dec 2016 10:42 pm, BartC wrote:

>> So if someone types:
>>
>> > X A B C
>>
>> You would expect X to be launched, and be given arguments A, B and C.
>
> Would I? I don't think so.
>
> Even the DOS prompt supports some level of globbing. Its been a while since
> I've used the DOS prompt in anger, but I seem to recall being able to do
> things like:
>
> dir a*
>
> to get a listing of all the files starting with "a". So *something* is
> treating the * as a special character. In Linux and Unix, that's invariably
> the shell, before the dir command even sees what you typed.
>
> In DOS, it might be the dir command itself.

Yes, it has to be, as there is probably no space to first construct an
in-memory list of all the files.

The disadvantage of the DOS way
> of doing this is that *every single command and application* has to
> re-implement its own globbing, very possibly inconsistently. That's a lot
> of duplicated work and re-inventing the wheel,

Which will need to be done anyway. Expansion of filespecs with wildcards
may need to be done from inside a program.

On Windows that involves calling FindFirstFile/FindNextFile (which takes
care of wildcards for you), and on Linux it might be opendir/readdir
(which doesn't; you have to call fnmatch to accept/reject each file).

(I had to port such code recently across OSes for my language; on both
systems, dirlist(filespec) returns a list of files matching the wildcard
specification provided. No shell expansion is needed!)

> The Unix way is far more consistent: applications typically don't have to
> care about globbing, because the shell handles glob expansion, environment
> variables, etc.

It's not consistent because program P will sometimes see * and sometimes
a list of files. On Windows P will /never/ see a list of files if the
start point is *. Not without a lot of intervention.

And I've already posted a long list of reasons why Linux shell's
behaviour is undesirable, as you want to literal argument, or you want
to do something with the filespec that is different from what the shell
will do, or you want to do it later (perhaps the files question DON'T
EXIST until halfway through the program).

OK I'm just thinking up more reasons why I don't like Linux shell's
behaviour.

>> You wouldn't expect any of them to be expanded to some unknown number of
>> arguments.
>
> Actually yes I would. If they could be interpreted as file names with
> globbing or environment variables, that's exactly what I would expect.

If the syntax is:

program filespec

or:

program filespec file

how do you tell whether the last file in an argument list is the
optional 'file', or the last file of the expansion of 'filespec'?

> So yes, I would expect that if I said
>
> dir a*
>
> I would get a listing of all the files starting with "a", not just the
> single file called literally "a*".

So what does 'dir' do then; just print?

Since it sounds like:

echo *.*

would do the job just as well! (If 'echo' is a program that just lists
its input.)

> Fine. Just escape the damn thing and do whatever you like to it.

I'm not the one running the program. Perhaps you don't know how stupid
users can be.

>> The input is a PATTERN; I want to process it, and apply it, myself.
>
> When you double-click on a .doc file,

Are we still talking about a console or terminal here? Any filenames
displayed as part of the dialogue (result of ls or dir for example) are
usually not clickable.

> and Windows launches Word and opens
> the file for editing, do you rant and yell that you didn't want Word to
> open the file, you wanted to process the file name yourself?

GUIs are different.

--
Bartc

Chris Angelico

unread,
Dec 5, 2016, 1:21:57 PM12/5/16
to
On Tue, Dec 6, 2016 at 5:02 AM, BartC <b...@freeuk.com> wrote:
> If the syntax is:
>
> program filespec
>
> or:
>
> program filespec file
>
> how do you tell whether the last file in an argument list is the optional
> 'file', or the last file of the expansion of 'filespec'?

Why should you care? I have used shell globbing to pass precisely two
parameters to a program. More often, I use this syntax, which Windows
simply doesn't support:

ffmpeg -i some-really-long-file-name.{avi,mkv}

to convert a file from one format to another. And if the .mkv file
already exists, I can just use ".*" at the end, although that does
depend on them being in alphabetical order. The shell is predictable
and therefore useful. This trick is guaranteed to work, no matter what
program I'm using. On Windows, I have to hope that the program expands
these notations correctly.

ChrisA

DFS

unread,
Dec 5, 2016, 1:23:17 PM12/5/16
to
On 12/04/2016 06:26 PM, Chris Angelico wrote:
> On Mon, Dec 5, 2016 at 9:52 AM, Steve D'Aprano
> <steve+...@pearwood.info> wrote:
>> I'm not sure how to interpret this error, so I'm guessing. Please correct me
>> if I'm wrong, but doesn't this mean that your column is called:
>>
>> single quote R single quote
>>
>> that is, literally 'R', which means that if you were using it in Python
>> code, you would have to write the column name as this?
>>
>> "'R'"
>>
>
> AIUI this is meant to be a string literal, which in SQL is surrounded
> by single quotes. This also means that anyone who's using this script
> needs to be comfortable with writing raw SQL; plus, there's no
> protection against SQL injection

Got an example of a SQL injection attack that would let a non-privileged
user do anything he's not supposed to?


> , so anyone using the script has to
> have full database permission.


Not sure what you mean.

This is a sqlite hack system for a few users, but if it were distributed
I would use something like Oracle and create ROLEs and GRANTs.

CREATE ROLE ROLE_READONLY;
GRANT SELECT ON SCHEMA.TABLE TO ROLE_READONLY;
GRANT ROLE_READONLY TO ANGELICC;
GRANT ROLE_READONLY TO DAPRANOS;



> The best solution might well be to
> change the protocol somewhat: instead of taking raw SQL on the command
> line, take "column=value", parse that in Python, and provide the value
> as a string (or maybe as "int if all digits else string").

?

"column=value" is SQL.



eryk sun

unread,
Dec 5, 2016, 1:35:57 PM12/5/16
to
On Mon, Dec 5, 2016 at 3:41 PM, BartC <b...@freeuk.com> wrote:
>
> Are you saying that if someone executes:
>
> subprocess.Popen(["python","a.py", "*"])
>
> the output will be: ['a.py','*']?
>
> In that case forget Windows vs. Linux, you now have a program that will get
> command parameters processed differently depending on whether it was invoked
> from a shell or not.
>
> Or a program that sometimes will see "*" as an argument, and sometimes a big
> list of files that merges into all the other arguments.

If it sees "*", it will try to open a file named "*". That's a valid
filename in Unix, but it should be avoided. We wouldn't want someone
to accidentally run `rm *` instead of `rm \*`.

In Windows, it's invalid for filenames to include wildcard characters
(i.e. '*' and '?' as well as the MS-DOS compatibility wildcards '<',
'>', and '"' -- DOS_STAR, DOS_QM, and DOS_DOT). Since there's no
ambiguity of intent, if you've linked an executable with
[w]setargv.obj, the C runtime will happily expand "*" in the argv
list.

I don't understand your concern regarding Unix shell scripting syntax.
On Windows, are you as troubled that environment variables such as
%USERNAME% get expanded by cmd.exe, but not by CreateProcess? Does it
bother you that cmd consumes its "^" escape character if it's not
escaped as "^^"? For example:

C:\>python.exe -i -c "" remove: ^ keep: ^^
>>> import win32api
>>> win32api.GetCommandLine()
'python.exe -i -c "" remove: keep: ^'

Every command-line shell that I've ever used is also a quirky
scripting language. Shell literacy requires learning at least the
syntax and operators of the language.

BartC

unread,
Dec 5, 2016, 1:50:44 PM12/5/16
to
On 05/12/2016 17:46, Dennis Lee Bieber wrote:
> On Mon, 5 Dec 2016 11:42:08 +0000, BartC <b...@freeuk.com> declaimed the
> following:
>
>
>> And it doesn't work anyway; suppose I write:
>>
>> >X A *.* C D
>>
>> How does the program know when the expanded filenames of *.* end, and
>> the two extra parameters start? Remember it doesn't know there were four
>> parameters, all it seems is one linear stream of arguments. Any
>> structure the input may have had is lost.
>>
> And just what ARE A, C, and D?

It doesn't matter, and is not the concern of the shell. It should
restrict itself to the basic parsing that may be necessary when
parameters are separated by white-space and commas, if a parameter can
contain white-space or commas. That usually involves surrounding the
parameter with quotes.

One would be very annoyed if, reading a CSV file, where each of N values
on a line correspond to a field of record, if one entry of "?LBC"
expanded itself to a dozen entries, screwing everything up.

Shell command line processing shouldn't be attempting anything more than
that.

> If they are command options, on a Linux shell options appear as
>
> X -A *.* -C -D
>
> Even Windows command processor separates optional/command stuff via
>
> X /A *.* /C /D
>
> ... or requires the parameter defining the file(s) to process as the last
> arguments

Nonsense. If I pass that to a Python program that prints sys.argv, I get:

['a.py', '/A', '*.*', '/C', '/D']

Presumably a Windows programmer is grown up enough to make their own
decisions as to what to do with that input.

All that's needed is one little standard library function to process
sys.argc, with unexpanded parameters, into a list of expanded arguments,
if that's really what someone wants (and then needing to trawl through
it all looking for the options).

Then you get the best of both worlds.


> X A C D *.*

So how do I do:

gcc *.c -lm

The -lm needs to go at the end.

Presumably it now needs to check each parameter seeing if it resembles a
file more than it does an option? And are options automatically taken
care of, or is that something that, unlike the easier wildcards, need to
be processed explicitly?

--
Bartc

Gene Heskett

unread,
Dec 5, 2016, 1:51:47 PM12/5/16
to
I like bash, use it a lot. But the ultimate language (damn the security
considerations real or imagined) is still ARexx. The *nixes have not, 20
years later, a do it all language that can play on the same field with
ARexx.

Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

eryk sun

unread,
Dec 5, 2016, 1:55:25 PM12/5/16
to
On Mon, Dec 5, 2016 at 4:03 PM, Paul Moore <p.f....@gmail.com> wrote:
> 2. On Windows, the OS primitive takes a command line. The application is
> responsible for splitting it into arguments, if it wants to. Most do, for
> compatibility with the normal argv convention inherited via C from Unix. Many
> programs let the C runtime do that splitting for them - I don't recall if Python
> does, or if it implements its own splitting these days.

Windows Python uses the CRTs parsed argument list. python.exe simply
uses the wmain entry point:

int
wmain(int argc, wchar_t **argv)
{
return Py_Main(argc, argv);
}

pythonw.exe uses the wWinMain entry point and the CRT's __argc and
__wargv variables:

int WINAPI wWinMain(
HINSTANCE hInstance, /* handle to current instance */
HINSTANCE hPrevInstance, /* handle to previous instance */
LPWSTR lpCmdLine, /* pointer to command line */
int nCmdShow /* show state of window */
)
{
return Py_Main(__argc, __wargv);
}

python[w].exe doesn't link in wsetargv.obj, so it doesn't support
wildcard expansion.

Chris Angelico

unread,
Dec 5, 2016, 2:01:53 PM12/5/16
to
On Tue, Dec 6, 2016 at 5:50 AM, BartC <b...@freeuk.com> wrote:
> So how do I do:
>
> gcc *.c -lm
>
> The -lm needs to go at the end.
>
> Presumably it now needs to check each parameter seeing if it resembles a
> file more than it does an option? And are options automatically taken care
> of, or is that something that, unlike the easier wildcards, need to be
> processed explicitly?

Actually, gcc processes a series of "things", where each thing could
be a file, a library definition, etc. So it recognizes "-lm" as a
library designation regardless of where it exists. It does NOT have to
be at the end specifically - you can have more file names after it.

ChrisA

Lew Pitcher

unread,
Dec 5, 2016, 2:03:32 PM12/5/16
to
On Monday December 5 2016 11:23, in comp.lang.python, "BartC" <b...@freeuk.com>
wrote:

> On 05/12/2016 15:53, Chris Angelico wrote:
>> On Tue, Dec 6, 2016 at 2:41 AM, BartC <b...@freeuk.com> wrote:
>>>
>>> Are you saying that if someone executes:
>>>
>>> subprocess.Popen(["python","a.py", "*"])
>>>
>>> the output will be: ['a.py','*']?
>>>
>>> In that case forget Windows vs. Linux, you now have a program that will
>>> get command parameters processed differently depending on whether it was
>>> invoked from a shell or not.
>>
>> Yes, that is correct. *SHELLS DO STUFF*. If you can't comprehend this,
>> you should get to know your shell. Try this:
>>
>> python a.py %PATH%
>>
>> subprocess.Popen(["python", "a.py", "%PATH%"])
>>
>> Would you expect those to do the same? If you do, prepare for Windows
>> to surprise you. If you don't, why do you keep expecting other shells
>> to do nothing?
>
> You still don't get point. I write a program P, a native executable. It
> takes command line parameters but exactly what it gets depends on
> whether it's started from a 'shell' or from inside another program?

In Unix, it always has.


> I
> don't want to worry about that stuff or exactly how it is invoked!

Then, I guess that you have a problem, don't you?

> > subprocess.Popen(["python", "a.py", "%PATH%"])
>
> Yes, %...% is an escape sequence. Those % signs are supposed to stand
> out and have been chosen not to clash with typical input.
>
> And the end result of the transformation is, here, also a SINGLE thing;
> any parameter following will still be the second parameter, not the 14771th!
>
> Are you saying that the * in ABC*EF also makes the whole thing some
> escape pattern?

If you ask the shell to parse the arguments, then, "YES, the * in ABC*EF makes
the argument a candidate for globbing".

> And one that could massively expand the number of parameters,

Yes

> pushing all the following ones out of the way, and making it
> impossible to discover where these expanded parameters end and the
> normal ones recommence.

The best way to think about it is that all parameters are parameters, whether
derived from a glob input to a shell, or explicitly specified in the
invocation.

If you have a need for positional parameters, then either specify that your
code only accepts them in specific places, or find a way to disable globbing
(it can be done) and handle the expansion yourself, in your own code.

> If someone had thought this up now, it would rapidly be dismissed as
> being unworkable. But because it's been in Unix/Linux/whatever so long,
> no one can see anything wrong with it!

Primarily because there /IS/ nothing wrong with it.

Michael Torrie

unread,
Dec 5, 2016, 2:08:40 PM12/5/16
to
On 12/05/2016 11:21 AM, Chris Angelico wrote:
> On Tue, Dec 6, 2016 at 5:02 AM, BartC <b...@freeuk.com> wrote:
>> If the syntax is:
>>
>> program filespec
>>
>> or:
>>
>> program filespec file
>>
>> how do you tell whether the last file in an argument list is the optional
>> 'file', or the last file of the expansion of 'filespec'?
>
> Why should you care? I have used shell globbing to pass precisely two
> parameters to a program. More often, I use this syntax, which Windows
> simply doesn't support:
>
> ffmpeg -i some-really-long-file-name.{avi,mkv}
>
> to convert a file from one format to another. And if the .mkv file
> already exists, I can just use ".*" at the end, although that does
> depend on them being in alphabetical order. The shell is predictable
> and therefore useful. This trick is guaranteed to work, no matter what
> program I'm using. On Windows, I have to hope that the program expands
> these notations correctly.

Agreed. I do this sort of trick all the time, even when I want to pass
just a single file to a program. I often use expansion for paths as well:

somecommand /path/to/somelongname*withendpart/myepisode*2x03*mp4

I could use tab expansion as I go, but tab expansion might reveal
several options to pick from, requiring a few additional keystrokes to
arrive at the path I want. Globs save that typing. And shells are
smart enough to expand the expression even after several wildcards have
been used. It's a win win. And the program doesn't have to know
anything about it to work.

Now I usually can use the same expressions in cmd.exe. But I find other
parts of Windows' command-line parsing to really strange, particularly
when it comes to spaces in the filenames. I'm not sure if this is
cmd.exe's fault or just the win32 api.

In the Unix world, there are times when you don't want shell expansion.
For example, when dealing with ssh or rsync, you most often want the
wildcards and other expression characters to be passed through to the
remote process. By default zsh makes you escape them all so they won't
attempt expansion locally. If zsh can't expand an expression locally,
it gives you an error. This can be controlled with a config option.

Bash on the other hand, passes through any expressions it can't expand
as is. So under certain circumstances, "python a*.py" would indeed pass
"a*.py" to the program. But 99% of the time this does not matter.
Whether the glob expands to zero files or the program attempts to open
"a*.py" as a literal file, the effect is usually the same. Most unix
shells default to this behavior, which is useful when using rsync.

I'd far rather leave it up to the shell to do expansion than to do it
myself in a program. It allows consistency within the shell experience.
If people want the zsh idea of erring out on an expansion, they can have
that. Or they can use a shell that behaves like bash. Either way there
is a consistency there that's just not there on Windows.

BartC

unread,
Dec 5, 2016, 2:24:34 PM12/5/16
to
On 05/12/2016 18:34, eryk sun wrote:
> On Mon, Dec 5, 2016 at 3:41 PM, BartC <b...@freeuk.com> wrote:
>>
>> Are you saying that if someone executes:
>>
>> subprocess.Popen(["python","a.py", "*"])
>>
>> the output will be: ['a.py','*']?
>>
>> In that case forget Windows vs. Linux, you now have a program that will get
>> command parameters processed differently depending on whether it was invoked
>> from a shell or not.
>>
>> Or a program that sometimes will see "*" as an argument, and sometimes a big
>> list of files that merges into all the other arguments.
>
> If it sees "*", it will try to open a file named "*".

And people still say that the way Windows works is crazy!

That's a valid
> filename in Unix, but it should be avoided.

No, it should be prohibited, if the file API and/or shell assign special
significance to it.

> I don't understand your concern regarding Unix shell scripting syntax.
> On Windows, are you as troubled that environment variables such as
> %USERNAME% get expanded by cmd.exe, but not by CreateProcess?

No, because I never use %USERNAME%, and it is obviously something
associated with Windows' Batch language which I hardly ever use.

Not like file wildcards, which I use all the time, and don't associate
them exclusively with a shell.

I also never expect any parameter that contains ? and * to be expanded
then and there to whatever list of files there might happen to be. Maybe
the files haven't been created yet, maybe the parameter has got NOTHING
TO DO with filenames.

(For example, in Windows:

>ren *.c *.d

Rename all .c files to .d files. None of the .d files exist (or, because
of the point below, some isolated .d file exists). I wouldn't be able to
emulate such a command under Linux, not without writing:

rename *.c "*.d"

or even:

rename "*.c" "*.d"

since, if the user forgets the second parameter, and there were two .c
files, it would think I wanted to rename one .c file to another.)

But that leads to another weird behaviour I've just observed: if a
parameter contains a filespec with wildcards, and there is at least one
matching file, then you get a list of those files.

However, if there are no matching files, you get the filespec unchanged.

Now, what is a program supposed to with that? It seems it has to deal
with wildcards after all. But also some behaviours can become erratic.
This program asks for the name of a DAB radio station:

>station ?LBC

Fine, the input is ?LBC (which is an actual station name where I live).
But then at some point a file ALBC comes into existence; no connection.
Now the same line gives the input ALBC, to perplex the user!

> Every command-line shell that I've ever used is also a quirky
> scripting language.

It seems shell language authors have nothing better to do than adding
extra quirky features that sooner or later are going to bite somebody
on the arse. Mainly I need a shell to help launch a program and give it
some basic input; that's all.

(The people who think up obscure hot-key combinations belong in the same
camp. I keep hitting key combinations that do weird things, then have to
find a way out. One combination turned the display upside-down! Then I
had to turn the monitor upside down while I investigated solutions.)

--
Bartc

Chris Angelico

unread,
Dec 5, 2016, 2:26:20 PM12/5/16
to
On Tue, Dec 6, 2016 at 6:08 AM, Michael Torrie <tor...@gmail.com> wrote:
> Agreed. I do this sort of trick all the time, even when I want to pass
> just a single file to a program. I often use expansion for paths as well:
>
> somecommand /path/to/somelongname*withendpart/myepisode*2x03*mp4
>

"somecommand" is "vlc", isn't it :D

And I agree. I do this too. If you tab complete it, you can't as
easily use the up arrow. Compare:

vlc /video/Music/ASongToSingO/203*
vlc /video/Music/ASongToSingO/203\ Bugs\ Bunny\ at\ the\ Gaiety.ogg

The latter came from "203<tab>". Now, you want to play track 216
"Trial by Ruddy Patience" next. With the first one, it's up, left,
backspace two, patch in the new number, hit enter. With the second,
you have to backspace the whole title, then put in 216, and
tab-complete it again. Now, VLC on Windows is probably capable of
expanding globs, but why should it bother? Bash gives the exact same
result for both of those commands.

Plus, the Windows method depends on the globbing characters being
INVALID in file names. You absolutely cannot create a file called
"What's up, doc?.mkv" on Windows. And I'm not sure what other
characters are valid - are you allowed quotes? If so, how do you quote
the quote character? And what happens with network mounts?

With the Unix model, where the shell does the work, it's the shell's
job to provide an escaping mechanism. The shell doesn't have to be
externally consistent (although it helps a lot), but it can be
perfectly internally consistent. The *system* simply passes everything
along as-is, as an array of arguments.

Windows gives you less freedom than Unix does.

ChrisA

Michael Torrie

unread,
Dec 5, 2016, 2:29:22 PM12/5/16
to
On 12/05/2016 11:50 AM, BartC wrote:
> It doesn't matter, and is not the concern of the shell. It should
> restrict itself to the basic parsing that may be necessary when
> parameters are separated by white-space and commas, if a parameter can
> contain white-space or commas. That usually involves surrounding the
> parameter with quotes.

That is just a difference of opinion and what you're used to. A shell's
primary concern on my computer is providing a great environment for me
to work in, and for that, shell expansion is very important and useful
to me. Also, in the Unix world (can't speak for Windows), we assemble
lots of little bash scripts into individual executable units, and string
them together. In other words, many "commands" we use in Unix are
actually little scripts themselves. So having each and every little
script have to do its own glob expansion would be very tedious and
redundant. Moving that up to the shell makes an incredible amount of
sense given this structure.

> One would be very annoyed if, reading a CSV file, where each of N values
> on a line correspond to a field of record, if one entry of "?LBC"
> expanded itself to a dozen entries, screwing everything up.

I have no idea why you even put this in here. What does reading a CSV
file have to do with glob expansion? Sorry Bart, but you're grasping at
straws with that one.

> Shell command line processing shouldn't be attempting anything more than
> that.

In your opinion. Clearly there's a whole world out there that thinks
differently, and has their own good reasons.

>
>> If they are command options, on a Linux shell options appear as
>>
>> X -A *.* -C -D
>>
>> Even Windows command processor separates optional/command stuff via
>>
>> X /A *.* /C /D
>>
>> ... or requires the parameter defining the file(s) to process as the last
>> arguments
>
> Nonsense. If I pass that to a Python program that prints sys.argv, I get:
>
> ['a.py', '/A', '*.*', '/C', '/D']
>
> Presumably a Windows programmer is grown up enough to make their own
> decisions as to what to do with that input.
>
> All that's needed is one little standard library function to process
> sys.argc, with unexpanded parameters, into a list of expanded arguments,
> if that's really what someone wants (and then needing to trawl through
> it all looking for the options).

Sure I understand that, and I can understand the appeal. I just
disagree that it's this big issue you seem to think it is.

> Then you get the best of both worlds.
>
>
>> X A C D *.*
>
> So how do I do:
>
> gcc *.c -lm
>
> The -lm needs to go at the end.
>
> Presumably it now needs to check each parameter seeing if it resembles a
> file more than it does an option? And are options automatically taken
> care of, or is that something that, unlike the easier wildcards, need to
> be processed explicitly?

To add to what Chris said, all the different parameters have to be
looked at anyway, so I don't understand why it's such a burden to scan
them one by one and if they are a filename, add them to a list, if they
are a command-line argument, do something different. And in the case of
gcc, one or more globs could appear, in any order. So in Windows you'd
have to look at each argument anyway, and do n number of manual
expansions for non-argument parameters. So this particular argument of
yours is rather weak.

Not having glob expansion be in individual programs makes things much
more explicit, from the program's point of view. It provides an
interface that takes filename(s) and you provide them, either explicitly
(via popen with no shell), or you can do it implicitly but in an
interactive way via the shell using expansion. Personal preference but
I believe it's a much better way because it's explicit from the
program's point of view as there's no question about the program's behavior.

Different strokes for different folks as they say.

Chris Angelico

unread,
Dec 5, 2016, 2:40:10 PM12/5/16
to
On Tue, Dec 6, 2016 at 6:24 AM, BartC <b...@freeuk.com> wrote:
>> If it sees "*", it will try to open a file named "*".
>
>
> And people still say that the way Windows works is crazy!
>
> That's a valid
>>
>> filename in Unix, but it should be avoided.
>
>
> No, it should be prohibited, if the file API and/or shell assign special
> significance to it.
>

That implies that the file system, file API, and shell are all bound
together. You have to forbid in file names *any* character that has
*any* significance to *any* shell. That means no quotes, no asterisks
or question marks, no square brackets, no braces, etc, etc, etc.....
or, alternatively, restrict your shells to all perform the exact same
expansion - or no expansion at all, and then restrict all your
_applications_ to perform the same expansion.

Good luck.

ChrisA

Michael Torrie

unread,
Dec 5, 2016, 2:48:29 PM12/5/16
to
Bored to day since it's -20 and I don't want to work outside.

On 12/05/2016 12:24 PM, BartC wrote:
>> If it sees "*", it will try to open a file named "*".
>
> And people still say that the way Windows works is crazy!
>
> That's a valid
>> filename in Unix, but it should be avoided.
>
> No, it should be prohibited, if the file API and/or shell assign special
> significance to it.

Why? What does it matter to the file system what characters are in the
filename?

>
>> I don't understand your concern regarding Unix shell scripting syntax.
>> On Windows, are you as troubled that environment variables such as
>> %USERNAME% get expanded by cmd.exe, but not by CreateProcess?
>
> No, because I never use %USERNAME%, and it is obviously something
> associated with Windows' Batch language which I hardly ever use.

cmd.exe *IS* the the "Windows' Batch" language. So if you use cmd.exe,
you use this "Batch" language all the time.

> Not like file wildcards, which I use all the time, and don't associate
> them exclusively with a shell.
>
>
> I also never expect any parameter that contains ? and * to be expanded
> then and there to whatever list of files there might happen to be. Maybe
> the files haven't been created yet, maybe the parameter has got NOTHING
> TO DO with filenames.
>
> (For example, in Windows:
>
> >ren *.c *.d
>
> Rename all .c files to .d files. None of the .d files exist (or, because
> of the point below, some isolated .d file exists). I wouldn't be able to
> emulate such a command under Linux, not without writing:
>
> rename *.c "*.d"
>
> or even:
>
> rename "*.c" "*.d"

Wow. Does that actually work? And work consistently? How would it
handle globs like this:

renamae a*b??c*.c *.d

I can't think of any way to do that in a consistent way. I can see it
working for the simple cases. Must have quite a bit of logic in rename
to handle all the corner cases.

Such a syntax, even with quoting, would be highly unusual to see in the
Unix world. I think I'd rather just use a simple loop and have the
rename be explicit so I know it's working (and yes I do this all the
time) as I expect it to.

> since, if the user forgets the second parameter, and there were two .c
> files, it would think I wanted to rename one .c file to another.)
>
> But that leads to another weird behaviour I've just observed: if a
> parameter contains a filespec with wildcards, and there is at least one
> matching file, then you get a list of those files.
>
> However, if there are no matching files, you get the filespec unchanged.

That's true for many shells (but not zsh by default which errors out).

> Now, what is a program supposed to with that? It seems it has to deal
> with wildcards after all. But also some behaviours can become erratic.
> This program asks for the name of a DAB radio station:

Again, it doesn't really matter. If the program was looking for
filenames, the program will try to open the file as given ("a*.txt") and
if it succeeds it succeeds, if it fails it fails. The program need not
care. Why should it?

>
> >station ?LBC
>
> Fine, the input is ?LBC (which is an actual station name where I live).
> But then at some point a file ALBC comes into existence; no connection.
> Now the same line gives the input ALBC, to perplex the user!

Why would a file come into existence? I thought you said it refers to a
station identifier. No wonder you have such confused users if you're
creating files when you should be opening a URL or something.

I would expect in this case that the "station" program would return an
error, something like:

Error! Cannot find station "ALBC"

If station ids were supposed to start with a wildcard character, I'd
probably have to make a note in the help file that the station
identifiers have to be escaped or placed in quotes. Or change my logic
to not require this ? character (if it was part of all identifiers we
can drop it safely).

>
>> Every command-line shell that I've ever used is also a quirky
>> scripting language.
>
> It seems shell language authors have nothing better to do than adding
> extra quirky features that sooner or later are going to bite somebody
> on the arse. Mainly I need a shell to help launch a program and give it
> some basic input; that's all.

I think you'd quickly find that such a shell would be very non-useful.

John Gordon

unread,
Dec 5, 2016, 2:54:12 PM12/5/16
to
In <o249l7$knc$1...@dont-email.me> DFS <nos...@dfs.com> writes:

> > Shells don't just repeat the characters you type, they interpret them.

> Yes, I see that now. I still don't think bash/shell should alter the
> input going to another program.

But that's one of the reasons *to* use a shell!

ls *.c ERROR No file named "*.c"
ls > output ERROR No file named ">"
rm 'a file name' ERROR No file named "'a file name'"
cd ~/foo/bar ERROR No file named "~/foo/bar"
cat $HOME/file.txt ERROR No file named '$HOME/file.txt'
vi $(grep -l foo *.txt) ERROR No file named '$(grep -l foo *.txt)'

None of these commands would work if bash didn't "alter the input going to
another program".

--
John Gordon A is for Amy, who fell down the stairs
gor...@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"

DFS

unread,
Dec 5, 2016, 3:07:19 PM12/5/16
to
On 12/05/2016 02:54 PM, John Gordon wrote:
> In <o249l7$knc$1...@dont-email.me> DFS <nos...@dfs.com> writes:
>
>>> Shells don't just repeat the characters you type, they interpret them.
>
>> Yes, I see that now. I still don't think bash/shell should alter the
>> input going to another program.
>
> But that's one of the reasons *to* use a shell!
>
> ls *.c ERROR No file named "*.c"
> ls > output ERROR No file named ">"
> rm 'a file name' ERROR No file named "'a file name'"
> cd ~/foo/bar ERROR No file named "~/foo/bar"
> cat $HOME/file.txt ERROR No file named '$HOME/file.txt'
> vi $(grep -l foo *.txt) ERROR No file named '$(grep -l foo *.txt)'
>
> None of these commands would work if bash didn't "alter the input going to
> another program".


That's why bash is known as a 'command interpreter'.

bash should make a special exception for python programs.

Do what I said, not what you think I said!


Random832

unread,
Dec 5, 2016, 3:10:01 PM12/5/16
to
On Mon, Dec 5, 2016, at 14:48, Michael Torrie wrote:
> Wow. Does that actually work? And work consistently? How would it
> handle globs like this:

The rules are simpler than you're probably thinking of. There's actually
no relationship between globs on the left and on the right. Globs on the
left simply select the files to rename as normal, the glob pattern
doesn't inform the renaming operation.

A question mark on the right takes the character from *that character
position* in the original filename. That is, if you have "abc", "rename
ab? ?de" renames it to "ade", not "cde".

A star on the right takes all remaining characters. If you include a
dot, the "name" (everything before the last dot) and "extension" of the
file are considered separate components [so in adddition to rename *.c
*.d, renaming a.* b.* where you have a.py, a.pyc, a.pyo will work].

But if you don't include a dot, for example "rename abcde.fgh xyz*", it
will rename to xyzde.fgh .

Marko Rauhamaa

unread,
Dec 5, 2016, 3:14:29 PM12/5/16
to
Let's mention a few traps:

* Classic shell programming is built on the concept of string lists. A
list of three file names could be stored in a variable as follows:

files="a.txt b.txt c.txt"

or maybe:

files=$*

Trouble is, whitespace is not a safe separator because it is a valid
character in filenames.

The only safe way out is to use bash's arrays, which are a deviation
of the classic foundation. That's why most (?) shell scripts just
ignore the possibility of whitespace in filenames.

* A common, classic pattern to pipeline data is something like this:

find_files |
while read filename; do
... do a thing or two with "$filename" ...
done

Unfortunately, the pattern suffers from several problems:

1. A filename can legally contain a newline.

2. A filename can legally begin and end with whitespace which is
stripped by the "read" builtin.

3. The "read" builtin treats the backslash as an escape character
unless the "-r" option is specified. The backslash is a valid
character in a filename.

* The classic "test" builtin (or "[ ... ]") has ambigous syntax:

if [ -e "$filename" -o -e "$filename.bak" ]; then
...
fi

results in a syntax error if "$filename" should be "=".

* This fails gracefully:

local x
x=$(false) || exit

This doesn't fail at all:

local x=$(false) || exit


Marko

Michael Torrie

unread,
Dec 5, 2016, 3:34:25 PM12/5/16
to
On 12/05/2016 01:09 PM, Random832 wrote:
> The rules are simpler than you're probably thinking of. There's actually
> no relationship between globs on the left and on the right. Globs on the
> left simply select the files to rename as normal, the glob pattern
> doesn't inform the renaming operation.
>
> A question mark on the right takes the character from *that character
> position* in the original filename. That is, if you have "abc", "rename
> ab? ?de" renames it to "ade", not "cde".
>
> A star on the right takes all remaining characters. If you include a
> dot, the "name" (everything before the last dot) and "extension" of the
> file are considered separate components [so in adddition to rename *.c
> *.d, renaming a.* b.* where you have a.py, a.pyc, a.pyo will work].
>
> But if you don't include a dot, for example "rename abcde.fgh xyz*", it
> will rename to xyzde.fgh .

Ahh. Good explanation. Thank you. So in the case of rename, the
second argument isn't actually a glob at all (even though it looks like
one). A useful simplification to be sure, though not one I'd want to
emulate (it relies on Windows' idea of file extensions to really work
correctly).

BartC

unread,
Dec 5, 2016, 3:35:25 PM12/5/16
to
On 05/12/2016 19:48, Michael Torrie wrote:
> Bored to day since it's -20 and I don't want to work outside.
>
> On 12/05/2016 12:24 PM, BartC wrote:

>> (For example, in Windows:
>>
>> >ren *.c *.d
>>
>> Rename all .c files to .d files. None of the .d files exist (or, because
>> of the point below, some isolated .d file exists). I wouldn't be able to
>> emulate such a command under Linux, not without writing:
>>
>> rename *.c "*.d"
>>
>> or even:
>>
>> rename "*.c" "*.d"
>
> Wow. Does that actually work? And work consistently? How would it
> handle globs like this:
>
> renamae a*b??c*.c *.d

That's not really the point, which is that a parameter may use * and ?
which, even if they are related to file names, may not refer to actual
input files in this directory. Here, they are output files. But this is
just one of dozens of reasons why automatic expansion of such patterns
is not desirable.

(And yes sometimes it works, for simple cases, and sometimes it screws
up. But it is up to the program how it defines its input and what it
will do with it. It shouldn't be up to the shell to expand such names in
ALL circumstances whether that is meaningful to the app or not.)

>> But that leads to another weird behaviour I've just observed: if a
>> parameter contains a filespec with wildcards, and there is at least one
>> matching file, then you get a list of those files.
>>
>> However, if there are no matching files, you get the filespec unchanged.
>
> That's true for many shells (but not zsh by default which errors out).
>
>> Now, what is a program supposed to with that? It seems it has to deal
>> with wildcards after all. But also some behaviours can become erratic.
>> This program asks for the name of a DAB radio station:
>
> Again, it doesn't really matter. If the program was looking for
> filenames, the program will try to open the file as given ("a*.txt") and
> if it succeeds it succeeds, if it fails it fails. The program need not
> care. Why should it?

Really, the program should just fail? It seems to me that the program is
then obliged to deal with wildcards after all, even if just to detect
that wildcards have been used and that is reason for the error. But my
example was how that behaviour can stop programs working in random ways.

>> Fine, the input is ?LBC (which is an actual station name where I live).
>> But then at some point a file ALBC comes into existence; no connection.
>> Now the same line gives the input ALBC, to perplex the user!
>
> Why would a file come into existence? I thought you said it refers to a
> station identifier. No wonder you have such confused users if you're
> creating files when you should be opening a URL or something.

The file is irrelevant. It's just a bit of junk, or something that is
used for some other, unrelated purpose.

The important thing is that creation of that file FOR ANY REASON could
screw up a program which is asking for input that is unrelated to any files.

> I would expect in this case that the "station" program would return an
> error, something like:
>
> Error! Cannot find station "ALBC"

And that's the error! But how is the user supposed to get round it? It's
an error due entirely to unconditional shell expansion of any parameter
that looks like it might be a filename with wildcards.

> If station ids were supposed to start with a wildcard character, I'd
> probably have to make a note in the help file that the station
> identifiers have to be escaped or placed in quotes.

I've no idea why that station starts with "?" (it appears on the station
list, but have never tried to listen to it). But this is just an example
of input that might use ? or * that has nothing to do files, yet can be
affected by the existence of some arbitrary file.

And somebody said doing it this way is more secure than how Windows does
things!

>> It seems shell language authors have nothing better to do than adding
>> extra quirky features that sooner or later are going to bite somebody
>> on the arse. Mainly I need a shell to help launch a program and give it
>> some basic input; that's all.
>
> I think you'd quickly find that such a shell would be very non-useful.

For about five minutes, until someone produces an add-on with the
missing features.

--
Bartc

BartC

unread,
Dec 5, 2016, 3:55:55 PM12/5/16
to
On 05/12/2016 19:29, Michael Torrie wrote:
> On 12/05/2016 11:50 AM, BartC wrote:

>> So how do I do:
>>
>> gcc *.c -lm
>>
>> The -lm needs to go at the end.
>>
>> Presumably it now needs to check each parameter seeing if it resembles a
>> file more than it does an option? And are options automatically taken
>> care of, or is that something that, unlike the easier wildcards, need to
>> be processed explicitly?

This was a response to someone saying the wildcard param needs to be at
the end. There need be no such restriction if handled properly (ie. no
auto-expansion).

But a similar example, suppose a syntax is:

appl *.* [options]

but one of the files in the list is called "-lm", or some other option
that can do things the user didn't want (with gcc, -lm is harmless).

Without expansion, input is easy to parse: filespec, followed by
optional options. But with expansion, now you have to decide if a
particular argument is an option, or a filename.

And if there's an error in an option, you may have to abort, which means
throwing away that list of files which, in some cases, can run into
millions.

> Not having glob expansion be in individual programs makes things much
> more explicit, from the program's point of view. It provides an
> interface that takes filename(s) and you provide them, either explicitly
> (via popen with no shell), or you can do it implicitly but in an
> interactive way via the shell using expansion. Personal preference but
> I believe it's a much better way because it's explicit from the
> program's point of view as there's no question about the program's behavior.
>
> Different strokes for different folks as they say.

I must have given ten or twenty scenarios where such auto-expansion is
problematical. And yet people are still in denial. It's been in Unix for
decades therefore there's nothing wrong with it!

But maybe people simply avoid all the situations that cause problems.
Interfaces are specified in a certain manner, given that input can be
any long stream of filenames and/or options with no order and no
positional significance. Non-file parameters that use ? or * are
prohibited. You can't do the equivalent of:

>DIR *.b *.c

And get a list of *.b files, with a heading and summary lines, then a
list of *.c files with its own heading and summary. It would be one
monolithic list.

So everyone is working around the restrictions and the problems. Which
is exactly what I would have to do.

That doesn't change the fact that the Windows approach is considerably
more flexible and allowing more possibilities.

--
Bartc


Michael Torrie

unread,
Dec 5, 2016, 5:38:43 PM12/5/16
to
On 12/05/2016 01:35 PM, BartC wrote:
>>> It seems shell language authors have nothing better to do than adding
>>> extra quirky features that sooner or later are going to bite somebody
>>> on the arse. Mainly I need a shell to help launch a program and give it
>>> some basic input; that's all.
>>
>> I think you'd quickly find that such a shell would be very non-useful.
>
> For about five minutes, until someone produces an add-on with the
> missing features.

Which will re-implement everything you stripped out. :)

As I've gotten older I've learned the truth of this quotation:
"Those who do not understand UNIX are condemned to reinvent it, poorly."
-- Henry Spencer


eryk sun

unread,
Dec 5, 2016, 6:10:50 PM12/5/16
to
On Mon, Dec 5, 2016 at 4:49 PM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
>
> You've never used cmd.com or command.exe? "The DOS prompt"?

The default Windows shell is "cmd.exe", and it's informally called the
"Command Prompt", not "DOS Prompt". In Windows 9x it was accurate to
say DOS prompt, since the shell was COMMAND.COM, which used DOS system
calls. But that branch of Windows has been dead for over a decade.

> Even the DOS prompt supports some level of globbing. Its been a while since
> I've used the DOS prompt in anger, but I seem to recall being able to do
> things like:
>
> dir a*

"dir" is a built-in command. It calls FindFirstFileExW to list each
directory that it walks over.

FindFirstFileExW converts the glob to a form supported by the system
call NtQueryDirectoryFile. In this case that's simply "a*". In other
cases it tries to match the behavior of MS-DOS globbing, which
requires rewriting the pattern to use DOS_STAR ('<'), DOS_QM ('>'),
and DOS_DOT ('"'). To simplify the implementation, the five wildcard
characters are reserved. They're not allowed in filenames.

The I/O manager leaves the implementation up to the filesystem driver.
NtQueryDirectoryFile gets dispatched to the driver as an I/O request
packet with major function IRP_MJ_DIRECTORY_CONTROL and minor function
IRP_MN_QUERY_DIRECTORY. The driver does most of the work, including
filtering the directory listing by the FileName argument. But a driver
writer doesn't have to reinvent the wheel here; the filesystem runtime
library has string comparison functions that support wildcards, such
as FsRtlIsNameInExpression.

> *every single command and application* has to re-implement its own globbing,
> very possibly inconsistently.

C/C++ programs can link with wsetargv.obj to support command-line globbing.

For Python, this dependency can be added in PCBuild/python.vcxproj in
the linker configuration:

<Link>
<SubSystem>Console</SubSystem>
<StackReserveSize>2000000</StackReserveSize>
<BaseAddress>0x1d000000</BaseAddress>
<AdditionalDependencies>wsetargv.obj</AdditionalDependencies>
</Link>

For example:

C:\Temp\test>dir /b
a.txt
b.dat
c.bin

C:\Temp\test>python -c "import sys;print(sys.argv)" *.txt *.dat
['-c', 'a.txt', 'b.dat']

Marko Rauhamaa

unread,
Dec 5, 2016, 6:38:33 PM12/5/16
to
Michael Torrie <tor...@gmail.com>:
> As I've gotten older I've learned the truth of this quotation:
> "Those who do not understand UNIX are condemned to reinvent it, poorly."
> -- Henry Spencer

I thought you kinda liked systemd...


Marko

Michael Torrie

unread,
Dec 5, 2016, 7:08:45 PM12/5/16
to
Yup I do.

Nathan Ernst

unread,
Dec 5, 2016, 9:42:38 PM12/5/16
to
Rather than argue about what is/should be allowed by a filesystem, this
defines what is allowed on NTFS (default for modern Windows systems):
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

One can complain about whether or not something should be allowed, but,
you'd have to take that up with Microsoft (and I'll doubt you'll make a
convincing enough argument for them to change it).

What is allowed on linux may be defined by linux itself, and it may be
restricted by the filesystem type itself (I don't know).

Regards,
Nathan

On Mon, Dec 5, 2016 at 8:25 PM, Dennis Lee Bieber <wlf...@ix.netcom.com>
wrote:

> On Mon, 5 Dec 2016 20:55:41 +0000, BartC <b...@freeuk.com> declaimed the
> following:
>
> >This was a response to someone saying the wildcard param needs to be at
> >the end. There need be no such restriction if handled properly (ie. no
> >auto-expansion).
> >
> That only applies if there is no prefix indicating a command option
> from a file name.
>
> >but one of the files in the list is called "-lm", or some other option
>
> -lm is not a valid file name on the OS's that use - as an option
> prefix.
>
> >Without expansion, input is easy to parse: filespec, followed by
> >optional options. But with expansion, now you have to decide if a
> >particular argument is an option, or a filename.
> >
> And you do that using a delimiter character that is not valid in
> filenames. On Windows, file names can not have a /, and that is the
> character used by the command line interpreter to indicate an option
> follows. On UNIX, - is used as the delimiter of an option.
>
> --
> Wulfraed Dennis Lee Bieber AF6VN
> wlf...@ix.netcom.com HTTP://wlfraed.home.netcom.com/
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

Steve D'Aprano

unread,
Dec 5, 2016, 9:44:23 PM12/5/16
to
On Tue, 6 Dec 2016 10:09 am, eryk sun wrote:

> On Mon, Dec 5, 2016 at 4:49 PM, Steve D'Aprano
> <steve+...@pearwood.info> wrote:
>>
>> You've never used cmd.com or command.exe? "The DOS prompt"?
>
> The default Windows shell is "cmd.exe", and it's informally called the
> "Command Prompt",

Thanks for the correction, I always mix up cmd/command . exe/com. I fear
this won't be the last time either -- I wish there was a good mnemonic for
which is which.



--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

Nathan Ernst

unread,
Dec 5, 2016, 9:51:14 PM12/5/16
to
Ifyou're running on Windows 10, at least, you can soon purge that memory.
command.com doesn't exist (may never have existed on Win2k, XP, Vista, 7,
8, 8.1 or 10). If I try and run either "command" or "command.com" from
Win10, both say command cannot be found.

IIRC, command.com was a relic of Win9x running on top of DOS and was a
16-bit executable, so inherently crippled (and probably never support by
the NT kernel). Whereby cmd.exe coexisted but ran in a 32-bit context.

On Mon, Dec 5, 2016 at 8:44 PM, Steve D'Aprano <steve+...@pearwood.info>
wrote:

> On Tue, 6 Dec 2016 10:09 am, eryk sun wrote:
>
> > On Mon, Dec 5, 2016 at 4:49 PM, Steve D'Aprano
> > <steve+...@pearwood.info> wrote:
> >>
> >> You've never used cmd.com or command.exe? "The DOS prompt"?
> >
> > The default Windows shell is "cmd.exe", and it's informally called the
> > "Command Prompt",
>
> Thanks for the correction, I always mix up cmd/command . exe/com. I fear
> this won't be the last time either -- I wish there was a good mnemonic for
> which is which.
>
>
>
> --
> Steve
> “Cheer up,” they said, “things could be worse.” So I cheered up, and sure
> enough, things got worse.
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

Michael Torrie

unread,
Dec 5, 2016, 10:58:15 PM12/5/16
to
On 12/05/2016 07:25 PM, Dennis Lee Bieber wrote:
> On Mon, 5 Dec 2016 20:55:41 +0000, BartC <b...@freeuk.com> declaimed
> the following:
>
>> This was a response to someone saying the wildcard param needs to
>> be at the end. There need be no such restriction if handled
>> properly (ie. no auto-expansion).
>>
> That only applies if there is no prefix indicating a command option
> from a file name.
>
>> but one of the files in the list is called "-lm", or some other
>> option
>
> -lm is not a valid file name on the OS's that use - as an option
> prefix.

"-" is perfectly valid in a filename on Linux. Getting apps to recognize
it as a filename and not an argument is another story. Convention is to
allow an argument "--" that tells the arg parser that everything
following that is not an argument but a parameter which may or may not
be a file--BartC seems stuck on this point, but parameters could be
anything from urls to string messages to numbers. They don't have to be
files and they in fact could begin with "/" if the program allowed it.

I argue that having convention used by programs and conventions sued by
shells instead of some kind of arbitrary OS-enforced scheme is
inherently more flexible and has its own consistency (everything is
explicit from any program's point of view).

>> Without expansion, input is easy to parse: filespec, followed by
>> optional options. But with expansion, now you have to decide if a
>> particular argument is an option, or a filename.
>>
> And you do that using a delimiter character that is not valid in
> filenames. On Windows, file names can not have a /, and that is the
> character used by the command line interpreter to indicate an option
> follows. On UNIX, - is used as the delimiter of an option.

What is valid in a filename is not relevant here. No one says
command-line parameters are file names if they aren't an option or
argument. What a parameter means is up to the application (convention).
Windows disallows / in a filename. Linux happens to also. But Linux
does allow '-' in a filename. Any character can be used in a parameter
to a program except \0, from the program's point of view.

Back to shell expansion, if I had a file named "-atest.txt" and I passed
an argument to a command like this:

command -a*

The shell would in fact expand that to "-atest.txt" much to BartC's
consternation.

Steven D'Aprano

unread,
Dec 6, 2016, 12:17:45 AM12/6/16
to
On Tuesday 06 December 2016 14:57, Michael Torrie wrote:

> "-" is perfectly valid in a filename on Linux. Getting apps to recognize
> it as a filename and not an argument is another story. Convention is to
> allow an argument "--" that tells the arg parser that everything
> following that is not an argument but a parameter which may or may not
> be a file

Another useful trick is to use a relative or absolute path to refer to a
filename that begins with a dash:

./-foo

is unambiguously the file called "-foo" in the current directory, not the -f
option.

This doesn't help when it comes to arguments which don't refer to filenames,
hence the -- tradition.


> --BartC seems stuck on this point, but parameters could be
> anything from urls to string messages to numbers. They don't have to be
> files and they in fact could begin with "/" if the program allowed it.

Indeed.

The Unix shells are optimized for a particular use-case: system administration.
For that, the globbing conventions etc are pretty close to optimal, but of
course they're not the only conventions possible. Unix shells recognise this
and allow you to escape metacharacters, and even turn off globbing altogether.
Another alternative would be to eschew the use of a single command line and
build your own scripting mini-language with its own conventions. That's
especially useful if your primary use-case is to invoke your program many
times, rather than just once.

E.g. the Postgresql interactive interpreter allows you to run multiple queries
interactively without worrying about the shell:

https://www.postgresql.org/docs/current/static/tutorial-accessdb.html


There's no doubt that Bart has a legitimate use-case: he wants his input to be
fed directly to his program with no pre-processing. (At least, no globbing: if
he has given his opinion on environment variable expansion, I missed it.) Fair
enough. Its easy to escape wildcards, but perhaps there should be an easy way
to disable *all* pre-processing for a single command.

The fact that there is no easy, well-known way to do so indicates just how
unusual Bart's use-case is. Linux and Unix sys admins are really good at
scratching their own itches, and believe me, if there was widespread wish to
disable pre-processing for a single command, after 40-odd years of Unix the
majority of shells would support it.



--
Steven
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." - Jon Ronson

Gregory Ewing

unread,
Dec 6, 2016, 1:29:05 AM12/6/16
to

Larry Hudson

unread,
Dec 6, 2016, 2:01:58 AM12/6/16
to
On 12/05/2016 06:51 PM, Nathan Ernst wrote:
> IIRC, command.com was a relic of Win9x running on top of DOS and was a
> 16-bit executable, so inherently crippled (and probably never support by
> the NT kernel). Whereby cmd.exe coexisted but ran in a 32-bit context.

I know my 79-year-old memory is definitely subject to "senior moments" and not too reliable, but
IIRC it was Windows prior to 9x (Win 3 and earlier) that were 16 bit and ran on top of DOS.
Win95 was the first 32 bit version and was independent from DOS.

--
-=- Larry -=-

Gregory Ewing

unread,
Dec 6, 2016, 2:08:45 AM12/6/16
to
BartC wrote:
> But a similar example, suppose a syntax is:
>
> appl *.* [options]

I would be disappointed by such a syntax. What if I want
to operate on two or more files with unrelated names? With
that syntax, I can't list them explicitly in the one command.

To make that possible, the syntax would have to be

appl filespec... [options]

i.e. allow an arbitrary number of filespecs followed by
options -- requiring the command line to be scanned looking
for options anyway.

> And if there's an error in an option, you may have to abort, which means
> throwing away that list of files which, in some cases, can run into
> millions.

This "millions of files" thing seems to be an imaginary
monster you've invented to try to scare people. I claim
that, to a very good approximation, it doesn't exist.
I've never encountered a directory containing a million
files, and if such a thing existed, it would be pathological
in enough other ways to make it a really bad idea.

> But maybe people simply avoid all the situations that cause problems.
> Interfaces are specified in a certain manner, given that input can be
> any long stream of filenames and/or options with no order and no
> positional significance.

Exactly. The programs are designed with knowledge of the
way shells behave and are typically used.

> You can't do the equivalent of:
>
> >DIR *.b *.c
>
> And get a list of *.b files, with a heading and summary lines, then a
> list of *.c files with its own heading and summary.

Not with *that particular syntax*. You would need to
design the interface of a program to do that some other
way.

> That doesn't change the fact that the Windows approach is considerably
> more flexible and allowing more possibilities.

At the expense of having shells with less powerful
facilities, and more inconsistent behaviour between
different programs.

One isn't better than the other. There are tradeoffs.

--
Greg

Gregory Ewing

unread,
Dec 6, 2016, 2:12:52 AM12/6/16
to
Dennis Lee Bieber wrote:
> -lm is not a valid file name on the OS's that use - as an option
> prefix.

It's not invalid -- you can create a file called -lm
on a unix system if you want, you just have to be a bit
sneaky about how you refer to it:

% echo foo > ./-lm
% ls
-lm
% cat ./-lm
foo

Sane people normally refrain from using such file names,
however, because of the hassle of dealing with them.

--
Greg

Larry Hudson

unread,
Dec 6, 2016, 2:37:27 AM12/6/16
to
On 12/05/2016 10:50 AM, BartC wrote:

>> And just what ARE A, C, and D?
>
> It doesn't matter, and is not the concern of the shell. It should restrict itself to the basic
> parsing that may be necessary when parameters are separated by white-space and commas, if a
> parameter can contain white-space or commas. That usually involves surrounding the parameter
> with quotes.
>
> One would be very annoyed if, reading a CSV file, where each of N values on a line correspond to
> a field of record, if one entry of "?LBC" expanded itself to a dozen entries, screwing
> everything up.
>
Now you're suggesting the _shell_ is going to read and process a CVS file???

> Shell command line processing shouldn't be attempting anything more than that.
>
I get awfully tired of your constant pontificating about your opinions. I know they're _VERY_
strongly held beliefs on your part, but... they are ONE person's opinions and are no more than
opinions and they ain't gonna change nothin', no how, no way, not ever.

[Sorry, I'm in a bad mood today and just had to let off a little steam...]

--
-=- Larry -=-

Paul Moore

unread,
Dec 6, 2016, 5:52:12 AM12/6/16
to
On Monday, 5 December 2016 18:21:57 UTC, Chris Angelico wrote:
> On Tue, Dec 6, 2016 at 5:02 AM, BartC <b...@freeuk.com> wrote:
> >
> > how do you tell whether the last file in an argument list is the optional
> > 'file', or the last file of the expansion of 'filespec'?
>
> Why should you care? I have used shell globbing to pass precisely two
> parameters to a program. More often, I use this syntax, which Windows
> simply doesn't support:

You might care. I occasionally teach Unix to beginners, and a common gotcha is the fact that

cp a/* .

copies everything from a to the current directory. But if you miss typing the ".", the behaviour is *very* different. And the cp command has no way to know or warn you that you might have mistyped.

The find command is another, more "advanced" example.

find . -name foo*

works as "expected" as long as there's no file that matches the glob in the current directory (because in that case the shell passes the pattern through unchanged). But users who get used to this behaviour get a nasty surprise when they hit a case where it doesn't apply and they *need* to quote.

It's a trade-off. Unix makes shells do globbing, so programs don't have to, but as a consequence they have no means of seeing whether globbing occurred, or switching it off for particular argument positions. Windows chooses to make the trade-off a different way.

Paul

Paul Moore

unread,
Dec 6, 2016, 5:57:49 AM12/6/16
to
On Monday, 5 December 2016 17:14:27 UTC, Skip Montanaro wrote:
> ISTR that the way DOS/Windows operate at the text prompt level was
> modeled on VMS. As you indicated, each command was responsible for its
> own "globbing". I've never programmed in DOS or Windows, and its been
> decades since I programmed in VMS, but I imagine that both
> environments probably provide some standard sort of globbing library.

Technically, the primitive "launch an executable" operation in Windows takes a *command line* to pass to the new process, rather than a list of arguments. The argv convention comes to Windows via C, which is derived from Unix.

So the C runtime historically provided argv splitting to match C semantics, and added globbing as a convenience "because people were used to it from Unix" (even though in Unix it was a shell feature, not a C/application feature). There's also an OS API to do cmdline->argv conversion, for programs that don't want to rely on the C runtime capability.

The result is the same, though - in Windows, applications (or the language runtime) handle globbing, but in Unix the shell does.

Paul

BartC

unread,
Dec 6, 2016, 6:21:43 AM12/6/16
to
On 06/12/2016 07:37, Larry Hudson wrote:

> Now you're suggesting the _shell_ is going to read and process a CVS
> file???

What follows a shell command is a set of values a,b,c,d,e. What is
encountered in a CSV is a set of values a,b,c,d,e. You really can't see
the similarity?

Suppose instead of:

command a b c d e

The interface was changed to be more interactive:

command
Input: a b c d e

So parameters are not entered on the command line, but are prompted for.
Do you think entering a b c d e here should give exactly the same
results as doing so on the command line?

As far as any user is concerned, they should. But they can't because in
the in-line example, parameters could be expanded. And there seems to be
no way of turning that off, without changing the input (eg. quotes
around parameters).

>> Shell command line processing shouldn't be attempting anything more
>> than that.
>>
> I get awfully tired of your constant pontificating about your opinions.

Opinions based on facts: I've given a dozen examples where the shell's
auto-expansion can screw things up. And I can easily come up with more,
as have others in the thread.

People's attitudes seem to be 'So don't that'. Or, 'So what?'.

Which suggests there is an actual problem that is being brushed under
the carpet.

> I know they're _VERY_ strongly held beliefs on your part, but... they
> are ONE person's opinions and are no more than opinions and they ain't
> gonna change nothin', no how, no way, not ever.

No they're not. But the auto-expansion of parameters by default is still
wrong.

> [Sorry, I'm in a bad mood today and just had to let off a little steam...]

People spout off about Windows ALL the time. 'So what?'

--
Bartc


Chris Angelico

unread,
Dec 6, 2016, 6:42:39 AM12/6/16
to
On Tue, Dec 6, 2016 at 10:21 PM, BartC <b...@freeuk.com> wrote:
> What follows a shell command is a set of values a,b,c,d,e. What is
> encountered in a CSV is a set of values a,b,c,d,e. You really can't see the
> similarity?
>
> Suppose instead of:
>
> command a b c d e
>
> The interface was changed to be more interactive:
>
> command
> Input: a b c d e
>
> So parameters are not entered on the command line, but are prompted for. Do
> you think entering a b c d e here should give exactly the same results as
> doing so on the command line?
>
> As far as any user is concerned, they should.

Here are two Python statements:

x = 1, 2, 3, 4, 5
f(x)

f(1, 2, 3, 4, 5)

As far as any user is concerned, these should do the exact same thing.
After all, they both have five numbers separated by commas. There's
absolutely no reason for them to do anything different. And it should
be exactly the same if you write it like this:

f(input())

and type "1, 2, 3, 4, 5" at the prompt. Right?

Why do you continue to claim that shells should do no parsing, yet
expect parsing to happen elsewhere? Why are shells somehow different?

ChrisA

BartC

unread,
Dec 6, 2016, 6:44:08 AM12/6/16
to
On 06/12/2016 02:21, Dennis Lee Bieber wrote:
> On Mon, 5 Dec 2016 18:50:30 +0000, BartC <b...@freeuk.com> declaimed the
> following:
>
>> It doesn't matter, and is not the concern of the shell. It should
>> restrict itself to the basic parsing that may be necessary when
>
> Another name for "shell" is "command line interpreter" -- emphasis on
> "interpreter"...
>
> They are languages in their own right, with their own rules.

I distinguish between proper languages and those that just process
commands like those you type in to applications that use command line input:

> kill dwarf with axe

You don't really expect input like:

>HOW ARE YOU?
VERY WELL THANKS!

[An actual demo of a board computer in the 1970s] to be transformed into:

HOW ARE YOUA YOUB YOUC

> The Windows command prompt being one of the weakest -- it doesn't
> support arithmetic and local variables, nor (to my knowledge) looping
> constructs.

It does some of that, but very crudely. If you want scripting, then use
a scripting language. There are plenty about. A typical Windows user who
uses the command prompt will have no idea they are doing coding. And
they're not. /I/ don't expect that on Unix either.

>> One would be very annoyed if, reading a CSV file, where each of N values
>> on a line correspond to a field of record, if one entry of "?LBC"
>> expanded itself to a dozen entries, screwing everything up.
>
> Meaningless argument... You are, here, /reading a data file/, not
> interpreting the contents as command lines.

They're both data.

> Read the Python documentation for argparse

I just tried it, but it was too complex for me to set it up so as to
discover with it did with * arguments.

> Again, start with argparse... Any command line argument that is left
> after it has parsed the line can likely be considered a "filename".

Only at the end?

And to
> handle the difference between Windows and UNIX you'd likely need something
> like:
>
> for aParm in remainingArguments:
> for aFile in glob.glob(aParm):
> do something with the file

Suppose any argument contains * or ?, isn't a filename, but happens to
match some files in the current directory. AFAICS it will still screw up.

--
Bartc

BartC

unread,
Dec 6, 2016, 6:56:39 AM12/6/16
to
On 06/12/2016 07:08, Gregory Ewing wrote:
> BartC wrote:

>> And if there's an error in an option, you may have to abort, which
>> means throwing away that list of files which, in some cases, can run
>> into millions.
>
> This "millions of files" thing seems to be an imaginary
> monster you've invented to try to scare people. I claim
> that, to a very good approximation, it doesn't exist.
> I've never encountered a directory containing a million
> files, and if such a thing existed, it would be pathological
> in enough other ways to make it a really bad idea.

Many of my examples are based on actual experience.

One of my machines /did/ have 3.4 million files in one directory.

(The result of Firefox file caching having run amok for the best part of
a year. Once discovered, it took 15 hours to delete them all.)

In that directory (which was on Windows but accessible via a virtual
Linux), typing any Linux command followed by * would have required all
3.4 million directory entries to be accessed in order to build a 3.4
million-element argv list. I've no idea how long that would have taken.

>> >DIR *.b *.c

> Not with *that particular syntax*. You would need to
> design the interface of a program to do that some other
> way.

EXACTLY. It's restrictive.

--
Bartc


Chris Angelico

unread,
Dec 6, 2016, 7:27:02 AM12/6/16
to
On Tue, Dec 6, 2016 at 10:56 PM, BartC <b...@freeuk.com> wrote:
> In that directory (which was on Windows but accessible via a virtual Linux),
> typing any Linux command followed by * would have required all 3.4 million
> directory entries to be accessed in order to build a 3.4 million-element
> argv list. I've no idea how long that would have taken.

I just asked Python to build me a 4-million-element list, and it took
no visible time - a small fraction of a second. Don't be afraid of
large argument lists. We're not writing 8088 Assembly Language
programs in 64KB of working memory here.

ChrisA

Gregory Ewing

unread,
Dec 6, 2016, 7:40:18 AM12/6/16
to
BartC wrote:
> I've given a dozen examples where the shell's
> auto-expansion can screw things up.

Only because you're taking Windows conventions and trying
to apply them to Unix.

That's like somebody from the USA visiting Britain and
thinking "OMG! These people are all going to get themselves
killed, they're driving on the wrong side of the road!"

--
Greg

eryk sun

unread,
Dec 6, 2016, 7:40:26 AM12/6/16
to
On Tue, Dec 6, 2016 at 2:51 AM, Nathan Ernst <nathan...@gmail.com> wrote:
> On Mon, Dec 5, 2016 at 8:44 PM, Steve D'Aprano <steve+...@pearwood.info>
> wrote:
>> On Tue, 6 Dec 2016 10:09 am, eryk sun wrote:
>>
>> > The default Windows shell is "cmd.exe", and it's informally called the
>> > "Command Prompt",
>>
>> Thanks for the correction, I always mix up cmd/command . exe/com. I fear
>> this won't be the last time either -- I wish there was a good mnemonic for
>> which is which.

There are a few executables that end in .com: chcp.com, format.com,
mode.com, more.com, and tree.com. These are 32-bit / 64-bit PE/COFF
binaries, the same as any other Windows executable.

> Ify ou're running on Windows 10, at least, you can soon purge that memory.
> command.com doesn't exist (may never have existed on Win2k, XP, Vista, 7,
> 8, 8.1 or 10). If I try and run either "command" or "command.com" from
> Win10, both say command cannot be found.

Only 32-bit versions of Windows include the NT Virtual DOS Machine
(ntvdm.exe) for running 16-bit DOS programs. Such programs expect a
8086 real-mode execution environment, in which the DOS kernel hooks
interrupt 0x21 for its system-call interface. To provide this
environment, NTVDM uses a virtual 8086 mode monitor that's built into
the 32-bit kernel.

x86-64 long mode doesn't allow switching the CPU to v86 mode, so NTVDM
isn't available in 64-bit Windows. In this case the kernel's entry
point for VDM control is hard coded to return STATUS_NOT_IMPLEMENTED
(0xC0000002), as the following disassembly shows:

lkd> u nt!NtVdmControl
nt!NtVdmControl:
fffff801`ffff4710 b8020000c0 mov eax,0C0000002h
fffff801`ffff4715 c3 ret

> IIRC, command.com was a relic of Win9x running on top of DOS and was a
> 16-bit executable, so inherently crippled (and probably never support by
> the NT kernel).

COMMAND.COM is a 16-bit DOS program, which was the "DOS prompt" in
Windows 3.x and 9x. The versions of Windows that ran on DOS had a
complicated design (in 386 Enhanced Mode) that used a virtual 8086
monitor that ran in 32-bit protected mode. As far back as Windows 3.1,
Microsoft started replacing some DOS system services with
functionality implemented in 32-bit VxDs. Some among us may recall the
big performance improvement that 32-bit disk access provided in
Windows for Workgroups 3.11.

In Windows 9x most DOS system calls were implemented in 32-bit
protected mode VxDs; they even reflected calls in v86 mode up to the
32-bit implementation. Thus much of the implementation underlying the
Win32 API used 32-bit code in protected mode. That said, initially in
Windows 95 there were still a lot of Win32 API calls that ended up
executing real-mode 16-bit DOS calls in the system VM. The book
"Unauthorized Windows 95" analyzes this in detail.

> Whereby cmd.exe coexisted but ran in a 32-bit context.

cmd.exe (command prompt) is a Windows application -- for the most
part, though it does go beyond the Windows API to peek at the NT
process environment block (PEB) of child processes. It was ported to
but never distributed with Windows 9x. On Windows 9x you instead had
an actual DOS prompt that ran COMMAND.COM in virtual 8086 mode.

Jon Ribbens

unread,
Dec 6, 2016, 8:07:12 AM12/6/16
to
To be fair, literally just now I couldn't run the command I wanted to:

sed -i -e 's/foo/bar/g' /dir/ectory/*.txt

because there were 300,000 files matching the glob, and 'sed' can't
cope with that many command-line arguments. I had to do this instead:

find /dir/ectory -name '*.txt' -exec sed -i -e 's/foo/bar/g' {} \;

(Yes, I could also have done something with 'xargs' instead, which
would've been slower to write and quicker to run.)

However please don't take that to mean I agree with BartC - he's
clearly just reacting with instinctive hostility to the unknown.

eryk sun

unread,
Dec 6, 2016, 8:09:30 AM12/6/16
to
On Tue, Dec 6, 2016 at 12:26 PM, Chris Angelico <ros...@gmail.com> wrote:
> On Tue, Dec 6, 2016 at 10:56 PM, BartC <b...@freeuk.com> wrote:
>> In that directory (which was on Windows but accessible via a virtual Linux),
>> typing any Linux command followed by * would have required all 3.4 million
>> directory entries to be accessed in order to build a 3.4 million-element
>> argv list. I've no idea how long that would have taken.
>
> I just asked Python to build me a 4-million-element list, and it took
> no visible time - a small fraction of a second. Don't be afraid of
> large argument lists. We're not writing 8088 Assembly Language
> programs in 64KB of working memory here.

The problem isn't building an arbitrary list with millions of
elements. The problem is the time it would take to read millions of
filenames from a directory. It depends on the performance of the disk
and filesystem. Be careful with globbing. Think about the consequences
before running a command, especially if you're in the habit of
creating directories with hundreds of thousands, or millions, of
files. It's not a problem that I've ever had to deal with.

BartC

unread,
Dec 6, 2016, 8:25:40 AM12/6/16
to
On 06/12/2016 12:40, Gregory Ewing wrote:
> BartC wrote:
>> I've given a dozen examples where the shell's auto-expansion can screw
>> things up.
>
> Only because you're taking Windows conventions and trying
> to apply them to Unix.

What, the convention of NOT assuming that any command parameter that
uses * or ? MUST refer to whatever set of filenames happen to match in
the current directory? And to then insert N arbitrary filenames in the
parameter list.

That's a pretty good convention, yes?!

(Or should that be yesx! yesy!)

> That's like somebody from the USA visiting Britain and
> thinking "OMG! These people are all going to get themselves
> killed, they're driving on the wrong side of the road!"

Or someone from Britain visiting the USA and saying OMG, everyone's got
a gun! Suppose someone runs amok and shoots everybody!

--
Bartc

BartC

unread,
Dec 6, 2016, 8:28:56 AM12/6/16
to
Haven't people been saying that Unix has been doing this for 40 years?

Some systems /did/ have little memory, and while there won't have been
the capacity for that many files, some file devices were slow.

--
Bartc

Chris Angelico

unread,
Dec 6, 2016, 8:34:53 AM12/6/16
to
On Wed, Dec 7, 2016 at 12:25 AM, BartC <b...@freeuk.com> wrote:
> What, the convention of NOT assuming that any command parameter that uses *
> or ? MUST refer to whatever set of filenames happen to match in the current
> directory? And to then insert N arbitrary filenames in the parameter list.
>
> That's a pretty good convention, yes?!

Right! And while you're at it, stop assuming that percent signs have
meaning, that quotes have meaning, etc, etc, etc. Right? And why
should the enter key be significant - what if you want to send more
than one line of command line arguments?

ChrisA

BartC

unread,
Dec 6, 2016, 8:52:33 AM12/6/16
to
But those would be silly.

Some special syntax is known about: | < and > for example. % less so
(I've never, ever used it in live input AFAIK).

This auto-expansion causes so many problems, places so many restrictions
on what can be done, that it doesn't make sense. Auto-expanding
parameters is such a big deal, that it should not be the default
behaviour; it needs something to tell the command processor to expand.

Then you don't get utterly ridiculous and dangerous behaviour such as
the cp example Paul Moore came up with (that trumps most of mine actually):

Start with a directory containing two files c.c and d.c. You want to
copy all .c files elsewhere, but accidentally type this:

> cp *.c

which ends up doing:

> cp c.c d.c

cp (or a program wanting to do anything like this) expects two arguments
to be entered. But with auto-expansion, that is impossible to determine.

And the justification? Well, %ENVIRONMENTVARIABLE% gets converted in
Windows, so why not?!

--
Bartc

Paul Moore

unread,
Dec 6, 2016, 9:25:51 AM12/6/16
to
On Tuesday, 6 December 2016 13:25:40 UTC, BartC wrote:
> On 06/12/2016 12:40, Gregory Ewing wrote:
> > BartC wrote:
> >> I've given a dozen examples where the shell's auto-expansion can screw
> >> things up.
> >
> > Only because you're taking Windows conventions and trying
> > to apply them to Unix.
>
> What, the convention of NOT assuming that any command parameter that
> uses * or ? MUST refer to whatever set of filenames happen to match in
> the current directory? And to then insert N arbitrary filenames in the
> parameter list.

Correct - with the exception that it's not that it MUST - there's ways to prevent that happening, just ones that aren't the default.

> That's a pretty good convention, yes?!

Yes. It has its benefits.

> (Or should that be yesx! yesy!)

No, because the rules for text in an email are different from those for text in a shell. But you seem to be arguing that the rules should be the same everywhere, so maybe in your world, yes it should be. Most other people understand the concept of context, though.

Paul

Random832

unread,
Dec 6, 2016, 9:50:10 AM12/6/16
to
On Mon, Dec 5, 2016, at 21:21, Dennis Lee Bieber wrote:
> They are languages in their own right, with their own rules.
>
> The Windows command prompt being one of the weakest -- it doesn't
> support arithmetic and local variables, nor (to my knowledge) looping
> constructs. BAT files are limited to something like 9 parameters (which
> may
> be the only argument for not expanding wildcards at the command line
> level).

There are only nine that you can name explicitly, but there's no
obstacle to handling more with shift or %*. Also, there is a 'for' loop,
though the syntax is somewhat arcane (and you can always loop with
if/goto)

It can do arithmetic with 'set /a', and there is a 'setlocal' command
for local variable scopes.

Random832

unread,
Dec 6, 2016, 10:06:11 AM12/6/16
to
Yes but there was no* 32-bit windows command interpreter - it ran DOS in
a virtual machine inside it. Windows 3 did the same, actually - the real
architecture of Windows/386 was a 32-bit protected mode host kernel
called VMM32.VXD that ran all of Windows in one virtual machine and each
DOS window in another one, leading to the odd consequence of there being
cooperative multitasking between Windows apps but pre-emptive
multitasking between DOS apps [and between them and Windows].

The fact that Windows was launched at boot by running "win.com" (either
in autoexec.bat or manually at the command line) created a *perception*
that windows ran "on top of DOS", but running it really *replaced* DOS
in memory, putting the CPU into protected mode and everything. The
ability to boot into (or exit to) DOS was because people still did real
work (and games) in DOS and the virtual environment of DOS-on-Windows
didn't perform well enough to be sufficient.

*Well, I vaguely remember a version of cmd.exe that would run on Windows
98 floating around back in the day, but it certainly didn't come with
the OS. It might have been a pirated Windows NT component.

Michael Torrie

unread,
Dec 6, 2016, 10:45:14 AM12/6/16
to
On 12/06/2016 04:43 AM, BartC wrote:
>> Read the Python documentation for argparse
>
> I just tried it, but it was too complex for me to set it up so as to
> discover with it did with * arguments.
>
>> Again, start with argparse... Any command line argument that is left
>> after it has parsed the line can likely be considered a "filename".
>
> Only at the end?

No, that's not what he said. After arguments have been parsed out and
dealt with, whatever is left can be retrieved as the parameters (whether
those are filenames or urls or something. All remaining parameters.
Wherever they appeared. Some argument parsers do require all arguments
to be first on the command line. argparse is not one of them. BSD
tools typically do want args first. And actually a lot of windows
applications are extremely picky about where the arguments come vs the
"filespec" parameters.

> And to
>> handle the difference between Windows and UNIX you'd likely need something
>> like:
>>
>> for aParm in remainingArguments:
>> for aFile in glob.glob(aParm):
>> do something with the file
>
> Suppose any argument contains * or ?, isn't a filename, but happens to
> match some files in the current directory. AFAICS it will still screw up.

Precisely! And you can bet there is probably more than one Windows
program out there that incorrectly makes this assumption and does the
wrong thing. Or the opposite is true and there are programs that expect
no globs and can do nothing with them. And for such a buggy program
there's not a darn thing the user can do to escape the glob or otherwise
tell the program it's not a glob.

That's why I would far rather place globbing in control of the shell
where a user can properly deal with it, escape it, or otherwise disable
it when necessary.

Yes shell expansion has it's gotchas. But those can all be learned,
whereas it's much harder to learn and remember all the gotchas and bugs
of many individual applications' unique ways of dealing with globs. I'd
rather deal with shells.

Michael Torrie

unread,
Dec 6, 2016, 11:34:10 AM12/6/16
to
On 12/06/2016 06:52 AM, BartC wrote:
> But those would be silly.

But why?

> Some special syntax is known about: | < and > for example. % less so
> (I've never, ever used it in live input AFAIK).

Yup and why would you think the ? * special syntax is not known about or
should be known about? Very strange that you would treat them so
differently.

By the way I use %% environment variable expansion all the time in
Windows. In fact I most use it in file open dialog boxes or in the Run
dialog. If I want to see my home folder in a hurry, I'll type Win-R and
then put "%userprofile%" in the box and hit enter. Very convenient. For
me it's faster to then to browse through explorer and pick folders.
Also it should work regardless of locale, and even if folder names are
localized.

> This auto-expansion causes so many problems, places so many restrictions
> on what can be done, that it doesn't make sense. Auto-expanding
> parameters is such a big deal, that it should not be the default
> behaviour; it needs something to tell the command processor to expand.

Yet you seem to be unable to see that applications doing their own
expansion can also cause problems and restrictions and even worse,
there's not a darn thing you as a user can do about it.

>
> Then you don't get utterly ridiculous and dangerous behaviour such as
> the cp example Paul Moore came up with (that trumps most of mine actually):

It's potentially dangerous agreed. So are lots of commands like rm -rf
/ (which some shells will ask you about). If you understand a few basic
rules of expansion, you can understand easily what happened or would
happen. I'm not sure but I think many distros by default in the shell
profiles alias cp="cp -i" and rm="rm -i" to help newbies. I know the
root account has those aliases by default. I'm pretty sure I've disabled
those aliases for my personal user account because they get in the way
of a lot of my common operations.

Again, I point out that these behaviors can be changed and altered by
the user if he so desires. Right at the shell level. Instead of having
to alter applications themselves that aren't too smart about things.



BartC

unread,
Dec 6, 2016, 11:41:59 AM12/6/16
to
On 06/12/2016 15:44, Michael Torrie wrote:
> On 12/06/2016 04:43 AM, BartC wrote:

> Yes shell expansion has it's gotchas. But those can all be learned,

Yes, learn to avoid wildcards in command parameters at all costs. But we
both know that is not satisfactory.

And it's not much help when someone types in:

program *

and the program has to try and clean up the mess, if it can see it is a
mess. But remember:

cp *.c

There might be some irate users out there if it can't detect a simple
user error like that.

> whereas it's much harder to learn and remember all the gotchas and bugs
> of many individual applications' unique ways of dealing with globs. I'd
> rather deal with shells.
>

OK, I understand. Having a program launcher indiscriminately transform
input A into A' is /much/ better than having it done by the program,
even if the program doesn't want it and it is not meaningful.

The fact that you also lose format information (is it three parameters,
or one parameter transformed into three) is an extra bonus.

This is clearly much better than any other scheme because:

(1) It's Unix not Windows; everything in Unix is always better and
always make sense.

(2) There have been 40 years of it working this way and there have never
been any problems. (That is, 40 years of writing programs with
stultified command line interfaces to make sure that is always the case.
[As for 'problems' such as the 'cp *.c' one, that's a feature!]

--
Bartc

MRAB

unread,
Dec 6, 2016, 11:53:53 AM12/6/16
to
On 2016-12-06 13:08, eryk sun wrote:
> On Tue, Dec 6, 2016 at 12:26 PM, Chris Angelico <ros...@gmail.com> wrote:
>> On Tue, Dec 6, 2016 at 10:56 PM, BartC <b...@freeuk.com> wrote:
>>> In that directory (which was on Windows but accessible via a virtual Linux),
>>> typing any Linux command followed by * would have required all 3.4 million
>>> directory entries to be accessed in order to build a 3.4 million-element
>>> argv list. I've no idea how long that would have taken.
>>
>> I just asked Python to build me a 4-million-element list, and it took
>> no visible time - a small fraction of a second. Don't be afraid of
>> large argument lists. We're not writing 8088 Assembly Language
>> programs in 64KB of working memory here.
>
> The problem isn't building an arbitrary list with millions of
> elements. The problem is the time it would take to read millions of
> filenames from a directory. It depends on the performance of the disk
> and filesystem. Be careful with globbing. Think about the consequences
> before running a command, especially if you're in the habit of
> creating directories with hundreds of thousands, or millions, of
> files. It's not a problem that I've ever had to deal with.
>
Many years ago I was working with a database application running on
MSDOS that stored predefined queries in files, one query per file. There
were many queries (though fewer than a thousand), resulting in many
small files in a single directory. Fetching one of those predefined
queries was surprisingly slow.

I found that it was a lot faster to put them into a single file and then
call an external program to extract the one wanted. It also took up a
lot less disk space!

MRAB

unread,
Dec 6, 2016, 12:00:38 PM12/6/16
to
On 2016-12-06 13:52, BartC wrote:
> On 06/12/2016 13:34, Chris Angelico wrote:
>> On Wed, Dec 7, 2016 at 12:25 AM, BartC <b...@freeuk.com> wrote:
>>> What, the convention of NOT assuming that any command parameter that uses *
>>> or ? MUST refer to whatever set of filenames happen to match in the current
>>> directory? And to then insert N arbitrary filenames in the parameter list.
>>>
>>> That's a pretty good convention, yes?!
>>
>> Right! And while you're at it, stop assuming that percent signs have
>> meaning, that quotes have meaning, etc, etc, etc. Right? And why
>> should the enter key be significant - what if you want to send more
>> than one line of command line arguments?
>
> But those would be silly.
>
> Some special syntax is known about: | < and > for example. % less so
> (I've never, ever used it in live input AFAIK).
>
[snip]

You've never used ImageMagick?

If you want to shrink an image to half its size:

convert dragon.gif -resize 50% half_dragon.gif

BartC

unread,
Dec 6, 2016, 12:36:38 PM12/6/16
to
No. But that '50%' is seen by apps (eg. with Python's 'print sys.argv')
on both Windows and Linux.

However, I can imagine that a calculator app might have trouble with
some expressions:

calculate 3*5

This expression might be seen as 345 if there happens to be file called
'345' lying around.

--
Bartc

Lew Pitcher

unread,
Dec 6, 2016, 1:00:52 PM12/6/16
to
On Tuesday December 6 2016 12:36, in comp.lang.python, "BartC" <b...@freeuk.com>
wrote:

> On 06/12/2016 17:00, MRAB wrote:
>> On 2016-12-06 13:52, BartC wrote:
>
>>> Some special syntax is known about: | < and > for example. % less so
>>> (I've never, ever used it in live input AFAIK).
>>>
>> [snip]
>>
>> You've never used ImageMagick?
>>
>> If you want to shrink an image to half its size:
>>
>> convert dragon.gif -resize 50% half_dragon.gif
>
> No. But that '50%' is seen by apps (eg. with Python's 'print sys.argv')
> on both Windows and Linux.
>
> However, I can imagine that a calculator app might have trouble with
> some expressions:
>
> calculate 3*5

I can't see this being an issue with the "calculator app", unless the
calculator app itself is written to perform file globbing.

It /might/ be an issue with the shell, if you permit it to glob
the "calculator app" arguments before invoking the "calculator app" binary.
But, then again, Unix shell filename globbing is something that the user can
disable temporarily or permanently if necessary.

For example:
calculate '3*5'
or
sh -o noglob -c "calculate 3*5"
or even
sh -o noglob
calculate 3*5

> This expression might be seen as 345 if there happens to be file called
> '345' lying around.

Only if shell globbing is enabled, and you don't specifically bypass it.


--
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

Gregory Ewing

unread,
Dec 6, 2016, 4:28:17 PM12/6/16
to
BartC wrote:
> What, the convention of NOT assuming that any command parameter that
> uses * or ? MUST refer to whatever set of filenames happen to match in
> the current directory?

Yes.

> That's a pretty good convention, yes?!

That's a matter of opinion. It precludes the shell from
performing various services that Unix users like their
shells to provide.

> Or someone from Britain visiting the USA and saying OMG, everyone's got
> a gun! Suppose someone runs amok and shoots everybody!

If you expect us to trust most Americans not to misuse
their guns, you will appreciate us asking you to trust
most Unix users not to misuse their wildcard patterns.

--
Greg

Gregory Ewing

unread,
Dec 6, 2016, 4:44:18 PM12/6/16
to
BartC wrote:
> But those would be silly.
>
> Some special syntax is known about: | < and > for example. % less so

What you need to understand is that, to a Unix user,
* and ? are *just as well known* as |, < and >. Perhaps
even more so, because they're likely to be used much
sooner than piping and redirection. And when learning
about them, the fact that they're interpreted by the
shell is learned at the same time.

> And the justification? Well, %ENVIRONMENTVARIABLE% gets converted in
> Windows, so why not?!

No, the justification is that the Unix convention allows
the shell to provide certain useful functions that Unix
users value.

If you don't want those functions, you're free to write
your own shell that works however you want. Complaining
that everyone *else* should want the same things you
want is not reasonable.

--
Greg

Cameron Simpson

unread,
Dec 6, 2016, 5:03:46 PM12/6/16
to
On 06Dec2016 16:41, BartC <b...@freeuk.com> wrote:
>On 06/12/2016 15:44, Michael Torrie wrote:
>>On 12/06/2016 04:43 AM, BartC wrote:
>>Yes shell expansion has it's gotchas. But those can all be learned,
>
>Yes, learn to avoid wildcards in command parameters at all costs. But
>we both know that is not satisfactory.

Sigh. I'm sure this has all been thrashed out in this huge thread, but: that is
what quoting is for (in the shell): to control whether wildcards are expanded
or not. You, the typist, get to decide.

>And it's not much help when someone types in:
>
> program *
>
>and the program has to try and clean up the mess, if it can see it is
>a mess.

Some invocations will be nonsense, and a program may catch those. But if that
was actually the typist's intent, and the program says "nah!"?

>But remember:
>
> cp *.c
>
>There might be some irate users out there if it can't detect a simple
>user error like that.

If there are _2_ .c files present, that will indeed misbehave. But often there
are several, and cp detects the "copy several things to one target" mode for:

cp a.c b.c d.c target-dir

and says the final target (eg "d.c") is not a directory. A degree of safety.
This is the circumstance where the request, as it is received, is nonsense and
detectably so. Not perfectly robust, but you can never be perfectly robust
against the typist making a mistaken request, globbing or not.

>>whereas it's much harder to learn and remember all the gotchas and bugs
>>of many individual applications' unique ways of dealing with globs. I'd
>>rather deal with shells.
>
>OK, I understand. Having a program launcher indiscriminately transform
>input A into A' is /much/ better than having it done by the program,
>even if the program doesn't want it and it is not meaningful.

It isn't indiscriminate; quoting lets the typist discriminate. Michael's point
is that at least you're always using the _same_ command line conventions about
what shall be expanded and what shall not, rather than a rule per app.

>The fact that you also lose format information (is it three parameters, or one
>parameter transformed into three) is an extra bonus.

Feh. If *.c matches 3 files and I didn't quote, it is 3 parameters in my mind.
If I wanted one parameter I'd have said '*.c' (<== quotes).
>
>This is clearly much better than any other scheme because:
>
>(1) It's Unix not Windows; everything in Unix is always better and
>always make sense.

UNIX rules are often simpler and more consistent.

>(2) There have been 40 years of it working this way and there have
>never been any problems. (That is, 40 years of writing programs with
>stultified command line interfaces to make sure that is always the case. [As
>for 'problems' such as the 'cp *.c' one, that's a feature!]

Nothing prevents you writing an extremely simple shell yourself you know. It
needn't expand anything. _Or_ you could have it adopt the inverse convention:
expand nothing unless asked. Eg:

cp G:*.c

to cause "*.c" to get expanded. Of course, because the various executables
don't feel any burden to implement globbing you may run into some impedence
mismatch.

Cheers,
Cameron Simpson <c...@zip.com.au>

Chris Angelico

unread,
Dec 6, 2016, 5:24:54 PM12/6/16
to
On Wed, Dec 7, 2016 at 8:46 AM, Cameron Simpson <c...@zip.com.au> wrote:
> Nothing prevents you writing an extremely simple shell yourself you know. It
> needn't expand anything. _Or_ you could have it adopt the inverse
> convention: expand nothing unless asked. Eg:
>
> cp G:*.c
>
> to cause "*.c" to get expanded. Of course, because the various executables
> don't feel any burden to implement globbing you may run into some impedence
> mismatch.

I actually have something like this in one application's inbuilt
command executor - it does globbing ONLY if the parameter starts with
a question mark (eg "?*.c" will glob "*.c" in that directory). It's
deliberate, but frankly, it ends up annoying more often than not. I'm
considering switching to a more normal way of doing it.

ChrisA
It is loading more messages.
0 new messages