Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

quotearg.c's shell_quoting_style and MinGW

31 views
Skip to first unread message

Eli Zaretskii

unread,
May 5, 2012, 12:54:07 PM5/5/12
to bug-g...@gnu.org, bug-gn...@gnu.org
When quoting a shell command argument (using shell_quoting_style or
shell_always_quoting_style), quotearg.c uses the '..' style of
quoting. This doesn't work for MinGW, which supports only the ".."
style. This causes, e.g., diff3 to fail for file names with embedded
whitespace or other special characters, when compiled for MinGW.

The changes to quotearg.c to support this are quite minor. The
following somewhat kludgey change demonstrates what needs to be done.
Is there a more elegant way?

2012-05-05 Eli Zaretskii <el...@gnu.org>

* lib/quotearg.c (shell_quoting_char): New static variable.
[__MINGW32__]: Set shell_quoting_char to '"'.
(quotearg_buffer_restyled): Use shell_quoting_char instead of
literal '\''.

--- lib/quotearg.c~0 2011-05-18 21:31:30.000000000 +0300
+++ lib/quotearg.c 2012-05-05 19:40:41.904625000 +0300
@@ -91,6 +91,13 @@ enum quoting_style const quoting_style_v
/* The default quoting options. */
static struct quoting_options default_quoting_options;

+/* The (system-dependent) shell quoting character. */
+#ifdef __MINGW32__
+static int shell_quoting_char = '"';
+#else
+static int shell_quoting_char = '\'';
+#endif
+
/* Allocate a new set of quoting options, with contents initially identical
to O if O is not null, or to the default if O is null.
It is the caller's responsibility to free the result. */
@@ -286,8 +293,11 @@ quotearg_buffer_restyled (char *buffer,
/* Fall through. */
case shell_always_quoting_style:
if (!elide_outer_quotes)
- STORE ('\'');
- quote_string = "'";
+ STORE (shell_quoting_char);
+ if (shell_quoting_char == '\'')
+ quote_string = "'";
+ else
+ quote_string = "\"";
quote_string_len = 1;
break;

@@ -389,7 +399,10 @@ quotearg_buffer_restyled (char *buffer,
case '\r': esc = 'r'; goto c_and_shell_escape;
case '\t': esc = 't'; goto c_and_shell_escape;
case '\v': esc = 'v'; goto c_escape;
- case '\\': esc = c;
+ case '\\':
+ if (shell_quoting_char == '"')
+ break;
+ esc = c;
/* No need to escape the escape if we are trying to elide
outer quotes and nothing else is problematic. */
if (backslash_escapes && elide_outer_quotes && quote_string_len)
@@ -429,13 +442,22 @@ quotearg_buffer_restyled (char *buffer,
be the first bytes of multibyte characters, which means
we should check them with mbrtowc, but in practice this
doesn't happen so it's not worth worrying about. */
- if (quoting_style == shell_always_quoting_style
- && elide_outer_quotes)
- goto force_outer_quoting_style;
+ if (quoting_style == shell_always_quoting_style)
+ {
+ if (elide_outer_quotes)
+ goto force_outer_quoting_style;
+ if (c == '"' && c == shell_quoting_char)
+ {
+ STORE ('"');
+ STORE ('\\');
+ STORE ('"');
+ }
+ }
break;

case '\'':
- if (quoting_style == shell_always_quoting_style)
+ if (quoting_style == shell_always_quoting_style
+ && shell_quoting_char == '\'')
{
if (elide_outer_quotes)
goto force_outer_quoting_style;

Bruno Haible

unread,
May 5, 2012, 2:08:45 PM5/5/12
to bug-g...@gnu.org, Eli Zaretskii, bug-gn...@gnu.org
Eli Zaretskii wrote:
> When quoting a shell command argument (using shell_quoting_style or
> shell_always_quoting_style), quotearg.c uses the '..' style of
> quoting. This doesn't work for MinGW, which supports only the ".."
> style. This causes, e.g., diff3 to fail for file names with embedded
> whitespace or other special characters, when compiled for MinGW.

What exactly did not work? Does 'diff3' give an error message that
it cannot find the file? Does a unit test from the diffutils package
not work? Does 'diff3' produce an output that is incompatible with
other tools (such as 'patch' or 'emacs')?

In other words, please state a "How to reproduce" recipe.

> +/* The (system-dependent) shell quoting character. */
> +#ifdef __MINGW32__

This #ifdef is true for mingw but false for native Windows executables
built with MSVC. Is that what you intended?

> + if (quoting_style == shell_always_quoting_style)
> + {
> + if (elide_outer_quotes)
> + goto force_outer_quoting_style;
> + if (c == '"' && c == shell_quoting_char)
> + {
> + STORE ('"');
> + STORE ('\\');
> + STORE ('"');
> + }
> + }

No tabs in gnulib source code, please. See gnulib/README for a way
to avoid tabs.

Bruno


Eli Zaretskii

unread,
May 5, 2012, 3:14:04 PM5/5/12
to Bruno Haible, bug-gn...@gnu.org, bug-g...@gnu.org
> From: Bruno Haible <br...@clisp.org>
> Cc: bug-gn...@gnu.org
> Date: Sat, 05 May 2012 20:08:45 +0200
>
> Eli Zaretskii wrote:
> > When quoting a shell command argument (using shell_quoting_style or
> > shell_always_quoting_style), quotearg.c uses the '..' style of
> > quoting. This doesn't work for MinGW, which supports only the ".."
> > style. This causes, e.g., diff3 to fail for file names with embedded
> > whitespace or other special characters, when compiled for MinGW.
>
> What exactly did not work? Does 'diff3' give an error message that
> it cannot find the file? Does a unit test from the diffutils package
> not work? Does 'diff3' produce an output that is incompatible with
> other tools (such as 'patch' or 'emacs')?

Diff3 failed with an error message.

> In other words, please state a "How to reproduce" recipe.

Here:

D:\gnu\diffutils-3.2\src>.\diff3 ../../diffutils-2.8.7/src/system.h system.h "sys tem.$h$"
diff: extra operand `tem.$h$''
diff: Try `diff --help' for more information.
D:\gnu\diffutils-3.2\src/./diff3.exe: subsidiary program `diff' failed (exit status 2)

> > +/* The (system-dependent) shell quoting character. */
> > +#ifdef __MINGW32__
>
> This #ifdef is true for mingw but false for native Windows executables
> built with MSVC. Is that what you intended?

I only tested the code with MinGW. But I don't mind using

#if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__

instead, if you are okay with assuming it will work with MSVC.

> > + if (quoting_style == shell_always_quoting_style)
> > + {
> > + if (elide_outer_quotes)
> > + goto force_outer_quoting_style;
> > + if (c == '"' && c == shell_quoting_char)
> > + {
> > + STORE ('"');
> > + STORE ('\\');
> > + STORE ('"');
> > + }
> > + }
>
> No tabs in gnulib source code, please.

Sorry, I didn't know that. Do you need me to submit TAB-less diffs?

Thanks.

Paul Eggert

unread,
May 6, 2012, 2:22:13 AM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/05/2012 09:54 AM, Eli Zaretskii wrote:
> When quoting a shell command argument (using shell_quoting_style or
> shell_always_quoting_style), quotearg.c uses the '..' style of
> quoting. This doesn't work for MinGW, which supports only the ".."
> style. This causes, e.g., diff3 to fail for file names with embedded
> whitespace or other special characters, when compiled for MinGW.

Unfortunately quotearg is used not just to quote
strings for the shell, but also to quote strings POSIX-shell-style
for stdout. That is, in practice shell_quoting_style doesn't mean
"quote this safely for whatever shell the current system happens to
be using", it means "quote this safely for POSIX shell syntax".

I suggest instead putting a wrapper around popen and system,
so that they translate quotes from the POSIX syntax to the
syntax expected by mingw. That would fix the problem with diff,
and it would be a more-useful fix anyway, as it would catch
instances of this problem that can occur even when quotearg is
not used. This wrapper would be at the gnulib level.

Bruno Haible

unread,
May 6, 2012, 7:49:14 AM5/6/12
to Eli Zaretskii, Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
Eli Zaretskii wrote:
> D:\gnu\diffutils-3.2\src>.\diff3 ../../diffutils-2.8.7/src/system.h system.h "sys tem.$h$"
> diff: extra operand `tem.$h$''
> diff: Try `diff --help' for more information.
> D:\gnu\diffutils-3.2\src/./diff3.exe: subsidiary program `diff' failed (exit status 2)

Thanks. This piece of info is essential for understanding and analyzing
the problem.

diff3 receives the arguments as expected (otherwise it would have
complained about too many arguments).

diff3 invokes diff, in function read_diff(), file src/diff3.c.
Similar code also is in function edit(), file src/sdiff.c,
and in function begin_output(), file src/util.c.
In doing so, it prepares a command line, by using shell_quote() from gnulib,
and passes this command to popen() or system().

And shell_quote() happens to invoke quotearg with shell_quoting_style.

This works fine on Unix, where popen() and system() pass the command to
/bin/sh.

But on native Windows, there are two ways to execute commands:
a) through the Windows function CreateProcess(); this is what gnulib's
modules 'execute' and 'spawn-pipe' do.
b) through the cmd.exe program; this is what popen() and system() do.

Both need different quoting.
- For a), you find an implementation in gnulib/lib/w32spawn.h.
- For b), it will be even more complicated, because arguments have to
be passed through a first CreateProcess() (from diff3 to cmd.exe),
are then subject to parsing by cmd.exe, and are then passed through
a second CreateProcess() call (from cmd.exe to diff).

Modifying quotearg.c, like you propose, would have an effect
- on quotearg, but as Paul said, quotearg's behaviour assumes a POSIX sh,
- on sh-quote, but here too the comments in sh-quote.h indicate a POSIX
shell.
In particular, there is another use of shell_quote() in src/diff.c,
function option_list(). It is used here:

$ mkdir foo ; echo Hi > foo/text
$ mkdir gaz ; echo Hello > gaz/text
$ diff -r -x foo\ bar foo gaz
diff -r -x 'foo bar' foo/text gaz/text
1c1
< Hi
---
> Hello

Your proposed patch would lead to a different 'diff' output on native Windows.
Which is obviously undesirable.

I can see two possible actions:

1) Change the three locations in src/diff3.c, src/sdiff.c, src/util.c to
use the gnulib modules 'execute' and 'spawn-pipe'.

This will have the following advantages:
- It will fix the quoting problem at the right place.
- It will speed up the subprocess invocation, by bypassing cmd.exe.
- If you apply these changes also to the HAVE_WORKING_FORK code branch,
it will speed up the subprocess invocation also on Unix, by
bypassing /bin/sh.

A command interpreter like /bin/sh or cmd.exe is useful when a command
has redirections or may be invoking a script. But here you know that
the command is a simple program invocation without redirections and that
the program being invoked (diff.exe) is not a script. Therefore it is
possible to optimize it.

2) If you want to keep the unoptimized subprocess invocation through
popen() and system(), it would be possible to add two quoting styles
to quotearg: one for a), one for b). And add gnulib API, similar to
the 'sh-quote' module, that uses the quoting style b).

Paul, what is your preference? Mine would be 1).

> I only tested the code with MinGW. But I don't mind using
>
> #if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
>
> instead, if you are okay with assuming it will work with MSVC.

Yes, this is the right #if for native Windows. Nothing in popen() or
system() is specific to mingw (as opposed to MSVC).

Bruno


Paul Eggert

unread,
May 6, 2012, 12:23:31 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/06/2012 04:49 AM, Bruno Haible wrote:

> 1) Change the three locations in src/diff3.c, src/sdiff.c, src/util.c to
> use the gnulib modules 'execute' and 'spawn-pipe'....

This would add (by my count) 72 source files to diff, and would
add some complexity to diffutils proper. I'd rather keep things
simple if we can (my most recent change to diff was to *remove* a
gnulib module, yay! ...), or at least substitute a simpler set of
gnulib modules that would require less fiddling with diffutils
proper.

> - If you apply these changes also to the HAVE_WORKING_FORK code branch,
> it will speed up the subprocess invocation also on Unix, by
> bypassing /bin/sh.

/bin/sh is already bypassed on Unix, since the code invokes execvp
or execl in that case, so I don't see how these changes would
benefit Unix performance.

The motivation for the existing code is to have something that is
simple and easy to understand in portable terms and works
efficiently on GNU and similar systems, and to do something
similar (though efficiency is less of a concern) on systems that
lack 'fork'. These other systems include not just Microsoft
Windows, but also other platforms such as OpenVMS.

What about the other idea, to add gnulib support for popen and for
system to have them quote properly for mingw? That would require
no changes to diffutils proper (other than a one-line addition to
bootstrap.conf) and would presumably need far fewer than 72 source
files.

Eli Zaretskii

unread,
May 6, 2012, 12:26:46 PM5/6/12
to Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
> Date: Sat, 05 May 2012 23:22:13 -0700
> From: Paul Eggert <egg...@cs.ucla.edu>
> CC: bug-g...@gnu.org, bug-gn...@gnu.org
>
> Unfortunately quotearg is used not just to quote
> strings for the shell, but also to quote strings POSIX-shell-style
> for stdout. That is, in practice shell_quoting_style doesn't mean
> "quote this safely for whatever shell the current system happens to
> be using", it means "quote this safely for POSIX shell syntax".

I don't understand. MinGW doesn't have a Posix shell, it is a native
Windows environment. So a MinGW program running on Windows does not
need to cater to Posix shells more than a program running on a Posix
platform should cater to a Windows shell. What use case did you have
in mind?

If there are some hypothetical use cases where such a mix makes sense,
they must be so rare that such a program should use the custom quoting
style, or simply not use gnulib for the quoting.

> I suggest instead putting a wrapper around popen and system,
> so that they translate quotes from the POSIX syntax to the
> syntax expected by mingw. That would fix the problem with diff,
> and it would be a more-useful fix anyway, as it would catch
> instances of this problem that can occur even when quotearg is
> not used. This wrapper would be at the gnulib level.

That doesn't sound right. popen and friends don't know enough about
the context to DTRT with quoting. How would they know which style of
quoting is expected from them?

Paul Eggert

unread,
May 6, 2012, 12:41:50 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/06/2012 09:26 AM, Eli Zaretskii wrote:
> a MinGW program running on Windows does not
> need to cater to Posix shells

It's not a question of catering to Posix shells, it's
a question of what the output from 'diff' looks like.
It's better to have 'diff' output be consistent on
all platforms.

> popen and friends don't know enough about
> the context to DTRT with quoting.

On mingw the popen implementation can assume that popen's
argument is quoted suitably for POSIX, since that's the popen API.
The idea is that the caller of popen can assume POSIXish behavior,
and that it's up to the popen implementation to arrange for that.
This is the standard way that gnulib works.

Eli Zaretskii

unread,
May 6, 2012, 12:41:54 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
> From: Bruno Haible <br...@clisp.org>
> Cc: bug-g...@gnu.org, bug-gn...@gnu.org
> Bcc: br...@haible.de
> Date: Sun, 06 May 2012 13:49:14 +0200
>
> diff3 receives the arguments as expected

Yes, of course: it receives them from the command line, correctly
quoted.
> But on native Windows, there are two ways to execute commands:
> a) through the Windows function CreateProcess(); this is what gnulib's
> modules 'execute' and 'spawn-pipe' do.
> b) through the cmd.exe program; this is what popen() and system() do.
>
> Both need different quoting.
> - For a), you find an implementation in gnulib/lib/w32spawn.h.
> - For b), it will be even more complicated, because arguments have to
> be passed through a first CreateProcess() (from diff3 to cmd.exe),
> are then subject to parsing by cmd.exe, and are then passed through
> a second CreateProcess() call (from cmd.exe to diff).

In practice, the processing by cmd.exe doesn't change anything,
because cmd.exe almost always leaves the quotes unaltered. The rare
situation where it doesn't (described by "cmd /?") will at most cause
removal of quotes around the 'diff' program itself, which is harmless,
because cmd will invoke it correctly, and we are always interested in
quoting the rest of arguments.

> Modifying quotearg.c, like you propose, would have an effect
> - on quotearg, but as Paul said, quotearg's behaviour assumes a POSIX sh,

I don't see why quotearg should assume a Posix shell when compiled
into a MinGW program. I'd like to hear what kinds of use cases
require that on MS-Windows.

> - on sh-quote, but here too the comments in sh-quote.h indicate a POSIX
> shell.

Comments should certainly be synchronized with code. I didn't do that
because I wanted to hear agreement with the idea and/or instructions
for additional changes.

> In particular, there is another use of shell_quote() in src/diff.c,
> function option_list(). It is used here:
>
> $ mkdir foo ; echo Hi > foo/text
> $ mkdir gaz ; echo Hello > gaz/text
> $ diff -r -x foo\ bar foo gaz
> diff -r -x 'foo bar' foo/text gaz/text
> 1c1
> < Hi
> ---
> > Hello
>
> Your proposed patch would lead to a different 'diff' output on native Windows.
> Which is obviously undesirable.

A command line such as

diff -r -x foo\ bar foo gaz

is impossible on MS-Windows, because whitespace cannot be
escape-protected with a backslash there, as you well know. The
equivalent of this for Windows is

diff -r -x "foo bar" foo gaz

which will produce the same result as on Posix platforms (unless I'm
missing something).

> I can see two possible actions:
>
> 1) Change the three locations in src/diff3.c, src/sdiff.c, src/util.c to
> use the gnulib modules 'execute' and 'spawn-pipe'.
>
> This will have the following advantages:
> - It will fix the quoting problem at the right place.
> - It will speed up the subprocess invocation, by bypassing cmd.exe.
> - If you apply these changes also to the HAVE_WORKING_FORK code branch,
> it will speed up the subprocess invocation also on Unix, by
> bypassing /bin/sh.

Unfortunately, I lack knowledge and tools to make this happen. I
don't know how to import a gnulib module into a program, and lack some
of the auto-tools to do that. But if you or someone else provide a
patch, I can surely try it.

Thanks.

Eli Zaretskii

unread,
May 6, 2012, 12:52:50 PM5/6/12
to Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
> Date: Sun, 06 May 2012 09:41:50 -0700
> From: Paul Eggert <egg...@cs.ucla.edu>
> CC: bug-g...@gnu.org, bug-gn...@gnu.org
>
> On 05/06/2012 09:26 AM, Eli Zaretskii wrote:
> > a MinGW program running on Windows does not
> > need to cater to Posix shells
>
> It's not a question of catering to Posix shells, it's
> a question of what the output from 'diff' looks like.
> It's better to have 'diff' output be consistent on
> all platforms.

But since 'diff' output can be fed to a shell or to another program,
it should be quoted for those other programs to be able to process it
correctly. I don't see how can these two contradicting requirements
be reconciled.

> > popen and friends don't know enough about
> > the context to DTRT with quoting.
>
> On mingw the popen implementation can assume that popen's
> argument is quoted suitably for POSIX, since that's the popen API.

There was never such an implementation of popen on any non-Posix
platform (nor can there be, at least on Windows, because of the need
to support native file names with backslashes). So such a requirement
is a pipe dream.

Paul Eggert

unread,
May 6, 2012, 1:13:13 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/06/2012 09:52 AM, Eli Zaretskii wrote:
> But since 'diff' output can be fed to a shell or to another program,

It's a judgment call, but in this particular case the output of
diff is intended more for the user than for other programs, and
there is a good argument for consistency in output format regardless
of platform.

> There was never such an implementation of popen on any non-Posix
> platform (nor can there be, at least on Windows, because of the need
> to support native file names with backslashes).

The idea here is not to reimplement popen from scratch in gnulib,
or to have gnulib popen support every feature that POSIX requires.
The idea is only to have a popen wrapper that works assuming that
its argument is quoted for the POSIX shell, using quotearg.
Such arguments have a simple format that can easily be converted
to mingw shell format. Once this simple conversion is done, the
wrapper call the real mingw popen.

This should work with file names that contain backslashes, just
as it'd work for any other character.

Bruno Haible

unread,
May 6, 2012, 1:29:22 PM5/6/12
to Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
Hi Paul,

> > use the gnulib modules 'execute' and 'spawn-pipe'....
>
> This would add (by my count) 72 source files to diff

So what? It is no maintenance burden, because "gnulib-tool --update"
does all a maintainer needs.

> > - If you apply these changes also to the HAVE_WORKING_FORK code branch,
> > it will speed up the subprocess invocation also on Unix, by
> > bypassing /bin/sh.
>
> /bin/sh is already bypassed on Unix, since the code invokes execvp
> or execl in that case, so I don't see how these changes would
> benefit Unix performance.

For src/util.c you are right. But in src/diff3.c and src/sdiff.c, popen()
is invoked, which invokes a shell, and that shell will immediately
fork() and exec() again to execute the 'diff' program.

> The motivation for the existing code is to have something that is
> simple and easy to understand in portable terms and works
> efficiently on GNU and similar systems, and to do something
> similar (though efficiency is less of a concern) on systems that
> lack 'fork'. These other systems include not just Microsoft
> Windows, but also other platforms such as OpenVMS.

OK, then the code structure you would like to have is

#if HAVE_WORKING_FORK
(code for Unix)
#else
(prepare arguments)
#if (defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__
(quote the arguments the Windows way, case b)
#endif
(invoke popen or system)
#endif

Is that right?

> What about the other idea, to add gnulib support for popen and for
> system to have them quote properly for mingw? That would require
> no changes to diffutils proper (other than a one-line addition to
> bootstrap.conf) and would presumably need far fewer than 72 source
> files.

popen() takes a POSIX sh command as argument, possibly with function
definitions, loops, redirections, and what else. You want code that
rewrites that into a cmd.exe command? This cannot be done in less
than a man-year, and will likely also require more than 72 source files.

What is, more generally, on Unix, the advantage of using system()/popen()
rather than the lower-lever, more efficient exec*/spawn-pipe?
It is that you can easily add more complex commands, with redirections,
logging, reference to environment variables, and such. This is useful
for a program that is in rapid active development. But even in such a
case, the commands are in sh syntax. Complex commands, redirections,
logging, reference to environment variables, etc. all have a different
syntax with cmd.exe. So the maximum gnulib can reasonably offer is that
the author of the code writes two variants of the command, one in sh
syntax and one in cmd.exe syntax.

For this reason, I think it may be useful in gnulib to have a Windows-
specific argument quoting function. It has to be Windows specific, since
you mention OpenVMS and I certainly won't wrote VMS specific code again.

Bruno


Eli Zaretskii

unread,
May 6, 2012, 1:29:12 PM5/6/12
to Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
> Date: Sun, 06 May 2012 10:13:13 -0700
> From: Paul Eggert <egg...@cs.ucla.edu>
> CC: bug-g...@gnu.org, bug-gn...@gnu.org
>
> > There was never such an implementation of popen on any non-Posix
> > platform (nor can there be, at least on Windows, because of the need
> > to support native file names with backslashes).
>
> The idea here is not to reimplement popen from scratch in gnulib,
> or to have gnulib popen support every feature that POSIX requires.
> The idea is only to have a popen wrapper that works assuming that
> its argument is quoted for the POSIX shell, using quotearg.
> Such arguments have a simple format that can easily be converted
> to mingw shell format. Once this simple conversion is done, the
> wrapper call the real mingw popen.

You will break gobs of programs that use gnulib, if you implement
this.

Bruno Haible

unread,
May 6, 2012, 1:41:39 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
Eli Zaretskii wrote:
> The rare situation where it doesn't (described by "cmd /?")

Rare or not - the text that you refer to mentions that the following
characters need to be quoted as well when passing through cmd.exe:
&()[]{}^=;!'+,`~
And they forgot to mention at least % which also has an interpretation.

That's why I'm saying that when passing through cmd.exe you need a
more complex quoting.

> > In particular, there is another use of shell_quote() in src/diff.c,
> > function option_list(). It is used here:
> >
> > $ mkdir foo ; echo Hi > foo/text
> > $ mkdir gaz ; echo Hello > gaz/text
> > $ diff -r -x foo\ bar foo gaz
> > diff -r -x 'foo bar' foo/text gaz/text
> > 1c1
> > < Hi
> > ---
> > > Hello
> >
> > Your proposed patch would lead to a different 'diff' output on native Windows.
> > Which is obviously undesirable.
>
> A command line such as
>
> diff -r -x foo\ bar foo gaz
>
> is impossible on MS-Windows

I was assuming a Cygwin or MSYS shell with diff.exe being a native Windows
program.

But you are splitting hairs. The main points is that
1) We don't want to change the default output of 'diff' on Unix.
2) We don't want 'diff' to produce different output on Windows than on
Unix (modulo end-of-line convention),
3) Your patch would not only have had an effect on the popen() and
system() invocations but also on the 'diff' output.

Bruno


Eli Zaretskii

unread,
May 6, 2012, 1:54:48 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
> From: Bruno Haible <br...@clisp.org>
> Cc: egg...@cs.ucla.edu, bug-g...@gnu.org, bug-gn...@gnu.org
> Bcc: br...@haible.de
> Date: Sun, 06 May 2012 19:41:39 +0200
>
> Eli Zaretskii wrote:
> > The rare situation where it doesn't (described by "cmd /?")
>
> Rare or not - the text that you refer to mentions that the following
> characters need to be quoted as well when passing through cmd.exe:
> &()[]{}^=;!'+,`~

Aren't these all handled by quotearg, after the changes that I sent?
If not, which ones did I miss?

> And they forgot to mention at least % which also has an interpretation.

The % character only needs quoting if you have %FOO%, _and_ FOO
happens to be defined as an environment variable. Otherwise, it will
be preserved.

When % does need to be quoted, it cannot be in the way on which
quotearg's design is based, because "%FOO%" will still expand the
variable.

> That's why I'm saying that when passing through cmd.exe you need a
> more complex quoting.

With the exception of %, you don't.

> > A command line such as
> >
> > diff -r -x foo\ bar foo gaz
> >
> > is impossible on MS-Windows
>
> I was assuming a Cygwin or MSYS shell with diff.exe being a native Windows
> program.

I don't see a need to support such a mix; life is too short and too
hard. These Posix-like environments have their own versions of diff
and diff3, which use the original Posix code. People who mix them are
asking for trouble.

> But you are splitting hairs.

Are you sure it's me who's splitting hair? ;-)

> 1) We don't want to change the default output of 'diff' on Unix.

Of course. The change I suggested fits this requirement.

> 2) We don't want 'diff' to produce different output on Windows than on
> Unix (modulo end-of-line convention),

I think this is fundamentally wrong: the required compatibility will;
certainly break some uses of 'diff's output. That is a wrong way of
handling the kind of conflict between two requirements that we have
here. Correct functionality should come before identical form, since
the latter is nothing more than convenience.

Paul Eggert

unread,
May 6, 2012, 2:03:45 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/06/2012 10:29 AM, Bruno Haible wrote:

>> This would add (by my count) 72 source files to diff
>
> So what? It is no maintenance burden, because "gnulib-tool --update"
> does all a maintainer needs.

Yes and no. gnulib-tool imports the files automatically,
but the maintainer still needs to deal with the burden of
people sending in bug reports about the added files and so
forth. The extra files introduce other minor hassles such as
'grep -r' producing unwanted matches. None of these problems are
enormous, but they are burdens, and I'd rather avoid them
if possible.

Let's put it this way. diffutils proper has only 77 source
files total. This includes everything: C files,
documentation, makefiles, etc. Adding 72 files seems
a bit much just to solve a simple quoting problem.

>> I don't see how these changes would benefit Unix performance.
>
> For src/util.c you are right. But in src/diff3.c and src/sdiff.c, popen()
> is invoked,

Sorry, there still seems to be a misunderstanding here.
In src/diff3.c the pattern is this:

#if HAVE_WORKING_FORK
...
#else
...
fpipe = popen (command, "r");
...
#endif

which means that popen is not invoked on Unix. src/sdiff.c is similar.
The popen code is intended only to be executed on non-POSIXish
systems, so worries about efficiency on Unix seem to be misplaced.

I should mention that the popen code does have the nice
property that it can be easily *tested* on POSIXish systems,
simply by compiling with HAVE_WORKING_FORK set to 0, and
this is something I want to retain as much as possible.

> OK, then the code structure you would like to have is
>
> #if HAVE_WORKING_FORK
> (code for Unix)
> #else
> (prepare arguments)
> #if (defined _WIN32 || defined __WIN32__)&& !defined __CYGWIN__
> (quote the arguments the Windows way, case b)
> #endif
> (invoke popen or system)
> #endif
>
> Is that right?

No, the code structure I'd like in util.c, diff3.c and
sdiff.c is exactly what's there now. Zero changes there.

> popen() takes a POSIX sh command as argument, possibly with function
> definitions, loops, redirections, and what else. You want code that
> rewrites that into a cmd.exe command?

No, I don't want that. All we need is code that redoes POSIX
shell quoting into quoting suitable for mingw.

We don't need a complete popen implementation -- all we need is
a popen implementation that's good enough to satisfy our
porting needs. This is a common situation in gnulib.


Bruno Haible

unread,
May 6, 2012, 2:16:18 PM5/6/12
to Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
Paul Eggert wrote:
> Sorry, there still seems to be a misunderstanding here.
> In src/diff3.c the pattern is this:
>
> #if HAVE_WORKING_FORK
> ...
> #else
> ...
> fpipe = popen (command, "r");
> ...
> #endif
>
> which means that popen is not invoked on Unix.

You're right. I was looking at the wrong spot twice in a row.
Apologies.

My argument about "spawn-pipe would be more efficient on Unix" is therefore
void.

Btuno


Bruno Haible

unread,
May 6, 2012, 2:33:05 PM5/6/12
to bug-g...@gnu.org, Paul Eggert, bug-gn...@gnu.org
Paul Eggert wrote:
> The idea here is not to reimplement popen from scratch in gnulib,
> or to have gnulib popen support every feature that POSIX requires.
> The idea is only to have a popen wrapper that works assuming that
> its argument is quoted for the POSIX shell, using quotearg.
> Such arguments have a simple format that can easily be converted
> to mingw shell format. Once this simple conversion is done, the
> wrapper call the real mingw popen.

Whereas my proposal is to provide a windows_cmd_quote function similar
to the shell_quote function, for use with popen() or system() on
native Windows.

Your proposal has the following drawbacks:

* Like Eli said, a popen() or system() override is going to break
programs that have already constructed the command line for Windows.

(Similar to what we experience with the socket functions override
that take an 'int' file descriptor instead of a 'HANDLE': It broke
parts of GNU clisp, because clisp uses some native Windows API in
some corners and gnulib code in other places.)

* A program like 'diff3' would construct a POSIX sh command to just
immediately afterwards parse it and decompose it. It is an inefficient
way of passing arguments. A simple array argv[] of unquoted strings
is way more efficient.

* The popen() and system() functions would only understand a limited
subset of the set of possible sh commands. No specific error code
exists to signal to the caller such a syntax violation. If a developer
adds " 2>&1 | tee /tmp/log" to such a command and it doesn't work,
how will he be notified of his mistake?

Whereas in a function that takes an array of strings as arguments,
the limits are evident from the API.

Each of these drawbacks is not severe. But can we avoid them nevertheless?

Whereas for my proposal I see the following drawbacks:

* The programmer has to explicitly write code with #if for native Windows.

Bruno


Bruno Haible

unread,
May 6, 2012, 2:56:54 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
Eli Zaretskii wrote:
> When % does need to be quoted, it cannot be in the way on which
> quotearg's design is based, because "%FOO%" will still expand the
> variable.

How does % need to be quoted, then? Can you find out, please? This info
would be useful both for Paul's proposal and for mine.

Bruno


Paul Eggert

unread,
May 6, 2012, 3:33:38 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/06/2012 10:29 AM, Eli Zaretskii wrote:
> You will break gobs of programs that use gnulib

Not at all. Only programs that specifically ask
for the new module, such as diffutils, would get
the proposed requoting behavior.

John Cowan

unread,
May 6, 2012, 3:33:40 PM5/6/12
to Bruno Haible, bug-g...@gnu.org, egg...@cs.ucla.edu, bug-gn...@gnu.org
Bruno Haible scripsit:

> How does % need to be quoted, then? Can you find out, please? This info
> would be useful both for Paul's proposal and for mine.

Strings of the form %FOO% in cmd.exe commands are expanded iff FOO is
an existing environment variable; otherwise, the string is left alone.
All other uses of % are likewise left alone. This expansion is done
*before* quotation marks are interpreted, so there is no way to escape
it; you simply have to hope that you don't get hosed by it.

--
We are lost, lost. No name, no business, no Precious, nothing. Only empty.
Only hungry: yes, we are hungry. A few little fishes, nassty bony little
fishes, for a poor creature, and they say death. So wise they are; so just,
so very just. --Gollum co...@ccil.org http://ccil.org/~cowan

Bruno Haible

unread,
May 6, 2012, 3:34:43 PM5/6/12
to bug-g...@gnu.org, Paul Eggert, bug-gn...@gnu.org
Here are two more proposals, designed to avoid the drawbacks of the
previous ones:

3) A new API

/* Constructs a command line for the platform's command interpreter
that executes argv[0] with the arguments argv[1...] and invokes
popen() on it. */
FILE *popenv (const char * const *argv, const char *type);

/* Constructs a command line for the platform's command interpreter
that executes argv[0] with the arguments argv[1...] and invokes
system() on it. */
int systemv (const char * const *argv);

4) A new API

/* Constructs a command line for the platform's command interpreter
that executes argv[0] with the arguments argv[1...]. */
char *create_system_command (const char * const *argv);

These APIs would not break existing programs, would not do redundant
processing, would not be artificially limited, and would be usable without
any #ifs.

Bruno


Eli Zaretskii

unread,
May 6, 2012, 3:36:33 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
> Date: Sun, 06 May 2012 20:56:54 +0200
>
> Eli Zaretskii wrote:
> > When % does need to be quoted, it cannot be in the way on which
> > quotearg's design is based, because "%FOO%" will still expand the
> > variable.
>
> How does % need to be quoted, then? Can you find out, please?

Like this:

% => "%"

IOW, if you have % characters in a string, each such character should
be quoted separately. Thus, %Path% is quoted as "%"Path"%".

(Actually, it's enough to quote only the first %, if we can find a
simple enough algorithm to do that.)

Eli Zaretskii

unread,
May 6, 2012, 3:42:03 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
> From: Bruno Haible <br...@clisp.org>
> Cc: Paul Eggert <egg...@cs.ucla.edu>, Eli Zaretskii <el...@gnu.org>, bug-gn...@gnu.org
> Bcc: br...@haible.de
> Date: Sun, 06 May 2012 20:33:05 +0200
>
> Whereas for my proposal I see the following drawbacks:
>
> * The programmer has to explicitly write code with #if for native Windows.

This drawback could perhaps be avoided if, instead having an
"ms_shell" style in addition to the current "Posix shell" style, we
use a different set of styles: a "shell command-line argument" quoting
style and a "file-name" quoting style. The former would be used for
producing shell commands, and will use platform-dependent quoting,
similar to the patch I sent. The latter would be used for quoting
file names which are not meant to be passed to a shell, but rather
written as part of a program's output; that latter style will produce
identical results on all platforms.

Would this be okay?

Bruno Haible

unread,
May 6, 2012, 4:17:29 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
Eli Zaretskii wrote:
> if you have % characters in a string, each such character should
> be quoted separately. Thus, %Path% is quoted as "%"Path"%".

Perfect. Thanks.


Bruno Haible

unread,
May 6, 2012, 4:25:12 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
Eli Zaretskii wrote:
> use a different set of styles: a "shell command-line argument" quoting
> style and a "file-name" quoting style. The former would be used for
> producing shell commands, and will use platform-dependent quoting,
> similar to the patch I sent. The latter would be used for quoting
> file names which are not meant to be passed to a shell

Yes, this idea of thinking is good: Different purposes need different
ways of quoting.

But instead of putting it all into the 'quotearg' module - whose primary
purpose has historically been error messages and file names -, I would
assiociate the "file-name" quoting style with the 'quotearg' module and
the "command-line argument" style with the following new API, closely
modeled on sh-quote.h.

=============================== system-quote.h ===============================
/* Quoting for a system command.
Copyright (C) 2001-2012 Free Software Foundation, Inc.
Written by Bruno Haible <br...@clisp.org>, 2012.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */

#ifndef _SYSTEM_QUOTE_H
#define _SYSTEM_QUOTE_H

/* When passing a command the system's command interpreter, we must quote the
program name and arguments, since
- Unix shells interpret characters like " ", "'", "<", ">", "$" etc. in a
special way,
- Windows CreateProcess() interprets characters like ' ', '\t', '\\', '"'
etc. (but not '<' and '>') in a special way,
- Windows cmd.exe also interprets characters like '<', '>', '&', '%', etc.
in a special way. */

#include <stddef.h>

#ifdef __cplusplus
extern "C" {
#endif

/* Identifier for the kind of interpreter of the command. */
enum system_command_interpreter
{
/* The interpreter used by the system() and popen() functions.
This is equivalent to SCI_POSIX_SH on Unix platforms and
SCI_WINDOWS_CMD on native Windows platforms. */
SCI_SYSTEM
/* The POSIX /bin/sh. */
, SCI_POSIX_SH
#if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
/* The native Windows CreateProcess() function. */
, SCI_WINDOWS_CREATEPROCESS
/* The native Windows cmd.exe interpreter. */
, SCI_WINDOWS_CMD
#endif
};

/* Returns the number of bytes needed for the quoted string. */
extern size_t
system_quote_length (enum system_command_interpreter interpreter,
const char *string);

/* Copies the quoted string to p and returns the incremented p.
There must be room for shell_quote_length (string) + 1 bytes at p. */
extern char *
system_quote_copy (enum system_command_interpreter interpreter,
char *p, const char *string);

/* Returns the freshly allocated quoted string. */
extern char *
system_quote (enum system_command_interpreter interpreter,
const char *string);

/* Returns a freshly allocated string containing all argument strings, quoted,
separated through spaces. */
extern char *
system_quote_argv (enum system_command_interpreter interpreter,
const char * const *argv);

#ifdef __cplusplus
}
#endif

#endif /* _SYSTEM_QUOTE_H */
==============================================================================

The system_quote_argv function would be what I called create_system_command
earlier, with the added 'interpreter' argument.

For GNU diffutils, this means just replacing specific uses of
sh_quote with system_quote(SCI_SYSTEM, ...), right before the calls to
popen() and system().

How does that sound?

Bruno


Eli Zaretskii

unread,
May 6, 2012, 4:35:58 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
> From: Bruno Haible <br...@clisp.org>
> Cc: bug-g...@gnu.org, egg...@cs.ucla.edu, bug-gn...@gnu.org
> Bcc: br...@haible.de
> Date: Sun, 06 May 2012 22:25:12 +0200
Sounds good to me, thanks.

So on Posix hosts, this new API will probably simply call the
shell-style quoting, while on non-Posix hosts it will use separate
code, is that right?

Bruno Haible

unread,
May 6, 2012, 4:59:20 PM5/6/12
to Eli Zaretskii, bug-gn...@gnu.org, egg...@cs.ucla.edu, bug-g...@gnu.org
Eli Zaretskii wrote:
> So on Posix hosts, this new API will probably simply call the
> shell-style quoting, while on non-Posix hosts it will use separate
> code, is that right?

Right.

Bruno


Paul Eggert

unread,
May 6, 2012, 5:12:23 PM5/6/12
to Bruno Haible, bug-gn...@gnu.org, bug-g...@gnu.org
On 05/06/2012 12:34 PM, Bruno Haible wrote:
> Here are two more proposals, designed to avoid the drawbacks of the
> previous ones:

Thanks. Either of those would work. I prefer option 3
(popenv / systemv) as it offers more opportunity for
optimization on GNUish hosts, which can use vfork or the
equivalent to get better performance. Such optimizations
aren't that important for diffutils but might matter
more in other applications.

Bruno Haible

unread,
May 6, 2012, 6:13:50 PM5/6/12
to Paul Eggert, bug-gn...@gnu.org, bug-g...@gnu.org
Paul Eggert wrote:
> Thanks. Either of those would work.

Good. I'll concentrate on option 4 in the next few days. This part is
needed for option 3 as well.

> I prefer option 3
> (popenv / systemv) as it offers more opportunity for
> optimization on GNUish hosts, which can use vfork or the
> equivalent to get better performance. Such optimizations
> aren't that important for diffutils but might matter
> more in other applications.

OK. I think Eli's immediate needs will be fulfilled by option 4
(system-quote.h), but if you want to implement option 3 (popenv, systemv)
on top of that, feel free to.

For applications that want the optimized code - no use of /bin/sh and
use of vfork() if possible - we currently have the 'execute' and 'spawn-pipe'
modules; so the current situation is not that bad. (It would be even better
if we had a posix_spawn that works also on mingw. Has been on my TODO list
for ages...)

Bruno


0 new messages