On Tue, Apr 10, 2012 at 4:23 AM, Johannes Schindelin
<Johannes....@gmx.de> wrote:
> Hi Karsten,
>
> On Thu, 1 Mar 2012, karste...@dcon.de wrote:
>
>> As of "Win32: Thread-safe windows console output", git-log no longer
>> terminates when the pager process dies. [...]
>
> Sorry for the long delay. Happily, I did not forget to include this in
> time for 1.7.10.
>
> Thanks so much!
> Dscho
>
It seems some variation of issue is still kind-of present, but not in
the case I presented. However, this time it was already present before
Karsten's patch.
Exiting the pager on "git show" on a very big commit (typically the
initial commit on some big project imported from some other VCS) still
makes the process hang until git finishes outputting the diff.
The symptoms are the same; console_thread is stuck inside ReadFile,
which doesn't return until all the output has been written. However,
the "git show" code-path doesn't call maybe_flush_or_die.
Now, it seems that the big difference from performing the same test on
linux is that wait_for_pager_signal never gets triggered. I'm guessing
this is because the pager is supposed to issue a SIGHUP to it's parent
process, or that a write to a disconnected pipe should issue SIGPIPE.
This obviously doesn't work on Windows, but perhaps we can emulate
SIGPIPE it by wrapping read/write/fread/fwrite etc and calling
raise(). Yes, Karsten managed to get rid of some of these wrappings,
which was nice. It's a shame if we need to re-introduce them. But I
think it's better than hanging, which makes git feel somewhat slow for
some operations on huge repositories.
Now, in addition it turns out something else is spooky; running just a
simple "git show" triggers a SIGTRAP in winansi_exit:
Program received signal SIGTRAP, Trace/breakpoint trap.
0x7778f9d2 in ntdll!RtlUpdateClonedSRWLock ()
from D:\Windows\system32\ntdll.dll
(gdb) bt
#0 0x7778f9d2 in ntdll!RtlUpdateClonedSRWLock ()
from D:\Windows\system32\ntdll.dll
#1 0x7778f9d2 in ntdll!RtlUpdateClonedSRWLock ()
from D:\Windows\system32\ntdll.dll
#2 0x75c5b9f2 in KERNELBASE!CloseThreadpoolIo ()
from D:\Windows\syswow64\KernelBase.dll
#3 0x76901438 in KERNEL32!GetPrivateProfileStructA ()
from D:\Windows\syswow64\kernel32.dll
#4 0x00000074 in ?? ()
#5 0x00000001 in ?? ()
#6 0x0028fe40 in ?? ()
#7 0x0051d0a7 in winansi_exit () at compat/winansi.c:437
Backtrace stopped: frame did not save the PC
This is the "CloseHandle(hwrite1);"-line. If I continue, I get the
same SIGTRAP from the "CloseHandle(hwrite2);"-line as well.
I doubt this is related to the problem above, but it seems suspicious
nevertheless.
Some debugging reveals that CloseHandle(hwrite1) returns FALSE, and
GetLastError() returns ERROR_INVALID_HANDLE. So it seems somehow,
hwrite1 isn't valid at this point. Or perhaps CloseHandle isn't the
right way to close a duplicated handle. MSDN says:
"Normally the target process closes a duplicated handle when that
process is finished using the handle. To close a duplicated handle
from the source process, call DuplicateHandle with the following
parameters:
* Set hSourceProcessHandle to the target process from the
DuplicateHandle call that created the handle.
* Set hSourceHandle to the duplicated handle to close.
* Set lpTargetHandle to NULL.
* Set dwOptions to DUPLICATE_CLOSE_SOURCE."
I'm not entirely sure if this means that this is the way to close
handles duplicated in this way, but it might be. A quick test doesn't
look promising, though:
---8<--
diff --git a/compat/winansi.c b/compat/winansi.c
index 040ef5a..8fadea9 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -420,6 +420,23 @@ static DWORD WINAPI console_thread(LPVOID unused)
return 0;
}
+static void die_lasterr(const char *fmt, ...)
+{
+ va_list params;
+ va_start(params, fmt);
+ errno = err_win_to_posix(GetLastError());
+ die_errno(fmt, params);
+ va_end(params);
+}
+
+static void close_handle(HANDLE hnd)
+{
+ HANDLE hproc = GetCurrentProcess();
+ if (!DuplicateHandle(hproc, hnd, hproc, NULL, 0, FALSE,
+ DUPLICATE_CLOSE_SOURCE))
+ die_lasterr("DuplicateHandle(%li) failed", (long) hnd);
+}
+
static void winansi_exit(void)
{
/* flush all streams */
@@ -434,22 +451,13 @@ static void winansi_exit(void)
/* cleanup handles... */
if (hwrite1 != INVALID_HANDLE_VALUE)
- CloseHandle(hwrite1);
+ close_handle(hwrite1);
if (hwrite2 != INVALID_HANDLE_VALUE)
- CloseHandle(hwrite2);
+ close_handle(hwrite2);
CloseHandle(hwrite);
CloseHandle(hthread);
}
-static void die_lasterr(const char *fmt, ...)
-{
- va_list params;
- va_start(params, fmt);
- errno = err_win_to_posix(GetLastError());
- die_errno(fmt, params);
- va_end(params);
-}
-
static HANDLE duplicate_handle(HANDLE hnd)
{
HANDLE hresult, hproc = GetCurrentProcess();
---8<--
Not only does this not at all solve the problem, it makes it worse, by
printing the error a bunch of times. Turns out, it's not a very good
idea to print errors when we're in the process of taking down stderr
;)
Hmm, it turns out raise(SIGPIPE) doesn't work on MSVC, as SIGPIPE
(defined by us in compat/mingw.h) isn't a valid argument to MSVC's
raise(), although signal seems to accept it (the CRT doesn't complain,
at least).
So we need to emulate it.
Here's my quick stab at fixing the problem, and it seems to do the trick:
---
diff --git a/compat/mingw.c b/compat/mingw.c
index e6f331a..d18abb9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -375,7 +375,19 @@ ssize_t mingw_write(int fd, const void *buf, size_t count)
* the net without changing the number of WriteFile() calls in
* the local case.
*/
- return write(fd, buf, min(count, 31 * 1024 * 1024));
+ int ret = write(fd, buf, min(count, 31 * 1024 * 1024));
+ if (ret < 0 && errno == EPIPE)
+ mingw_raise(SIGPIPE);
+ return ret;
+}
+
+#undef fwrite
+size_t mingw_fwrite(const void *ptr, size_t size, size_t nitems, FILE *fp)
+{
+ size_t ret = fwrite(ptr, size, nitems, fp);
+ if (ret < nitems && ferror(fp) == EPIPE)
+ mingw_raise(SIGPIPE);
+ return ret;
}
FILE *mingw_fopen (const char *filename, const char *otype)
@@ -1662,6 +1674,7 @@ static HANDLE timer_thread;
static int timer_interval;
static int one_shot;
static sig_handler_t timer_fn = SIG_DFL;
+static sig_handler_t pipe_fn = SIG_DFL;
/* The timer works like this:
* The thread, ticktack(), is a trivial routine that most of the time
@@ -1768,11 +1781,37 @@ int sigaction(int sig, struct sigaction *in,
struct sigaction *out)
#undef signal
sig_handler_t mingw_signal(int sig, sig_handler_t handler)
{
- sig_handler_t old = timer_fn;
- if (sig != SIGALRM)
+ sig_handler_t old;
+ switch (sig) {
+ case SIGALRM:
+ old = timer_fn;
+ timer_fn = handler;
+ return old;
+
+ case SIGPIPE:
+ old = pipe_fn;
+ pipe_fn = handler;
+ return old;
+
+ default:
return signal(sig, handler);
- timer_fn = handler;
- return old;
+ }
+}
+
+#undef raise
+int mingw_raise(int sig)
+{
+ if (sig == SIGPIPE) {
+ if (pipe_fn == SIG_DFL) {
+ fprintf(stderr, "Program terminated with signal SIGPIPE, Broken pipe.\n");
+ exit(SIGPIPE + 128);
+ }
+ if (pipe_fn != SIG_IGN)
+ pipe_fn(SIGPIPE);
+ return 0;
+ }
+
+ return raise(sig);
}
static const char *make_backslash_path(const char *path)
diff --git a/compat/mingw.h b/compat/mingw.h
index ad87177..c7b2cec 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -176,6 +176,9 @@ int mingw_open (const char *filename, int oflags, ...);
ssize_t mingw_write(int fd, const void *buf, size_t count);
#define write mingw_write
+size_t mingw_fwrite(const void *ptr, size_t size, size_t nitems, FILE *fp);
+#define fwrite mingw_fwrite
+
FILE *mingw_fopen (const char *filename, const char *otype);
#define fopen mingw_fopen
@@ -300,6 +303,9 @@ static inline unsigned int git_ntohl(unsigned int x)
sig_handler_t mingw_signal(int sig, sig_handler_t handler);
#define signal mingw_signal
+int mingw_raise(int sig);
+#define raise mingw_raise
+
/*
* ANSI emulation wrappers
*/
maybe_flush_or_die gets called after every commit (see
log-tree.c::log_tree_commit()), and showing a single commit should
_usually_ be fast enough that you don't notice any delay. Don't you think
that your imported ueber-commit is a bit of an esoteric corner-case?
ferror() just returns a flag, the actual error code is lost. The value of
the flag (_IOERR = 0x20) just happens to be the same as EPIPE, so that
'ferror(fp) == EPIPE' is true for any IO error.
To reliably detect EPIPE, you'd have to override _every_ stdio write
function and check errno (i.e. fwrite, [f]puts, [f]putc, [v][f]printf,
fflush...). That's quite a few overrides, probably with noticeable
performance impact.
If you really need that additional responsiveness, wouldn't it be simpler
to add maybe_flush_or_die in a few strategic places (e.g. if a commit
changes more than, say, 100 files, call maybe_flush_or_die() after every
diff instead of after every commit...)?
There's an enlightning discussion from 2007 in which Linus himself
explains the motivation behind maybe_flush_or_die [1] (that doesn't
consider that SIGPIPE is broken on Windows, though).
[1] http://thread.gmane.org/gmane.comp.version-control.git/50918
Cheers,
Karsten
Not even by a long shot in my real-world use-case. Showing some
commits that imports a *lot of code* makes git spend minutes trying to
finish.
> Don't you think
> that your imported ueber-commit is a bit of an esoteric corner-case?
Imported über-commit? This is code I wrote myself, not imported from
somewhere else.
But I'd prefer we just do the right thing when EPIPE occurs, which is
exiting the process, and flagging it in the return-code.
And no, I don't think this is an esoteric corner-case; I think it's
making the EPIPE signal work like git expects it to. This isn't a
theoretical case, it's a case I bump into quite often. For me, one of
the big reasons to use git is it's performance.
Thanks for pointing that out. Something like this seems slightly more
correct, even though the raising SIGPIPE on EINVAL (for the same
reason as maybe_flush_or_die treats EPIPE and EINVAL the same) feels
somewhat wrong. Ugh.
---8<---
diff --git a/compat/mingw.c b/compat/mingw.c
index d18abb9..309fa1d 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -385,7 +385,7 @@ ssize_t mingw_write(int fd, const void *buf, size_t count)
size_t mingw_fwrite(const void *ptr, size_t size, size_t nitems, FILE *fp)
{
size_t ret = fwrite(ptr, size, nitems, fp);
- if (ret < nitems && ferror(fp) == EPIPE)
+ if (ret < nitems && ferror(fp) && (errno == EPIPE || errno == EINVAL))
mingw_raise(SIGPIPE);
return ret;
}
---8<---
> To reliably detect EPIPE, you'd have to override _every_ stdio write
> function and check errno (i.e. fwrite, [f]puts, [f]putc, [v][f]printf,
> fflush...). That's quite a few overrides, probably with noticeable
> performance impact.
>
There's little point in jumping the gun on all variations; fwrite
alone is enough for my issue. The wrapping of write could probably be
left out until proven otherwise; this was just a quick stab.
> If you really need that additional responsiveness, wouldn't it be simpler
> to add maybe_flush_or_die in a few strategic places (e.g. if a commit
> changes more than, say, 100 files, call maybe_flush_or_die() after every
> diff instead of after every commit...)?
That might work in practice, with the exception of very large
text-files. This isn't something I have myself, so it'll work for me.
But I'd rather see the exit-on-SIGPIPE design that Git tries to
implement work on Windows. Doing so makes sure the sigchain gets
properly unwound.
But it proves to be a little bit tricky, at least without dirty hacks.
I'll see if I can come up with something better than the hack I posted
above, though. If not, calling maybe_flush_or_die every now and then
might be a good idea.
> There's an enlightning discussion from 2007 in which Linus himself
> explains the motivation behind maybe_flush_or_die [1] (that doesn't
> consider that SIGPIPE is broken on Windows, though).
>
> [1] http://thread.gmane.org/gmane.comp.version-control.git/50918
Thanks for the reference, it's an interesting read.
Hey, you brought up the 'imported from some other VCS' yourself, so I
(perhaps wrongly) assumed that this is a one-time thing. But I agree that
its always better to do the right thing (if it can be done with reasonable
effort, see below).
My tests showed that EINVAL originates from WriteFile returning
ERROR_NO_DATA after the pipe has been closed. So we can reliably check for
broken pipes by doing an empty write on the unterlying OS handle, like
this:
---8<---
int is_broken_pipe_error(int fd)
{
DWORD dummy, err;
int saved_errno = errno;
HANDLE hnd = (HANDLE) _get_osfhandle(fd);
errno = saved_errno;
if (hnd == INVALID_HANDLE_VALUE || GetFileType(hnd) !=
FILE_TYPE_PIPE)
return 0;
/* OS handle is a pipe, do an empty write to check the Win32 error
*/
if (WriteFile(hnd, (LPVOID) &dummy, 0, &dummy, NULL))
return 0;
err = GetLastError();
return err == ERROR_BROKEN_PIPE ||
err == ERROR_NO_DATA ||
err == ERROR_NOT_CONNECTED;
}
---8<---
I haven't encountered ERROR_NOT_CONNECTED in my tests, but it seems
reasonable to check that, too.
Unfortunately, MSVCRT's flush always clears the stream buffer, even if the
write fails, so all stream IO functions will happily succeed most of the
time (just filling the buffer) even if the pipe is broken. So its probably
better to just check ferror instead of return value + errno.
By passing the stdio function's return value through a helper function, it
is thus possible to override all the relevant functions used by git with
just a few defines (might be a bit counter-intuitive at first glance, but
works just fine):
---8<---
static inline int check_broken_pipe(FILE *file, int ioresult)
{
if (ferror(file) && is_broken_pipe_error(fileno(file)))
mingw_raise(SIGPIPE);
return ioresult;
}
#define fwrite(ptr, size, nitems, file) (size_t) check_broken_pipe(file, \
(int) fwrite(ptr, size, nitems, file))
#define fputc(c, file) check_broken_pipe(file, fputc(c, file))
#define putc(c, file) check_broken_pipe(file, putc(c, file))
#define putchar(c) check_broken_pipe(stdout, putchar(c))
#define fputs(s, file) check_broken_pipe(file, fputs(s, file))
#define puts(s) check_broken_pipe(stdout, puts(s))
#define fprintf(file, ...) check_broken_pipe(file, fprintf(file,
__VA_ARGS__))
#define printf(...) check_broken_pipe(stdout, printf(__VA_ARGS__))
#define vfprintf(file, format, args) check_broken_pipe(file, \
vfprintf(file, format, args))
#define vprintf(format, args) check_broken_pipe(stdout, \
vprintf(format, args))
---8<---
Ciao,
Karsten
I'm sorry, I misunderstood you. My bad.
Yes, it's an imported übercommit, indeed. But that's something I often
work with, because git-svn on Windows is dog slow. So I tend to only
import a partial history, and in such cases, viewing the root-commit
becomes painful.
Nice, thanks.
> I haven't encountered ERROR_NOT_CONNECTED in my tests, but it seems
> reasonable to check that, too.
>
> Unfortunately, MSVCRT's flush always clears the stream buffer, even if the
> write fails, so all stream IO functions will happily succeed most of the
> time (just filling the buffer) even if the pipe is broken. So its probably
> better to just check ferror instead of return value + errno.
>
Could we overload the flush-function to prevent that from happening?
I'm not saying it's the best approach, just trying to evaluate the
options... Checking the return-value (when/if we can) is probably less
costly than calling ferror.
Wow. I like this approach a lot. One minor nit is that we *might* want
to name the helper check_broken_write_pipe rather than
check_broken_pipe in case we need to add a version that checks if the
read-end of a pipe is broken. Perhaps.
But I wonder, does this work with MSVC? It doesn't implement C99, so
I'm not sure if we can use __VA_ARGS__...
[snip]
> >
> > Unfortunately, MSVCRT's flush always clears the stream buffer, even if
the
> > write fails, so all stream IO functions will happily succeed most of
the
> > time (just filling the buffer) even if the pipe is broken. So its
probably
> > better to just check ferror instead of return value + errno.
> >
>
> Could we overload the flush-function to prevent that from happening?
> I'm not saying it's the best approach, just trying to evaluate the
> options...
I meant the flush that gets called internally by all f* functions if the
buffer is full. So I'm afraid we cannot override this one.
> Checking the return-value (when/if we can) is probably less
> costly than calling ferror.
>
ferror is just a macro that checks a bit in the FILE structure, so
performace-wise it should be similar to checking the return value (two or
three machine instructions). Checking errno involves several library
calls, though.
I'd use feof to check for end of input. But feel free to tweak that code
however you like, its just a suggestion (you'd probably want to rename
is_broken_pipe_error as well).
> But I wonder, does this work with MSVC? It doesn't implement C99, so
> I'm not sure if we can use __VA_ARGS__...
I haven't tried it, but "Support for variadic macros was introduced in
Visual C++ 2005." [1]
[1] http://msdn.microsoft.com/en-us/library/ms177415%28v=vs.80%29.aspx
If you also set errno in the helper and add a define for fflush (and a
NULL check, as fflush(NULL) is valid), you could remove the EINVAL hack in
maybe_flush_or_die:
---8<---
static inline int check_broken_pipe(FILE *file, int ioresult)
{
if (file && ferror(file) && is_broken_pipe_error(fileno(file))) {
mingw_raise(SIGPIPE);
errno = EPIPE;
}
return ioresult;
}
#define fflush(file) check_broken_pipe(file, fflush(file))
...
---8<---
and:
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -34,12 +34,7 @@ void maybe_flush_or_die(FILE *f, const char *desc)
return;
}
if (fflush(f)) {
- /*
- * On Windows, EPIPE is returned only by the first write()
- * after the reading end has closed its handle; subsequent
- * write()s return EINVAL.
- */
- if (errno == EPIPE || errno == EINVAL)
+ if (errno == EPIPE)
exit(0);
die_errno("write failure on '%s'", desc);
}
Oooh, that's neat! Thanks a lot for all the input, I'll roll a new
version incorporating the suggestions soonish :)
Ouch...the signal handler (pager.c::wait_for_pager()) also calls fflush,
and if fflush raises SIGPIPE, we get recursive. Shouldn't raise() disable
the signal handler while executing it?
if (pipe_fn != SIG_IGN) {
sig_handler_t tmp_fn = pipe_fn;
pipe_fn = SIG_IGN;
tmp_fn(SIGPIPE);
pipe_fn = tmp_fn;
}
Hm, not quite. A signal-handler can change the signal handler itself
and if we do this, we'll lose the new signal handler.
wait_for_pager_signal does this (through sigchain_pop), so we'll fail
to unwind the signal-chain.
I wonder a bit what is really supposed to happen in this case. fflush
isn't listed in opengroup's list of async-signal-safe functions. But
on the other hand write, which can also raise a signal, is listed. So
it seems signals-handlers are supposed to be allowed to call other
signal-generating functions.
Reading the POSIX spec on signals doesn't enlight me much... :(