Another look at the hang during push over git protocol

1814 views
Skip to first unread message

Jeff Preshing

unread,
May 1, 2012, 12:10:21 AM5/1/12
to msy...@googlegroups.com
Hi all, newcomer to this list and to msysGit. Feel free to point out any
newbie mistakes.

I spent some time studying Issue 457: "Push over git protocol hangs in
msysGit" (having hit it myself) and I'd like to share some observations
about it.
http://code.google.com/p/msysgit/issues/detail?id=457

I built and debugged using MSVC2008, since this is the debugger I am
most familiar with. To recap, here is the sequence of events leading to
deadlock during git push:

1. Main thread: cmd_push() connects to the server, ultimately calls
send_pack()
2. Main thread: send_pack() spawns an async thread to run sideband_demux()
3. Async thread: sideband_demux() begins eating all the responses from
the server, blocks inside read()
4. Main thread: send_pack() spawns a child process with stdout
redirected to the server, to run cmd_pack_objects()
5. Child process: cmd_pack_objects() ultimately attempts to write to
stdout; blocks inside write()
6. Main thread: send_pack() blocks waiting for child process to finish.

The deadlock is at a low level inside ntdll.dll.

Some platform notes (again, a recap for some): From Git's point of view,
everything is a POSIX file descriptor. Under MinGW, all file descriptors
are managed by the CRT, which translates them to Windows file handles.
Under the hood, MinGW also casts Winsock2 socket handles to Windows file
handles, and wraps them in a CRT file descriptor via mingw_socket().
It's this last part that's dubious. There are a lot of caveats when you
use a Winsock2 socket handle as a file handle, and that seems to be what
we're bumping up against here.

I managed to reproduce the problem in two small sample programs, which
I've attached as a .zip. One is a MinGW application, the other is native
Win32. Both samples perform a simple HTTP request to google.com while
attempting to read the response in a separate thread. Everything happens
in one process.

The MinGW sample works fine when ported to Ubuntu (just need to swap a
few headers), but deadlocks under MinGW (at least in the msysGit
environment on Windows 7 x64), in basically the same fashion as git push
(inside ntdll.dll during a simultaneous read/write). It's true that the
Git deadlock is spread across two processes and uses inherited handles,
but I don't think that makes any difference with regards to the
underlying problem, since the callstacks are pretty much the same at
deadlock time.

Here are a couple of notes about the MinGW sample I've provided. The
first note is that if you remove the dup() on line 6, the program will
still deadlock, but at a higher level: inside msvcr90.dll, while
attempting to lock the CRT file descriptor. So in effect, dup() serves
as a way of defeating the CRT lock. Indeed, "git push" calls dup() on
the socket file descriptor once in git_tcp_connect(), and a second time
in start_command() right before spawning the child process. (As many of
you are aware, dup() calls DuplicateHandle() under the hood. There is
some question as to whether it is valid to call DuplicateHandle() on a
Winsock2 socket handle. There are in fact potential issues related to
that, and lots of links talk about it, but I don't think those issues
are directly related to the deadlock problem.)

The second note is that if you move the usleep(50000) on line 37 to the
start of readSocket() in this sample, effectively delaying the reader
thread, the deadlock disappears, because the write() has time to start
and finish before read() begins. I think some people observed the same
thing about the "git push" deadlock as well.

I've also attached a native Win32 application which performs all the
same operations as the MinGW program, and deadlocks in exactly the same
way. Given that all we're trying to do is perform a simultaneous read &
write on a simple socket in two threads -- which should be possible
somehow, but obviously not using these particular APIs -- I hammered on
this sample until I found a combination of APIs which worked. In short,
the following modifications are required:

* Change the WSASocket() call to socket(), or pass the
WSA_FLAG_OVERLAPPED flag to WSASocket() -- both do the same thing.
* Change the read/write calls to send/recv, making sure to pass the
underlying Winsock2 socket handle, and not the wrapped file descriptor.
(eg. just use (SOCKET) _get_osfhandle(sockfd).)

After that, the sample works fine and you can read/write on different
threads to your heart's content! But you need both modifications: If you
simply add the WSA_FLAG_OVERLAPPED flag without changing to send/recv,
the read and write calls will both fail with error codes because the
underlying ReadFile/WriteFile calls fail, for reasons which are
explained here: http://comments.gmane.org/gmane.comp.lib.ev/1726 ... And
if you simply change to send/recv without adding WSA_FLAG_OVERLAPPED,
the program will deadlock inside ws2_32.dll.

Given all these observations, it seems there are three possible areas to
explore in terms of fixing Issue 457. It would be nice to for this
operation to finally work out-of-the-box, given that the git protocol is
a good no-fuss choice for private networks.

1) Modify Git so that it never performs a concurrent read & write on the
same socket. This could be a viable solution. Even the MinGW wiki
suggests that it's normal to modify POSIX applications to run under
MinGW. Many others have suggested this and it seems that's what
disabling sideband-64k accomplishes. But are there other operations in
Git which potentially perform such socket threading, too? What about
future changes to Git?

2) Modify MinGW to support concurrent recv/write on sockets using the
modifications I demonstrated in the Win32 sample. We'd need to wrap
read() as mingw_read() in the same way that write() is currently
wrapped, and somehow redirect socket calls to send/recv. This might also
be a chance to fix other questionable things for sockets, such as the
dup() and close(). At the same time, it would probably break other
things like buffered I/O on sockets via fread/fwrite (which would fall
back to the CRT implementations of read/write). With luck, Git doesn't
do such things, but I'm not familiar enough to say. Also, handle
inheritance would require some finesse. Whatever the modifications end
up being, I doubt anyone would want to push them back to the main MinGW.
But it just might be possible to support Git's use cases.

3) Another possible modification internal to MinGW: Simply avoid
blocking reads from sockets. In such places, detect the socket somehow,
and poll intermittently until data arrives. Network reads are
unpredictable anyway, so no serious harm is done this way. I think
somebody proposed a patch somewhere in this list before, but I can't
find the link now.

That's all I've got at this point. I hope the attached sample programs
will help others better understand the issue to some extent. Hopefully a
better understanding will help us choose the best solution.

In your opinion, which approaches makes the most sense?

BTW, thanks for Git for Windows & msysGit.

Regards,
Jeff Preshing

samples.zip

Johannes Sixt

unread,
May 2, 2012, 3:08:48 PM5/2/12
to Jeff Preshing, msy...@googlegroups.com
Thank you for your work on this issue. I haven't looked at your example
programs, but your description is sufficiently clear, and your
observations match mine as far as I remember.

In the face of the complexity needed to work around the underlying
problem, IMO it is the simplest to just disable the sideband support for
the git protocol on Windows. I've submitted a patch that disables
sideband support in send-pack on Windows unconditionally without looking
at the transport protocol. It could need some refinement to keep
sideband support for other transports, such as ssh.

-- Hannes

Johannes Schindelin

unread,
May 2, 2012, 7:05:32 PM5/2/12
to Johannes Sixt, Jeff Preshing, msy...@googlegroups.com
Hi Hannes,

On Wed, 2 May 2012, Johannes Sixt wrote:

> In the face of the complexity needed to work around the underlying
> problem, IMO it is the simplest to just disable the sideband support for
> the git protocol on Windows. I've submitted a patch that disables
> sideband support in send-pack on Windows unconditionally without looking
> at the transport protocol. It could need some refinement to keep
> sideband support for other transports, such as ssh.

I will apply the patch if you enhance it by Jeff's explanation (Jeff,
thank you so much, this was awesome work on all levels -- from identifying
the problem to discussing possible solutions to communicating it:
awesome!). Maybe just pasting Jeff's mail with a header "Technical
explanation of the issue by Jeff Preshing"?

Ciao,
Dscho

Jeff Preshing

unread,
May 3, 2012, 9:13:24 AM5/3/12
to msy...@googlegroups.com, Johannes Sixt, Jeff Preshing
On Wednesday, 2 May 2012 19:05:32 UTC-4, Johannes Schindelin wrote:
Maybe just pasting Jeff's mail with a header "Technical
explanation of the issue by Jeff Preshing"?

FWIW, here's a more concise summary of the problem: MinGW wraps Windows sockets in CRT file descriptors in order to mimic the functionality of POSIX sockets. This causes msvcrt.dll to treat sockets as Installable File System (IFS) handles, calling ReadFile, WriteFile, DuplicateHandle and CloseHandle on them. This approach works well in simple cases on recent versions of Windows, but does not support all usage patterns. In particular, using this approach, any attempt to read & write concurrently on the same socket (from one or more processes) will deadlock in a scenario where the read waits for a response from the server which is only invoked after the write. This is what send_pack currently attempts to do in the use_sideband codepath.

By the way, if we disable sideband support for the git protocol, does that mean there will be no more error messages from the server?

Jeff Preshing

unread,
May 5, 2012, 6:51:36 PM5/5/12
to Johannes Schindelin, Johannes Sixt, msy...@googlegroups.com
Hi guys,

Out of curiosity I wondered, "How is it that the SSH protocol works just
fine, but the git protocol doesn't?" Well, as you already know, the SSH
connection is handled by a subprocess. Git just spawns ssh.exe and
treats its stdin/stdout pipes as the network connection (so to speak).
Then later, when send_pack spawns a child git process, its stdout gets
plugged directly into the stdin of ssh.exe (which, I assume, does
perform concurrent read/writes on the same socket).

While tracing through this, I noticed that ssh.exe seems to emit error
messages such as "tty option detected in CYGWIN environment variable",
which leads me to believe that ssh.exe was built as a Cygwin executable
rather than a MinGW executable. (Dunno if I'm stating the obvious here;
I'm new to these environments.)

If that's the case, that alone might explain why the SSH protocol works,
but not the git protocol. It would also explain how the Cygwin binary of
git does not have this deadlock bug, while msysGit currently does. From
what I understand, Cygwin provides a more complete emulation of a POSIX
runtime, as compared to MinGW which is just a thin wrapper over
msvcrt.dll (and which works passably with sockets until you start doing
things like the sample I attached earlier). I skimmed through the Cygwin
source and indeed it contains an fhandler_socket C++ class, with all
sorts of carefully crafted logic to emulate POSIX sockets correctly on
Win32, including passing WSA_FLAG_OVERLAPPED to WSASocket, using WSARecv
to read, using WSADuplicateSocket to correctly deal with handle
inheritance, etc.

So my question is, why not build Git as a Cygwin executable, and
distribute that as "Git for Windows" instead? It would fix this issue
without requiring any code changes. And other Cygwin executables seem to
be bundled in the package already. (Let me apologize in advance by the
way, I have absolutely no idea if this suggestion ruffles anyone's
feathers, or goes against the goals of the project or something! I also
don't know what changes this would entail for the msysGit environment,
but I imagine you guys do.)

Johannes Schindelin

unread,
May 5, 2012, 9:44:34 PM5/5/12
to Jeff Preshing, Johannes Sixt, msy...@googlegroups.com
Hi,

On Sat, 5 May 2012, Jeff Preshing wrote:

> [... debugging SSH vs Git protocol ...]
>
> While tracing through this, I noticed that ssh.exe seems to emit error
> messages such as "tty option detected in CYGWIN environment variable",
> which leads me to believe that ssh.exe was built as a Cygwin executable
> rather than a MinGW executable. (Dunno if I'm stating the obvious here;
> I'm new to these environments.)

Yep, ssh.exe is an MSys executable (which is equivalent to a
stripped-down-Cygwin executable).

> So my question is, why not build Git as a Cygwin executable,

Two reasons: 1) it is dog slow, and 2) it incurs the dependency to Cygwin
which is not as friendly as MSys: you cannot simply have two different
versions of Cygwin running at the same time (and some of us require a
certain Cygwin version for their work setup).

Ciao,
Johannes

Andrew White

unread,
Oct 22, 2012, 6:02:51 AM10/22/12
to msy...@googlegroups.com, Johannes Sixt, Jeff Preshing

What is the status of this fix? I tried building git using the latest msysGit but I still experience this problem.
 

Johannes Schindelin

unread,
Oct 22, 2012, 9:00:11 AM10/22/12
to Andrew White, msy...@googlegroups.com, Johannes Sixt, Jeff Preshing
Hi,

On Mon, 22 Oct 2012, Andrew White wrote:

> On Thursday, 3 May 2012 08:35:32 UTC+9:30, Johannes Schindelin wrote:
> >
> > On Wed, 2 May 2012, Johannes Sixt wrote:
> >
> > > In the face of the complexity needed to work around the underlying
> > > problem, IMO it is the simplest to just disable the sideband support
> > > for the git protocol on Windows. I've submitted a patch that
> > > disables sideband support in send-pack on Windows unconditionally
> > > without looking at the transport protocol. It could need some
> > > refinement to keep sideband support for other transports, such as
> > > ssh.
> >
> > I will apply the patch if you enhance it by Jeff's explanation (Jeff,
> > thank you so much, this was awesome work on all levels -- from
> > identifying the problem to discussing possible solutions to
> > communicating it: awesome!). Maybe just pasting Jeff's mail with a
> > header "Technical explanation of the issue by Jeff Preshing"?
>
> What is the status of this fix?

I do not recall.

> I tried building git using the latest msysGit but I still experience
> this problem.

Apparently it is not in 'devel' then. Unfortunately, I am on a busy
business trip and cannot take care of the issue.

Ciao,
Johannes
Reply all
Reply to author
Forward
0 new messages