Hi all, newcomer to this list and to msysGit. Feel free to point out any
newbie mistakes.
I spent some time studying Issue 457: "Push over git protocol hangs in
msysGit" (having hit it myself) and I'd like to share some observations
about it.
http://code.google.com/p/msysgit/issues/detail?id=457
I built and debugged using MSVC2008, since this is the debugger I am
most familiar with. To recap, here is the sequence of events leading to
deadlock during git push:
1. Main thread: cmd_push() connects to the server, ultimately calls
send_pack()
2. Main thread: send_pack() spawns an async thread to run sideband_demux()
3. Async thread: sideband_demux() begins eating all the responses from
the server, blocks inside read()
4. Main thread: send_pack() spawns a child process with stdout
redirected to the server, to run cmd_pack_objects()
5. Child process: cmd_pack_objects() ultimately attempts to write to
stdout; blocks inside write()
6. Main thread: send_pack() blocks waiting for child process to finish.
The deadlock is at a low level inside ntdll.dll.
Some platform notes (again, a recap for some): From Git's point of view,
everything is a POSIX file descriptor. Under MinGW, all file descriptors
are managed by the CRT, which translates them to Windows file handles.
Under the hood, MinGW also casts Winsock2 socket handles to Windows file
handles, and wraps them in a CRT file descriptor via mingw_socket().
It's this last part that's dubious. There are a lot of caveats when you
use a Winsock2 socket handle as a file handle, and that seems to be what
we're bumping up against here.
I managed to reproduce the problem in two small sample programs, which
I've attached as a .zip. One is a MinGW application, the other is native
Win32. Both samples perform a simple HTTP request to
google.com while
attempting to read the response in a separate thread. Everything happens
in one process.
The MinGW sample works fine when ported to Ubuntu (just need to swap a
few headers), but deadlocks under MinGW (at least in the msysGit
environment on Windows 7 x64), in basically the same fashion as git push
(inside ntdll.dll during a simultaneous read/write). It's true that the
Git deadlock is spread across two processes and uses inherited handles,
but I don't think that makes any difference with regards to the
underlying problem, since the callstacks are pretty much the same at
deadlock time.
Here are a couple of notes about the MinGW sample I've provided. The
first note is that if you remove the dup() on line 6, the program will
still deadlock, but at a higher level: inside msvcr90.dll, while
attempting to lock the CRT file descriptor. So in effect, dup() serves
as a way of defeating the CRT lock. Indeed, "git push" calls dup() on
the socket file descriptor once in git_tcp_connect(), and a second time
in start_command() right before spawning the child process. (As many of
you are aware, dup() calls DuplicateHandle() under the hood. There is
some question as to whether it is valid to call DuplicateHandle() on a
Winsock2 socket handle. There are in fact potential issues related to
that, and lots of links talk about it, but I don't think those issues
are directly related to the deadlock problem.)
The second note is that if you move the usleep(50000) on line 37 to the
start of readSocket() in this sample, effectively delaying the reader
thread, the deadlock disappears, because the write() has time to start
and finish before read() begins. I think some people observed the same
thing about the "git push" deadlock as well.
I've also attached a native Win32 application which performs all the
same operations as the MinGW program, and deadlocks in exactly the same
way. Given that all we're trying to do is perform a simultaneous read &
write on a simple socket in two threads -- which should be possible
somehow, but obviously not using these particular APIs -- I hammered on
this sample until I found a combination of APIs which worked. In short,
the following modifications are required:
* Change the WSASocket() call to socket(), or pass the
WSA_FLAG_OVERLAPPED flag to WSASocket() -- both do the same thing.
* Change the read/write calls to send/recv, making sure to pass the
underlying Winsock2 socket handle, and not the wrapped file descriptor.
(eg. just use (SOCKET) _get_osfhandle(sockfd).)
After that, the sample works fine and you can read/write on different
threads to your heart's content! But you need both modifications: If you
simply add the WSA_FLAG_OVERLAPPED flag without changing to send/recv,
the read and write calls will both fail with error codes because the
underlying ReadFile/WriteFile calls fail, for reasons which are
explained here:
http://comments.gmane.org/gmane.comp.lib.ev/1726 ... And
if you simply change to send/recv without adding WSA_FLAG_OVERLAPPED,
the program will deadlock inside ws2_32.dll.
Given all these observations, it seems there are three possible areas to
explore in terms of fixing Issue 457. It would be nice to for this
operation to finally work out-of-the-box, given that the git protocol is
a good no-fuss choice for private networks.
1) Modify Git so that it never performs a concurrent read & write on the
same socket. This could be a viable solution. Even the MinGW wiki
suggests that it's normal to modify POSIX applications to run under
MinGW. Many others have suggested this and it seems that's what
disabling sideband-64k accomplishes. But are there other operations in
Git which potentially perform such socket threading, too? What about
future changes to Git?
2) Modify MinGW to support concurrent recv/write on sockets using the
modifications I demonstrated in the Win32 sample. We'd need to wrap
read() as mingw_read() in the same way that write() is currently
wrapped, and somehow redirect socket calls to send/recv. This might also
be a chance to fix other questionable things for sockets, such as the
dup() and close(). At the same time, it would probably break other
things like buffered I/O on sockets via fread/fwrite (which would fall
back to the CRT implementations of read/write). With luck, Git doesn't
do such things, but I'm not familiar enough to say. Also, handle
inheritance would require some finesse. Whatever the modifications end
up being, I doubt anyone would want to push them back to the main MinGW.
But it just might be possible to support Git's use cases.
3) Another possible modification internal to MinGW: Simply avoid
blocking reads from sockets. In such places, detect the socket somehow,
and poll intermittently until data arrives. Network reads are
unpredictable anyway, so no serious harm is done this way. I think
somebody proposed a patch somewhere in this list before, but I can't
find the link now.
That's all I've got at this point. I hope the attached sample programs
will help others better understand the issue to some extent. Hopefully a
better understanding will help us choose the best solution.
In your opinion, which approaches makes the most sense?
BTW, thanks for Git for Windows & msysGit.
Regards,
Jeff Preshing