on windows an unlink/open call can fail in case another application has
the file opened for writing. If this happens it is currently very
annoying for the user as git stops there and he ends up in a half
checked out/merged/... state.
I do not know whether I recall correctly that there has been some work
related repeating such an operation in case of an access denied failure
because I can not find anything related in the source.
As this still happens to my users in irregular intervals it seems to be
such a common case for windows user in their day to day work that I want
to do something about it.
My proposal in case we get into such a situation:
* In case we are in interactive mode (e.g. stdout is a tty) present a
question to the user (cancel/retry) and wait so he can do something
about the situation.
* Additionally wait for an increasing amount of time and then
automatically restart. On every retry the delay time is doubled.
* In case we are not in interactive mode just fail by default. Add a
config option to enable the above cases in this mode.
What do you think about this?
My final goal is to extent this to the gui tools. I would be glad for
any pointers to existing code that may help me to parse the
output/define a nice format.
cheers Heiko
Look at mingw_rename in compat/mingw.c.
Do your users have files open for writing for such a long time that the
timeouts in mingw_rename are too short?
BTW, in my private repository:
git://repo.or.cz/git/mingw/j6t.git master
contains an extension of mingw_rename that writes to stderr *if* it retried to
rename a file. You might be interested to see what it says with your users.
-- Hannes
Ah thanks. That was what I was searching ...
>
> Do your users have files open for writing for such a long time that the
> timeouts in mingw_rename are too short?
As far as I can tell. Its mostly IDE indexing and Virus Scanner
activitiy that causing this. Sometimes it seems to be triggered by
switching branches to quickly because the IDE start indexing the changed
files.
Is the retrying for rename only complete? I would think that unlink need
something like that as well?
> BTW, in my private repository:
>
> git://repo.or.cz/git/mingw/j6t.git master
>
> contains an extension of mingw_rename that writes to stderr *if* it retried to
> rename a file. You might be interested to see what it says with your users.
It could be interesting. Will probably extend it to write a log file and check
what it says once it has happened again. Its a very racy behavior.
cheers Heiko
Only rename() was adjusted.
> I would think that unlink need something like that as well?
I never encountered the problems with open() or unlink() (or at least I never
attributed other rare random failures to them). But extending the mechanism
to these two functions may be worth a test.
-- Hannes
On Mon, 18 Jan 2010, Heiko Voigt wrote:
> on windows an unlink/open call can fail in case another application has
> the file opened for writing. If this happens it is currently very
> annoying for the user as git stops there and he ends up in a half
> checked out/merged/... state.
>
> I do not know whether I recall correctly that there has been some work
> related repeating such an operation in case of an access denied failure
> because I can not find anything related in the source.
>
> As this still happens to my users in irregular intervals it seems to be
> such a common case for windows user in their day to day work that I want
> to do something about it.
Indeed.
As Hannes already mentioned, we do something like that for renames. I
think it might be a good thing to generalize this, so that you can --
eventually -- tell the user what is going on, and let them Retry in that
wonderful DOS fashion (how often have you seen the message "Retry, Abort
or Fail?"?)
However, I would try to go about it the same way as Hannes: automatic
first, and only then bolt on something that could even be handled by
something like a hook, or a config setting specifying the helper
application.
Ciao,
Dscho
Yes I agree on that. First automatic and then user interaction. I think
we should provide a default user interaction (via CLI) and make it a
config option to use an external helper. What do you think?
cheers Heiko
How about taking care of the automatic handling first?
Ciao,
Dscho
So here is a short test script showing the race condition. For me it did
fail for 7 out of 10 runs. Should we add it to the testsuite? If we do
how long should it run and how could we scale the time based on the
machine ? What are your experiences?
cheers Heiko
---8<--------
#!/bin/sh
touch /tmp/$0_test_running
create_repo() {
test ! -d repo &&
rm -f repo &&
mkdir repo &&
(cd repo &&
git init &&
echo "Point A" > file &&
git add file &&
git commit -m "Point A" &&
git checkout -b pointA &&
git checkout -b pointB &&
echo "Point B" > file &&
git add file &&
git commit -m "Point B" )
}
read_n_write_files() {
(cd repo &&
while test -f /tmp/$0_test_running
do
# give git a chance
sleep 0
echo "Lalala" > file
done)
}
switch_branches() {
(cd repo &&
branch=pointA
while test -f /tmp/$0_test_running
do
git checkout -f $branch ||
rm /tmp/$0_test_running
if [ "$branch" = "pointA" ]
then
branch=pointB
else
branch=pointA
fi
done )
}
create_repo
read_n_write_files &
switch_branches &
echo "Press enter to end the test"
read i
rm /tmp/$0_test_running
On Sun, 24 Jan 2010, Heiko Voigt wrote:
> So here is a short test script showing the race condition. For me it did
> fail for 7 out of 10 runs. Should we add it to the testsuite? If we do
> how long should it run and how could we scale the time based on the
> machine ?
As this is only reproducible on Windows, I'd rather have it as a part of
/share/msysGit/run-tests.sh...
Ciao,
Dscho
But as the implementation will be part of the compat layer for Windows
it might make sense to include upstream and only enabled for Windows.
I found out that my first script does not reproduce the correct failure
as it also fails on other platforms and is about a race condition where
both processes write. Attached is another test script that should
trigger the right behavior. We need to use windows open function
(CreateFile()) because the mingw open() does not seem to be exclusive
against writing if the file is opened for reading.
The patches that implement automatic retries for unlink follow.
cheers Heiko
---8<------
#!/bin/sh
touch /tmp/$0_test_running
create_repo() {
test ! -d repo &&
rm -f repo &&
mkdir repo &&
(cd repo &&
git init &&
echo "Point A" > file &&
git add file &&
git commit -m "Point A" &&
git checkout -b pointA &&
git checkout -b pointB &&
echo "Point B" > file &&
git add file &&
git commit -m "Point B" )
}
gcc -o block_read -x c - <<BLOCK_READ
#include <stdio.h>
#include <windows.h>
int main(int argc, const char *argv[])
{
HANDLE file;
DWORD dwNumRead;
BOOL ret;
char buffer[256];
if (argc != 3) {
printf("Usage: %s <block-time> <filename>", argv[0]);
exit(1);
}
file = CreateFile(argv[2], GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (file == INVALID_HANDLE_VALUE) {
fprintf(stderr, "failed to open: '%s'\n", argv[2]);
exit(1);
}
Sleep(atoi(argv[1]));
if (!CloseHandle(file)) {
fprintf(stderr, "failed to close: '%s'\n", argv[2]);
exit(1);
}
Sleep(atoi(argv[1]));
return 0;
}
BLOCK_READ
read_n_write_files() {
(cd repo &&
while test -f /tmp/$0_test_running
do
# give git a chance
./../block_read 50 file
done)
}
switch_branches() {
(cd repo &&
branch=pointA
while test -f /tmp/$0_test_running
do
git checkout -f $branch ||
rm /tmp/$0_test_running
if [ "$branch" = "pointA" ]
then
branch=pointB
else
branch=pointA
fi
done )
}
create_repo
read_n_write_files &
switch_branches
echo "Press enter to end the test"
read i
rm /tmp/$0_test_running
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
compat/mingw.c | 8 ++++++++
compat/mingw.h | 11 +++--------
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 10d6796..e5b2fae 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -147,6 +147,14 @@ int mingw_mkdir(const char *path, int mode)
return ret;
}
+#undef unlink
+int mingw_unlink(const char *pathname)
+{
+ /* read-only files cannot be removed */
+ chmod(pathname, 0666);
+ return unlink(pathname);
+}
+
#undef open
int mingw_open (const char *filename, int oflags, ...)
{
diff --git a/compat/mingw.h b/compat/mingw.h
index 1b528da..349d878 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -98,14 +98,6 @@ static inline int fcntl(int fd, int cmd, long arg)
* simple adaptors
*/
-static inline int mingw_unlink(const char *pathname)
-{
- /* read-only files cannot be removed */
- chmod(pathname, 0666);
- return unlink(pathname);
-}
-#define unlink mingw_unlink
-
static inline int waitpid(pid_t pid, int *status, unsigned options)
{
if (options == 0)
@@ -140,6 +132,9 @@ int readlink(const char *path, char *buf, size_t bufsiz);
int mingw_mkdir(const char *path, int mode);
#define mkdir mingw_mkdir
+int mingw_unlink(const char *pathname);
+#define unlink mingw_unlink
+
int mingw_open (const char *filename, int oflags, ...);
#define open mingw_open
--
1.6.6.rc1.35.g13067
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
compat/mingw.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index e5b2fae..0df3b21 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -7,6 +7,7 @@
extern int hide_dotfiles;
unsigned int _CRT_fmode = _O_BINARY;
+static const int delay[] = { 0, 1, 10, 20, 40 };
static int err_win_to_posix(DWORD winerr)
{
@@ -150,9 +151,22 @@ int mingw_mkdir(const char *path, int mode)
#undef unlink
int mingw_unlink(const char *pathname)
{
+ int ret, tries = 0;
+
/* read-only files cannot be removed */
chmod(pathname, 0666);
- return unlink(pathname);
+ while ((ret = unlink(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ /*
+ * We assume that some other process had the source or
+ * destination file open at the wrong moment and retry.
+ * In order to give the other process a higher chance to
+ * complete its operation, we give up our time slice now.
+ * If we have to retry again, we do sleep a bit.
+ */
+ Sleep(delay[tries]);
+ tries++;
+ }
+ return ret;
}
#undef open
@@ -1042,7 +1056,6 @@ int mingw_rename(const char *pold, const char *pnew)
{
DWORD attrs, gle;
int tries = 0;
- static const int delay[] = { 0, 1, 10, 20, 40 };
/*
* Try native rename() first to get errno right.
--
1.6.6.rc1.35.g13067
Just saw that the patches do not cleanly apply on Junios master.
They are based msysgit/master. A rebased version for testing can be
found on[1].
cheers Heiko
Careful! You cannot loop on every failure, disregarding the failure reason.
For example, if the failuere is due to ENOENT, you do not have to retry.
-- Hannes
No; this is not about a test of git, but of the C runtime. Git can expect that
this fundamental infrastructure works. If msysgit has to bend over backwards
to meet this expectation, then it is a problem of msysgit.
> read_n_write_files() {
> (cd repo &&
> while test -f /tmp/$0_test_running
> do
> # give git a chance
> ./../block_read 50 file
You should be able to do this with
(
exec < file
perl -e 'select(undef,undef,undef,0.05)'
)
without another external program.
-- Hannes
On Tue, 26 Jan 2010, Heiko Voigt wrote:
> On Tue, Jan 26, 2010 at 10:10:16PM +0100, Heiko Voigt wrote:
> > If a file is opened by another process (e.g. indexing of an IDE) for
> > reading it is not allowed to be deleted. So in case unlink fails retry
> > after waiting for some time. This extends the workaround from 6ac6f878.
>
> Just saw that the patches do not cleanly apply on Junios master.
> They are based msysgit/master. A rebased version for testing can be
> found on[1].
I think that is okay, as we have to test it in msysGit (and maybe to carry
in 4msysgit.git forever).
Care to push to a branch in 4msysgit.git?
Ciao,
Dscho
I considered this to be such an uncommon case that waiting for some ms
would not be to bad. But since you mentioned it and to be clean I will
incorporate a check for that.
cheers Heiko
Not sure here. According to [1]:
~~~~snip~~~~~
ERRORS
The unlink() function shall fail and shall not unlink the file if:
[...]
[EBUSY]
The file named by the path argument cannot be unlinked because it is
being used by the system or another process and the implementation
considers this an error.
~~~~snap~~~~~
So according to POSIX unlink may fail if the system thinks so. Even
though most POSIX conformant systems do not do that in practise this is
an argument that the infrastructure meets the expectation if this call fails
for an "file in use" case.
But I do not feel very strongly about this, we can leave the test in
msysgit if everyone else thinks it belongs there.
> > read_n_write_files() {
> > (cd repo &&
> > while test -f /tmp/$0_test_running
> > do
> > # give git a chance
> > ./../block_read 50 file
>
> You should be able to do this with
>
> (
> exec < file
> perl -e 'select(undef,undef,undef,0.05)'
> )
>
> without another external program.
Unfortunately this seems to trigger a different failure case. My patch
does not fix this (even if extended to mingw_open). Not sure why.
In my C implementation the pause is during the locked-for-reading-phase.
Whereas your implementation would read as fast as possible and then
pause.
cheers Heiko
[1] http://www.opengroup.org/onlinepubs/000095399/functions/unlink.html
Here you go. An updated version of the changes can be found on
hv/automatic_retry. I have based it on the devel branch.
Some feedback about your experiences with this change would be nice.
cheers Heiko
Do you intend to check just for ENOENT? Then I will not agree.
Look how we do it for rename(): We do not retry for every random error
condition, but only for that one error condition that happens when another
process has the file open -- ERROR_ACCESS_DENIED.
-- Hannes
No of course not! ;) Have a look here:
http://repo.or.cz/w/git/mingw/4msysgit.git/commitdiff/a655bb
Just as you hopefully expected it.
> Look how we do it for rename(): We do not retry for every random error
> condition, but only for that one error condition that happens when another
> process has the file open -- ERROR_ACCESS_DENIED.
See above.
cheers Heiko
No. 'exec <file' opens the file and keeps it open until the shell (or
sub-shell) exits. So, my example *does* keep the file open while it sleeps.
The difference might be that the fork of the sub-shell takes considerable
time, so you see more delays between the reads. perl startup may also be
slower and add to the delay.
-- Hannes
Good! You have this code:
+ while ((ret = unlink(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ if (err_win_to_posix(GetLastError()) != EACCES)
+ break;
1. You call GetLastError(). Are you sure that you get the error from the
correct system call?
2. When you look at an errno value. Why do you not inspect errno?
3. When you call GetLastError() anyway, why not check for the real Windows
error code that applies to the case of interest? (ERROR_ACCESS_DENIED again,
most likely). Checking the result of err_win_to_posix for EACCES means that
you check for many different Windows error codes that do not indicate that a
program had the file open.
-- Hannes
Probably yes. But you are right, we use unlink so we should be
consistent and use errno. I will send another patch series with that
corrected. The branch is already updated.
> 2. When you look at an errno value. Why do you not inspect errno?
See above.
> 3. When you call GetLastError() anyway, why not check for the real Windows
> error code that applies to the case of interest? (ERROR_ACCESS_DENIED again,
> most likely). Checking the result of err_win_to_posix for EACCES means that
> you check for many different Windows error codes that do not indicate that a
> program had the file open.
See above.
cheers Heiko
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
compat/mingw.c | 8 ++++++++
compat/mingw.h | 11 +++--------
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 33b6b80..3661553 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -157,6 +157,14 @@ int mingw_mkdir(const char *path, int mode)
return ret;
}
+#undef unlink
+int mingw_unlink(const char *pathname)
+{
+ /* read-only files cannot be removed */
+ chmod(pathname, 0666);
+ return unlink(pathname);
+}
+
#undef open
int mingw_open (const char *filename, int oflags, ...)
{
diff --git a/compat/mingw.h b/compat/mingw.h
index 364cc72..740aa59 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -111,14 +111,6 @@ static inline int fcntl(int fd, int cmd, long arg)
* simple adaptors
*/
-static inline int mingw_unlink(const char *pathname)
-{
- /* read-only files cannot be removed */
- chmod(pathname, 0666);
- return unlink(pathname);
-}
-#define unlink mingw_unlink
-
static inline pid_t waitpid(pid_t pid, int *status, unsigned options)
{
if (options == 0)
@@ -153,6 +145,9 @@ int readlink(const char *path, char *buf, size_t bufsiz);
int mingw_mkdir(const char *path, int mode);
#define mkdir mingw_mkdir
+int mingw_unlink(const char *pathname);
+#define unlink mingw_unlink
+
int mingw_open (const char *filename, int oflags, ...);
#define open mingw_open
--
1.7.0.rc0.27.ge8d3a
Signed-off-by: Heiko Voigt <hvo...@hvoigt.net>
---
Junio: How do you feel about including this upstream? We already have
the same logic for mingw_rename?
Johannes(both ;)): Do you think this would be ready in this state?
I have ran my following final testscript with the stated command 5 times
with and without the change. Without it it failed evertime. With it all
runs passed.
rm -rf repo block_read.exe && ./test_it.sh && echo Test OK
----8<--------- test_it.sh
#!/bin/sh
touch /tmp/$0_test_running
Sleep(atoi(argv[1]));
return 0;
}
BLOCK_READ
read_n_write_files() {
(cd repo &&
while test -f /tmp/$0_test_running
do
# give git a chance
./../block_read 50 file 2> /dev/null
done)
}
switch_branches() {
(cd repo &&
branch=pointA
while test -f /tmp/$0_test_running
do
git checkout -f $branch 2> /dev/null ||
rm /tmp/$0_test_running
if [ "$branch" = "pointA" ]
then
branch=pointB
else
branch=pointA
fi
done )
}
create_repo
read_n_write_files &
switch_branches &
echo "This test will run for 10 seconds"
sleep 10 &&
rm /tmp/$0_test_running
----8<-------------
compat/mingw.c | 19 +++++++++++++++++--
1 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/compat/mingw.c b/compat/mingw.c
index 3661553..7a833de 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -7,6 +7,7 @@
#include "../cache.h"
unsigned int _CRT_fmode = _O_BINARY;
+static const int delay[] = { 0, 1, 10, 20, 40 };
static int err_win_to_posix(DWORD winerr)
{
@@ -160,9 +161,24 @@ int mingw_mkdir(const char *path, int mode)
#undef unlink
int mingw_unlink(const char *pathname)
{
+ int ret, tries = 0;
+
/* read-only files cannot be removed */
chmod(pathname, 0666);
- return unlink(pathname);
+ while ((ret = unlink(pathname)) == -1 && tries < ARRAY_SIZE(delay)) {
+ if (errno != EACCES)
+ break;
+ /*
+ * We assume that some other process had the source or
+ * destination file open at the wrong moment and retry.
+ * In order to give the other process a higher chance to
+ * complete its operation, we give up our time slice now.
+ * If we have to retry again, we do sleep a bit.
+ */
+ Sleep(delay[tries]);
+ tries++;
+ }
+ return ret;
}
#undef open
@@ -1054,7 +1070,6 @@ int mingw_rename(const char *pold, const char *pnew)
{
DWORD attrs, gle;
int tries = 0;
- static const int delay[] = { 0, 1, 10, 20, 40 };
/*
* Try native rename() first to get errno right.
--
1.7.0.rc0.27.ge8d3a