Weird problem: Git does not see that file in working directory differs from HEAD

120 views
Skip to first unread message

J66st

unread,
Feb 5, 2015, 9:50:26 AM2/5/15
to msy...@googlegroups.com
Hi.

I found a strange problem in msysGit, I am wondering if it's a bug.
I already discussed it here:
It seems not to happen in Git for Linux or OS-X.
The problem occurs in msysGit 1.9.0 and 1.9.5 (running x86 version on a Windows 7 x64 system).

Essentially, the problem seems that Git for Windows assumes that two files are the same when both  the timestamp and the file size match. Obviously the file contents is not inspected nor the hash recalculated.

I attached a minimal demo package to this post to easily prove the issue. Simply download the zip file, put it in a clean directory and run the enclosed script from bash. Below is a transcript of what happens when I run the script in my situation.

The issue is a real show-stopper in my automated migration from Visual SourceSafe to Git (in a test transferring 20,000 source files the problem caused loss of roughly 0.1% of the files from the history), so I hope for a quick solution.


-------------------------------------------------------------------------------->8--------
Welcome to Git (version 1.9.5-preview20141217)

Run 'git help git' to display the help index.
Run 'git help <command>' to display help for specific commands.
joostadm@MUPC20 ~
$ cd /d/proj/try/git
joostadm@MUPC20 /d/proj/try/git
$ ls -l
total 15
-rw-r--r--    1 joostadm Administ    26968 Jan 23 18:43 DixiLink2.zip
-rwxr-xr-x    1 joostadm Administ     1848 Feb  5 14:05 weird-git-demo
joostadm@MUPC20 /d/proj/try/git
$ weird-git-demo
Note: No argument supplied, using DemoRepo by default.
+ mkdir DemoRepo
+ cd DemoRepo
+ unzip ../DixiLink2.zip
Archive:  ../DixiLink2.zip
   creating: .git/
 extracting: .git/COMMIT_EDITMSG
  inflating: .git/config
  inflating: .git/description
  inflating: .git/gitk.cache
 extracting: .git/HEAD
   creating: .git/hooks/
  inflating: .git/hooks/applypatch-msg.sample
  inflating: .git/hooks/commit-msg.sample
  inflating: .git/hooks/post-update.sample
  inflating: .git/hooks/pre-applypatch.sample
  inflating: .git/hooks/pre-commit.sample
  inflating: .git/hooks/pre-push.sample
  inflating: .git/hooks/pre-rebase.sample
  inflating: .git/hooks/prepare-commit-msg.sample
  inflating: .git/hooks/update.sample
  inflating: .git/index
   creating: .git/info/
  inflating: .git/info/exclude
   creating: .git/logs/
  inflating: .git/logs/HEAD
   creating: .git/logs/refs/
   creating: .git/logs/refs/heads/
  inflating: .git/logs/refs/heads/master
   creating: .git/objects/
   creating: .git/objects/29/
 extracting: .git/objects/29/c51bbb9ada43dbe98cbd5dbedbab56586b24c3
   creating: .git/objects/68/
 extracting: .git/objects/68/b1f129eb06b91fc6c9a3885fc0d24ad0cdaa50
   creating: .git/objects/7e/
 extracting: .git/objects/7e/8d94849a38370141b3ca2aab5c3cadc27934da
   creating: .git/objects/cf/
 extracting: .git/objects/cf/13dbb766d4aff04ad51d3cdac84fda67dc6f50
   creating: .git/objects/da/
 extracting: .git/objects/da/eca6b50a218a78e4cca5566965381e2d384b7f
   creating: .git/objects/de/
 extracting: .git/objects/de/3643596c73be4f6112d027616c8df31acd1b09
   creating: .git/objects/ec/
 extracting: .git/objects/ec/88fb155c5e9ebd8b3ead39345618517c0af6cf
   creating: .git/objects/info/
   creating: .git/objects/pack/
   creating: .git/refs/
   creating: .git/refs/heads/
 extracting: .git/refs/heads/master
   creating: .git/refs/tags/
 extracting: .git/refs/tags/2011-11-24
  inflating: dixilinkerr.h
+ echo 'This repository now contains version 1 and 2 of dixilinkerr.h:'
This repository now contains version 1 and 2 of dixilinkerr.h:
+ git log
commit cf13dbb766d4aff04ad51d3cdac84fda67dc6f50
Author: Joost <
joost@localhost>
Date:   Tue Sep 11 09:57:32 2007 +0000
    @VSS 11-09-2007 11:08:35 [Edit] dixilinkerr.h
commit de3643596c73be4f6112d027616c8df31acd1b09
Author: Joost <
joost@localhost>
Date:   Fri Jul 6 12:27:57 2007 +0000
    @VSS 01-06-2007 01:54:53 [Create] dixilinkerr.h
+ echo 'Version 3 is in our working directory:'
Version 3 is in our working directory:
+ head -n 12 dixilinkerr.h
// DixiLinkErr.h
#pragma once
#ifndef EXCPCAT_DIXILINK
#define EXCPCAT_DIXILINK 2000
#endif
#ifndef __DixiLinkErr_H_INCLUDED__
#define __DixiLinkErr_H_INCLUDED__
#ifndef IDL_ENUM
+ echo 'The file in our HEAD'
The file in our HEAD
+ git show HEAD:dixilinkerr.h
+ head -n 12
// DixiLinkErr.h
#pragma once
#ifndef CAT_DIXILINK_ERR
#define CAT_DIXILINK_ERR 2000
#endif
#ifndef __DixiLinkErr_H_INCLUDED__
#define __DixiLinkErr_H_INCLUDED__
#ifndef IDL_ENUM
+ echo 'You see? Working copy differs from HEAD.'
You see? Working copy differs from HEAD.
+ echo 'So the working directory is dirty, right? Ask Git:'
So the working directory is dirty, right? Ask Git:
+ git status
On branch master
nothing to commit, working directory clean
+ git diff
+ echo 'In my situation, Git sees no difference here, I THINK THIS IS WRONG!'
In my situation, Git sees no difference here, I THINK THIS IS WRONG!
+ echo 'So we are unable to add version 3 of our file. Let'\''s try it once again:'
So we are unable to add version 3 of our file. Let's try it once again:
+ git add dixilinkerr.h
+ git status
On branch master
nothing to commit, working directory clean
+ echo 'In my situation at this point there is nothing to add or commit.'
In my situation at this point there is nothing to add or commit.
+ echo 'End of demo'
End of demo
joostadm@MUPC20 /d/proj/try/git
$ pwd
/d/proj/try/git
joostadm@MUPC20 /d/proj/try/git
$ ls
DemoRepo  DixiLink2.zip  weird-git-demo
joostadm@MUPC20 /d/proj/try/git
$ cd DemoRepo
joostadm@MUPC20 /d/proj/try/git/DemoRepo (master)
$ git st
On branch master
nothing to commit, working directory clean
joostadm@MUPC20 /d/proj/try/git/DemoRepo (master)
$ touch
touch: file arguments missing
Try `touch --help' for more information.
joostadm@MUPC20 /d/proj/try/git/DemoRepo (master)
$ git st
On branch master
nothing to commit, working directory clean
joostadm@MUPC20 /d/proj/try/git/DemoRepo (master)
$
Demo.zip

Sebastian Schuberth

unread,
Feb 5, 2015, 11:21:12 AM2/5/15
to msy...@googlegroups.com
This is now discussed at https://github.com/msysgit/git/issues/312. Please participate there instead of here.

Johannes Sixt

unread,
Feb 5, 2015, 1:14:52 PM2/5/15
to J66st, msy...@googlegroups.com
Am 05.02.2015 um 15:50 schrieb J66st:
> I found a strange problem in msysGit, I am wondering if it's a bug.
> I already discussed it here:
> https://groups.google.com/forum/#!msg/git-users/9K0ExTQpMF8/Z3BkgiA3HJUJ
> It seems not to happen in Git for Linux or OS-X.
> The problem occurs in msysGit 1.9.0 and 1.9.5 (running x86 version on a
> Windows 7 x64 system).
>
> Essentially, the problem seems that Git for Windows assumes that two
> files are the same when both the timestamp and the file size match.
> Obviously the file contents is not inspected nor the hash recalculated.

That's correct. Git on Windows does not notice such a change. On POSIX,
we have the inode number as an additional indication to notice that the
file was updated, but on Windows we do not (the "inode" number is always
zero).

We could evaluate similar information that is present on Windows file
systems, but it just has not been implemented. So, if it is a real itch
for you, please, by all means, scratch it ;-)

-- Hannes

Sebastian Schuberth

unread,
Feb 5, 2015, 2:17:01 PM2/5/15
to msy...@googlegroups.com, vdplas...@gmail.com
Wow, that was totally new to be and sort of destroys my world picture of Git ... I always thought Git is tracking file *contents*, and nothing else.

Hannes, could you elaborate how tracking the inode helps to solve this issue on Windows? I always though the inodes also to not change when a file is changed without changing its size. (Sorry, I don't now much really about inodes or Linux file systems in general.)

Regards,
Sebastian

Johannes Sixt

unread,
Feb 5, 2015, 2:50:56 PM2/5/15
to Sebastian Schuberth, msy...@googlegroups.com, vdplas...@gmail.com
Am 05.02.2015 um 20:17 schrieb Sebastian Schuberth:
> On Thursday, February 5, 2015 at 7:14:52 PM UTC+1, Johannes Sixt wrote:
>
>>> I found a strange problem in msysGit, I am wondering if it's a bug.
>>> I already discussed it here:
>>> https://groups.google.com/forum/#!msg/git-users/9K0ExTQpMF8/Z3BkgiA3HJUJ
>>> It seems not to happen in Git for Linux or OS-X.
>>> The problem occurs in msysGit 1.9.0 and 1.9.5 (running x86 version on a
>>> Windows 7 x64 system).
>>>
>>> Essentially, the problem seems that Git for Windows assumes that two
>>> files are the same when both the timestamp and the file size match.
>>> Obviously the file contents is not inspected nor the hash recalculated.
>>
>>That's correct. Git on Windows does not notice such a change.

Don't panic! In practice, this doesn't seem to be a major problem during
interactive work.

However, if there are many changes in quick succession, induced by a
script, as in the case of the OP, chances are much higher that a change
is missed.

>> On POSIX,
>>we have the inode number as an additional indication to notice that the
>>file was updated, but on Windows we do not (the "inode" number is always
>>zero).
>>
>>We could evaluate similar information that is present on Windows file
>>systems, but it just has not been implemented. So, if it is a real itch
>>for you, please, by all means, scratch it ;-)
>
> Wow, that was totally new to be and sort of destroys my world picture of
> Git ... I always thought Git is tracking file *contents*, and nothing else.
>
> Hannes, could you elaborate how tracking the inode helps to solve this
> issue on Windows? I always though the inodes also to not change when a
> file is changed without changing its size. (Sorry, I don't now much
> really about inodes or Linux file systems in general.)

The information comparable to inodes on Windows are the members
.nFileIndexLow and .nFileIndexHigh of struct BY_HANDLE_FILE_INFORMATION
retrievable by GetFileInformationByHandle(). (Perhaps there is a
function that retrieves the same information given a path name instead
of a handle.)

BTW, notice this difference between upstream Git and Git for Windows
that exists because of the mentioned weakness:

diff --git a/t/t4130-apply-criss-cross-rename.sh
b/t/t4130-apply-criss-cross-rename.sh
index d173acd..bf7049e 100755
--- a/t/t4130-apply-criss-cross-rename.sh
+++ b/t/t4130-apply-criss-cross-rename.sh
@@ -14,8 +14,8 @@ create_file() {

test_expect_success 'setup' '
create_file file1 "File1 contents" &&
- create_file file2 "File2 contents" &&
- create_file file3 "File3 contents" &&
+ create_file file2 "File2 more contents" &&
+ create_file file3 "File3 even more contents" &&
git add file1 file2 file3 &&
git commit -m 1
'

The test case exchanges two files such that only the inode number is
modified. On Windows, git would not detect that the index is outdated
because of the unusable inode value. With this patch, the file size
changes and it notices the outdated index.

-- Hannes

J66st

unread,
Feb 5, 2015, 4:50:04 PM2/5/15
to msy...@googlegroups.com, sschu...@gmail.com, vdplas...@gmail.com
>>> Obviously the file contents is not inspected nor the hash recalculated.
>>
>>That's correct. Git on Windows does not notice such a change.

Don't panic! In practice, this doesn't seem to be a major problem during
interactive work.
 
However, if there are many changes in quick succession, induced by a
script, as in the case of the OP, chances are much higher that a change
is missed.

Indeed, no reason to panic. But I don't agree that we can just ignore the problem because it is unlikely to happen. It should, and it can be solved.
The fog is clearing now: My vss2git migration program visits every file in the VSS repository, rewinds the deltas back to the origin. Then it starts replaying the history and wants to add every version to the Git repo. It will do so in rapid succession (more than one version per second), and some files (like icons) keep a fixed size between versions. In a file system with low timestamp resolution (like FAT) this definitely will cause frequent clashes if only timestamp and size are compared. In my opinion this is NOT acceptable. In case of a timestamp and size match I am willing to spend a few extra milliseconds to actually inspect the file contents to be sure. If anything is to be cached, it should be the SHA1 hash. The basic idea of Git is that only contents matters, not timestamps. (A major annoyance when moving from VSS to Git is the loss of file timestamp information; for heaven's sake, Git, don't take them away first and then secretly start using them as a file ID!)

>> On POSIX,
>>we have the inode number as an additional indication to notice that the
>>file was updated, but on Windows we do not (the "inode" number is always
>>zero).  
>>
>>We could evaluate similar information that is present on Windows file
>>systems, but it just has not been implemented. So, if it is a real itch
>>for you, please, by all means, scratch it ;-)
 
OK, Git should know on what kind of file system a file is living. If it is not POSIX, the inode should be left out of the equation entirely. It could be replaced by something like a file index in NTFS. But I consider all these things tricks, bypassing the real thing (file contents, reliably represented by the SHA hash) only for performance reasons. But this should in no way subvert the reliability! So in case of any doubt, inspect the file contents or use it's hash. This is not just an itch that should be scratched. A migration tool should be able to trust acceptance of every file offered, and not be forced to insert a 2-second delay per file just to be sure...
 
>
> Wow, that was totally new to be and sort of destroys my world picture of
> Git ... I always thought Git is tracking file *contents*, and nothing else.
>
I fully agree, Sebastian.

Git is a well-designed version control system. It has first-class reliability. The fact that Windows by many is not considered a well-designed OS, does not mean that msysGit should become some sloppy derivate of the real Git. You msysGit developers/porters did a great job, and you can and should (within limitations imposed by the OS) try to keep up with the high standards of the real thing.

The technical details on how to properly implement this under Windows can better be discussed in the issue tracker at https://github.com/msysgit/git/issues/312 I think.

Thanks for picking up this issue so quickly! That's one of the virtues of open source software.

Thomas Braun

unread,
Feb 12, 2015, 4:58:51 PM2/12/15
to Johannes Sixt, Sebastian Schuberth, msy...@googlegroups.com, vdplas...@gmail.com
Am 05.02.2015 um 20:50 schrieb Johannes Sixt:> The information
comparable to inodes on Windows are the members
> .nFileIndexLow and .nFileIndexHigh of struct Y_HANDLE_FILE_INFORMATION
> retrievable by GetFileInformationByHandle(). (Perhaps there is a
> function that retrieves the same information given a path name instead
> of a handle.)
>
> BTW, notice this difference between upstream Git and Git for Windows
> that exists because of the mentioned weakness:
>
> diff --git a/t/t4130-apply-criss-cross-rename.sh
> b/t/t4130-apply-criss-cross-rename.sh
> index d173acd..bf7049e 100755
> --- a/t/t4130-apply-criss-cross-rename.sh
> +++ b/t/t4130-apply-criss-cross-rename.sh
> @@ -14,8 +14,8 @@ create_file() {
>
> test_expect_success 'setup' '
> create_file file1 "File1 contents" &&
> - create_file file2 "File2 contents" &&
> - create_file file3 "File3 contents" &&
> + create_file file2 "File2 more contents" &&
> + create_file file3 "File3 even more contents" &&
> git add file1 file2 file3 &&
> git commit -m 1
> '
>
> The test case exchanges two files such that only the inode number is
> modified. On Windows, git would not detect that the index is outdated
> because of the unusable inode value. With this patch, the file size
> changes and it notices the outdated index.

I have just for fun reversed the commit which makes the file contents of
the files different (2fec9936).
And here t4130 still runs without errors.

Which I now don't understand.

Johannes Sixt

unread,
Feb 12, 2015, 5:32:54 PM2/12/15
to Thomas Braun, Sebastian Schuberth, msy...@googlegroups.com, vdplas...@gmail.com
Am 12.02.2015 um 22:58 schrieb Thomas Braun:
> Am 05.02.2015 um 20:50 schrieb Johannes Sixt:> The information
>> The test case exchanges two files such that only the inode number is
>> modified. On Windows, git would not detect that the index is outdated
>> because of the unusable inode value. With this patch, the file size
>> changes and it notices the outdated index.
>
> I have just for fun reversed the commit which makes the file contents of
> the files different (2fec9936).
> And here t4130 still runs without errors.
>
> Which I now don't understand.

How often did you repeat the test? I guess that most of the time git
considers the files racily-clean and inspects the contens, but if the
timing is wrong, the change is not detected.

It is also possible that the machine must be slow to trigger the wrong
timing. I can't remember the details anymore.

-- Hannes

Thomas Braun

unread,
Feb 12, 2015, 6:05:19 PM2/12/15
to Johannes Sixt, Sebastian Schuberth, msy...@googlegroups.com, vdplas...@gmail.com
Thanks for your hint!

I repeated it something like 10 times.
But now after your comment I issued

while true; do ./t4130-apply-criss-cross-rename.sh || break; done

which stopped after something like 50 iterations with an error in "apply".

My machine is reasonably fast and the repo is on a SSD.




Reply all
Reply to author
Forward
0 new messages