I am also using the most recent version of msysGit. What are you using?
-Josh
FWIW: I'm getting somewhat similar:
- cold cache: 1m55.383s
- second run: 0m20.960s
- third run: 0m20.882s
But my machine is not at all that fast...
I work in a big tree, and my scan speeds are nowhere near as slow.I found a few references to git status speed on the ML and have tried to understand the problems of FindFirstFile and GetFileInformationByHandle vs. having inode data available.
I am also using the most recent version of msysGit. What are you using?
I'm also on Windows 7, and I found this in the issue tracker: http://code.google.com/p/msysgit/issues/detail?id=320. I'll investigate that next.
- cold cache: 1m06.08s
- second run: 0m06.64s
- third run: 0m06.63s
I run Windows 7 and do not have UAC on.
-Josh
On Mon, Dec 19, 2011 at 6:57 PM, Joshua Jensen <jje...@workspacewhiz.com> wrote:
I get the following on a Sony Core i7 laptop with new Seagate Momentus XT drive (the brand new one):
- cold cache: 1m06.08s
- second run: 0m06.64s
- third run: 0m06.63s
Let me give you some additional things to think about:Hmm, well failing an algorithmic improvement, that seems like a decent improvement. Perhaps I should just go HDD shopping.
Let me give you some additional things to think about:
My work computer is much faster than my laptop. The obliteration of all artifacts of our asset build takes over 4 minutes, and the obliteration of the artifacts is mostly an 'rm -rf' process. If I run the free MyDefrag Data Disk Monthly on the drive, the obliteration of the artifacts takes right around **10 seconds**.
On my home Core i7 laptop, I cloned the WebKit code onto a defragmented partition of the Momentus XT (2 or 750 gb or whatever they're calling it). Theoretically, the layout is very compact with no fragmentation. Quickly eyeballing it in MyDefrag seems to confirm that. My 'git status' may have run over the close equivalent of a 'defragmented' drive.
A Lua mailing list posting the other day talked about GetFileInformationByHandleEx() [1]. I believe you made mention of something similar.
I do not have UAC on. Is yours off?
> Hopefully once I have some c++ wrappers, somebody who's interested can
> help with the integration work :)
Your USN for NTFS stuff looks quite interesting to me, thanks for your
efforts so far. Please note, however, that in order to get your changes
integrated into Git for Windows, they should be written in plain C, not
C++. This is because upstream Git is (mostly) written in plain C, and
we're aiming to get all of our changes contributed back upstream, with
only few conditional compiles for Windows vs. Linux (or Mac).
--
Sebastian Schuberth
This could be worked around, but it'd mean that you'd need to cache the last
git status result somewhere, then use its mtime to determine whether you
should "trust" the USN journal.
--
Paul Betts <pa...@paulbetts.org>
History
The USN Journal, sometimes called the Change Journal was
introduced in Windows 2000, and like the name suggests it’s role is to record
changes. It’s part of the NTFS file system, and it’s role is to record changes
to files and folders on a NTFS partition. If a NTFS journal is activated, it is
guaranteed that it will record every file and folder change on the drive.
Requirements
In order to make use of the Journal, the thread executing access must be running with administrative privileges, the drive must be NTFS, and the Windows and NTFS version have be higher then 5.0 (Win 2000).
Required reading:
If you want to start doing anything with the journal, you really have to read [1] and [2], the articles are from 1999, but as far as I can tell, they are the only solid ‘documentation’ of how the journal works and contain everything you must do in order to get information out of the journal there isn't much point in me repeating what's already in these well written articles. A good utility just to have a quick look at the USN is documented here [3]
Thoughts
This section is mostly going to be my ramblings and thoughts. You might find something of use in here.
In order to be sure that the Journal you are calling is ‘valid’ you need to know the ID of the journal last time you called it, if they match you know it’s the same journal. You then need to check to make sure the last journal entry you were keeping an eye on is still in the journal. Check the lowest journal entry, and if the lowest entry is higher than your last entry you’re in luck and the journal is valid.
One of the biggest performance issues I had getting the C#
code to perform well was the calls to the native file system, and moving data
between managed and unmanaged memory. With some smart coding I reduced the
amount of conversions to a bare minimum to the point of it being a non-issue, not really a issue when coding in pure C.
The next biggest performance issue experienced by the code
is the need for a directory database. As a USN record stores the filename, and
the folder ID. you need to get the folder path from a different API. This involves opening up a read operation to the folder by it’s ID and
getting its path from the properties. This makes the process extremely IO
bound. In C# I try to minimise this by using a Dictionary (hashtable) to
store the outcome of the query, so that
if that folder ID ever comes up again, you can get information about it without
having to generate IO. Obviously I need to know this to know if a file is in a git repository.
Thanks/Credits
Firstly I’d like to thank “StCroixSkipper”
who seems to be one of the few people on the internet that talks about and
posts in forums about the USN Journal. It’s his code forms a
foundation of my code, and without it I suspect I would still be yelling at my computer in frustration.
I’d also like to thank everybody who works on msysGit (and git in general) as it’s an awesome SCM.
References
[1] http://www.microsoft.com/msj/0999/journal/journal.aspx-Josh