I've noticed that too. The last time I profiled git, most of the CPU
time was going to the Windows kernel (and most of that was in ACL
checks (can you read this file? Can you read this directory? Can you
run that program?)).
Windows is just plain slower than Linux at operating on many small files.
> We see this behaviour across all of our repositories; but taking a
> reasonably small one containing about 600 files with the .git/
> directory weighing in at about 2.4mb, yet most git operations are
> significantly slower than under Linux or OSX.
>
> For example a git status on this repository takes about 5 seconds,
Okay, that's not normal. I'm away from the office at the moment so I
can't get exact numbers, but 5 seconds is about right for a cold-cache
"git status" on over ten thousand files, with a 300+MB .git directory.
hot-cache is under a second (Vista x64, 7 x64, and XP x32 on 2004
vintage Athlon64, Core2Quad, and VirtualBox respectively).
My ~600 files ~2.4MB repositories all "git status" in well under a
second even in the cold cache case. So it sounds like there's
something fishy going on.
> I've searched around this a lot, and have disabled all AV/firewalling,
> tried running under Administrator and have even toggled
> core.ignorestat in my git configuration, with none having a noticeable
> effect on my performance. I should add that the machines are largely
> just stock installs with little else installed or running.
I'll throw out a bunch of crazy ideas here, and see if any of them stick:
Are your ACLs unusually long?
Are you using AD? I know roughly nothing about AD, and weather or how
or even if it caches remote credentials for the purpose of ACL
lookups.
The git clone is on a local disk and not some SMB/CIFS/NFS mount, right?
Is git using CPU during that 5 seconds, or is it waiting for something
else to time-out? eg. Are there invalid paths (eg. disconnected
network shares) in your %PATH%? If git is using CPU, what does
CodeAnalyst[1] say?
> Is there anything I can do to improve performance, particularly on the
> CI server?
Install Hudson and a mingw cross compiler on Linux on your CI server? ;-)
Peter Harris
[1] http://developer.amd.com/cpu/CodeAnalyst/Pages/default.aspx
As far as I know, no. I've not tinkered with any security settings on
either machine, and they only have 2/3 users on (including
Administrator). The ACL's listed on the directory holding the repo
only has 3 entries listed (SYSTEM, Administrators, and me).
> Are you using AD? I know roughly nothing about AD, and weather or how
> or even if it caches remote credentials for the purpose of ACL
> lookups.
No, I'm not using Active Directory, at least locally.
> The git clone is on a local disk and not some SMB/CIFS/NFS mount, right?
Yes, for the local machine there are no network shares in use. The
remote CI server is hosted on EC2 and is the data centre edition, so I
don't know if anything special is going on there, regarding their
filesystem... Eitherway it doesn't explain the similar performance I
get locally.
> Is git using CPU during that 5 seconds, or is it waiting for something
> else to time-out? eg. Are there invalid paths (eg. disconnected
> network shares) in your %PATH%? If git is using CPU, what does
> CodeAnalyst[1] say?
>
> [1] http://developer.amd.com/cpu/CodeAnalyst/Pages/default.aspx
>
It is using about 25% of the CPU the whole time I'm waiting. All
paths are local.
I'll maybe take a look at trying Code Analyst tomorrow.
Thanks for the quick response,
R.
I'm at the office now. 1.5s cold cache, 1.1s hot cache for a 17546
file repository (1.7s to do a "find src | wc -l", so git is pretty
fast compared to other tools, if not compared to Linux). This is Vista
x64 on a Core2Quad, running git 1.7.2.3-preview20100911.
>> My ~600 files ~2.4MB repositories all "git status" in well under a
>> second even in the cold cache case.
A couple more data points: I just saw an anomalous 3.3s cold-cache
"git status" on one repository of about this size. The box must have
been busy doing something else. Hot cache was 0.1s. Another repository
of similar size was 0.54s cold and 0.08s hot.
>>> I've searched around this a lot, and have disabled all AV/firewalling,
>>> tried running under Administrator and have even toggled
>>> core.ignorestat in my git configuration, with none having a noticeable
>>> effect on my performance. I should add that the machines are largely
>>> just stock installs with little else installed or running.
>>
>> I'll throw out a bunch of crazy ideas here, and see if any of them stick:
One more crazy idea:
Have you run "git gc" recently?
Peter Harris
I just tried a "git status" on my largest private repo (~200 files,
~150Mb .git dir), and it took 13.077 seconds.
On a work-repo, it took 12.208 seconds. That's with 15k files and a
1.5G git-file.
With warm-cache, both are less than 0.1 seconds.
So it seems to me that the repo-size does not matter much for the
performance of git status. Weird.
By the way, if you want to test cold-cache performance, I found this
tool: http://chadaustin.me/2009/04/flushing-disk-cache/ . My numbers
was not captured with that, but after using it I successfully
reproduce roughly the same numbers.
I'm using 1.7.4.rc1.3197.g51744 on NTFS.
Try the same git commands on a machine without an antivirus program,
to see if your current one is causing slowness.
I've manged to figure out a few things.
Firstly my problem as I see it on my desktop is due to git submodules.
Having used them extensively and never noticed a speed problem with
them on Linux I'd assumed they were pretty efficient; but on Windows
they can definitely cause significant slow downs. Prior to a
submodule update git status is quick, after a submodule update things
slow down significantly... The fact that the repositories in question
all have submodules 3 levels deep, leads to long wait times with the
following nested weights:
- 2.4mb
- 8mb
- 50mb
This can lead to 21 second cold-cache statuses, and 5 second hot-cache
ones... Under Linux the same repo with the same submodules takes
about 0.7 seconds for a cold-cache git status (running in a VM on the
windows host which has the problem).
I've aliased git status --ignore-submodules to something more
convenient to type on Windows, and this performs quickly. Most of the
time the submodules don't change so I'm happy with this workaround.
Secondly, the problem on the server is the same, but was compounded by
this issue here:
I firstly tried to use plink, but figured it was going to be too much
hassle to reconfigure my ssh keys, setup pagent etc... so I've
instead opted to launch hudson from a bat file launched at startup
instead of as a Windows Service. This simple workaround reduces the
time for the following sequence of git operations from about 8 minutes
to 20 seconds:
git fetch -t g...@github.com:git-repo.git +refs/heads/*:refs/remotes/origin/*
git ls-tree HEAD
cd submodule
git fetch -t g...@github.com:git-submodule-repo.git
+refs/heads/*:refs/remotes/origin/*
cd ..
git tag -l master
git rev-parse origin/master
git checkout -f d62e360728c2c96edab6a5a47e1013bc31705e56
git submodule init
git submodule sync
git submodule update --init --recursive
git tag -a -f -m "Hudson Build #8" hudson-project-build-8
git whatchanged --no-abbrev -M --pretty=raw
d62e360728c2c96edab6a5a47e1013bc31705e56..d62e360728c2c96edab6a5a47e1013bc31705e56
It'd be nice if submodules ran quicker like on Linux, but for my
current project performance is acceptable.
Thanks again to everyone who offered me help and suggestions.
R.