New issue 867 by yves.goe...@gmail.com: Add should not run a new Git
process for each file
http://code.google.com/p/tortoisegit/issues/detail?id=867
What steps will reproduce the problem?
1. Select a whole directory with many files to add
What is the expected output? What do you see instead?
git-add accepts a list of files to add. That should be done, calling the
git process a single time. Instead, a new git process is started for each
file. This takes forever and causes incredible system load.
What version of the product are you using? On what operating system?
TortoiseGit 1.6.5.0 (1.7.2.0 does not work at all) on Windows XP.
A similar issue exists when opening the Add dialogue for multiple files
from one context menu. So select many files, right-click on them, select
TortoiseGit/Add and then see many git processes started which takes a long
time. This seems to be a common problem of TortoiseGit. Git is fast, the
tortoise is slow.
Same for commit of ~800 files (initial commit)... still waiting...
I've a nice proof of concept for this issue until we do not have any direct
access to the git index:
https://github.com/csware/TortoiseGit/tree/massive-git-task
or
git://github.com/csware/TortoiseGit.git massive-git-task
Frank, what do you think?
I review your code. which can't resolve problem 100%. But it will a little
improve.
I suggest export "add" function at gitdll. So Tgit can call git function
directly without launch git.exe.
Doing it directly would be the best way (and I hope libgit2 is capable of
some more tasks soon), but atm it is not just the add-method which is
called very often. I tested my code and it speeds things up a lot and can
also be used for a lot of more git-commands like revert, update-index, ...
I tested my code with "add" for ~1500 files: 3 sec vs. 30+ sec (aborted),
same for commit.
Issue 940 has been merged into this issue.
"git add" is implemented using libgit2 (revision
65d2888f8eaffe82eaa4825ef51f9fce4bb9ead2).
Comment #9 on issue 867 by sstrickr...@gmail.com: Add should not run a new
Git process for each file
http://code.google.com/p/tortoisegit/issues/detail?id=867
This issue is closed for add. Other optimizations like commit are stimm on
our todo list, but libgit2 doesn't support all required calls atm.
Has this been reverted in 1.7.6? I noticed 1.7.6 starts new process for
each file to add.
Not it has not been reverted. But you might have seen this in the commit
process (we csannot switch to libgit2 there). Also in 1.7.6+ we use "git
add" if we know that libgit2 cannot work (separate git dir).
Ok, I found the problem.
If I select a list of files and do
Right click -> Tortoise Git -> Add it will use libgit2 to do the addition
to index, it's VERY fast.
If I do
Right click -> Commit -> Select List of Files -> Right Click -> Add it will
launch a git process for each file.
I've attached a test repo where you can compare the performance of Right
Click -> TortoiseGit -> Add vs Right Click -> Commit -> Right Click -> Add.
I also suggest you
1. Extract the .git into a folder, navigate to it in windows explorer, with
explorer still open extract the *.txt's you'll see a massive performance
lockup.
Attachments:
libgit2addtest.7z 38.0 KB
Comment #13 on issue 867 by pro-lo...@optusnet.com.au: Add should not run a
new Git process for each file
http://code.google.com/p/tortoisegit/issues/detail?id=867
(No comment was entered for this change.)
or at least add a registry flag to explicitely enable it, but disable it by
default.
You mean revert to use of git.exe to add files? Given the number of issue
caused by libgit2 it's probably a good idea. (We should probably report all
these issues to libgit2, at the end of the day we want to use it since it's
much faster and easier to work with than git.exe)
My suggestion would be to pass many file names to each git process at a
time, as opposed to just 1, this will at least speed it up a bit. A single
command can be at most 8191 characters in windows XP[1].
Eg: 'git.exe add "file1" "folder with space in it\file 2"' is 52
characters, so 8191 should fit a fair few files to the add parameter.
My guess would be, have an array of files to add, and start formatting
the "add" string, keep adding files in "" and relative posix paths until
the array of files to add is empty, or the string exceeds 8191 characters.
If there are no more files to add to the sting, execute it, if it exceeds
8191 remove the last string, execute the add, and start formatting a new
add string.
This way we save a lot of time for the process start/terminate cycle.
pro-logic: I already had this idea: See my massive-git-task branch.
I like the idea your massive-git-task :) I personally think it should be
the default for all operations that a list of files. Like add / revert /
remove etc etc the CPU time spent formatting the string is insignificant
compared to the CPU time of the git process starting up.
The only issue I see with it (from a very quick read) is you have
MAX_COMMANDLINE_LENGTH = 30000 where it has to be 8191
My idea - which i think you have - is esentially
string baseCommand = "git.exe add"
string command = baseCommand;
foreach fileYouWantToAdd
if(command.length + fileYouWantToAdd.length + 3 < 8191)
command += " " + "\"" + fileYouWantToAdd.path + "\"
else
command.execute
command = baseCommand
command += " " + "\"" + fileYouWantToAdd.path + "\"
end foreach
command.execute
I suppose 8191 is only for parameters of the cmd.exe command and not a
general limitation.
"CreateProcess function": "The command line to be executed. The maximum
length of this string is 32,768 characters" (for W2K and newer)
Taken from:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682425%28v=vs.85%29.aspx
Any plans to include this in 1.7.8 :) Happy to beta test ;)
it's already in 1.7.7
Do you just use it for a specific list of operations?
I just tested doing an add of 1200 files (using the repo from comment 12),
and it's FAAAST.
But selecting those same 1200 files and doing a right click "revert" seems
to start a few git processes as opposed to just 1 for the start.
Revert isn't that easy (interaction with windows recycle bin and "checkout"
vs. delete file), so it doesn't use this approach atm.