Re: Staging large amount of files is very slow

114 views
Skip to first unread message

David Aguilar

unread,
Apr 19, 2013, 10:03:03 PM4/19/13
to hor...@gmail.com, git-cola
On Fri, Apr 19, 2013 at 7:44 AM, <hor...@gmail.com> wrote:
> Hi everyone,
>
> first of all: I think git cola is the best git UI for Linux :-)
>
> I think staging files is not as fast as it could be though:
> When I stage many small files (many thousands), I can see that git-cola
> calls git diff many times, always on the same file. The staging process
> takes more than 15 minutes for me, using git-cola 1.8.2.

Ah, interesting. Thanks for the report.
This should certainly be improved.

It certainly shouldn't do that since it would make much more sense to
do all the additions at once and then refresh diffs/etc. once at the
very end.

I'm pretty sure this can be fixed. Is it literally thousands of files?

I do know that we had an issue once where the number of added files
was overrunning some "maximum command-line length" so the "add"
operation was modified so that it processes files in chunks of 40(?).
It's an arbitrary number and could be larger, but we should first
determine whether or not that's the source of this slowness. It seems
like there's a callback that's getting triggered multiple times when
really it should only be triggered once.

This would be a good item to tackle before the next release.


> Here you can see what happens when I grep for the git diff process:
>
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> 0 R 1000 12393 12169 0 80 0 - 8348 - 16:30 ? 00:00:00
> git diff --unified=2 --submodule --no-ext-diff -M --patience --no-color --
> TestProject/src/test/java/com/tests/v02/001_MainMenuBeforeInitialSync.out.screendump
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> 0 R 1000 12662 12169 0 80 0 - 6771 - 16:30 ? 00:00:00
> git diff --unified=2 --submodule --no-ext-diff -M --patience --no-color --
> TestProject/src/test/java/com/tests/v02/001_MainMenuBeforeInitialSync.out.screendump
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> 0 R 1000 13070 12169 0 80 0 - 6436 - 16:30 ? 00:00:00
> git diff --unified=2 --submodule --no-ext-diff -M --patience --no-color --
> TestProject/src/test/java/com/tests/v02/001_MainMenuBeforeInitialSync.out.screendump
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> csp@noodles ~ $ ps -elf | grep diff | grep -v grep
> 0 R 1000 17525 12169 0 80 0 - 6436 - 16:36 ? 00:00:00
> git diff --unified=2 --submodule --no-ext-diff -M --patience --no-color --
> TestProject/src/test/java/com/tests/v02/001_MainMenuBeforeInitialSync.out.screendump
>
> You can see that the git process-id changes, but the filename is always the
> same. Maybe this is the UI refreshing itself everytime a single file is
> staged? Could this be optimized so that the UI is only refreshed once the
> entire staging is done? I dont know how big the performance impact is, but I
> think its something that slows down the staging to some degree.
>
> kind regards,
> Christian
>
> --
> --
> To unsubscribe from this group an send email to
> git-cola+u...@googlegroups.com
> For more information about this group visit
> http://groups.google.com/group/git-cola?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "git-cola" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to git-cola+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
David

David Aguilar

unread,
Apr 19, 2013, 11:39:53 PM4/19/13
to hor...@gmail.com, git-cola
On Fri, Apr 19, 2013 at 7:03 PM, David Aguilar <dav...@gmail.com> wrote:
> On Fri, Apr 19, 2013 at 7:44 AM, <hor...@gmail.com> wrote:
>> Hi everyone,
>>
>> first of all: I think git cola is the best git UI for Linux :-)

Thanks

>>
>> I think staging files is not as fast as it could be though:
>> When I stage many small files (many thousands), I can see that git-cola
>> calls git diff many times, always on the same file. The staging process
>> takes more than 15 minutes for me, using git-cola 1.8.2.
>
> Ah, interesting. Thanks for the report.
> This should certainly be improved.

... and before I dig in a little more, how exactly are you staging the files?

Are you selecting them all at once (click first one, shift-click last
one) and then Ctrl+S to stage?

Are you using inotify?

That last question might be the answer, especially if the number of
files is in the thousands.

Assuming the answer is yes...

>> You can see that the git process-id changes, but the filename is always the
>> same. Maybe this is the UI refreshing itself everytime a single file is
>> staged? Could this be optimized so that the UI is only refreshed once the
>> entire staging is done? I dont know how big the performance impact is, but I
>> think its something that slows down the staging to some degree.

The GUI only updates itself at the end of the add operation,
but if inotify is enabled then I can imagine that it could see the sliced
add operation in progress and continually update itself in response.

I have some changes that I'll push to an "inotify" branch at
https://github.com/davvid/git-cola
They prevent the inotify stuff from kicking in while in we're staging/unstaging.

Please try 'em out and let me know if they help.

Best,
--
David

horschi

unread,
Apr 20, 2013, 7:47:31 AM4/20/13
to David Aguilar, git-...@googlegroups.com
Hi David,

thanks for your response.

Are you selecting them all at once (click first one, shift-click last
one) and then Ctrl+S to stage?
Yes, I  select the first file, then I start paging though the list and check if the correct files have changed. At the end I shift-click the last file. Then I right-click on the selection and click stage. I do not use shortcuts (yet).
Sometimes, when the file list needs some reviewing, then I do not stage the entire list at once. Instead I only stage files in batches of ~1000.
 

Are you using inotify?
I did not explicitly enable anything, but I assume I am: Git-cola definetely refreshes itself when I change files from my IDE.
 

That last question might be the answer, especially if the number of
files is in the thousands.
Yes, its really thousands of files. I just checked one of my commits from yesterday, its was 6480 files.

 
The GUI only updates itself at the end of the add operation,
but if inotify is enabled then I can imagine that it could see the sliced
add operation in progress and continually update itself in response.
That sounds very reasonable to me.
 

I have some changes that I'll push to an "inotify" branch at
https://github.com/davvid/git-cola
They prevent the inotify stuff from kicking in while in we're staging/unstaging.

Please try 'em out and let me know if they help.
I will try my best. I will let you know how it performs....


Also: I sometimes got an "python-event-queue is full" error. Unfortunetaly I did not copy the exact error message. Is this maybe something related?

Cheers,
Christian
 

horschi

unread,
Apr 30, 2013, 8:39:56 AM4/30/13
to David Aguilar, git-...@googlegroups.com
Hi David,

I was working with your inotify branch this week. It worked fine so far. It seemed a bit faster, but I havn't had any of these very large commits so far.

Today unfortunetaly I cannot stage anything at all. It simply does not do anything and it does not log any errors to the console. This errors happens with the inotify-branch and the 1.8.3rc1 version. I also had this error some time ago with the 1.4.x git cola that comes with ubuntu.

I restarted git-cola multiple times now and I still cannot stage. I now switched to another GUI to perform my stage. There it works fine.

If you you want to add any debugging info to the inotify branch, I would be glad to continue testing with it and report anything I find.

kind regards,
Christian



 

horschi

unread,
Apr 30, 2013, 8:42:11 AM4/30/13
to David Aguilar, git-...@googlegroups.com
Sorry, I forgot to mention: This error currently occurs with only ~50 files changed. So its not one of those large commits.

David Aguilar

unread,
Apr 30, 2013, 2:36:16 PM4/30/13
to horschi, git-cola
Sorry about that. Do you think you could share something --
typically, how long are the filenames in your project?

What does `getconf ARG_MAX` report on your machine?

It's strange that it would suddenly stop staging.


$ env GIT_COLA_TRACE=1 /path/to/bin/git-cola
and
$ env GIT_COLA_TRACE=full /path/to/bin/git-cola

may be able to help us understand what's going on.
--
David
Reply all
Reply to author
Forward
0 new messages