What is cauing slowness of git's "Counting objects" step?

9,684 views
Skip to first unread message

Sebastian Schuberth

unread,
Apr 3, 2014, 7:05:13 AM4/3/14
to repo-d...@googlegroups.com
Hi,

both network latency and bandwidth to our Gerrit server (version 2.6.1 currently; goinf to upgrade to 2.8.3 soon) are fine, but git clients (any OS, any version) spend a lot of time in the "Counting objects" step when fetching. To narrow down the cause, is it a client or server issue if "Counting objects" is slow (i.e. *who* counts the objects, the client or the server)? If it's the server, what kind of hardware upgrade is most likely going to help? I guess more CPU power ... or better faster hard disks?

Thanks for any insights,
Sebastian

Saša Živkov

unread,
Apr 3, 2014, 7:43:44 AM4/3/14
to Sebastian Schuberth, repo-d...@googlegroups.com
I guess what you see is:
remote: Counting objects: ...

So the counting happens on the remote (Gerrit).
When did you last time "git gc" that repository?


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sebastian Schuberth

unread,
Apr 3, 2014, 8:00:49 AM4/3/14
to Saša Živkov, repo-d...@googlegroups.com
On Thu, Apr 3, 2014 at 1:43 PM, Saša Živkov <ziv...@gmail.com> wrote:

> I guess what you see is:
> remote: Counting objects: ...

Yes.

> So the counting happens on the remote (Gerrit).
> When did you last time "git gc" that repository?

On the server? Manually? Never. I'd expect Gerrit to do this in
reasonable time intervals. But the project is only 3 months old, so
I'd not expect so much garbage lingering around.

--
Sebastian Schuberth

Saša Živkov

unread,
Apr 3, 2014, 9:52:54 AM4/3/14
to Sebastian Schuberth, repo-d...@googlegroups.com
On Thu, Apr 3, 2014 at 2:00 PM, Sebastian Schuberth <sschu...@gmail.com> wrote:
On Thu, Apr 3, 2014 at 1:43 PM, Saša Živkov <ziv...@gmail.com> wrote:

> I guess what you see is:
> remote: Counting objects: ...

Yes.

> So the counting happens on the remote (Gerrit).
> When did you last time "git gc" that repository?

On the server? Manually?
On the server.
 
Never. I'd expect Gerrit to do this in
reasonable time intervals.
wrong expectation ;-)
 
But the project is only 3 months old, so
I'd not expect so much garbage lingering around.
You can try "git gc" on that repository (on the server) and then try fetching again.
 

--
Sebastian Schuberth

Sebastian Schuberth

unread,
Apr 3, 2014, 2:51:02 PM4/3/14
to Saša Živkov, repo-d...@googlegroups.com
On Thu, Apr 3, 2014 at 3:52 PM, Saša Živkov <ziv...@gmail.com> wrote:

>> But the project is only 3 months old, so
>> I'd not expect so much garbage lingering around.
>
> You can try "git gc" on that repository (on the server) and then try
> fetching again.

I'll give it a try, but I don't expect it to help much. I don't think
that phase is slow because of many objects, but because the counting
itself is slow. And my initial question was targeting at: What server
hardware upgrade would make countering faster?

--
Sebastian Schuberth

Shawn Pearce

unread,
Apr 3, 2014, 2:53:55 PM4/3/14
to Sebastian Schuberth, Saša Živkov, repo-d...@googlegroups.com
CPU and RAM. Its mostly CPU and bound and disk IO bound. RAM to make
sure data is cached in memory and is not running at disk seek speeds,
and CPU as it is computationally intensive.


But really, run `git gc`. Part of what makes it slow is the disk data
is not in an optimized format. `git gc` not only removes garbage, it
also re-optimizes the on disk format to improve read performance. A
periodic `git gc` can improve counting performance without doing a
hardware upgrade.

Sebastian Schuberth

unread,
Apr 3, 2014, 2:58:27 PM4/3/14
to Shawn Pearce, Saša Živkov, repo-d...@googlegroups.com
On Thu, Apr 3, 2014 at 8:53 PM, Shawn Pearce <s...@google.com> wrote:

>> I'll give it a try, but I don't expect it to help much. I don't think
>> that phase is slow because of many objects, but because the counting
>> itself is slow. And my initial question was targeting at: What server
>> hardware upgrade would make countering faster?
>
> CPU and RAM. Its mostly CPU and bound and disk IO bound. RAM to make
> sure data is cached in memory and is not running at disk seek speeds,
> and CPU as it is computationally intensive.

Thanks.

> But really, run `git gc`. Part of what makes it slow is the disk data
> is not in an optimized format. `git gc` not only removes garbage, it
> also re-optimizes the on disk format to improve read performance. A
> periodic `git gc` can improve counting performance without doing a
> hardware upgrade.

I was already wondering how often Gerrit would run "git gc" on its
repos, but it seems to me you're saying that Gerrit is *never* running
it automatically. Is that correct? Is there not even a plugin for
this?

--
Sebastian Schuberth

Shawn Pearce

unread,
Apr 3, 2014, 3:23:36 PM4/3/14
to Sebastian Schuberth, Saša Živkov, repo-d...@googlegroups.com
Correct, it is not done automatically, and we don't have a plugin to do it.

Alex Blewitt

unread,
Apr 3, 2014, 4:45:58 PM4/3/14
to Shawn Pearce, Sebastian Schuberth, Saša Živkov, repo-d...@googlegroups.com

> On 3 Apr 2014, at 20:23, Shawn Pearce <s...@google.com> wrote:
>
> Correct, it is not done automatically, and we don't have a plugin to do it.

Isn't there an admin command that can run gc when invoked over ssh? gerrit gc --all used to run gc in the past, and because it's using jgit not cgit it generates the bitmaps too.

Alex

Shawn Pearce

unread,
Apr 3, 2014, 5:16:12 PM4/3/14
to Alex Blewitt, Sebastian Schuberth, Saša Živkov, repo-d...@googlegroups.com
Yes, you can run `ssh -p 29418 localhost gerrit gc --all` but this
isn't automatic.

Dave Castagna (Motorola Mobility)

unread,
May 2, 2014, 9:39:49 AM5/2/14
to repo-d...@googlegroups.com, Alex Blewitt, Sebastian Schuberth, Saša Živkov
We've actually given up on "gerrit gc --all" (Sorry Shawn) because it causes odd "ACK/NCK" errors when users try to clone and it also is buggy in 2.6.1 (which is what we are using).  We get numerous hangs or failures and there is a bug in there that will invalidate your entire project set if a gc fails - thus causing you to have to reboot the GerritCodeReview process to clear it out so that more gits can be gc'd but that isn't a good workaround and only works until the next gc failure.

I am hoping that when we upgrade to a newer version of Gerrit "gerrit gc --all" starts working in a more stable manner.


But to the point....

We have jenkins jobs set up to regularly execute git gc operations on all of our relevant servers on a regular basis (usually once per week).

One word of caution on gc's:  Because they clean things out you should probably keep backups of your gits as they look before you gc them (just schedule the backups to occur before the gcs).  We just recently had a case where we had to rebuild some accidentally orphaned content in a repo.  If the gc had kicked in then we would have been pretty badly screwed.

Doug Kelly

unread,
May 5, 2014, 10:42:39 AM5/5/14
to repo-d...@googlegroups.com, Alex Blewitt, Sebastian Schuberth, Saša Živkov
I've been running "gerrit gc" for the past several months (since at least November), and it's been quite stable.  Though, I run it one-by-one on each repository.

Indeed, Dave's warning does not come lightly.  Any change which can either rewind a branch or removing any branch which isn't fully merged is potentially destructive.  The TeamForge history protection plugin is one way to combat this, and another (perhaps more simple) way is severely limiting any operation such as branch deletion or force push in the project ACL.

--Doug
Reply all
Reply to author
Forward
0 new messages