git push slow to mirror

314 views
Skip to first unread message

lazygar...@gmail.com

unread,
Apr 21, 2014, 10:40:59 PM4/21/14
to gito...@googlegroups.com
Hello

We are running v3.5.3.1-9-gfc5467c and we have a very large repo with 4 mirrors (two in different countries). We are finding our pushes are slow when the furtherest repo is enabled. Turning off the repo is saving minutes off of everyones push.

I would like to turn off the asynchronous push for that mirror and manually push. Is this possible?

Thanks
Shane

Sitaram Chamarty

unread,
Apr 22, 2014, 12:32:38 AM4/22/14
to lazygar...@gmail.com, gito...@googlegroups.com
I'm confused. Asynch mirroring means that the pusher shouldn't have to
wait for anything but the immediate push (to master) to complete so why
are they seeing a slowdown at all, whether you enable the far-away slave
or not?

As for disabling the automatic mirror push while allowing a manual
mirror push, that's not possible right now. I'll have to look into it
and see what I can cook up.

sitaram

lazygar...@gmail.com

unread,
Apr 22, 2014, 1:20:24 AM4/22/14
to gito...@googlegroups.com, lazygar...@gmail.com
On Monday, April 21, 2014 9:32:38 PM UTC-7, Sitaram Chamarty wrote:

>
>
> I'm confused. Asynch mirroring means that the pusher shouldn't have to
>
> wait for anything but the immediate push (to master) to complete so why
>
> are they seeing a slowdown at all, whether you enable the far-away slave
>
> or not?
>
>
>
> As for disabling the automatic mirror push while allowing a manual
>
> mirror push, that's not possible right now. I'll have to look into it
>
> and see what I can cook up.
>
>
>
> sitaram

Thanks for the reply Sitaram,

It's confusing to me too. I've done some debugging and I know it's not the post-receive hook, that is taking less then a second (debug date statements in our post-review script). And when we turned of mirroring to the one repo, everyone saw their push times get smaller.

If I'm reading the code right, at the bottom of gl-mirror-push is where the fork happens. I'm thinking of changing line 78 to
[ "$s" = "$hn" ] || [ "$s" = "farawayremote" ] && continue
and then running the mirror push in a separate dir with the original files.

Thanks
Shane

ravi jaladi

unread,
Apr 22, 2014, 1:30:50 AM4/22/14
to lazygar...@gmail.com, gito...@googlegroups.com
Hello Shane,

How much delta is being pushed to this repository ( are these large
binaries) , as user session will be completed right away once the
changes are updated on master repository and btw it should not block
other users pushing to different repositories either

Thanks
RJ


Sitaram Chamarty

unread,
Apr 22, 2014, 2:50:54 AM4/22/14
to lazygar...@gmail.com, gito...@googlegroups.com
I distinctly remember your first post saying you're using
v3.5.3.something.

gl-mirror-push is v2. V2 did not have asynch push at all.

Also, unless it's a security issue, it's no longer supported (or at
least not supported during the weekdays... on weekends I may not mind
spending a few minutes answering a question or two).

good luck, and next time you ask a question please make sure you have
the right version number or at least something close to it.

lazygar...@gmail.com

unread,
Apr 22, 2014, 10:41:32 AM4/22/14
to gito...@googlegroups.com, lazygar...@gmail.com

>
>
>
> I distinctly remember your first post saying you're using
>
> v3.5.3.something.
>
>
>
> gl-mirror-push is v2. V2 did not have asynch push at all.
>
>
>
> Also, unless it's a security issue, it's no longer supported (or at
>
> least not supported during the weekdays... on weekends I may not mind
>
> spending a few minutes answering a question or two).
>
>
>
> good luck, and next time you ask a question please make sure you have
>
> the right version number or at least something close to it.

Our ~/bin/VERSION file says
v3.5.1-5-g412d9ab

It seems that someone previously updated git, but installed V3 over V2 or something. This is something I've inherited so I'm still trying to figure it out.
If I checkout 412d9ab and install to a different directory, then diff against the installed bin, I see extra files in the ~/bin dir (gl-mirror-push being one of them)

Konstantin Ryabitsev

unread,
Apr 22, 2014, 11:07:19 AM4/22/14
to lazygar...@gmail.com, gito...@googlegroups.com
On 21/04/14 10:40 PM, lazygar...@gmail.com wrote:
> We are running v3.5.3.1-9-gfc5467c and we have a very large repo
> with 4 mirrors (two in different countries). We are finding our
> pushes are slow when the furtherest repo is enabled. Turning off the
> repo is saving minutes off of everyones push.

Alternatively, if your mirrors are read-only (in other words, pushes are
only done to the master, never to the mirrors), you can use grokmirror
for pull-mirroring instead of push-mirroring natively via gitolite.

This is how we replicate 1500 repositories to 4 worldwide locations,
including to Beijing and Singapore.

https://github.com/mricon/grokmirror

Best,
--
Konstantin Ryabitsev
Senior Systems Administrator
Linux Foundation Collab Projects
Montréal, Québec

signature.asc

Sitaram Chamarty

unread,
Apr 22, 2014, 11:44:24 AM4/22/14
to Konstantin Ryabitsev, gito...@googlegroups.com
On 04/22/2014 08:37 PM, Konstantin Ryabitsev wrote:
> On 21/04/14 10:40 PM, lazygar...@gmail.com wrote:
>> We are running v3.5.3.1-9-gfc5467c and we have a very large repo
>> with 4 mirrors (two in different countries). We are finding our
>> pushes are slow when the furtherest repo is enabled. Turning off the
>> repo is saving minutes off of everyones push.
>
> Alternatively, if your mirrors are read-only (in other words, pushes are
> only done to the master, never to the mirrors), you can use grokmirror
> for pull-mirroring instead of push-mirroring natively via gitolite.
>
> This is how we replicate 1500 repositories to 4 worldwide locations,
> including to Beijing and Singapore.
>
> https://github.com/mricon/grokmirror

Interesting. The main difference from gitolite mirroring is that this
is pull based, which reduces the load on the master server. How often
do they poll?

How do you setup the alternates information? Do you save it when
"repositoryB" is actually *created*? Also, how do you ensure that
"repositoryA" doesn't get branches deleted etc (i.e., how are the
warnings in man git-clone taken care of).

One more thought: if using the git-daemon-export-ok method, don't use it
to *delete* repos from the manifest.

In any case, build in a safety of some kind -- if more than 10% of the
repos are suddenly marked "deleted", abort. Make that percentage
configurable (setting it to 100 disables the paranoia I guess).

signature.asc

Sitaram Chamarty

unread,
Apr 22, 2014, 11:56:02 AM4/22/14
to lazygar...@gmail.com, gito...@googlegroups.com
On 04/22/2014 08:11 PM, lazygar...@gmail.com wrote:

>> good luck, and next time you ask a question please make sure you have
>>
>> the right version number or at least something close to it.
>
> Our ~/bin/VERSION file says
> v3.5.1-5-g412d9ab
>
> It seems that someone previously updated git, but installed V3 over V2
> or something. This is something I've inherited so I'm still trying to
> figure it out. If I checkout 412d9ab and install to a different
> directory, then diff against the installed bin, I see extra files in
> the ~/bin dir (gl-mirror-push being one of them)

I can only sympathise, and repeat what I said earlier, but this time in
all caps: GOOD LUCK!!!

Jokes apart, that is seriously f-ed up. I cannot even imagine what that
must be doing or how it is working, because clearly someone left the old
hooks in!

Gitolite v2 and v3 mirroring *are* compatible (except for redirection);
my largest mirroring user is using it to mirror a new v3 server to the
old v2 setup. But that was more of an accident, and a sign that I did
not gratuitously change stuff when writing v3 :-)

Konstantin Ryabitsev

unread,
Apr 22, 2014, 12:10:12 PM4/22/14
to Sitaram Chamarty, gito...@googlegroups.com
On 22/04/14 11:44 AM, Sitaram Chamarty wrote:
>> https://github.com/mricon/grokmirror
>
> Interesting. The main difference from gitolite mirroring is that this
> is pull based, which reduces the load on the master server. How often
> do they poll?

We poll every 15 seconds. The load on the server is minimal, as the
clients use HTTP's "if-newer-than" to check if the remote manifest is
newer than local manifest and will only download it when it's been
actually updated.

> How do you setup the alternates information? Do you save it when
> "repositoryB" is actually *created*? Also, how do you ensure that
> "repositoryA" doesn't get branches deleted etc (i.e., how are the
> warnings in man git-clone taken care of).

In a number of ways. First, we have a modified fork command that always
makes sure that we don't do alternates off a repo that itself sets
alternates -- we unravel alternates to the uppermost tree. Second, we
disallow force-pushes wherever possible (rebases are what most commonly
results in broken repos). Third, we don't auto-gc, just "repack -Adl"
routinely instead. This leaves around minor cruft that doesn't impact
normal repo operations but preserves stale objects in case they are
needed by other repos. Fourth, we run "git fsck" on a very routine basis
(grok-fsck). :)

> One more thought: if using the git-daemon-export-ok method, don't use it
> to *delete* repos from the manifest.
>
> In any case, build in a safety of some kind -- if more than 10% of the
> repos are suddenly marked "deleted", abort. Make that percentage
> configurable (setting it to 100 disables the paranoia I guess).

This has proven to be more difficult than you'd think. We actually have
to delete large chunks of repos every now and again, so aborting at 10%
is not an option. For now, you have to actually tell your grok-pull
client to "--purge" before any repos are deleted off the mirror. Also, a
corrupted manifest would not parse as a valid json file, so we're safe
from manifest corruption that way.

For the final 0.4 we'll probably introduce the default behaviour of only
purging repositories when they are listed in the manifest as "deleted"
instead of just not there any more.

-K

signature.asc

Sitaram Chamarty

unread,
Apr 22, 2014, 1:17:43 PM4/22/14
to Konstantin Ryabitsev, gito...@googlegroups.com
On 04/22/2014 09:40 PM, Konstantin Ryabitsev wrote:
> On 22/04/14 11:44 AM, Sitaram Chamarty wrote:
>>> https://github.com/mricon/grokmirror
>>
>> Interesting. The main difference from gitolite mirroring is that this
>> is pull based, which reduces the load on the master server. How often
>> do they poll?
>
> We poll every 15 seconds. The load on the server is minimal, as the
> clients use HTTP's "if-newer-than" to check if the remote manifest is
> newer than local manifest and will only download it when it's been
> actually updated.

yeah IMS is cheap; I know.

>> How do you setup the alternates information? Do you save it when
>> "repositoryB" is actually *created*? Also, how do you ensure that
>> "repositoryA" doesn't get branches deleted etc (i.e., how are the
>> warnings in man git-clone taken care of).
>
> In a number of ways. First, we have a modified fork command that always
> makes sure that we don't do alternates off a repo that itself sets
> alternates -- we unravel alternates to the uppermost tree. Second, we
> disallow force-pushes wherever possible (rebases are what most commonly
> results in broken repos). Third, we don't auto-gc, just "repack -Adl"
> routinely instead. This leaves around minor cruft that doesn't impact
> normal repo operations but preserves stale objects in case they are
> needed by other repos. Fourth, we run "git fsck" on a very routine basis
> (grok-fsck). :)

You're trading disk space for time (potentially, if all the repos
stopped using an object, it would eventually get gc'd, but since you
can't be sure...)

Nice...

>
>> One more thought: if using the git-daemon-export-ok method, don't use it
>> to *delete* repos from the manifest.
>>
>> In any case, build in a safety of some kind -- if more than 10% of the
>> repos are suddenly marked "deleted", abort. Make that percentage
>> configurable (setting it to 100 disables the paranoia I guess).
>
> This has proven to be more difficult than you'd think. We actually have
> to delete large chunks of repos every now and again, so aborting at 10%
> is not an option. For now, you have to actually tell your grok-pull
> client to "--purge" before any repos are deleted off the mirror. Also, a
> corrupted manifest would not parse as a valid json file, so we're safe
> from manifest corruption that way.
>
> For the final 0.4 we'll probably introduce the default behaviour of only
> purging repositories when they are listed in the manifest as "deleted"
> instead of just not there any more.

regardless, what I'm talking about is guarding against a potential
(future) bug in the code that determines a repo is gone -- whether it is
an explicit "deleted" or an implicit "remove repo from json". It would
still be valid JSON.

IIRC that's part of what happened to KDE a while ago; their eqvt of the
manifest file suddenly shrunk and as it propagated, servers started
deleting repos!

signature.asc

Konstantin Ryabitsev

unread,
Apr 23, 2014, 4:37:49 PM4/23/14
to Sitaram Chamarty, gito...@googlegroups.com
On 22/04/14 01:17 PM, Sitaram Chamarty wrote:
>> This has proven to be more difficult than you'd think. We actually have
>> to delete large chunks of repos every now and again, so aborting at 10%
>> is not an option. For now, you have to actually tell your grok-pull
>> client to "--purge" before any repos are deleted off the mirror. Also, a
>> corrupted manifest would not parse as a valid json file, so we're safe
>> from manifest corruption that way.
>>
>> For the final 0.4 we'll probably introduce the default behaviour of only
>> purging repositories when they are listed in the manifest as "deleted"
>> instead of just not there any more.
>
> regardless, what I'm talking about is guarding against a potential
> (future) bug in the code that determines a repo is gone -- whether it is
> an explicit "deleted" or an implicit "remove repo from json". It would
> still be valid JSON.

Upon your suggestion, I've added a configurable "purge protect" option.
Grok-pull will now exit with a critical error if more than 5% of
repositories are being deleted. It can still be forced using --force-purge.

Best,
Konstantin

signature.asc

Manuel Vacelet

unread,
Apr 23, 2014, 5:21:09 PM4/23/14
to Konstantin Ryabitsev, lazygar...@gmail.com, gitolite
On Tue, Apr 22, 2014 at 5:07 PM, Konstantin Ryabitsev <konst...@linuxfoundation.org> wrote:
On 21/04/14 10:40 PM, lazygar...@gmail.com wrote:
> We are running v3.5.3.1-9-gfc5467c  and we have a very large repo
> with 4 mirrors (two in different countries).  We are finding our
> pushes are slow when the furtherest repo is enabled.  Turning off the
> repo is saving minutes off of everyones push.

Alternatively, if your mirrors are read-only (in other words, pushes are
only done to the master, never to the mirrors), you can use grokmirror
for pull-mirroring instead of push-mirroring natively via gitolite.

This is how we replicate 1500 repositories to 4 worldwide locations,
including to Beijing and Singapore.

https://github.com/mricon/grokmirror

Sounds great thanks for sharing,

I see a contrib repository with a specific hook dedicated to gerrit.
I'm curious to know why you do not rely on built-in gerrit replication here ?

This might be not the best place to ask the question so just tell me if we need to find a better place to discuss.

Manuel

Konstantin Ryabitsev

unread,
Apr 24, 2014, 10:01:53 AM4/24/14
to Manuel Vacelet, lazygar...@gmail.com, gitolite
On 23/04/14 05:21 PM, Manuel Vacelet wrote:
> Sounds great thanks for sharing,
>
> I see a contrib repository with a specific hook dedicated to gerrit.
> I'm curious to know why you do not rely on built-in gerrit replication
> here ?

Several reasons:

1. Push mirroring fails when a remote mirror is temporarily down, which
is why we prefer pull-mirroring.
2. I want to make it possible for others to mirror our gerrit
repositories without having to set up anything on my side. Grokmirror
makes this very easy.

Best,
Konstantin

signature.asc
Reply all
Reply to author
Forward
0 new messages