performance opportunity on git push --tags

14 views
Skip to first unread message

Nipunn Koorapati

unread,
Dec 10, 2020, 10:54:58 PM12/10/20
to gito...@googlegroups.com
Hi,

I was noticing a fairly severe performance issue when issuing a `git push --tags` in a large repo with 15K+ tags (we're merging repos). git itself handles this reasonably well, but with gitolite, each tag ends up calling into the update hook - resulting in a rules check. Our rules file is also relatively large (few thousand lines) - so this adds up - even though my user's evaluation (verified using gitolite access command) - was only one check (I'm in the group-gitolite-admins):
  A        gitolite.conf:32         RW+CDM = @group-gitolite-admins @user-commit_queue @user-git

We were finding that each tag took around a half second, making the full push an estimated few hours.

One idea - may be to do a single access check during gitolite-shell phase-1 checks - and if the user has access to create all refs (eg no deny rules / vrefs apply) - then set GL_BYPASS_ACCESS_CHECKS=1 before passing the `git push` command through. Then the phase-2 checks would fast-exit.

Implementation wise, one strategy might to
Add a new special ref type here (in addition to any. Perhaps "all"?)

Then here - check if the user has access to create/modify/delete all refs. If so, set GL_BYPASS_ACCESS_CHECKS before passing command through

This optimization would only work for the user w/ full super powers, but it would make a huge difference for such a user. I was able to simulate this sort of thing as a proof-of-concept by (hax) manually editing the update hook.

W/ access checks (normally) - was seeing 1-2 tags / second pushed
W/ a small wrapper around the update hook - to set GL_BYPASS_ACCESS_CHECKS - was seeing 10-15 pushed tags/second
W/ a replacement update hook - that simply echoed the args + timestamp, bypassing gitolite's perl script entirely - was seeing 90-100 tags / second
[I'd imagine removing the hook entirely would be even faster - but I am not sure how to measure this, since I was using loglines+timestamps to measure]

There appears to be some big room for improvement here!

Thanks
--Nipunn


Sitaram Chamarty

unread,
Dec 12, 2020, 11:34:43 AM12/12/20
to Nipunn Koorapati, gito...@googlegroups.com
On Thu, Dec 10, 2020 at 06:01:22PM +0000, 'Nipunn Koorapati' via gitolite wrote:
> Hi,
>
> I was noticing a fairly severe performance issue when issuing a `git push --tags` in a large repo with 15K+ tags (we're merging repos). git itself handles this reasonably well, but with gitolite, each tag ends up calling into the update hook - resulting in a rules check. Our rules file is also relatively large (few thousand lines) - so this adds up - even though my user's evaluation (verified using gitolite access command) - was only one check (I'm in the group-gitolite-admins):
>
> A        gitolite.conf:32         RW+CDM = @group-gitolite-admins @user-commit_queue @user-git
>
> We were finding that each tag took around a half second, making the full push an estimated few hours.
>
> One idea - may be to do a single access check during gitolite-shell phase-1 checks - and if the user has access to create all refs (eg no deny rules / vrefs apply) - then set GL_BYPASS_ACCESS_CHECKS=1 before passing the `git push` command through. Then the phase-2 checks would fast-exit.

There is no published mechanism to check if the user has "access
to all refs". The published mechanisms only check for any given
ref. Even if I were to use internals to determine if a
`refs/.*` rule existed, that would still require making sure
there were no deny rules *before* it, and so on.

> Implementation wise, one strategy might to
>
> Add a new special ref type here (in addition to any. Perhaps "all"?)
>
> https://github.com/sitaramc/gitolite/blob/master/src/commands/access#L22
>
> Then here - check if the user has access to create/modify/delete all refs. If so, set GL_BYPASS_ACCESS_CHECKS before passing command through
>
> https://github.com/sitaramc/gitolite/blob/master/src/gitolite-shell#L131

For any given repo, this should only happen the first time those
tags are pushed, not for normal use once that initial push is
done. I won't be making changes to "core" gitolite for that
kind of odd-ball case.

In any case, here's something that works, without having to make
such changes. Please test it out.

1. Open up your `~/.gitolite.rc` and look for `LOCAL_CODE`.
Pick one of the locations commented out, and uncomment it.

2. Let's say you picked the `$ENV{HOME}/local` for your local
code. Create a file called
"lib/Gitolite/Triggers/Bypass.pm" (creating the intermediate
directories of course), with the following content:

package Gitolite::Triggers::Bypass;

use Gitolite::Rc;
use Gitolite::Common;
use Gitolite::Easy;

use strict;
use warnings;

# ----------------------------------------------------------------------

sub pre_git {
$ENV{GL_BYPASS_ACCESS_CHECKS} = 1 if $ENV{GL_OPTION_LOADING} and in_role('superuser');
}

1;

3. Again, in your ~/.gitolite.rc file, just *before* the
"ENABLE" line, insert this:

PRE_GIT => ['Bypass::pre_git'],

Don't miss the trailing comma!

4. In your gitolite.conf file, for people you are **sure**
should have this power, add the following line:

@superuser = alice bob carol

and for the repo itself, add

option ENV.LOADING = 1

somewhere in the repo's rule lines.

That should do it.

And needs no code changes to gitolite "core".

I have smoke tested it to make sure the mechanism works, but I
didn't do a timing test with thousands of tags. Please test.

sitaram
Reply all
Reply to author
Forward
0 new messages