RFC: pre-receive hook

398 views
Skip to first unread message

Pat Notz

unread,
Jul 27, 2010, 6:18:58 PM7/27/10
to Repo and Gerrit Discussion
In looking at the hooks that Gerrit supports, it seems that the one
that's really missing (compared to vanilla Git) is pre-receive. We
use this hook extensively to prevent a number of changes from ever
getting pushed. Notably, we prevent empty merge commits,
author/committer emails that aren't in a standard form, suspiciously
large commits, etc. It seems that this kind of hook would desirable
(for some teams). I think the other server side hooks that Git has
can be mapped to existing Gerrit hooks.

Does a pre-receive hook sound like a reasonable addition for Gerrit?

~ Pat

PS I promise to shut up for a while :-

Nasser Grainawi

unread,
Jul 27, 2010, 6:20:30 PM7/27/10
to Pat Notz, Repo and Gerrit Discussion

Yes please. I'll leave it to Shawn to describe how it should be done. :)

>
> ~ Pat
>
> PS I promise to shut up for a while :-
>


--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

Shawn Pearce

unread,
Jul 27, 2010, 7:50:50 PM7/27/10
to Nasser Grainawi, Pat Notz, Repo and Gerrit Discussion
On Tue, Jul 27, 2010 at 15:20, Nasser Grainawi <nas...@codeaurora.org> wrote:
> On 07/27/2010 04:18 PM, Pat Notz wrote:
>>
>> Does a pre-receive hook sound like a reasonable addition for Gerrit?
>
> Yes please. I'll leave it to Shawn to describe how it should be done. :)

The ugly part of any hook is you need to fork the JVM. Forking a
Gerrit JVM is painful, its easily 4-8 GiB of dirty pages. :-(

But, that said... lets assume you want this anyway. The Right
Thing(TM) is to create an implementation of the PreReceiveHook
interface inside of JGit which speaks the standard git pre-receive
hook protocol. Donate that to the JGit project. Once its accepted
there, modify Gerrit Code Review's ReceiveCommits class to run that
implementation as part of its own onPreReceive method, before it does
anything else.

Since the pre-receive protocol redirects stdout and stderr into the
same pipe you can use ProcessBuilder and do redirectErrorStream(true)
to combine them together into the output stream, which avoids the need
for spawning an additional thread to read stderr. Although to be safe
we may need to spawn a thread anyway because we can't trust the hook
to actually fully consume the input before it starts output. :-)

We probably want the hook on two levels, one in
$site_path/hooks/pre-receive as a global hook that fires for every
repository, and maybe again inside of the actual project repository
just in case the site also wants a per-project check. Given that we
already do whole-site hooks we should do that for pre-receive. I'm on
the fence about whether or not we should also honor the per-project
hook inside of the project's own repository.

Rob Heittman

unread,
Jul 27, 2010, 8:44:30 PM7/27/10
to Shawn Pearce, Nasser Grainawi, Pat Notz, Repo and Gerrit Discussion
Never considered the fork cost, but now you mention it, ugh.  Is it worthwhile to use, if available, a dedicated fork daemon under TCP/IP remote control?  We do that on one of our apps that runs with a heavy memory load but needs to shell out a lot; it also helps insulate the main app's JVM from bad things happening in the forked code, and in our case, we even run the fork daemon under really nasty restrictions that could not be imposed on the main app.  I think Gerrit's uses would be pretty susceptible to this.

Shawn Pearce

unread,
Jul 27, 2010, 9:00:05 PM7/27/10
to Rob Heittman, Nasser Grainawi, Pat Notz, Repo and Gerrit Discussion
On Tue, Jul 27, 2010 at 17:44, Rob Heittman <rob.he...@solertium.com> wrote:
> Never considered the fork cost, but now you mention it, ugh.  Is it
> worthwhile to use, if available, a dedicated fork daemon under TCP/IP remote
> control?  We do that on one of our apps that runs with a heavy memory load
> but needs to shell out a lot; it also helps insulate the main app's JVM from
> bad things happening in the forked code, and in our case, we even run the
> fork daemon under really nasty restrictions that could not be imposed on the
> main app.  I think Gerrit's uses would be pretty susceptible to this.

Yea, it might be worth doing. Be nice if a lot of that was
contributed back to JGit too, we'd love to be able to run proper Git
hooks out of EGit for example without forking the nearly 2 GiB
workbench most users have. :-)

A fork daemon is a lot harder for the admin to setup. We could fork
once during JVM startup (where memory is a lot lower) to setup the
fork daemon, and then communicate to it over the loopback for all hook
usage. Or the admin could run it dedicated themselves, in which case
they need to give us a configuration variable with the port number its
listening on, and probably a magic auth cookie that we can use to
prove to the fork daemon we can tell it what to execute.

IIRC on Windows a fork daemon isn't required because there's no real
concept of fork on that platform. But on POSIX systems it can be very
useful if the JVM doesn't know how to use posix_spawn().

Rob Heittman

unread,
Jul 28, 2010, 9:14:24 AM7/28/10
to Shawn Pearce, Nasser Grainawi, Pat Notz, Repo and Gerrit Discussion
In retrospect, I think efficient hooks is what I want, not more flexible replication (http://groups.google.com/group/repo-discuss/browse_thread/thread/869c483c707a3672) ... I'm having good success with a variation on Dustin Salling's nice change-merged hook (http://gist.github.com/440904) but, I didn't consider the fork cost.  I watched it this morning while some submits happened, and had a little cry over it.

Let me see if I can't get our other project's fork daemon libre-ated.  It's pretty robust, and nine out of ten administrators can probably just say

    java -Xmx8M -jar fork-daemon.jar

to run it, with no other meaningful configuration.  An extra line in bin/gerrit.sh would be all that's needed for anyone using that script.  On the caller side, we have classes that look suspiciously like ProcessBuilder and Process; if the mechanism can find a fork daemon running that it's allowed to talk to, it remotes the command, otherwise it just actually uses ProcessBuilder and Process locally.  There's some real (PKI) security exercised between the client and daemon, but it only requires explicit configuration of keys and such if you're a true masochist.

- Rob

Shawn Pearce

unread,
Jul 28, 2010, 1:36:06 PM7/28/10
to Rob Heittman, Nasser Grainawi, Pat Notz, Repo and Gerrit Discussion
On Wed, Jul 28, 2010 at 06:14, Rob Heittman <rob.he...@solertium.com> wrote:
> In retrospect, I think efficient hooks is what I want, not more flexible
> replication
> (http://groups.google.com/group/repo-discuss/browse_thread/thread/869c483c707a3672)
> ... I'm having good success with a variation on Dustin Salling's nice
> change-merged hook (http://gist.github.com/440904) but, I didn't consider
> the fork cost.  I watched it this morning while some submits happened, and
> had a little cry over it.

:-(

> Let me see if I can't get our other project's fork daemon libre-ated.  It's
> pretty robust, and nine out of ten administrators can probably just say
>     java -Xmx8M -jar fork-daemon.jar
> to run it, with no other meaningful configuration.  An extra line in
> bin/gerrit.sh would be all that's needed for anyone using that script.  On
> the caller side, we have classes that look suspiciously like ProcessBuilder
> and Process; if the mechanism can find a fork daemon running that it's
> allowed to talk to, it remotes the command, otherwise it just actually uses
> ProcessBuilder and Process locally.

Sounds good.

>There's some real (PKI) security
> exercised between the client and daemon, but it only requires explicit
> configuration of keys and such if you're a true masochist.

Well, ideally the fork daemon and the gerrit daemon authenticate each
other so that the fork daemon doesn't just run anything its told to.
:-) We can afford to setup keys during our `init` program, its not
that big of a deal.

Reply all
Reply to author
Forward
0 new messages