Seeing sudden stop in hook processing

172 views
Skip to first unread message

Dave Castagna (Motorola Mobility)

unread,
Jul 18, 2013, 11:59:55 AM7/18/13
to repo-d...@googlegroups.com
We're experiencing a strange issue here where the hook-processing just stops.

We're running 2.4.4 currently.

We see this in the error_log at about that time:

[2013-07-18 08:32:30,738] WARN  com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker killed after 120491ms
[2013-07-18 08:32:30,739] WARN  com.google.gerrit.server.git.AsyncReceiveCommits : Error in ReceiveCommits
java.util.concurrent.ExecutionException: java.util.concurrent.CancellationException
        at com.google.gerrit.server.git.MultiProgressMonitor.waitFor(MultiProgressMonitor.java:219)
        at com.google.gerrit.server.git.AsyncReceiveCommits.onPreReceive(AsyncReceiveCommits.java:163)
        at org.eclipse.jgit.transport.ReceivePack.executeCommands(ReceivePack.java:1215)
        at org.eclipse.jgit.transport.ReceivePack.service(ReceivePack.java:854)
        at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:768)
        at com.google.gerrit.sshd.commands.Receive.runImpl(Receive.java:95)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:103)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:34)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:69)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:403)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:333)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.CancellationException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at com.google.gerrit.server.git.WorkQueue$Task.get(WorkQueue.java:311)
        at com.google.gerrit.server.git.MultiProgressMonitor.waitFor(MultiProgressMonitor.java:215)
        ... 18 more

And then a whole flood of messages like this:

[2013-07-18 08:32:30,897] WARN  com.google.gerrit.server.git.ReceiveCommits : ...reponame... 116156,4 is missing

And then every time ref-updated runs it seems to be running many multiples of ref-updated.  Just after the above events we see this pattern over and over again:

[2013-07-18 08:32:35,778] INFO  com.google.gerrit.common.ChangeHookRunner : hook[ref-updated] output: ...reponame_1...
   ... repeats over a dozen times ...
[2013-07-18 08:32:35,778] INFO  com.google.gerrit.common.ChangeHookRunner : hook[ref-updated] output: ...reponame_2...
   ... repeats dozens of times ...
[2013-07-18 08:32:35,778] INFO  com.google.gerrit.common.ChangeHookRunner : hook[ref-updated] output: ...reponame_3...
   ... repeats 2 times ...
[2013-07-18 08:32:35,778] INFO  com.google.gerrit.common.ChangeHookRunner : hook[ref-updated] output: ...reponame_4...
   ... repeats bunches of times ...

And then another MultiProgressMonitor exception....

And by now things are a mess and the mysqld process is pegged and hooks are not being processed anymore.

The only way to clean it up is to restart Gerrit.

Any ideas guys?

And a suggestion:  If hook processing fails to operate properly the default behavior of the system should probably not be to just bypass the whole of hook processing.  I would consider that a fatal crash that should be dealt with immediately.  But because this fails silently with no notifications or anything, users just keep going and just keep working.  The business rules implemented in the hooks never run and the data in our various systems just gets more and more out of sync.  The system should just stop working altogether to prevent data loss/corruption.  Or we should have the option to cause the GerritCodeReview process to halt/exit/whatever.



Martin Fick

unread,
Jul 19, 2013, 12:36:58 PM7/19/13
to repo-d...@googlegroups.com, Dave Castagna (Motorola Mobility)
That doesn't sound hook related. I think you need to bump
your receive.timeout,

-Martin


On Thursday, July 18, 2013 09:59:55 am Dave Castagna
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Dave Borowitz

unread,
Jul 19, 2013, 12:41:19 PM7/19/13
to Martin Fick, repo-discuss, Dave Castagna (Motorola Mobility)
However, it is possible there is a bug in the way we cancel ReceiveCommits due to timeout that interacts poorly with hooks running in the background.




--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Martin Fick

unread,
Jul 19, 2013, 12:45:23 PM7/19/13
to Dave Borowitz, repo-discuss, Dave Castagna (Motorola Mobility)
I could be, but I was just testing pushing many refs to
Gerrit at the same time the other day and I was encountering
this (no hooks). That is when I mentioned on IRC that jgit
does not scale when adding refs since it adds each one as a
loose ref, and the loose ref adding code looks up all loose
refs. And after adding a bunch (in the same push), it just
slows down and takes minutes to complete easily timing out,

-Martin
> > > nkn own Source)
> > >
> > > at
> > >
> > > java.util.concurrent.FutureTask$Sync.innerRun(Unknown
> > > Source) at
> > > java.util.concurrent.FutureTask.run(Unknown Source)
> > > at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$Sche
> > > dule dFutureTask.access$301(Unknown Source)
> > >
> > > at
> > >
> > > java.util.concurrent.ScheduledThreadPoolExecutor$Sche
> > > dule dFutureTask.run(Unknown Source)
> > >
> > > at
> > >
> > > com.google.gerrit.server.git.WorkQueue$Task.run(WorkQ
> > > ueue .java:333) at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.runTas
> > > k(U nknown Source)
> > >
> > > at
> > >
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(Un
> > > kno wn Source)
> > >
> > > at java.lang.Thread.run(Unknown Source)
> > >
> > > Caused by: java.util.concurrent.CancellationException
> > >
> > > at
> > >
> > > java.util.concurrent.FutureTask$Sync.innerGet(Unknown
> > > Source) at
> > > java.util.concurrent.FutureTask.get(Unknown Source)
> > > at
> > > com.google.gerrit.server.git.WorkQueue$Task.get(WorkQ
> > > ueue .java:311) at
> > > com.google.gerrit.server.git.MultiProgressMonitor.wai
> > > tFor (MultiProgressMonitor.java:215) ... 18 more
Reply all
Reply to author
Forward
0 new messages