large number of tasks stuck in waiting in gerrit 2.8

591 views
Skip to first unread message

Robin Bobbitt

unread,
May 9, 2016, 1:33:56 PM5/9/16
to Repo and Gerrit Discussion
We have a custom GitReferenceUpdatedListener that adds tasks to the default work queue that post a payload to some external endpoints. This has been running without issue for a long time. Last week, one of our tasks got stuck in "running" indefinitely, and all of our other tasks backed up behind it. This morning, we killed most of the waiting tasks and then killed the running task, hoping that everything would become unstuck and get back to working. Now, all of our new tasks are queueing up (waiting) and nothing is running. Our plan is to restart the server. Is there any other approach you recommend, or anything we can do to debug the issue? We assume the queue is in memory only and will clear on restart. We are running gerrit 2.8.

Martin Fick

unread,
May 9, 2016, 1:49:02 PM5/9/16
to repo-d...@googlegroups.com, Robin Bobbitt
On Monday, May 09, 2016 10:33:56 AM Robin Bobbitt wrote:
> We have a custom GitReferenceUpdatedListener that adds
> tasks to the default work queue that post a payload to
> some external endpoints. This has been running without
> issue for a long time. Last week, one of our tasks got
> stuck in "running" indefinitely, and all of our other
> tasks backed up behind it. This morning, we killed most
> of the waiting tasks and then killed the running task,
> hoping that everything would become unstuck and get back
> to working.

How did you kill them? Using OS tools such as kill, or in
the gerrit queue using gerrit ssh commands?

> Now, all of our new tasks are queueing up
> (waiting) and nothing is running. Our plan is to restart
> the server. Is there any other approach you recommend, or
> anything we can do to debug the issue? We assume the
> queue is in memory only and will clear on restart. We are
> running gerrit 2.8.

It will indeed clear the in memory-queue,

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Robin Bobbitt

unread,
May 9, 2016, 2:14:49 PM5/9/16
to Martin Fick, repo-discuss
We used the gerrit kill command.

Martin Fick

unread,
May 9, 2016, 2:43:06 PM5/9/16
to Robin Bobbitt, repo-discuss
Top posting makes it hard for others to join the
conversation... See my comments inline.

On Monday, May 09, 2016 02:14:46 PM Robin Bobbitt wrote:
> We used the gerrit kill command.
>
> On Mon, May 9, 2016 at 1:48 PM, Martin Fick
<mf...@codeaurora.org> wrote:
> > On Monday, May 09, 2016 10:33:56 AM Robin Bobbitt wrote:
> > > We have a custom GitReferenceUpdatedListener that adds
> > > tasks to the default work queue that post a payload to
> > > some external endpoints. This has been running without
> > > issue for a long time. Last week, one of our tasks got
> > > stuck in "running" indefinitely, and all of our other
> > > tasks backed up behind it. This morning, we killed
> > > most
> > > of the waiting tasks and then killed the running task,
> > > hoping that everything would become unstuck and get
> > > back
> > > to working.
> >
> > How did you kill them? Using OS tools such as kill, or
> > in the gerrit queue using gerrit ssh commands?


Since you did not explicitly kill the process, did it
actually terminate? Could the "blockage" have been your
hook? Have you done any testing/simulation to see if new
tasks are blocking when you run them outside of Gerrit?

FYI, hook tasks need to terminate AND close STDOUT and
STDERR for them to unblock. If a descendant process of the
initial hook is still holding either STDOUT or STDERR from
the initial hook process open, Gerrit will not consider the
hook terminated, and it will wait for them to be released
before proceeding to the next hook,

Robin Bobbitt

unread,
May 9, 2016, 3:59:39 PM5/9/16
to Repo and Gerrit Discussion, robin.y...@gmail.com
Thanks Martin. My response is inline.
We aren't using hooks. This is a listener running in a custom plugin, written in Java. I'm not sure how to know if the task actually terminated. I just know that it no longer appeared in show-queue once we killed it.

Matthias Sohn

unread,
May 9, 2016, 4:59:00 PM5/9/16
to Robin Bobbitt, Repo and Gerrit Discussion
create a couple of thread dumps

$ cd <gerrit site>/logs
$ jstack $(cat gerrit.pid) > threaddump.txt

if the task is still running you should find a suspicious stack trace in the thread dump,
and looking at multiple stack traces you should get an idea if the task is still moving

-Matthias

Robin Bobbitt

unread,
May 9, 2016, 10:08:29 PM5/9/16
to Repo and Gerrit Discussion
Thanks! We'll give this a try next time we see the issue.
 
-Matthias
Reply all
Reply to author
Forward
0 new messages