SSHD idle timeout kills an active Gerrit GC cycle :-O

432 views
Skip to first unread message

lucamilanesio

unread,
May 11, 2015, 12:10:30 PM5/11/15
to repo-d...@googlegroups.com
I all,
I have experienced a very strange behaviour when triggering a Gerrit GC via SSHD.

The overall result is the inability to finalise a GC cycle on a very large repo (taking over 30 mins) because of the PackWriter threads terminated as a consequence of the SSHD channel timing out and then cancelling the associated threads.

I trigger the Gerrit GC using a:
ssh -p 29418 us...@gerrit.myhost.com gerrit gc my-very-large-repo

And what I see on the gc_log is:
org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed.
        at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:126)
        at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:83)
        at com.google.gerrit.sshd.commands.GarbageCollectionCommand.runGC(GarbageCollectionCommand.java:103)
        at com.google.gerrit.sshd.commands.GarbageCollectionCommand.access$500(GarbageCollectionCommand.java:44)
        at com.google.gerrit.sshd.commands.GarbageCollectionCommand$1.run(GarbageCollectionCommand.java:73)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:364)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Packing cancelled during objects writing
        at org.eclipse.jgit.internal.storage.pack.PackWriter.runTasks(PackWriter.java:1458)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.parallelDeltaSearch(PackWriter.java:1381)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.searchForDeltas(PackWriter.java:1333)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.searchForDeltas(PackWriter.java:1291)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writePack(PackWriter.java:1018)
        at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:721)
        at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:547)
        at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:166)
        at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:123)
        ... 13 more


I cannot really increase further the SSHD idle timeout (as otherwise I would impact Git/SSH users idle sessions) but at the same time I want to make sure that GC is getting completed.

A workaround to the problem is:
- Disable multi-threaded delta search (threads = 1 in the Project's Git config)
- Generate some output on the SSH channel that is controlling the Gerrit GC (possibly with --show-progress

... but this is a really dirty workaround :-( 

Does it make sense to introduce a background mode for the gerrit gc command? GC is meant to be used by a crontab user and generates his output on the gc_log, having it in foreground on the SSHD channel really does not make so much sense to me :-)

Feedback is more than  welcome as usual.

Luca.


Michał Sochoń

unread,
May 11, 2015, 3:16:24 PM5/11/15
to repo-d...@googlegroups.com
What about the ServerAliveInterval and TCPKeepAlive setting in your ~/.ssh/config?
AFAIR ServerAliveInterval is set to 0 by default

more in ssh_config(5)

Luca Milanesio

unread,
May 12, 2015, 1:50:02 AM5/12/15
to Michał Sochoń, repo-d...@googlegroups.com
Hi Michal,
Thank you for answering but the problem is not on the SSH client: the connection gets closed from the Apache Mina SSHD daemon. Actually closing the connection after 30' idle time is correct (I should even reduce it to just 5'), the problem here is the "gerrit gc" command that executes in foreground whilst should just trigger the GC in background and then exit.

Luca

Sent from my iPhone
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias Sohn

unread,
May 27, 2015, 8:50:29 AM5/27/15
to lucamilanesio, Repo and Gerrit Discussion
I think you are right, it should be possible to start a potentially long running gc over ssh and let
it execute in the background. Another command would be needed to check the status of such
background gc jobs

-Matthias

Luca Milanesio

unread,
May 27, 2015, 6:33:33 PM5/27/15
to Matthias Sohn, Repo and Gerrit Discussion
Agreed, the GC can be very long indeed and a background operation + check is needed.
In the shopping list for 2.12 ?

Luca.

Matthias Sohn

unread,
May 27, 2015, 7:06:45 PM5/27/15
to Luca Milanesio, Repo and Gerrit Discussion
On Thu, May 28, 2015 at 12:33 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Agreed, the GC can be very long indeed and a background operation + check is needed.
In the shopping list for 2.12 ?

looks like "gerrit show-queue" [1] could be a way to check status of a gc running in background


-Matthias
Reply all
Reply to author
Forward
0 new messages