So, I came into work today with an interesting problem that I think I know the cause, but I wanted to get some other opinions, too. In this specific case, a gerrit gc was started within seconds of the user pushing the object. From the database, I see the patch created at 2015-03-24 21:02:32.427-05. In the GC log, I find the repack started at 2015-03-24 21:02:31,767. The user received an unpacker error (missing unknown, for the sha1 he was attempting to upload), and the database entries otherwise look normal.
This sounds like a race condition, but as I recall, the gc runner script written before my time perhaps foolishly removes the reflog, which could cause such a race condition (I can't find the source, but I believe I've heard this warning before when describing git reflog expire --expire=now). This is also the first time we've seen such a, erm... lucky user.
Some of the detailed messages:
"C:\Program Files (x86)\Git\bin\git.exe" push --recurse-submodules=check --progress "origin" master:refs/for/<branch>
Counting objects: 7, done.
Delta compression using up to 12 threads.
Total 4 (delta 3), reused 0 (delta 0)
fatal: Unpack error, check server log
error: unpack failed: error Missing unknown 0150f526448c6ed6113b17fcbf6cc47ab3f90125
To <server>:<project>
! [remote rejected] master -> refs/for/<branch> (n/a (unpacker error))
error: failed to push some refs to '<server>:<project>'
Done
And from gc_log on the server:
[2015-03-24 21:02:31,767] INFO : [<project>] gc config: gc.autopacklimit=4; gc.packrefs=true; gc.reflogexpire=never; gc.reflogexpireunreachable=never;
[2015-03-24 21:02:31,768] INFO : [<project>] pack config: maxDeltaDepth=50, deltaSearchWindowSize=10, deltaSearchMemoryLimit=0, deltaCacheSize=52428800, deltaCacheLimit=100, compressionLevel=9, indexVersion=2, bigFileThreshold=52428800, threads=8, reuseDeltas=true, reuseObjects=true, deltaCompress=true, buildBitmaps=true
[2015-03-24 21:02:32,065] INFO : [<project>] before: sizeOfPackedObjects=357748052, sizeOfLooseObjects=444265, numberOfPackedObjects=233644, numberOfPackFiles=53, numberOfPackedRefs=30706, numberOfLooseRefs=87, numberOfLooseObjects=141
[2015-03-24 21:03:02,217] INFO : [<project>] after: sizeOfPackedObjects=356300241, sizeOfLooseObjects=398, numberOfPackedObjects=233466, numberOfPackFiles=2, numberOfPackedRefs=30792, numberOfLooseRefs=2, numberOfLooseObjects=1
My gut is telling me to use sane reflog expire values (maybe 1 day for unreachable, 7 for reachable), remove the hack of removing the reflog entirely, and hope this issue doesn't occur again. If it does, at least we know it wouldn't be removing the reflog seconds before performing a gc that causes the issue!