Hi,
OpenStack just upgraded to version 2.10 (patch level equivalent to
2.10.3), with jgit 3.7.2, and we see the following error shortly after
the system starts to see significant load:
[2015-05-11 16:30:13,687] ERROR org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /home/gerrit2/review_site/git/openstack/nova.git/objects/pack/pack-93ad57004de887eb835b2bd4df2d7c3f6a5c394b.pack is corrupt, removing it from pack list
org.eclipse.jgit.errors.CorruptObjectException: Object at 87,706,216 in /home/gerrit2/review_site/git/openstack/nova.git/objects/pack/pack-93ad57004de887eb835b2bd4df2d7c3f6a5c394b.pack has bad zlib stream
at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:840)
at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:259)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:417)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:378)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
at org.eclipse.jgit.diff.ContentSource$ObjectReaderSource.open(ContentSource.java:140)
at org.eclipse.jgit.diff.ContentSource$Pair.open(ContentSource.java:276)
at org.eclipse.jgit.diff.DiffFormatter.open(DiffFormatter.java:1033)
at org.eclipse.jgit.diff.DiffFormatter.createFormatResult(DiffFormatter.java:963)
at org.eclipse.jgit.diff.DiffFormatter.toFileHeader(DiffFormatter.java:928)
at com.google.gerrit.server.patch.PatchListLoader$2.call(PatchListLoader.java:203)
at com.google.gerrit.server.patch.PatchListLoader$2.call(PatchListLoader.java:200)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.DataFormatException
at org.eclipse.jgit.internal.storage.file.WindowCursor.inflate(WindowCursor.java:323)
at org.eclipse.jgit.internal.storage.file.PackFile.decompress(PackFile.java:340)
at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:813)
... 16 more
Since that pack file holds the bulk of the objects in that repository,
that means that many actions related to that repository fail thereafter.
A restart of Gerrit clears the error for a while, but then it returns.
This affects more than one repository (we've seen at least 11
repositories affected so far). For a given repository, the largest pack
file is the one affected. This may simply be due to chance as that pack
file holds the bulk of the objects for the repo. We have seen it affect
pack files ranging in size from 1MB - 300MB.
We have not seen a repeat of the file offset in that error, so each time
the error returns, it seems to be in a different place in the file.
Git fsck reports no errors other than some dangling commits. Git
verify-pack reports no errors on the pack file. We used git show-index
to find the blob at the offset indicated, and git show on that blob
works fine. We installed jgit-cli and it is also able to show the same
blob without error.
The files are on a local ext4 filesystem. We do repack repositories
once a week, and we believe we have also seen errors related to the pack
file names being reused. However, we believe that to be a separate
error and not related to the "bad zlib stream" error under discussion
here. In particular, we have restarted Gerrit multiple times since
encountering it after the last external repack was completed, so nothing
should be altering the repositories on disk aside from Gerrit at this
point.
It's worth noting that this commit recently touched relevant parts of
jgit:
https://github.com/eclipse/jgit/commit/94c4d7eee85d5ffe19d04c5a6e60192430d4fe1e#diff-cd9200eacde17f31ca6b3f490d4a2a97R319
We have not found a logic error in the commit (which is in jgit 3.7),
however, with it, if the supplied buffer is not large enough to receive
the decompressed object, the packfile will be marked as corrupt. This
suggests that if the packfile is not actually corrupt (which we believe
to be the case) that either Gerrit is not sizing the buffer correctly or
unexpected data are being produced in the decompression process (whether
via zlib or jgit's window routines).
We would appreciate some suggestions as to how to further diagnose this
error.
Thanks,
Jim