As time pass, more Gits gets affected. After just two days we had 15 inaccessible gits.
Since we cannot recover from the issue, we think an IOException happens in
openPackedObject(WindowCursor curs, AnyObjectId objectId), and the packFile is removed from future access.
Interesting enough this happened on a Git with just an empty initial commit, and it had not been touched for months.
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
My proposal is to modify JGit to at least re-throw the exception if the object couldn't be found in another packfile. When we want to load object X from any of our pack files and we hit an exception when trying to load X from packfile p(n) but we succeed to load X from packfile p(n+1) then the operation should succeed. Only if we hit the IOException when reading from p(n+1) and we can't read it from any other pack then we should rethrow the exception. If reading X from p(n) leads to an exception and also reading it from p(n+1) I think it's ok to rethrow that latest exception.
What I am not sure about: When reading from p(n) leading to an exception but reading from p(n+1) succeeds then we want to ignore p(n) from now on. But I think that should definitly go also into one of the logs of gerrit. We have a succesfull JGit API call which should lead to a new entry in the gerrit error log. Not sure how to do that best.
I personally made bad experiences with JGit dealing with packfiles stored on NFS shares and I had to introduce a new config param for that (see commit 0fc8b05 in JGit). Are youre repos on NFS?
Den måndagen den 22:e december 2014 kl. 16:30:06 UTC+1 skrev Christian Halstrick:My proposal is to modify JGit to at least re-throw the exception if the object couldn't be found in another packfile. When we want to load object X from any of our pack files and we hit an exception when trying to load X from packfile p(n) but we succeed to load X from packfile p(n+1) then the operation should succeed. Only if we hit the IOException when reading from p(n+1) and we can't read it from any other pack then we should rethrow the exception. If reading X from p(n) leads to an exception and also reading it from p(n+1) I think it's ok to rethrow that latest exception.I think this idea is excellent.
What I am not sure about: When reading from p(n) leading to an exception but reading from p(n+1) succeeds then we want to ignore p(n) from now on. But I think that should definitly go also into one of the logs of gerrit. We have a succesfull JGit API call which should lead to a new entry in the gerrit error log. Not sure how to do that best.I'm guessing you meant "un-successful". It would be nice if JGit could perculate some notice about the fact that a pack-file is corrupt, even if it also finds a working pack file.
do you think this patch would help to understand what's going wrong ?-Matthias
WARN org.eclipse.jgit.internal.storage.file.PackFile : Exception while opening packfile /git/platform/project-a.git/objects/pack/pack-c5bb0b85430943881a695eb80db84fd61ba084a9.packorg.eclipse.jgit.errors.PackMismatchException: Pack object count mismatch: pack 57cbea9c398eee170ba5dabc32cd5481274abfde index ed41394b323d11eb4a28f0696bafbfe7130ef0b7:
Thanks for your post and suggestions, Bassem.
While core.trustFolderStat may help cases where new .pack-files are not picked up at all, I’m not sure how it would relate to our case, where a correct .pack and .idx-file is marked corrupt/invalid due to mismatched checksums (when the checksums are indeed correct on disk). This issue also only happens on newer versions of JGit.
“This option is generally only useful if you are writing bitmaps with -b or pack.writebitmaps, as it ensures that the bitmapped packfile has the necessary objects.”
On this particular server we are currently avoiding using bitmaps for unrelated reasons (will change in the future though)
“We are currently considering relying on standalone JGit to run the garbage collection and generate bitmaps instead of native git, for the busy repositories this results in much better clone/fetch performance. From this thread [1], the resultant pack files and bitmaps using JGit are different than those generated using native Git.”
The invalid .pack- and .idx files are very much readable by JGit after a restart. I’m not worried about the .pack format in this case. The JGit vs Git bitmap/gc performance is also very interesting discussion, but I feel this should probably be a handled in a separate thread J
Best regards
Gustaf
At one point the pack files were named after the sha of the contents of the objects in the packfiles, and not after the packed contents of the file. This meant that 2 packfiles with the same objects, but packed differently, perhaps because of different repack settings, can end up with the same name. There was talk of changing this naming convention to use the sha of the contents to prevent this, but I am not sure if that has happened yet, and if your GC is doing that?
When a repack happens on a repo with no new objects and this naming convention, the repack script will end up recreating new pack and index files with the same names as the current ones. It will then replace the old ones with the new ones. It is possible for the old index file to be read, the update to happen, and then the new packfile to be read with an offset from the old index. This mismatch may confuse jgit into thinking there is corruption.
I am not sure this ever happens, but the possibility of it happening led me to disable this packfile replacing feature in our repacking script a long time ago. I wonder if you may be seeing this?
-Mattin
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
--
Thank you for the suggestion, Martin.
I think you may be right regarding filename collisions. I’m not sure why only newer versions of jGit fails to handle this, but now we have something to investigate closer.
I wrote a script that would try the following on all our gits:
1. Check if the git has exactly one .pack-file and one .idx-file.
2. Save the filenames and sha1sum of the two files.
3. Run “git gc”
4. If the filenames was kept but the sha1 of the files changed, I would output some information about the git and the .pack-files.
It took no more than 40 gits before we had our first collision. Look at this:
/path/git/project-a.git/objects/pack
------------------SHA1 MISMATCH!------------------
Before git gc (sha1sum *):
0a0fe026d2593fcbebfc270ea20f2eeb483d8efe pack-3af462a5a725aaa39159637800dce93dea0a4846.idx
4ecd410fdb2b1aecafd396910777af7a449b531d pack-3af462a5a725aaa39159637800dce93dea0a4846.pack
After git gc (sha1sum *):
609b6d21681efe8acca23bf85d51eb32fb04fd96 pack-3af462a5a725aaa39159637800dce93dea0a4846.idx
50415da72bdeb14c88f19e58dbc172ab3b82a7eb pack-3af462a5a725aaa39159637800dce93dea0a4846.pack
--------------------------------------------------
Suggestion how to patch jgit? Retry-mechanism on faulty checksum?
After a 10 minutes I have found 4-5 gits with the same type of naming collision.
We are running an older version of git on this server (1.8.2.2). I’ll try other versions, and see if it helps the situation.
Thanks again!
Best regards
Gustaf
Update! It seems we can get around the collision issue by just upgrading git to a later version.
This is the commit that fixes our naming collisions:
Author: Jeff King <peff@.net> 2013-12-05 21:28:07
Committer: Junio C Hamano <gitster@.com> 2013-12-06 00:40:11
Parent: ab1900a36ef7fcfe872ff9d0803f9442e356c2f0 (Appease Sun Studio by renaming "tmpfile")
Child: f06a5e607dde266884db4a99b70fbee09d5c5efc (Merge branch 'jk/sha1write-void')
Branches: remotes/origin/maint, remotes/origin/master, remotes/origin/next, remotes/origin/pu
Follows: v1.8.4.1
Precedes: v1.9-rc0
pack-objects: name pack files after trailer hash
Our current scheme for naming packfiles is to calculate the
sha1 hash of the sorted list of objects contained in the
packfile. This gives us a unique name, so we are reasonably
sure that two packs with the same name will contain the same
objects.
It does not, however, tell us that two such packs have the
exact same bytes. This makes things awkward if we repack the
same set of objects. Due to run-to-run variations, the bytes
may not be identical (e.g., changed zlib or git versions,
different source object reuse due to new packs in the
repository, or even different deltas due to races during a
multi-threaded delta search).
In theory, this could be helpful to a program that cares
that the packfile contains a certain set of objects, but
does not care about the particular representation. In
practice, no part of git makes use of that, and in many
cases it is potentially harmful. For example, if a dumb http
client fetches the .idx file, it must be sure to get the
exact .pack that matches it. Similarly, a partial transfer
of a .pack file cannot be safely resumed, as the actual
bytes may have changed. This could also affect a local
client which opened the .idx and .pack files, closes the
.pack file (due to memory or file descriptor limits), and
then re-opens a changed packfile.
In all of these cases, git can detect the problem, as we
have the sha1 of the bytes themselves in the pack trailer
(which we verify on transfer), and the .idx file references
the trailer from the matching packfile. But it would be
simpler and more efficient to actually get the correct
bytes, rather than noticing the problem and having to
restart the operation.
This patch simply uses the pack trailer sha1 as the pack
name. It should be similarly unique, but covers the exact
representation of the objects. Other parts of git should not
care, as the pack name is returned by pack-objects and is
essentially opaque.
One test needs to be updated, because it actually corrupts a
pack and expects that re-packing the corrupted bytes will
use the same name. It won't anymore, but we can easily just
use the name that pack-objects hands back.
Signed-off-by: Jeff King <peff@.net>
Signed-off-by: Junio C Hamano gitster@.com
Upgrading Git on the server essentially seems to remove all naming collisions when running the same test script described in the previous post. We will patch our Gerrit with jgit once again and verify that this indeed makes the issue with “corrupt” -.pack-files disappear.
Big thanks for all the help from the Gerrit & JGit community.
Best regards
Gustaf
Signed-off-by: Junio C Hamano git...@.com