Issue in JGit 3.5.1.201410131835-r (used in 2.10-rc1)

623 views
Skip to first unread message

Lundh, Gustaf

unread,
Dec 17, 2014, 5:28:16 AM12/17/14
to repo-discuss, Pursehouse, David (Sony Mobile), Selberg, Sven, Saša Živkov
We patched Gerrit 2.8.6.1 with the JGit version that can be found in v2.10-rc1 (3.5.1.201410131835-r) since we wanted a few fixed that was introduced since 3.2.x

Sadly we had to rollback our production system quite quickly due to a pretty bad JGit issue in this version (at least, that is our assumption).

Symptoms:

In certain circumstances Gerrit cannot load refs/meta/config, hence the projects become invisible since Gerrit cannot calculate the ACL for that Git.

The logs are _full_ of these:

[2014-12-15 05:35:14,276] WARN com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project platform/git/project
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 40bcb7aa806d2790bbf0f7d2ee6a42cd9ee2f03a
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2344)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:122)
at com.google.gerrit.server.project.ProjectControl$GenericFactory.controlFor(ProjectControl.java:80)
at com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:74)
at com.google.gerrit.server.query.change.IsVisibleToPredicate.match(IsVisibleToPredicate.java:62)
at com.google.gerrit.server.query.change.IsVisibleToPredicate.match(IsVisibleToPredicate.java:27)
at com.google.gerrit.server.query.AndPredicate.match(AndPredicate.java:75)
at com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:130)
at com.google.gerrit.server.query.change.AndSource.read(AndSource.java:94)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:261)
at com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:153)
at com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:141)
at com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:108)
at com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:41)
at com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:306)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278)
at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268)
at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180)
at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
[Removed some of the stack trace here]
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 40bcb7aa806d2790bbf0f7d2ee6a42cd9ee2f03a
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:148)
at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:229)
at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:839)
at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:752)
at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:122)
at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:98)
at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:274)
at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:258)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
... 56 more

It is always a commit in the refs/meta/config that is missing (but not necessary the tip). The issue does not seem appear on refs/heads/*.

When the first "Missing unknown" happens in a Git, it will continue to happen every time you access the Git. No clearing of caches helps.

As time pass, more Gits gets affected. After just two days we had 15 inaccessible gits.

Since we cannot recover from the issue, we think an IOException happens in
openPackedObject(WindowCursor curs, AnyObjectId objectId), and the packFile is removed from future access.

Interesting enough this happened on a Git with just an empty initial commit, and it had not been touched for months. So it does _not_ seem to be a racing condition where we are trying to read a pack-file while writing to the git directory.

I cannot tie the initial MissingObjectException to a certain user interaction, it just suddenly happens when Gerrit is trying to read objects from refs/meta/config. I cannot reproduce it on our staging environment, which makes the issue very hard to debug. So I'm asking for help. This issue makes me very hesitant to update to 2.10.

Rolling back to 3.2.0.201312181205-r solved all issues.

/Gustaf Lundh


Bassem Rabil

unread,
Dec 17, 2014, 8:01:57 AM12/17/14
to repo-d...@googlegroups.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com, ziv...@gmail.com
We experienced a similar behavior last month using a custom jgit based on jgit 3.4 with Gerrit 2.9.1, however this was coinciding with storage migration and we thought that this was the root cause of such behavior.
To fix this, we restored refs/meta/config from slaves and since then things are stable.

Regards
Bassem

Saša Živkov

unread,
Dec 17, 2014, 8:10:44 AM12/17/14
to Lundh, Gustaf, repo-discuss, Pursehouse, David (Sony Mobile), Selberg, Sven
Does a restart of Gerrit help? 


As time pass, more Gits gets affected. After just two days we had 15 inaccessible gits.

Since we cannot recover from the issue, we think an IOException happens in
        openPackedObject(WindowCursor curs, AnyObjectId objectId), and the packFile is removed from future access.

Interesting enough this happened on a Git with just an empty initial commit, and it had not been touched for months.

If a restart doesn't solve the issue then we would like to have that empty repository from your system and debug JGit on it. 

Gustaf Lundh

unread,
Dec 17, 2014, 8:55:44 AM12/17/14
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com
> Does a restart of Gerrit help? 

A restart temporarily removes the issue until it starts happening again on more and more gits (consistent with pack files marked as bad).

We also saw the issue on this release:

version 3.4.1.201406201815-r.112-g94c4d7e

Also only in production. Cannot get it to show in our staging environment, it seems to be related to load/racing condition.

Best regards
Gustaf Lundh

Alex Blewitt

unread,
Dec 17, 2014, 9:07:19 AM12/17/14
to Bassem Rabil, repo-d...@googlegroups.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com, ziv...@gmail.com
If you do a git fsck then it will tell you if the repository data has problems. If it doesn't you can do a git gc which will repack the files meaning jgit may be able to load them. 

Alex

Sent from my iPhat 6
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bassem Rabil

unread,
Dec 17, 2014, 9:12:36 AM12/17/14
to repo-d...@googlegroups.com, bassem.ra...@ericsson.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com, ziv...@gmail.com
In our case running git gc, gerrit gc, nor flushing caches didn't help to restore the repository visibility, fetching refs/meta/config from slaves was the solution to restore the visibility of these repositories.

Regards
Bassem

Kevin Sage

unread,
Dec 17, 2014, 12:48:05 PM12/17/14
to repo-d...@googlegroups.com, bassem.ra...@ericsson.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com, ziv...@gmail.com
We're seeing this same behavior - though we hadn't tracked it down to refs/meta/config - on Gerrit 2.9.1 (JGit 3.4.0.201405051725-m7).   In every case we've seen so far, running a "git gc --aggressive" fixes the problem immediately.  This is happening for repos of varying sizes and complexity.  In every case we've tried, "git fsck" does not report any errors.  C Git seems totally happy with the repos; it's just JGit/Gerrit that are making them invisible. 

Flushing caches does not help and "gerrit gc" refuses to run because Gerrit has made the repo invisible and therefore thinks it doesn't exist.

Anyway, just wanted to add a data point that it's happening on 2.9.1 and not only the 2.10 rc.

Thanks,
Kevin

Saša Živkov

unread,
Dec 18, 2014, 5:05:55 AM12/18/14
to Lundh, Gustaf, repo-discuss, Pursehouse, David (Sony Mobile), Selberg, Sven
On Wed, Dec 17, 2014 at 11:27 AM, Lundh, Gustaf <Gustaf...@sonymobile.com> wrote:
Can you find this IOException in your error_log and post it here?

Gustaf Lundh

unread,
Dec 18, 2014, 7:46:03 AM12/18/14
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com
> Can you find this IOException in your error_log and post it here?

No. Annoyingly the IOException is silenced when the PackFile is marked as bad[1]. And a lot of stuff can trigger an IOException when JGit tries to read the object from the PackFile. Due to this, it is pretty much impossible for me to track down the issue any further.

I was thinking about writing a patch, to allow us to re-throw the Exception. But that would of course mean we would not continue looking in other PackFiles if we stumble upon a "broken" one. Which I guess we want.

[1]

ObjectLoader openPackedObject(WindowCursor curs, AnyObjectId objectId) {
PackList pList;
do {
SEARCH: for (;;) {
pList = packList.get();
for (PackFile p : pList.packs) {
try {
ObjectLoader ldr = p.get(curs, objectId);
if (ldr != null)
return ldr;
} catch (PackMismatchException e) {
// Pack was modified; refresh the entire pack list.
if (searchPacksAgain(pList))
continue SEARCH;
} catch (IOException e) {
// Assume the pack is corrupted.
removePack(p);
}
}
break SEARCH;
}
} while (searchPacksAgain(pList));
return null;

Sven Selberg

unread,
Dec 22, 2014, 3:52:03 AM12/22/14
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com
The code is from org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/ObjectDirectory.java

The "facts":
1. If it happens once to a Git repoistory, it keeps on happening until Gerrit Server is restarted.
2. A restart solve the issue.

Points towards:

First occourence:
1. An IOException is thrown somewhere in the try-clause.
2. The pack file is assumed corrupted and is removed.
3. No more pack files => both for loops terminate.
4. Null is returned which results in the ObjectMissingException and the stacktrace above.
Following occourences:
1. pList.packs is now empty => both for loops  terminate
2. Null is returned which results in the ObjectMissingException and the stacktrace above.

Looking at the code executed in the try clause, there can be a number of causes behind the IOException that gets this vicious cycle going. And as Gustaf stated earlier (and which  is apparent by the snippet of code provided in his latest post. This IOException is simply swallowed.
Since the problem only seems to manifest in a high load production environment, we would have to break our production environment to reproduce it, if someone can't come up with brilliant idea what could be the root cause.

Has anyone been able to recreate this issue in a test/dev environment?
or
Is there anyone who might have an idea how you might be able to recreate it?

Will investigate this further during the holidays.

/Sven

Matthias Sohn

unread,
Dec 22, 2014, 9:27:17 AM12/22/14
to Gustaf Lundh, Repo and Gerrit Discussion, David Pursehouse, Sven.S...@sonymobile.com
you may build a custom jgit version using the attached patch to get the exception logged to System.err, 
apply it on top of jgit v3.4.2.201412180340-r, run "mvn clean install" to build it and replace it in your gerrit version.

--
Matthias
0001-Log-reason-for-ignoring-pack-when-IOException-occurr.patch

Christian Halstrick

unread,
Dec 22, 2014, 10:30:06 AM12/22/14
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com
My proposal is to modify JGit to at least re-throw the exception if the object couldn't be found in another packfile. When we want to load object X from any of our pack files and we hit an exception when trying to load X from packfile p(n) but we succeed to load X from packfile p(n+1) then the operation should succeed. Only if we hit the IOException when reading from p(n+1) and we can't read it from any other pack then we should rethrow the exception. If reading X from p(n) leads to an exception and also reading it from p(n+1) I think it's ok to rethrow that latest exception.

What I am not sure about: When reading from p(n) leading to an exception but reading from p(n+1) succeeds then we want to ignore p(n) from now on. But I think that should definitly go also into one of the logs of gerrit. We have a succesfull JGit API call which should lead to a new entry in the gerrit error log. Not sure how to do that best.

I personally made bad experiences with JGit dealing with packfiles stored on NFS shares and I had to introduce a new config param for that (see commit 0fc8b05 in JGit). Are youre repos on NFS?


Sven Selberg

unread,
Dec 23, 2014, 5:05:19 AM12/23/14
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com


Den måndagen den 22:e december 2014 kl. 16:30:06 UTC+1 skrev Christian Halstrick:
My proposal is to modify JGit to at least re-throw the exception if the object couldn't be found in another packfile. When we want to load object X from any of our pack files and we hit an exception when trying to load X from packfile p(n) but we succeed to load X from packfile p(n+1) then the operation should succeed. Only if we hit the IOException when reading from p(n+1) and we can't read it from any other pack then we should rethrow the exception. If reading X from p(n) leads to an exception and also reading it from p(n+1) I think it's ok to rethrow that latest exception.

I think this idea is excellent.
 
What I am not sure about: When reading from p(n) leading to an exception but reading from p(n+1) succeeds then we want to ignore p(n) from now on. But I think that should definitly go also into one of the logs of gerrit. We have a succesfull JGit API call which should lead to a new entry in the gerrit error log. Not sure how to do that best.

I'm guessing you meant  "un-successful". It would be nice if JGit could perculate some notice about the fact that a pack-file is corrupt, even if it also finds a working pack file.

I personally made bad experiences with JGit dealing with packfiles stored on NFS shares and I had to introduce a new config param for that (see commit 0fc8b05 in JGit). Are youre repos on NFS?

Our repos resides on SAN so no. SCSI I guess.

/Sven 

Christian Halstrick

unread,
Jan 21, 2015, 9:17:53 AM1/21/15
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com


On Tuesday, December 23, 2014 at 11:05:19 AM UTC+1, Sven Selberg wrote:


Den måndagen den 22:e december 2014 kl. 16:30:06 UTC+1 skrev Christian Halstrick:
My proposal is to modify JGit to at least re-throw the exception if the object couldn't be found in another packfile. When we want to load object X from any of our pack files and we hit an exception when trying to load X from packfile p(n) but we succeed to load X from packfile p(n+1) then the operation should succeed. Only if we hit the IOException when reading from p(n+1) and we can't read it from any other pack then we should rethrow the exception. If reading X from p(n) leads to an exception and also reading it from p(n+1) I think it's ok to rethrow that latest exception.

I think this idea is excellent.

See https://git.eclipse.org/r/#/c/39685 . Something similar is in now.
  
What I am not sure about: When reading from p(n) leading to an exception but reading from p(n+1) succeeds then we want to ignore p(n) from now on. But I think that should definitly go also into one of the logs of gerrit. We have a succesfull JGit API call which should lead to a new entry in the gerrit error log. Not sure how to do that best.

I'm guessing you meant  "un-successful". It would be nice if JGit could perculate some notice about the fact that a pack-file is corrupt, even if it also finds a working pack file.

No, I meant "succesful". A request to gerrit (e.g. a push) can be succesful although during request processing we detected corrupt packfiles. Because in the end it came out that these corrupt packfiles where not required to process the request we want to return success and additionally log the existence of corrupt packfiles.

Sven Selberg

unread,
Feb 5, 2015, 8:26:08 AM2/5/15
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com
Background:
* We run $ git gc on all repositories every night. (that is (C)Git gc not JGit gc) # I have a feeling that this is the crux...
* This happend after the first gc after deployment

We patched our gerrit v2.8.6.1 with jgitv3.6.2 and ran in production. The output created through [1] resulted in [2].
This only led us to org.eclipse.jgit.internal.storage.file.PackFile.java:599

private void doOpen() throws IOException {
try {
if (invalid)
throw new PackInvalidException(packFile);  // Right here!!!!
synchronized (readLock) {
fd = new RandomAccessFile(packFile, "r"); //$NON-NLS-1$
length = fd.length();
onOpenPack();
}
} catch (IOException ioe) {
openFail();
throw ioe;
} catch (RuntimeException re) {
openFail();
throw re;
} catch (Error re) {
openFail();
throw re;
}
}

So apparently something marked this PackFile as invalid... back to square one.

The invalid flag is set in three places.
1. org.eclipse.jgit.internal.storage.file.PackFile.idx() (line:179) [3]
2. org.eclipse.jgit.internal.storage.file.PackFile.setInvalid() (line:550)
3 org.eclipse.jgit.internal.storage.file.PackFile.openFail() (line: 620)

All of which feels more or less like Rome (All paths leads to...)

/Sven

[2] ERROR: Exception caught while accessing pack file /some/path/problem-repository.git/objects/pack/pack-84fc226d0fb0f0c0b97c4f4d3ab8a1d1c2553b93.pack, the pack file might be corrupt
org.eclipse.jgit.errors.PackInvalidException: Pack file invalid:  /some/path/problem-repository.git/objects/pack/pack-84fc226d0fb0f0c0b97c4f4d3ab8a1d1c2553b93.pack
at org.eclipse.jgit.internal.storage.file.PackFile.doOpen(PackFile.java:599)
at org.eclipse.jgit.internal.storage.file.PackFile.beginWindowCache(PackFile.java:583)
at org.eclipse.jgit.internal.storage.file.WindowCache.load(WindowCache.java:284)
at org.eclipse.jgit.internal.storage.file.WindowCache.getOrLoad(WindowCache.java:368)
at org.eclipse.jgit.internal.storage.file.WindowCache.get(WindowCache.java:179)
at org.eclipse.jgit.internal.storage.file.WindowCursor.pin(WindowCursor.java:354)
at org.eclipse.jgit.internal.storage.file.WindowCursor.copy(WindowCursor.java:226)
at org.eclipse.jgit.internal.storage.file.PackFile.readFully(PackFile.java:556)
at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:714)
at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:257)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:416)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:385)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:377)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:229)
at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:840)
at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:753)
at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:122)
at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:98)
at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:274)
at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:258)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:122)
at com.google.gerrit.server.project.ProjectControl$GenericFactory.controlFor(ProjectControl.java:80)
at com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:74)
at com.google.gerrit.server.query.change.IsVisibleToPredicate.match(IsVisibleToPredicate.java:62)
at com.google.gerrit.server.query.change.IsVisibleToPredicate.match(IsVisibleToPredicate.java:27)
at com.google.gerrit.server.query.AndPredicate.match(AndPredicate.java:75)
at com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:111)
at com.google.gerrit.server.query.change.AndSource.read(AndSource.java:94)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:261)
at com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:153)
at com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:141)
at com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:108)
at com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:41)
at com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:306)

[3]  private synchronized PackIndex idx() throws IOException {
if (loadedIdx == null) {
if (invalid)
throw new PackInvalidException(packFile);

try {
final PackIndex idx = PackIndex.open(extFile(INDEX));

if (packChecksum == null)
packChecksum = idx.packChecksum;
else if (!Arrays.equals(packChecksum, idx.packChecksum))
throw new PackMismatchException(JGitText.get().packChecksumMismatch);

loadedIdx = idx;
} catch (IOException e) {
invalid = true;
throw e;
}
}
return loadedIdx;
}

Matthias Sohn

unread,
Feb 10, 2015, 12:29:09 PM2/10/15
to Sven Selberg, Repo and Gerrit Discussion, Gustaf...@sonymobile.com, David Pursehouse
do you think this patch would help to understand what's going wrong ?

-Matthias 

Sven Selberg

unread,
Feb 17, 2015, 9:13:34 AM2/17/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com
do you think this patch would help to understand what's going wrong ?

-Matthias 

We built our Gerrit with a jgit with your patch. And increased the logging in every point in PackFile.java where the PackFile is set to invalid.
And found this:
1. FileNotFoundException while trying to read the index file in a git => corresponding packfile is set invalid=true; [1]
#Four seconds delay and then an incoming git clone
2. For each object in git: [3]
    2.1 Try to read the packfile in ObjectToPack => PackInvalidException [2] (because PackFile.invalid == true)
    2.2 Try to find object in one of the packfiles => success.

The affected git has about ~3.5M objects and due to our extra logging we ended up with >2G of logs before we aborted and switched to a gerrit.war with a more friendly jgit.

We suspect that this isn't the root cause of our issue but it looks like a flaw. There's a whole lot of Exceptions thrown and swallowed.
One could perhaps check if the packfile is valid in some way, instead of throwing and catching millions of exceptions just because the git was gc:ed.

We made sure we wouldn't end up in the same log-bonanza and deployed again today. No luck yet...

/Sven

[1]
[2015-02-13 15:12:01,707] ERROR org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /gerrit/location/git/trouble-project.git/objects/pack/pack-1bbbc0e22d96ec6bb40ac4e69110e659e4676772.pack was deleted, removing it from pack list
java.io.FileNotFoundException: /gerrit/location/git/trouble-project.git/objects/pack/pack-1bbbc0e22d96ec6bb40ac4e69110e659e4676772.idx (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.eclipse.jgit.internal.storage.file.PackIndex.open(PackIndex.java:94)
at org.eclipse.jgit.internal.storage.file.PackFile.idx(PackFile.java:175)
at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:266)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:417)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:378)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeWholeObjectDeflate(PackWriter.java:1563)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1549)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1492)
at org.eclipse.jgit.internal.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:164)
at org.eclipse.jgit.internal.storage.file.WindowCursor.writeObjects(WindowCursor.java:196)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1480)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1467)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writePack(PackWriter.java:1036)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1417)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1271)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:717)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:628)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:57)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:364)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

[2]
[2015-02-13 15:12:04,723] WARN  org.eclipse.jgit.internal.storage.file.PackFile : Exception while opening packfile /gerrit/location/git/trouble-project.git/objects/pack/pack-1bbbc0e22d96ec6bb40ac4e69110e659e4676772.pack
org.eclipse.jgit.errors.PackInvalidException: Pack file invalid: /gerrit/location/git/trouble-project.git/objects/pack/pack-1bbbc0e22d96ec6bb40ac4e69110e659e4676772.pack
at org.eclipse.jgit.internal.storage.file.PackFile.doOpen(PackFile.java:612)
at org.eclipse.jgit.internal.storage.file.PackFile.beginCopyAsIs(PackFile.java:577)
at org.eclipse.jgit.internal.storage.file.PackFile.copyAsIs(PackFile.java:365)
at org.eclipse.jgit.internal.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:190)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1515)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1492)
at org.eclipse.jgit.internal.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:164)
at org.eclipse.jgit.internal.storage.file.WindowCursor.writeObjects(WindowCursor.java:196)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1480)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1467)
at org.eclipse.jgit.internal.storage.pack.PackWriter.writePack(PackWriter.java:1036)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1417)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1271)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:717)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:628)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:57)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:364)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

[3]
org.eclipse.jgit.internal.storage.file.WindowCursor.java [line: 193]
public void writeObjects(PackOutputStream out, List<ObjectToPack> list)
throws IOException {
for (ObjectToPack otp : list)
out.writeObject(otp);
}

Gustaf Lundh

unread,
Feb 20, 2015, 5:03:36 AM2/20/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com
After temporarily deploying Gerrit 2.8.1 together with the logging enhanced JGit version (thanks again Matthias) in our production environment, we have successfully recreated the issue. And to me it all seems really weird. Hopefully some of you can make better sense of it. I will add the relevant stacktraces inline, so it will be quite a long post. But please bear with me.

The issue seems to be sporadically triggered when one of Gerrit’s gits is GC:ed using native “git gc” and very soon afterwards (or perhaps even during the GC) the git is accessed by Gerrit. We are using native “git gc” since it is both faster and uses much less memory compared to gerrit gc, especially when gc:ing really large repositories.

The chain of events seems to be the following:

1. A “git gc” on project-a has just been completed

2. Gerrit tries to read from project-a.git

3. The newly created .pack-file’s checksum does not match the .idx-file’s checksum:

[2015-02-17 16:26:08,524] WARN  org.eclipse.jgit.internal.storage.file.PackFile : Exception while opening packfile /git/platform/project-a.git/objects/pack/pack-c5bb0b85430943881a695eb80db84fd61ba084a9.pack
org.eclipse.jgit.errors.PackMismatchException: Pack object count mismatch: pack 57cbea9c398eee170ba5dabc32cd5481274abfde index ed41394b323d11eb4a28f0696bafbfe7130ef0b7: /git/platform/project-a.git/objects/pack/pack-c5bb0b85430943881a695eb80db84fd61ba084a9.pack
                             at org.eclipse.jgit.internal.storage.file.PackFile.onOpenPack(PackFile.java:715)
                             at org.eclipse.jgit.internal.storage.file.PackFile.doOpen(PackFile.java:616)
                             at org.eclipse.jgit.internal.storage.file.PackFile.beginWindowCache(PackFile.java:596)
                             at org.eclipse.jgit.internal.storage.file.WindowCache.load(WindowCache.java:284)
                             at org.eclipse.jgit.internal.storage.file.WindowCache.getOrLoad(WindowCache.java:368)
                             at org.eclipse.jgit.internal.storage.file.WindowCache.get(WindowCache.java:179)
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.pin(WindowCursor.java:358)
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.copy(WindowCursor.java:226)
                             at org.eclipse.jgit.internal.storage.file.PackFile.readFully(PackFile.java:569)
                             at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:732)
                             at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:267)
                             at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:417)
                             at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
                             at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:378)
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
                             at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:229)
                             at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:840)
                             at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:753)
                             at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:122)
                             at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:98)
                             at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:274)
                             at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:258)
                             at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
                             at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
                             at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
                             at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
                             at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
                             at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
                             at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
                             at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:122)
                             at com.google.gerrit.server.project.ProjectCacheImpl.get(ProjectCacheImpl.java:109)
                             at com.google.gerrit.server.project.ListProjects.display(ListProjects.java:248)
                             at com.google.gerrit.sshd.commands.ListProjectsCommand$1.run(ListProjectsCommand.java:46)
                             at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
                             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
                             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
                             at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:364)
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                             at java.lang.Thread.run(Thread.java:724)
[2015-02-17 16:26:08,525] WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project platform/project-a
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 05cb53957f3262d696b60fb441d7600480421d70
                             at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
                             at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
                             at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
                             at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
                             at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2344)
                             at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
                             at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
                             at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
                             at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
                             at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
                             at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
                             at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:122)
                             at com.google.gerrit.server.project.ProjectCacheImpl.get(ProjectCacheImpl.java:109)
                             at com.google.gerrit.server.project.ListProjects.display(ListProjects.java:248)
                             at com.google.gerrit.sshd.commands.ListProjectsCommand$1.run(ListProjectsCommand.java:46)
                             at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
                             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
                             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
                             at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:364)
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                             at java.lang.Thread.run(Thread.java:724)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 05cb53957f3262d696b60fb441d7600480421d70
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:148)
                             at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:229)
                             at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:840)
                             at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:753)
                             at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:122)
                             at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:98)
                             at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:274)
                             at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:258)
                             at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
                             at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
                             ... 18 more

4. The git is now marked invalid and will soon be removed from the list of existing pack files. This means that not a single commit will be reachable. Next time it will be accessed, (for instance when reading the refs/meta/config a bit later) it will be removed from the pack-file list:

[2015-02-17 16:26:26,602] ERROR org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /git/platform/project-a.git/objects/pack/pack-c5bb0b85430943881a695eb80db84fd61ba084a9.pack is corrupt, removing it from pack list
org.eclipse.jgit.errors.PackInvalidException: Pack file invalid: /git/platform/project-a.git/objects/pack/pack-c5bb0b85430943881a695eb80db84fd61ba084a9.pack
                             at org.eclipse.jgit.internal.storage.file.PackFile.doOpen(PackFile.java:612)
                             at org.eclipse.jgit.internal.storage.file.PackFile.beginWindowCache(PackFile.java:596)
                             at org.eclipse.jgit.internal.storage.file.WindowCache.load(WindowCache.java:284)
                             at org.eclipse.jgit.internal.storage.file.WindowCache.getOrLoad(WindowCache.java:368)
                             at org.eclipse.jgit.internal.storage.file.WindowCache.get(WindowCache.java:179)
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.pin(WindowCursor.java:358)
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.copy(WindowCursor.java:226)
                             at org.eclipse.jgit.internal.storage.file.PackFile.readFully(PackFile.java:569)
                             at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:732)
                             at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:267)
                             at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:417)
                             at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
                             at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:378)
                             at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
                             at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:229)
                             at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:840)
                             at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:753)
                             at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:122)
                             at com.google.gerrit.server.git.VersionedMetaData.load(VersionedMetaData.java:98)
                             at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:274)
                             at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:258)
                             at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
                             at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
                             at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
                             at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
                             at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
                             at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
                             at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
                             at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:122)
                             at com.google.gerrit.server.project.ProjectCacheImpl.get(ProjectCacheImpl.java:109)
                             at com.google.gerrit.server.project.ListProjects.display(ListProjects.java:248)
                             at com.google.gerrit.sshd.commands.ListProjectsCommand$1.run(ListProjectsCommand.java:46)
                             at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
                             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
                             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
                             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
                             at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:364)
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                             at java.lang.Thread.run(Thread.java:724)

5. The project will now never be readable until we restart. Since users cannot access the git, no one can push to the git and the git gc will not create new packfiles (stacktraces cut out):

java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 05cb53957f3262d696b60fb441d7600480421d70
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 05cb53957f3262d696b60fb441d7600480421d70
[2015-02-17 16:26:50,500] WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project platform/project-a
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 05cb53957f3262d696b60fb441d7600480421d70
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 05cb53957f3262d696b60fb441d7600480421d70
(etc. etc.)

And now, really weird part. As we can see, what triggers all this crazyness is the failing comparison between the checksum of the .pack-file (the last 20 bytes in the file) with the corresponding checksum of the .idx-file (byte 40-20 counting from the end of the file). But! Looking at the .pack-file and .idx-file:

tail -c 20 pack-c5bb0b85430943881a695eb80db84fd61ba084a9.pack| hexdump -v -e '/1 "%02X "'
57 CB EA 9C 39 8E EE 17 0B A5 DA BC 32 CD 54 81 27 4A BF DE

tail -c 40 pack-c5bb0b85430943881a695eb80db84fd61ba084a9.idx| hexdump -v -e '/1 "%02X "'
57 CB EA 9C 39 8E EE 17 0B A5 DA BC 32 CD 54 81 27 4A BF DE E0 80 6E 6E D6 EE 6F CA 88 3B 4F 16 9D D8 CE 8D 83 D1 AA AA

…we can see that the checksums are indeed correct! Let’s take a look at the initial log message again:

WARN  org.eclipse.jgit.internal.storage.file.PackFile : Exception while opening packfile /git/platform/project-a.git/objects/pack/pack-c5bb0b85430943881a695eb80db84fd61ba084a9.pack
org.eclipse.jgit.errors.PackMismatchException: Pack object count mismatch: pack 57cbea9c398eee170ba5dabc32cd5481274abfde index ed41394b323d11eb4a28f0696bafbfe7130ef0b7:

When jgit reads the checksum from the .idx it gets these bytes (in hex): ed41394b323d11eb4a28f0696bafbfe7130ef0b7

So you may think: “Ahh! Perhaps there something strange with the git gc, and somehow it renames the tmp-files to the correct and final filenames for the .pack- and .idx- files, before it has even finished writing the .idx-file!”. But no! Because the bytes jgit finds in the .idx is nowhere to be seen in the whole file! Searched the index file, and not even 6 bytes from this byte stream: ed41394b323d11eb4a28f0696bafbfe7130ef0b7 can be found in the .idx-file (or the .pack-file for that matter!)

How is this possible? I’m reading the JGit commit history and cannot seem to find any commit which could be the culprit of this bug. But it essentially hinders us from upgrading to newer versions of Gerrit.

Best regards
Gustaf

Bassem Rabil

unread,
Feb 20, 2015, 8:42:33 AM2/20/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com, Gustaf...@sonymobile.com, David.Pu...@sonymobile.com
In another thread regarding garbage collection and bitmap generation [1], Matthias suggested a fix in JGit [2] that helps to always fetch the newest pack file without relying on the modification time. Also we came across another parameter for repack which looks useful in case of concurrent fetches of the git repository [3], I am quoting below the documentation for this switch, i.e.
"
--pack-kept-objects
Include objects in .keep files when repacking. Note that we still do not delete .keep packs after pack-objects finishes. This means that we may duplicate objects, but this makes the option safe to use when there are concurrent pushes or fetches. This option is generally only useful if you are writing bitmaps with -b or pack.writebitmaps, as it ensures that the bitmapped packfile has the necessary objects.
"
Or instead you can just add the parameter to your git repository " repack.packKeptObjects = true ". We are currently considering relying on standalone JGit to run the garbage collection and generate bitmaps instead of native git, for the busy repositories this results in much better clone/fetch performance. From this thread [1], the resultant pack files and bitmaps using JGit are different than those generated using native Git.



Regards
Bassem

Lundh, Gustaf

unread,
Feb 20, 2015, 9:26:22 AM2/20/15
to Bassem Rabil, repo-d...@googlegroups.com, Selberg, Sven, Pursehouse, David (Sony Mobile)

Thanks for your post and suggestions, Bassem.

 

While core.trustFolderStat may help cases where new .pack-files are not picked up at all, I’m not sure how it would relate to our case, where a correct .pack and .idx-file is marked corrupt/invalid due to mismatched checksums (when the checksums are indeed correct on disk).  This issue also only happens on newer versions of JGit.

 

“This option is generally only useful if you are writing bitmaps with -b or pack.writebitmaps, as it ensures that the bitmapped packfile has the necessary objects.”

 

On this particular server we are currently avoiding using bitmaps for unrelated reasons (will change in the future though)

 

“We are currently considering relying on standalone JGit to run the garbage collection and generate bitmaps instead of native git, for the busy repositories this results in much better clone/fetch performance. From this thread [1], the resultant pack files and bitmaps using JGit are different than those generated using native Git.”

 

The invalid .pack- and .idx files are very much readable by JGit after a restart. I’m not worried about the .pack format in this case. The JGit vs Git bitmap/gc performance is also very interesting discussion, but I feel this should probably be a handled in a separate thread J

 

Best regards

Gustaf

Martin Fick

unread,
Feb 20, 2015, 9:58:28 AM2/20/15
to Lundh, Gustaf, Selberg, Sven, repo-d...@googlegroups.com, Bassem Rabil, Pursehouse, David (Sony Mobile)

At one point the pack files were named after the sha of the contents of the objects in the packfiles, and not after the packed contents of the file.  This meant that 2 packfiles with the same objects, but packed differently, perhaps because of different repack settings, can end up with the same name.  There was talk of changing this naming convention to use the sha of the contents to prevent this, but I am not sure if that has happened yet, and if your GC is doing that?

When a repack happens on a repo with no new objects and this naming convention, the repack script will end up recreating new pack and index files with the same names as the current ones.  It will then replace the old ones with the new ones. It is possible for the old index file to be read, the update to happen, and then the new packfile to be read with an offset from the old index.  This mismatch may confuse jgit into thinking there is corruption.

I am not sure this ever happens, but the possibility of it happening led me to disable this packfile replacing feature in our repacking script a long time ago.  I wonder if you may be seeing this?

-Mattin

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project

--

Lundh, Gustaf

unread,
Feb 24, 2015, 3:36:16 AM2/24/15
to Martin Fick, Selberg, Sven, repo-d...@googlegroups.com, Pursehouse, David (Sony Mobile), Matthias Sohn

Thank you for the suggestion, Martin.

 

I think you may be right regarding filename collisions. I’m not sure why only newer versions of jGit fails to handle this, but now we have something to investigate closer.

 

I wrote a script that would try the following on all our gits:

 

1.       Check if the git has exactly one .pack-file and one .idx-file.

2.       Save the filenames and sha1sum of the two files.

3.       Run “git gc”

4.       If the filenames was kept but the sha1 of the files changed, I would output some information about the git and the .pack-files.

 

It took no more than 40 gits before we had our first collision. Look at this:

 

/path/git/project-a.git/objects/pack

------------------SHA1 MISMATCH!------------------

Before git gc (sha1sum *):

 

0a0fe026d2593fcbebfc270ea20f2eeb483d8efe  pack-3af462a5a725aaa39159637800dce93dea0a4846.idx

4ecd410fdb2b1aecafd396910777af7a449b531d  pack-3af462a5a725aaa39159637800dce93dea0a4846.pack

 

After git gc (sha1sum *):

609b6d21681efe8acca23bf85d51eb32fb04fd96  pack-3af462a5a725aaa39159637800dce93dea0a4846.idx

50415da72bdeb14c88f19e58dbc172ab3b82a7eb  pack-3af462a5a725aaa39159637800dce93dea0a4846.pack

--------------------------------------------------

 

Suggestion how to patch jgit? Retry-mechanism on faulty checksum?

 

After a 10 minutes I have found 4-5 gits with the same type of naming collision.

 

We are running an older version of git on this server (1.8.2.2). I’ll try other versions, and see if it helps the situation.

Thanks again!

 

Best regards

Gustaf

Lundh, Gustaf

unread,
Feb 24, 2015, 6:26:38 AM2/24/15
to Lundh, Gustaf, Martin Fick, Selberg, Sven, repo-d...@googlegroups.com, Pursehouse, David (Sony Mobile), Matthias Sohn

Update! It seems we can get around the collision issue by just upgrading git to a later version.

 

This is the commit that fixes our naming collisions:

 

Author: Jeff King <peff@.net>  2013-12-05 21:28:07

Committer: Junio C Hamano <gitster@.com>  2013-12-06 00:40:11

Parent: ab1900a36ef7fcfe872ff9d0803f9442e356c2f0 (Appease Sun Studio by renaming "tmpfile")

Child:  f06a5e607dde266884db4a99b70fbee09d5c5efc (Merge branch 'jk/sha1write-void')

Branches: remotes/origin/maint, remotes/origin/master, remotes/origin/next, remotes/origin/pu

Follows: v1.8.4.1

Precedes: v1.9-rc0

 

    pack-objects: name pack files after trailer hash

   

    Our current scheme for naming packfiles is to calculate the

   sha1 hash of the sorted list of objects contained in the

    packfile. This gives us a unique name, so we are reasonably

    sure that two packs with the same name will contain the same

    objects.

   

    It does not, however, tell us that two such packs have the

    exact same bytes. This makes things awkward if we repack the

    same set of objects. Due to run-to-run variations, the bytes

    may not be identical (e.g., changed zlib or git versions,

    different source object reuse due to new packs in the

    repository, or even different deltas due to races during a

    multi-threaded delta search).

   

    In theory, this could be helpful to a program that cares

    that the packfile contains a certain set of objects, but

    does not care about the particular representation. In

    practice, no part of git makes use of that, and in many

    cases it is potentially harmful. For example, if a dumb http

    client fetches the .idx file, it must be sure to get the

    exact .pack that matches it. Similarly, a partial transfer

    of a .pack file cannot be safely resumed, as the actual

    bytes may have changed.  This could also affect a local

    client which opened the .idx and .pack files, closes the

    .pack file (due to memory or file descriptor limits), and

    then re-opens a changed packfile.

   

    In all of these cases, git can detect the problem, as we

    have the sha1 of the bytes themselves in the pack trailer

    (which we verify on transfer), and the .idx file references

    the trailer from the matching packfile. But it would be

    simpler and more efficient to actually get the correct

    bytes, rather than noticing the problem and having to

    restart the operation.

   

    This patch simply uses the pack trailer sha1 as the pack

    name. It should be similarly unique, but covers the exact

    representation of the objects. Other parts of git should not

    care, as the pack name is returned by pack-objects and is

    essentially opaque.

   

    One test needs to be updated, because it actually corrupts a

    pack and expects that re-packing the corrupted bytes will

    use the same name. It won't anymore, but we can easily just

    use the name that pack-objects hands back.

   

    Signed-off-by: Jeff King <peff@.net>

    Signed-off-by: Junio C Hamano gitster@.com

 

Upgrading Git on the server essentially seems to remove all naming collisions when running the same test script described in the previous post. We will patch our Gerrit with jgit once again and verify that this indeed makes the issue with “corrupt” -.pack-files disappear.

 

Big thanks for all the help from the Gerrit & JGit community.

 

Best regards

Gustaf

Chunlin Zhang

unread,
Jun 24, 2015, 10:15:26 PM6/24/15
to repo-d...@googlegroups.com, matthi...@gmail.com, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com, Gustaf...@sonymobile.com, mf...@codeaurora.org
I ran into this issue yesterday( gerrit 2.11.1 ), and use "git gc --aggressive" mentioned above to work around, it seems OK now.
Relevant error log:
'''
[2015-06-25 00:06:02,469] WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project device/xxx/common_row
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.MissingObjectException: Missing unknown ca00b97790abc3bb9d87e6ce080f140882ed7045

'''
And I found this revision is /refs/meta/config

在 2015年2月24日星期二 UTC+8下午7:26:38,Gustaf Lundh写道:

    Signed-off-by: Junio C Hamano git...@.com

Ahmed Hosni

unread,
Mar 9, 2016, 3:48:35 AM3/9/16
to Repo and Gerrit Discussion, David.Pu...@sonymobile.com, Sven.S...@sonymobile.com, ziv...@gmail.com
We are frequently facing the same issue in STMicroelectronics production server (running Gerrit 2.8.5 / JGit 3.2.0). 
The workaround using C git garbage collector  (git version 1.9) works well.

-Ahmed
Reply all
Reply to author
Forward
0 new messages