"Cannot read project" after 2.10.4 upgrade

909 views
Skip to first unread message

Sven Selberg

unread,
May 20, 2015, 4:32:21 AM5/20/15
to repo-d...@googlegroups.com
Hi,

Yesterday at 23.00 we upgraded from 2.8.6.1 to 2.10.4.

After a couple of hours the error_log started to fill up with "Cannot read project".

I think I have found a pattern where [1]:

1. PatchListLoader times out for "Diff loader"
~200 ms after
2. Project pack file is deemed corrupt and removed
Almost instantaneously 
3. PatchListLoader : Error computing PatchListKey
Since jgit finds no valid pack file (my assumption) it can not get refs/meta/config and can't compute access so it ends up with
4. "Cannot read project" 
Until gc.

If you have any idea about what could be the root cause, all help would be much appretiated.

/Sven

[1] Excerpts from error_log

[2015-05-20 02:27:47,769] WARN  com.google.gerrit.server.patch.PatchListLoader : 5000 ms timeout reached for Diff loader in project some/random/project on commit f442da55817f08735a585b4311a6a6cdf08db262 on path some/random/path/binary-file.so comparing 40ffe83686c07cbd7b4c6fd573ad5760057f4093..c8ddcb18f8d5657f97fc5108ba2832e1f08b23fe
[2015-05-20 02:27:47,976] WARN  org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /gerrit/site/git/some/random/project.git/objects/pack/pack-1b91c978e60a69d59d4db0b23b888bf7cb3b9c7b.pack is corrupt, removing it from pack list
[2015-05-20 02:27:47,976] WARN  org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /gerrit/site/git/some/random/project.git/objects/pack/pack-1b91c978e60a69d59d4db0b23b888bf7cb3b9c7b.pack is corrupt, removing it from pack list
[2015-05-20 02:27:47,976] WARN  com.google.gerrit.server.patch.PatchListLoader : Error computing PatchListKey[some/random/project BASE..f442da55817f08735a585b4311a6a6cdf08db262 IGNORE_NONE]
[2015-05-20 02:27:52,967] WARN  com.google.gerrit.server.patch.PatchListLoader : Error computing PatchListKey[some/random/project BASE..f442da55817f08735a585b4311a6a6cdf08db262 IGNORE_NONE]
[2015-05-20 02:29:08,128] WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project some/random/project

David Pursehouse

unread,
May 20, 2015, 4:40:17 AM5/20/15
to Sven Selberg, repo-d...@googlegroups.com
This stack trace seems to be similar to the one given by Will Saxon in
[1]. In his case increasing the diff timeout fixed it; let's see if
that works for us too...


[1] https://groups.google.com/forum/#!topic/repo-discuss/CYYoHfDxCfA
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to repo-discuss...@googlegroups.com
> <mailto:repo-discuss...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Saša Živkov

unread,
May 20, 2015, 6:19:00 AM5/20/15
to Sven Selberg, repo-d...@googlegroups.com
On Wed, May 20, 2015 at 10:32 AM, Sven Selberg <sven.s...@sonymobile.com> wrote:
Hi,

Yesterday at 23.00 we upgraded from 2.8.6.1 to 2.10.4.

After a couple of hours the error_log started to fill up with "Cannot read project".

I think I have found a pattern where [1]:

1. PatchListLoader times out for "Diff loader"
~200 ms after
2. Project pack file is deemed corrupt and removed
Almost instantaneously 
3. PatchListLoader : Error computing PatchListKey
Since jgit finds no valid pack file (my assumption) it can not get refs/meta/config and can't compute access so it ends up with
4. "Cannot read project" 
Until gc.

If you have any idea about what could be the root cause, all help would be much appretiated.

Sven, I believe that you found the root cause!

Haven't yet looked at the source code but I am quite sure that what happens is the following:
* the diff computation is done in its own thread T1
* the "main" thread waits on the T1 with some predefined timeout
* when the timeout is reached the thread T1 will be canceled
* if, at the time when T1 was canceled, it was performing an IO on a pack file, an exception will be thrown by the JVM 
* the default exception handling at this place in JGit will declare the pack file as corrupt and remove it from its list

The problem, likely, has to be solved in the last two steps i.e. we have to differentiate an
exception thrown due to a real pack corruption and an exception thrown due to the thread being interrupted.



/Sven

[1] Excerpts from error_log

[2015-05-20 02:27:47,769] WARN  com.google.gerrit.server.patch.PatchListLoader : 5000 ms timeout reached for Diff loader in project some/random/project on commit f442da55817f08735a585b4311a6a6cdf08db262 on path some/random/path/binary-file.so comparing 40ffe83686c07cbd7b4c6fd573ad5760057f4093..c8ddcb18f8d5657f97fc5108ba2832e1f08b23fe
[2015-05-20 02:27:47,976] WARN  org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /gerrit/site/git/some/random/project.git/objects/pack/pack-1b91c978e60a69d59d4db0b23b888bf7cb3b9c7b.pack is corrupt, removing it from pack list
[2015-05-20 02:27:47,976] WARN  org.eclipse.jgit.internal.storage.file.ObjectDirectory : Pack file /gerrit/site/git/some/random/project.git/objects/pack/pack-1b91c978e60a69d59d4db0b23b888bf7cb3b9c7b.pack is corrupt, removing it from pack list
[2015-05-20 02:27:47,976] WARN  com.google.gerrit.server.patch.PatchListLoader : Error computing PatchListKey[some/random/project BASE..f442da55817f08735a585b4311a6a6cdf08db262 IGNORE_NONE]
[2015-05-20 02:27:52,967] WARN  com.google.gerrit.server.patch.PatchListLoader : Error computing PatchListKey[some/random/project BASE..f442da55817f08735a585b4311a6a6cdf08db262 IGNORE_NONE]
[2015-05-20 02:29:08,128] WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project some/random/project

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Sven Selberg

unread,
May 20, 2015, 7:50:53 AM5/20/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com


Den onsdag 20 maj 2015 kl. 12:19:00 UTC+2 skrev zivkov:


On Wed, May 20, 2015 at 10:32 AM, Sven Selberg <sven.s...@sonymobile.com> wrote:
Hi,

Yesterday at 23.00 we upgraded from 2.8.6.1 to 2.10.4.

After a couple of hours the error_log started to fill up with "Cannot read project".

I think I have found a pattern where [1]:

1. PatchListLoader times out for "Diff loader"
~200 ms after
2. Project pack file is deemed corrupt and removed
Almost instantaneously 
3. PatchListLoader : Error computing PatchListKey
Since jgit finds no valid pack file (my assumption) it can not get refs/meta/config and can't compute access so it ends up with
4. "Cannot read project" 
Until gc.

If you have any idea about what could be the root cause, all help would be much appretiated.

Sven, I believe that you found the root cause!

Haven't yet looked at the source code but I am quite sure that what happens is the following:
* the diff computation is done in its own thread T1
* the "main" thread waits on the T1 with some predefined timeout
* when the timeout is reached the thread T1 will be canceled
* if, at the time when T1 was canceled, it was performing an IO on a pack file, an exception will be thrown by the JVM 
* the default exception handling at this place in JGit will declare the pack file as corrupt and remove it from its list

The problem, likely, has to be solved in the last two steps i.e. we have to differentiate an
exception thrown due to a real pack corruption and an exception thrown due to the thread being interrupted.

 
Great!

That seems very plausible, I was thinking along the same lines although not nearly as specific :-).
Raising the timeout (as suggested in [1]) made the issue go away which could suggest that you are very right indeed.

Please add me if a fix is pushed.

/Sven

Bassem Rabil

unread,
May 20, 2015, 7:52:30 AM5/20/15
to repo-d...@googlegroups.com
We have experienced this behavior before when the pack file holding refs/meta/config is affected and we applied the following workaround to get the projects visible again without having to restart the instance, i.e.
1. Set the following parameter in the git repository config file:
[core]
                trustfolderstat = false
2. Run git repack for the repository using a command that looks like:
$ git repack -a -d --window-memory 50m --max-pack-size 100m

Setting this parameter in the git repository indicates that the pack files are always scanned and JGit does not rely on the file system modification time, and running the repack refreshes the pack files which helps to get the project back without restarting the instance.

Regards
Bassem

Sven Selberg

unread,
May 20, 2015, 9:35:25 AM5/20/15
to repo-d...@googlegroups.com

Thanks!

I don't remember the code exactly but AFAIR core.trustfolderstat = false makes jgit go into an endless loop if packfile really is corrupt or missing.
We decided that's not an option for us.

I believe the root problem is really how jgit handles exceptions and and perhaps that jgit callers must take care to end things gracefully as not to upset JGit.

/Sven

Saša Živkov

unread,
May 20, 2015, 11:08:40 AM5/20/15
to Sven Selberg, repo-d...@googlegroups.com

The fix is based on the above described assumption. I couldn't really reproduce the
issue from my debugging environment. It would be great if you could verify it in your
environment where the error was reproducible.

Doug Kelly

unread,
May 20, 2015, 11:42:29 PM5/20/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com
I can also try to test this: I don't necessarily have a means for directly reproducing it, but I *do* have a data set that I know encountered this issue at the same point on running a reindex.  I'll try a custom Gerrit build with this added to JGit and try walking through my upgrade path to see if it occurs again.

Sven Selberg

unread,
May 25, 2015, 3:53:02 AM5/25/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com
Thanks for the fix!

I'm in the middle of upgrade aftermath, can't test it before friday.

/Sven

Khai Do

unread,
May 28, 2015, 2:45:50 PM5/28/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com
We had this same problem after upgrading from 2.8 to 2.10.  I took a shot at verifying this fix and from my testing it seems like change 48288 did not fix the problem.  To test I built Gerrit from our 2.10 branch[1] replacing jgit with version 4.0.0.201505260635-rc2:

~$ unzip -l gerrit.war|grep jgit
     5643  2015-05-27 23:09   WEB-INF/lib/gerrit-patch-jgit-server.jar
  2042908  2015-05-27 23:18   WEB-INF/lib/org.eclipse.jgit-4.0.0.201505260635-rc2.jar
    16980  2015-05-27 23:18   WEB-INF/lib/org.eclipse.jgit.archive-4.0.0.201505260635-rc2.jar
    85364  2015-05-27 23:18   WEB-INF/lib/org.eclipse.jgit.http.server-4.0.0.201505260635-rc2.jar

Here's how I tested it:
1. deployed my custom gerrit build to server
2. pushed changes [2] with a few patch sets contains large diffs to 'gtest-org/test2' project.
3. set the following in gerrit.config:
  [cache "diff"]
       timeout = 10ms   
4. restart Gerrit
5. using the gerrit UI I attempted to view some changes containing small (one line) diffs in 'gtest-org/test2' project.  No problem here.
6. then I attempt to view the change uploaded in step 2.  Due to the diff size this caused a diff timeout which is understandable however the log [3] indicates 'Error computing PatchListKey'
7. now I attempt step 5 again (line 355 in log).  Changes in project 'gtest-org/test2' are no longer accessible.


Hope this helps, -Khai

Khai Do

unread,
May 28, 2015, 2:51:39 PM5/28/15
to repo-d...@googlegroups.com, sven.s...@sonymobile.com
Just to clarify, I cherry-picked change https://gerrit-review.googlesource.com/#/c/68149 onto our 2.10 fork to create my test build.

Khai Do

unread,
Jun 15, 2015, 12:38:03 PM6/15/15
to repo-d...@googlegroups.com
Hello Gerriters.  I was wondering if anybody else has tried to test the attempted fix[1] for this specific issue?  From my test results, I do not believe that the attempted fix actually worked.  I was wondering whether anybody else is continuing to looking into this?   I went ahead and entered a bug [2] for it.   

Doug Kelly

unread,
Jun 16, 2015, 2:34:04 PM6/16/15
to repo-d...@googlegroups.com
Hello Khai (and everyone else): I've tested the Gerrit 2.10.5 official release against my reproducible test case, and I did find that this issue was no longer reproducible.  Thanks!

-Doug

Zaro

unread,
Jun 16, 2015, 4:31:43 PM6/16/15
to Doug Kelly, repo-d...@googlegroups.com
Well, that's cool Doug.  Would you be able to update the bug with your test scenario and result?  Thanks. -Khai

--

Leandro Fonseca

unread,
Nov 3, 2016, 3:52:30 PM11/3/16
to Repo and Gerrit Discussion, doug...@gmail.com
Hi,

This exact same problem is happening to me on version 2.11.3:

After the diff times-out, Gerrit starts throwing "WARN  org.eclipse.jgit.internal.storage.file.ObjectDirectory" warning, and finally the project gets invisible with "WARN  com.google.gerrit.server.project.ProjectCacheImpl : Cannot read project".

The changes that triggers these errors are full of binaries (I know this should not be pushed as a change). I understand the specific change should get impacted, but the whole project is invisible, shouldn't that be fixed on version 2.10?

Thanks,
Leandro.

Khai Do

unread,
Nov 18, 2016, 1:45:51 AM11/18/16
to Repo and Gerrit Discussion, doug...@gmail.com
Yes, the fix for this was applied to 2.10 [1] but it was separately applied to 2.11 [2].  The git log[3] seems to show that it would have gotten released in the 2.10.7 and 2.11.4 releases. 

Reply all
Reply to author
Forward
0 new messages