Cannot see change when using New Screen, but can with Old Screen

159 views
Skip to first unread message

Joshua J. Kugler

unread,
May 30, 2014, 2:20:00 PM5/30/14
to repo-d...@googlegroups.com
So, I have a really odd one here.

Running 2.8. Also replicated the bug in 2.8.5.

We have a review that can only be viewed if users configure their account to
use the Old Screen. If they use the New Screen, they are given a message "Not
Found" "The page you requested was not found, or you do not have permission to
view this page."

There are no errors in the error_log. All other reviews are fine. This review
*was* viewable via both Screens yesterday. The only change to the review was
that it was merged.

The URL is http://review.whamcloud.com/#/c/10028/ If you have an OpenID
account, you can log in and view it; it is a public project.


Anyone have any ideas?

j

--
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: peda...@gmail.com
PGP Key: http://pgp.mit.edu/ ID 0x73B13B6A

Joshua J. Kugler

unread,
May 30, 2014, 5:41:21 PM5/30/14
to repo-d...@googlegroups.com
So, more information. When using the new screen, it makes these requests,
which all 404 (it makes others, but it's 404'ing on these).

GET /changes/10028/revisions/8862fd2a3a34ad9bd97e1424252d37a740cbafd8/comments
GET /changes/10028/revisions/8862fd2a3a34ad9bd97e1424252d37a740cbafd8/drafts
GET /changes/10028/revisions/8862fd2a3a34ad9bd97e1424252d37a740cbafd8/files
GET /changes/10028/revisions/8862fd2a3a34ad9bd97e1424252d37a740cbafd8/files
GET /changes/10028/revisions/8862fd2a3a34ad9bd97e1424252d37a740cbafd8/commit

When using the old screen, the following requests are made, all are OK:

GET /
GET /gerrit_ui/gwt/chrome/1BEB1F98F392A12232A754A0784B49A8.cache.css
GET /static/intel_30x45.png
GET /accounts/self/capabilities
?q=createProject&q=createGroup&q=administrateServer
GET /config/server/top-menus
POST /gerrit_ui/rpc/ChangeDetailService
GET /projects/fs/lustre-release/config
GET /changes/10028/detail

So, I don't know why it's 404'ing. A

git show 8862fd2a3a34ad9bd97e1424252d37a740cbafd8

on the repo shows that merged commit.

I'd really appreciate ideas on how to debug this.

j

Doug Kelly

unread,
May 31, 2014, 9:19:33 AM5/31/14
to repo-d...@googlegroups.com
Sounds like you have an AllowEncodedSlashes problem (check the Apache/Nginx proxy installation documentation):

Good luck!

--Doug

Joshua J. Kugler

unread,
Jun 2, 2014, 4:14:00 PM6/2/14
to repo-d...@googlegroups.com, Doug Kelly
Thanks for the pointer. This is a working installation, with all the config
needed for AllowEncodedSlashes and nocanon. I can view all other changes. It
is *only* this particular change.

I can view: http://review.whamcloud.com/#/c/10027/
I cannot view: http://review.whamcloud.com/#/c/10028/
I can view: http://review.whamcloud.com/#/c/10029/

It is this one particular review that is failing and causing these errors in
the log.

j

Joshua J. Kugler

unread,
Jun 2, 2014, 8:21:38 PM6/2/14
to repo-d...@googlegroups.com
More investigation. Not sure this helps, but any data helps, eh.

So, the offending review had a hash of
8862fd2a3a34ad9bd97e1424252d37a740cbafd8

If I search for that hash in the search box, the I am redirected to
http://review.whamcloud.com/#/c/10028/ as I should be, but still am getting
that "not found" error. Note, this is a **Gerrit** not found error, not a web
server 404 error.

Another data point: this review was visible with both the old and new views
before it was merged. Now that it's merged, it's only visible with the old
view.

Is there some kind of "fsck" I can run against the database that will make
sure all information in the repository is in the database (see the GET calls
below that are 404'ing).

Any help would be appreciated. I'm getting a lot of pressure to fix this, and
I have no idea what is wrong.

j

Joshua J. Kugler

unread,
Jun 2, 2014, 8:40:19 PM6/2/14
to repo-d...@googlegroups.com
More investigation.

This particular change has has nine patch sets listed. Patch sets eight and
nine have the exact same:

- SHA1 hash
- Commiter time

To my knowledge, that isn't supposed to happen, correct? Is this pointing to
some kind of DB corruption or record insertion glitch? Goes back to my
question about doing a DB 'fsck.'

If I try the supplied command to check out the change:

git fetch ssh://review/fs/lustre-release refs/changes/28/10028/9 && git
checkout FETCH_HEAD

Git tells me:

fatal: Couldn't find remote ref refs/changes/28/10028/9

Which seems to indicate there aren't really 9 patchsets.

How can I go about fixing this?

j

Doug Kelly

unread,
Jun 3, 2014, 12:09:19 AM6/3/14
to repo-d...@googlegroups.com
Ok, that is certainly a bit more info (and sorry, I missed the part where you said it used to work on both screens)... Maybe look at the changes table and patch_sets and see if there's some inconsistency for the change ID.

--Doug

Joshua J. Kugler

unread,
Jun 3, 2014, 3:09:26 PM6/3/14
to repo-d...@googlegroups.com, Doug Kelly
So, digging in more I found this (sorry about the wrap):

mysql> select revision,created_on,change_id as cid,patch_set_id as p from
patch_sets where change_id = '10028';
+------------------------------------------+---------------------+-------+---+
| revision | created_on | cid | p |
+------------------------------------------+---------------------+-------+---+
| a6804e261a5e31ceaf11f3acafe3d678fc59cb7d | 2014-04-19 06:05:56 | 10028 | 1 |
| 612e1c0ebe6c4bfd350ea224450017ab47d1c83e | 2014-04-20 00:51:04 | 10028 | 2 |
| ec70364ec3fdf6f5c4994745bdd6d6b9d547b72c | 2014-04-23 04:52:49 | 10028 | 3 |
| 42324f804cc97fe707f8e77c1aabc9026b747059 | 2014-05-01 22:30:07 | 10028 | 4 |
| b66c262fe1862bd4632fba3be1bf3563c060843e | 2014-05-02 17:58:08 | 10028 | 5 |
| eaa08b0c47c780bed0bc02bb20c12ee0882e53c2 | 2014-05-08 17:29:53 | 10028 | 6 |
| 54bd297c12585dc8bc30a100c4bd293165265004 | 2014-05-26 17:49:23 | 10028 | 7 |
| 8862fd2a3a34ad9bd97e1424252d37a740cbafd8 | 2014-05-30 01:47:42 | 10028 | 8 |
| 8862fd2a3a34ad9bd97e1424252d37a740cbafd8 | 2014-05-30 01:47:42 | 10028 | 9 |
+------------------------------------------+---------------------+-------+---+
9 rows in set (0.00 sec)

Notice that patch_set_id's 8 and 9 have the exact same revision and created_on
time. That is certainly not something that should happen when a change is
merged.

Once I deleted that second row (patch_set_id 9) the page would load properly.

I also discovered that gerrit thought the current patch set id was 8, even
though it had created another record in the DB:

mysql> select change_id,current_patch_set_id from changes where change_id =
10028;
+-----------+----------------------+
| change_id | current_patch_set_id |
+-----------+----------------------+
| 10028 | 8 |
+-----------+----------------------+
1 row in set (0.00 sec)

So, certainly hit a bug somewhere, and have resolved it. Any ideas as to the
cause?

j

Bassem Rabil

unread,
Jun 6, 2014, 8:45:27 AM6/6/14
to repo-d...@googlegroups.com, doug...@gmail.com
Hi Joshua

We got a similar behavior of having duplicate entries in Gerrit DB in patchset_approvals table, this turned out to be a limitation in Gerrit to handle concurrent changes in DB. On the other side, there was an issue with Jenkins (multiple listeners for events) that results in concurrent commands from Jenkins towards Gerrit resulting in duplicate entries in DB. 

I hope this helps to find out the root cause for your issue.

Thanks and Regards
Bassem Guendy

Joshua J. Kugler

unread,
Jun 6, 2014, 1:34:40 PM6/6/14
to repo-d...@googlegroups.com, Bassem Rabil, doug...@gmail.com
On Friday, June 06, 2014 05:45:27 Bassem Rabil wrote:
> Hi Joshua
>
> We got a similar behavior of having duplicate entries in Gerrit DB in
> patchset_approvals table, this turned out to be a limitation in Gerrit to
> handle concurrent changes in DB. On the other side, there was an issue with
> Jenkins (multiple listeners for events) that results in concurrent commands
> from Jenkins towards Gerrit resulting in duplicate entries in DB.
>
> I hope this helps to find out the root cause for your issue.

Bassem -

Can you expound on the Jenkins issue? We have Gerrit Trigger listening for
events, then once the builds are done, it reports the build results (+1 or
-1). Our DB corruption was in patch_sets, not patchset_approvals. I guess I
could see how hitting Jenkins too many with review votes, but Jenkins results
shouldn't be modifying the patch_sets table, correct? As for duplicate changes
to patch_sets, it would have required two different people to click on
"submit" on said change at the very same time, no?

Bassem Rabil

unread,
Jun 6, 2014, 2:04:28 PM6/6/14
to Joshua J. Kugler, repo-d...@googlegroups.com, doug...@gmail.com
You are right the case is different from patch_set_approvals which is affected directly by Jenkins and patch_sets table, but could be resulting from the same limitation of concurrent DB handling in Gerrit. 


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/kKrhagOOqSk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Pursehouse

unread,
Jun 9, 2014, 4:27:20 AM6/9/14
to Joshua J. Kugler, repo-d...@googlegroups.com
On 06/03/2014 09:40 AM, Joshua J. Kugler wrote:
> More investigation.
>
> This particular change has has nine patch sets listed. Patch sets eight and
> nine have the exact same:
>
> - SHA1 hash
> - Commiter time
>
> To my knowledge, that isn't supposed to happen, correct? Is this pointing to
> some kind of DB corruption or record insertion glitch? Goes back to my
> question about doing a DB 'fsck.'
>

Seems to be the same issue as described in issue 2702 [1]


[1] https://code.google.com/p/gerrit/issues/detail?id=2702

David Ostrovsky

unread,
Jun 10, 2014, 3:10:59 AM6/10/14
to repo-d...@googlegroups.com, jos...@azariah.com

Am Montag, 9. Juni 2014 10:27:20 UTC+2 schrieb David Pursehouse:
On 06/03/2014 09:40 AM, Joshua J. Kugler wrote:
> More investigation.
>
> This particular change has has nine patch sets listed.  Patch sets eight and
> nine have the exact same:
>
>   - SHA1 hash
>   - Commiter time
>
> To my knowledge, that isn't supposed to happen, correct?  Is this pointing to
> some kind of DB corruption or record insertion glitch?  Goes back to my
> question about doing a DB 'fsck.'
>

Seems to be the same issue as described in issue 2702 [1]


[1] https://code.google.com/p/gerrit/issues/detail?id=2702


We are facing the same problem. I updated the issue [1] accordingly.
Can we reproduce it? All i know is that old change screen was used.

David Ostrovsky

unread,
Jun 11, 2014, 5:35:38 AM6/11/14
to repo-d...@googlegroups.com, jos...@azariah.com
I cannot reproduce it and was trying to blind fix it in this change [1] and was trying
to define unique index in this change [2] to prevent database corruption in the first
place.

The reason why new change screen is failing to load corrupted change with
identical revision rows in patch_sets table is probably this condition in Revisions.java [3]:

if (match.size() != 1) {
  throw new ResourceNotFoundException(id);
}

Unless we've found and fixed the reason for database corruption in merge queue code,
i think we should handle this case differently: issue a warning that the change is corrupt,
but return RevisionResource instance also for corrupted changes. Another question is if
new change screen can handle two identical revisions as is or if it is also has to be
adjusted to handle corrupt changes correctly.

Another approach to rectify the rendering of corrupted changes in new change screen
would be to remedy these changes by dropping second identical revision in ChangeJson.

This is more than annoying for the users that a bug in merge queue code prevents the
corrupted changes to be loaded in new change screen (and can be successful loaded
in old change screen). It doesn't really help to increase the acceptance of new change
screen.


Doug Kelly

unread,
Jun 11, 2014, 9:05:49 AM6/11/14
to repo-d...@googlegroups.com, jos...@azariah.com
Just a differing opinion from a user experience perspective: there's no sense in warning the
user about something they can't fix, however it does make sense to silently log to the
error_log (in hopes that an admin *would* see it).
 
Another approach to rectify the rendering of corrupted changes in new change screen
would be to remedy these changes by dropping second identical revision in ChangeJson.

This is more than annoying for the users that a bug in merge queue code prevents the
corrupted changes to be loaded in new change screen (and can be successful loaded
in old change screen). It doesn't really help to increase the acceptance of new change
screen.

I agree.  Whatever the root cause, it should be fixed.  Actually, as a blind question, can
we add a database constraint to prevent getting in this state (and probably help find the
root cause)?  Seems like a unique constraint on "revision" would be sufficient, since Gerrit
*should* guarantee that nobody ever uploads the same exact commit twice.

David Ostrovsky

unread,
Jun 11, 2014, 9:23:42 AM6/11/14
to repo-d...@googlegroups.com, jos...@azariah.com
That was i meant by issue warning:

  log.warn("Change {} is corrupt, ans must be fixed", change);
 
 
Another approach to rectify the rendering of corrupted changes in new change screen
would be to remedy these changes by dropping second identical revision in ChangeJson.

This is more than annoying for the users that a bug in merge queue code prevents the
corrupted changes to be loaded in new change screen (and can be successful loaded
in old change screen). It doesn't really help to increase the acceptance of new change
screen.

I agree.  Whatever the root cause, it should be fixed.  Actually, as a blind question, can
we add a database constraint to prevent getting in this state (and probably help find the
root cause)?  Seems like a unique constraint on "revision" would be sufficient, since Gerrit
*should* guarantee that nobody ever uploads the same exact commit twice.

Have you missed my remark above, where i mentioned that i've uploaded that
unique index in this change [2]?

"
I cannot reproduce it and was trying to blind fix it in this change [1] and was trying
to define unique index in this change [2] to prevent database corruption in the first
place.
"

Doug Kelly

unread,
Jun 11, 2014, 10:42:30 AM6/11/14
to repo-d...@googlegroups.com, jos...@azariah.com


On Wednesday, June 11, 2014 8:23:42 AM UTC-5, David Ostrovsky wrote:
Have you missed my remark above, where i mentioned that i've uploaded that
unique index in this change [2]?

"
I cannot reproduce it and was trying to blind fix it in this change [1] and was trying
to define unique index in this change [2] to prevent database corruption in the first
place.
"


 Ah yes.  I will admit to being lazy and not taking a look at the reviews yet. :) In that case, I fully support your change.

Joshua J. Kugler

unread,
Jun 11, 2014, 11:56:24 AM6/11/14
to David Ostrovsky, repo-d...@googlegroups.com
On Wednesday, June 11, 2014 06:23:42 David Ostrovsky wrote:
> "
> I cannot reproduce it and was trying to blind fix it in this change [1] and
> was trying
> to define unique index in this change [2] to prevent database corruption in
> the first
> place.
> "
>
> [2] https://gerrit-review.googlesource.com/57784

What will be the effect of this when the error is hit? An error in the UI?
Will the change set not merge? Will the user see it? Will the merge be
retried, and the error logged? If it prevents merging, then we're not much
better off, unless the bug is transient, and the merging could be retried by
the user.

David Ostrovsky

unread,
Jun 11, 2014, 4:53:47 PM6/11/14
to repo-d...@googlegroups.com

Am Mittwoch, 11. Juni 2014 17:56:24 UTC+2 schrieb Joshua Kugler:
On Wednesday, June 11, 2014 06:23:42 David Ostrovsky wrote:
> "
> I cannot reproduce it and was trying to blind fix it in this change [1] and
> was trying
> to define unique index in this change [2] to prevent database corruption in
> the first
> place.
> "
>
> [2] https://gerrit-review.googlesource.com/57784

What will be the effect of this when the error is hit? An error in the UI?
Will the change set not merge? Will the user see it? Will the merge be
retried, and the error logged?

The hope is that when the constraint violation is hit, that would clearly
prevent the database corruption and we would get the full stack trace
for further investigations. For user there shoudn't be any negative impacts.

So the plan is either to wait until this change [2] is ready/reviewed/merged,
and next Gerrit version with this change is released (next year?) or clean up
the database manually today and create unique index from [2] and wait for
the next corruption attempt.

Reply all
Reply to author
Forward
0 new messages