After investigation of sporadic database operation failures, that leaving changes
in undefined state with projects that have cherry-pick submit strategy i think i have
an idea what going on (on master):
The suspects for non synchronized database write operations are:
1/ Merges triggered interactively by user (UI, REST endpoint or SSH command)
2/ Merges scheduled by the background merge queue reload (ChangeMergeQueue.java)
3/ MergeabilityChecker jobs (Mergeable.java)
All three places above can currently run at the same time for the same change and trigger
in the end one or multiple insert/update statements on exact the same records, with the result:
* unique index constrain viloations
* multiple identical rows for patch_sets table, that is lacking unique index (fixed in [3])
* OrmConcurrencyException, later more on it
* Broken rendering (404-Not found) on new change screen
Race condition between /1 and /2 was (hopefully) fixed here: [1].
Race condition between /1, /2 and /3 can still happen. I tried to minimize the database operation in
MegeabilityChecker in this change [2] to only be triggered when status was changed,
but that's not enough. Consider a site with hundreds opened conflicting changes:
a) change "foo" was merged
b) mergeability checker reevaluates the whole branch and suppose that dozens changes need
to be updated. Suppose that change "bar" would be changed from non-mergeable to mergeable
(say change "foo" fixed conflicts). So in the end the following update statement is going to be executed:
update changes set mergeable = 'Y', row_version = 43 where change = "bar" and row_version = 42
c) parallel to event b) let say simultaneosly another user decides to merge change "bar":
per ssh command, it could be approved and submitted in one shot:
ssh host gerrit review --code-review=2 --submit "bar"
In this case the merge is triggered interactively (case /1 above) and CherryPick submit strategy is called,
and this call the CherryPick.writeCherryPickCommit() method and particularly new patch set is inserted and
change record is updated to reflect new patch set creation:
n.change().setCurrentPatchSet(patchSetInfoFactory.get(newCommit, ps.getId()));
args.db.changes().update(Collections.singletonList(n.change()));
And this would end with :
update changes set current_patch_set_id = 2, row_version = 43 where change = "bar" and row_version = 42
Now we have in b) and c) two non-synchronized actions that can be executed simultaneously and clearly
only one would succeed and another would end up with OrmConcurrencyException. Why?
Because where statement is stale, after first statement was successful executed,
there is no change any more with 'where change = "bar" and row_version = 42', and we would hit
zero rows update case in JdbcAccess.execute():
final int numberOfRowsUpdated = schema.getDialect().executeBatch(ps);
if (numberOfRowsUpdated != cnt) {
throw new OrmConcurrencyException();
}
In any event we have a change in undefined state. The better question is how to fix that?