I'd like to make the following changes to recipe autoroller:1. Keep track of the last trivial and nontrivial roll CL. Always run recipes.py autoroll locally, and land a trivial roll even when there was previously a nontrivial roll CL in flight. This will cover cases when e.g. a revert leads to a trivial roll. Currently this requires manually closing previous nontrivial roll CL, and with this change the roller would automatically handle it.
2. Keep track of spec being rolled. When it's the same as last landed CL, do not upload another one. This would address a race condition with recipe roller generating second roll identical to previous one shortly after it lands.
3. Keep track of when last nontrivial and trivial roll CLs were created. If there's a nontrivial CL still not reviewed+CQ for say 30 mins (exact timing TBD), fail the build.
On Tue, Sep 13, 2016 at 5:42 PM Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:I'd like to make the following changes to recipe autoroller:1. Keep track of the last trivial and nontrivial roll CL. Always run recipes.py autoroll locally, and land a trivial roll even when there was previously a nontrivial roll CL in flight. This will cover cases when e.g. a revert leads to a trivial roll. Currently this requires manually closing previous nontrivial roll CL, and with this change the roller would automatically handle it.I'm fine with solving the problem you're trying to solve (revert lands which makes non-trivial CL obsolete, and makes it so that everything can be rolled with a trivial CL).I don't see why you need to track the last trivial and nontrivial roll CL. Shouldn't always running autoroll locally, and uploading if it's trivial and different from the last spec enough?
2. Keep track of spec being rolled. When it's the same as last landed CL, do not upload another one. This would address a race condition with recipe roller generating second roll identical to previous one shortly after it lands.I don't understand how the roller would cause this race condition to happen.... Do you have any examples of this happening I could see? I think removing race conditions are good, so in general for this change, but I would like to understand why it happens a bit more.If this is caused by google storage concretely being a bad datastore (we know it's not the best, but I think it hasn't bitten us so far), then I think we should figure out another state storage mechanism.Also, don't we already keep track of the last spec? I know we keep track of the spec somewhere, is that not the spec that's being rolled?
3. Keep track of when last nontrivial and trivial roll CLs were created. If there's a nontrivial CL still not reviewed+CQ for say 30 mins (exact timing TBD), fail the build.I think we should do something like this. I want to have more granularity than just "the autoroller is failing"; I want to be able to see which repo is behind, and by how much time, etc... I'm working on a recipe status app which I think should be able to solve some of these problems.I think for now failing the build is fine. We'll need more granularity later, and I don't think digging through buildbot logs is that great of a UI if you want to see system status.