Area updates
Monitoring:
Internal google monitoring access still a work in progress
Bot cycle time being added by stip@ this week
Flakiness:
1.7%, somehow still lower. If this is REALLY is a trend, then something has gotten better, but not clear if this is just noise and luck and due to non-busy weeks.
No new insights into what is causing flakiness, nor improvements that have been made
Without patch retries on android recipes did make improvement
Bug in CQ
alancutter is still working on adding flakiness data to sheriff-o
sergey to keep looking into flakes
CQ Matches Waterfall:
progress made on automating calculating coverage data using ng bot coverage (35% currently), doesn’t match what is running on CQ exactly
Adding this coverage check as presubmit to ensure that ng trybots exist when new bots are added to main waterfall, this will help with capacity planning as well
sergiy making progress on reclaiming ~90 bots, so more bots can be converted soon
Capacity estimate work, looking over the fleet and seeing what we can update and what is needed.
Question: Are we still aiming for 100% match of main waterfall and cq? What about things like Mac 10.6? Concern: if we make one exception, people will want exceptions for their “special” bots too.
jam@: We were never aiming for exactly 100%, skipping official, clobber, etc was expected.
pawel: What about having optional trybots for every config?
Solution: opt_coverage for everything we don’t have capacity to do full coverage for
pawel: Do we need to make a PSA on this ‘policy’? It is not obvious to everyone that they can’t just willy-nilly add bots to main waterfall.
Maybe not a PSA needed, but this should be documented somewhere.
sergey: it would be good to have a sense of how much trouble the mismatch causes, when do failures happen due to not having cq coverage
Tree Open:
builder_alerts is no longer running on ojan’s machine. agable@ is now owning the backend, sean will be taking over frontend
3 cases in past couple weeks where tree is closed over weekend for long period of time because of compile failures/run hooks failures
Do we want to set up an auto-kick process that detects when things are red and no new changes are coming in?
Is this really a problem? People do take care of eventually when they care.
Point is, we could do something about this, but it is working ok right now with human intervention
Fast Cycle Times
No update
Tree Open:
builder_alerts is no longer running on ojan’s machine. agable@ is now owning the backend, sean will be taking over frontend
3 cases in past couple weeks where tree is closed over weekend for long period of time because of compile failures/run hooks failures
Do we want to set up an auto-kick process that detects when things are red and no new changes are coming in?
Is this really a problem? People do take care of eventually when they care.
Point is, we could do something about this, but it is working ok right now with human intervention
Fast Cycle Times
No update
--
You received this message because you are subscribed to the Google Groups "Chromium Hackability Code Yellow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hackability-c...@chromium.org.
To post to this group, send email to hackabi...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CAPSmAAR1P%3DvhT1GqtLJ8UZK1027h4vWiiuxYvE3KA10ur4cZyQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CADu_oUCuRGkx2S1Gy8MRYNiaKkAqms2wAPh22kp%2BNnD6Dq4rOA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CALhVsw0BNF5nJqe_hbzU1Vg9dqRpWROt7skwgFdNhLvnovFXmw%40mail.gmail.com.
win_chromium_xp_rel_ng!Probably worth asking for ~5 more XP VMs otherwise it'll starve the CI a bit. Especially that Jochen is working on adding more swarming tests so the load will increase.