Summary
Code yellow work is still ongoing. Please continue to reach out to the leads if you'd like to be involved.Area updates
Tree closers and CQ configs match (phajdan)
Fixed 1 blocking issue, but found another, so no change in coverage %
Considering changing metric to when CQ lands change and breaks main waterfall config (since this is the symptom we really care about). Caveat: main waterfall flakes can give false answers here.
ng trybot conversion in progress (issue: runtimes seem to be longer, investigating this)
Working on increasing GPU builder capacity (moving to swarming), re-using compile step, adding more coverage
Tree open > 80% (ojan)
agable will be helping out keeping builder_alerts running
AI: Karen to ping Raman on src-internal bug and follow up on crbug.com/417405
ojan fixed a major bug, things are running better now, but still stops running sometimes
mac bots (main waterfall) are finally on 10.9, and … the horrible bug is now gone! Tree open time much improved over the weekends. Next up: try server.
jam@ notes that the specific bug is fixed, but what about similar issues that could cause tree to stay closed all weekend? Should we do something to re-open automatically? ojan@ says yes, there are speculative things we could do, but, not sure it is worth the investment.
Reduce CQ false rejection rate (sergeyberezin)
The false rejection rate this week was 7.4%, mostly due to an outage on Wednesday.
I and sheyang@ are looking into ways to automatically monitor overall health of the tryserver and adjust CQ's behavior based on that, so that it can survive through such outages, and not require developers to re-click the button.
Lower Chromium bot cycle time (jam)
Looking more into GPU bots, working with ken russell on increasing utilization. GPU bots have long queues, lots of time spent rebooting, this is effecting times. Moving to swarming will improve everything. This is priority for the team, working on it now.
Dashboards/monitoring (jparent)
https://trooper-o-matic.appspot.com/cq/chromium now shows flakiness rates.Click a point to dig into results.
jam@ notes that the cq patch time breakdown isn’t a good metric - includes time waiting for LGTM, tree closed, etc. We want this info somewhere (total time graph encompasses it), but it makes the single run graph less actionable. [ AI(jparent): ask alancutter to tighten this metric ]
AI(jparent): circulate graphs about trooper-o so others can get their eyes on it
--
You received this message because you are subscribed to the Google Groups "Chromium Hackability Code Yellow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hackability-c...@chromium.org.
To post to this group, send email to hackabi...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CAPSmAASq5s2cBYaMQw7Ta3as3QP2yKkow30MpBSd6h6QZPXS1A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CAPcKkTk_tMtWZvnNorodTbpVnFh5Jc-so3weYxtz2VRQaiNPnA%40mail.gmail.com.