The Code Yellow is OVER!

117 views
Skip to first unread message

Julie Parent

unread,
Feb 18, 2015, 7:38:57 PM2/18/15
to hackability-cy, Chromium-dev, blink-dev
After months of hard work by an army of volunteers, the Code Yellow is complete!

tl;dr: Builds are faster, more reliable, and it is easier than ever to contribute to Chromium project.  We've saved 1 PST engineer from sheriffing, every single day!  

To highlight a few of the major changes:
  • Only one PST sheriff, due to improvements in overall infrastructure and tooling, including the introduction of Sheriff-o-matic
  • Changed the tree open policy to not close for test failures. Tree open times skyrocketed from ~50% to > 80% for Chromium (90% for Blink) (with dashboards to view this!)
  • Median commit queue cycles times now < 1 hour, main waterfall bots < 30 mins.
  • CQ false rejections analyzed (and common causes fixed), dashboards built to give insight into rejections, and plans in place for future improvements
  • CQ matches the main waterfall more closely, with safeguards in place to keep them in sync
  • New monitoring subteam formed and staffed to own all the new dashboards and build comprehensive monitoring and alerting
I won't enumerate all the accomplishments, you can read more about them here

Thank YOU!
A big thank you to everyone who participated, and in particular, the original 5 leads (cmp, eseidel, jam, jparent, ojan), the new leads (phajdan, sergeyberezin), and our TPM (kareng).

The full contributor list (apologies to anyone who was inadvertently missed here!)

Prevent Relapse: sullivan, alancutter, agable, eae, ellyjones, ericroman, navabi, shans, stip

CQ match waterfall: sergiyb, dba, friedman, johnw, pschmidt, agable, alexclarke, alph, cmumford, dba, dnj, dpranke, eisinger, engedy, falken, friedman, hjd, jabdelmalek, jbudorick, johnw, jschuh, junov, kaliamoorthi, kbr, kojii, kouhei, luqui, machenbach, maruel, mikecase, mkosiba, mnaganov, navabi, pfeldman, pschmidt, rmcilroy, rsesek, rtenneti, scottmg, sergeyberezin, sergiyb, sheyang, skobes, sky, skyostil, smut, tapted, vadimsh, zty

CQ Flakiness: alancutter, sheyang

Fast Bot Cycles: maruel, vadimsh, sky, bradnelson, dschuff, dtu, hjd, jbudorick, johnw, kbr, luqui, phajdan, pschmidt, scottmg, sergeyberezin, tonyg, ukai, yyanagisawa

Tree Always Open: michaelpg, leviw, eseidel, abarth, adamk, agable, asvitkine, bashi, cbiesinger, dba, dgrogan, dominicc, dsinclair, dstockwell, eae, eisinger, ellyjones, fgorski, friedman, groby, iannucci, jyasskin, kareng, kbr, kouhei, loislo, maruel, mathp, mek, mkwst, mlamouri, navabi, pdr, pgervais, rkaplow, rsesek, rtenneti, seanmccullough, jparent, sergiyb, shanestephens, stgao, stip, szager, teravest

PhistucK

unread,
Feb 19, 2015, 7:12:25 AM2/19/15
to Julie Parent, hackability-cy, Chromium-dev, blink-dev
​Thank you, all of you, for all of the efforts! See my comment below.​

On Thu, Feb 19, 2015 at 2:38 AM, Julie Parent <jpa...@chromium.org> wrote:
Changed the tree open policy to not close for test failures

That seems ​weird... Is there some strict policy for not tolerating test ​failures for longer than X days?
If so, how is it enforced?
If not, what happens when test failures occur? A bug is (automatically?) filed and hopefully someone will look into it and fix it?



PhistucK

Peter Kasting

unread,
Feb 19, 2015, 7:23:38 AM2/19/15
to PhistucK, Julie Parent, hackability-cy, Chromium-dev, blink-dev
On Thu, Feb 19, 2015 at 4:11 AM, PhistucK <phis...@gmail.com> wrote:
On Thu, Feb 19, 2015 at 2:38 AM, Julie Parent <jpa...@chromium.org> wrote:
Changed the tree open policy to not close for test failures

That seems ​weird... Is there some strict policy for not tolerating test ​failures for longer than X days?
If so, how is it enforced?
If not, what happens when test failures occur? A bug is (automatically?) filed and hopefully someone will look into it and fix it?

Test failures are still important and should be dealt with immediately; the tree simply isn't closed-by-default the instant one happens.  We assume test failures can be dealt with without needing to close the tree, and the sheriff can close if that turns out not to be true or things start to snowball.

So it's not as if test failures are just ignored until someone bothers to look.  Or at least that's not supposed to happen.

PK 

Mike Stipicevic

unread,
Feb 19, 2015, 2:42:28 PM2/19/15
to Peter Kasting, PhistucK, Julie Parent, hackability-cy, Chromium-dev, blink-dev
Sheriffs rely on https://sheriff-o-matic.appspot.com/ to track and triage test failures without needing to close the tree. The tree is still closed for compile failures, which are major enough to require immediate action.
Reply all
Reply to author
Forward
0 new messages