Weekly Meeting Notes

20 views
Skip to first unread message

Julie Parent

unread,
Dec 3, 2014, 1:29:48 PM12/3/14
to hackability-cy
We've been a bit lax about sending out meeting notes, but (sorry, google.com only) you can find the full archive here.

Wednesday, December 3

Area updates


Monitoring:

  • Internal google monitoring access still a work in progress

  • Bot cycle time being added by stip@ this week

Flakiness:

  • 1.7%, somehow still lower.  If this is REALLY is a trend, then something has gotten better, but not clear if this is just noise and luck and due to non-busy weeks.

  • No new insights into what is causing flakiness, nor improvements that have been made

    • Without patch retries on android recipes did make improvement

  • Bug in CQ

  • alancutter is still working on adding flakiness data to sheriff-o

  • sergey to keep looking into flakes

CQ Matches Waterfall:

  • progress made on automating calculating coverage data using ng bot coverage (35% currently), doesn’t match what is running on CQ exactly

  • Adding this coverage check as presubmit to ensure that ng trybots exist when new bots are added to main waterfall, this will help with capacity planning as well

  • sergiy making progress on reclaiming ~90 bots, so more bots can be converted soon

  • Capacity estimate work, looking over the fleet and seeing what we can update and what is needed.

    • Question: Are we still aiming for 100% match of main waterfall and cq? What about things like Mac 10.6? Concern: if we make one exception, people will want exceptions for their “special” bots too.

      • jam@: We were never aiming for exactly 100%, skipping official, clobber, etc was expected.

      • pawel: What about having optional trybots for every config?

      • Solution: opt_coverage for everything we don’t have capacity to do full coverage for

      • pawel: Do we need to make a PSA on this ‘policy’? It is not obvious to everyone that they can’t just willy-nilly add bots to main waterfall.

      • Maybe not a PSA needed, but this should be documented somewhere.

      • sergey: it would be good to have a sense of how much trouble the mismatch causes, when do failures happen due to not having cq coverage

Tree Open:

  • builder_alerts is no longer running on ojan’s machine. agable@ is now owning the backend, sean will be taking over frontend

  • 3 cases in past couple weeks where tree is closed over weekend for long period of time because of compile failures/run hooks failures

    • Do we want to set up an auto-kick process that detects when things are red and no new changes are coming in?

    • Is this really a problem? People do take care of eventually when they care.

    • Point is, we could do something about this, but it is working ok right now with human intervention

Fast Cycle Times

  • No update

Dirk Pranke

unread,
Dec 3, 2014, 1:49:13 PM12/3/14
to Julie Parent, hackability-cy
I just commented on this in a separate thread, but I will repeat it here: I agree with jam@; we never actually expected to get 100%. I do think we should set up optional try bots for every config if easily possible (some, like XP, may not be).

I think it does make sense to have some sort of review policy for bots that get added to the main waterfall, since every new configuration incurs a significant overhead on the team as a whole. 

I'm not sure what that policy should be: maybe approval by a top-level OWNER like jam@, brett@, or cpu@ *and* approval from someone in infra who can sign off on making sure that the CQ has been considered properly and that we have the capacity?

-- Dirk

 

Tree Open:

  • builder_alerts is no longer running on ojan’s machine. agable@ is now owning the backend, sean will be taking over frontend

  • 3 cases in past couple weeks where tree is closed over weekend for long period of time because of compile failures/run hooks failures

    • Do we want to set up an auto-kick process that detects when things are red and no new changes are coming in?

    • Is this really a problem? People do take care of eventually when they care.

    • Point is, we could do something about this, but it is working ok right now with human intervention

Fast Cycle Times

  • No update

--
You received this message because you are subscribed to the Google Groups "Chromium Hackability Code Yellow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hackability-c...@chromium.org.
To post to this group, send email to hackabi...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CAPSmAAR1P%3DvhT1GqtLJ8UZK1027h4vWiiuxYvE3KA10ur4cZyQ%40mail.gmail.com.

Emil A Eklund

unread,
Dec 3, 2014, 2:11:27 PM12/3/14
to Dirk Pranke, Julie Parent, hackability-cy
An (optional) try XP bot would extremely useful given how hard it is
to test on XP and how long the cycles times are on the main waterfall.
> https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CAEoffTAHzt1inw4pQ0YwDoDivskRFuObvNejMgN1bM2Ta7mV%2BA%40mail.gmail.com.

John Abd-El-Malek

unread,
Dec 8, 2014, 6:58:47 PM12/8/14
to Emil A Eklund, Dirk Pranke, Julie Parent, hackability-cy
btw this is now live (as well as Vista and 10.6)
 

Marc-Antoine Ruel

unread,
Dec 8, 2014, 7:12:24 PM12/8/14
to John Abd-El-Malek, Emil A Eklund, Dirk Pranke, Julie Parent, hackability-cy
win_chromium_xp_rel_ng!

Probably worth asking for ~5 more XP VMs otherwise it'll starve the CI a bit. Especially that Jochen is working on adding more swarming tests so the load will increase.

Thanks for doing this!

M-A

Emil A Eklund

unread,
Dec 8, 2014, 7:13:12 PM12/8/14
to John Abd-El-Malek, Dirk Pranke, Julie Parent, hackability-cy
Awesome, thank you!

John Abd-El-Malek

unread,
Dec 8, 2014, 7:22:47 PM12/8/14
to Marc-Antoine Ruel, Emil A Eklund, Dirk Pranke, Julie Parent, hackability-cy
On Mon, Dec 8, 2014 at 4:12 PM, Marc-Antoine Ruel <mar...@chromium.org> wrote:
win_chromium_xp_rel_ng!

Probably worth asking for ~5 more XP VMs otherwise it'll starve the CI a bit. Especially that Jochen is working on adding more swarming tests so the load will increase.

There are currently 13 XP and 15 Vista swarming bots. Should I ask for enough VMs to bring them up to 20 each?

Marc-Antoine Ruel

unread,
Dec 9, 2014, 9:02:20 AM12/9/14
to John Abd-El-Malek, Emil A Eklund, Dirk Pranke, Julie Parent, hackability-cy
It would depend on the use. Frankly, I wouldn't expect these one off try builders to be used more than ~once a day, if not less, so it's probably not worth wasting resources for occasional hickups. Maybe revisit after broader announcement and looking at the actual utilization.

M-A
Reply all
Reply to author
Forward
0 new messages