Update on alerts?

12 views
Skip to first unread message

John Abd-El-Malek

unread,
Sep 4, 2014, 2:26:04 PM9/4/14
to hackability-cy
Are there any updates on the alerts aspect of this code yellow?

I'm concerned because it seems that every day we have random breakages of things that cause one of the n trybot configurations to fail for hours until it's manually caught.

Is this being worked on _right now_?

Julie Parent

unread,
Sep 4, 2014, 3:09:49 PM9/4/14
to John Abd-El-Malek, sh...@chromium.org, mikel...@chromium.org, ero...@chromium.org, e...@chromium.org, Mike Stipicevic, pger...@chromium.org, hackability-cy
Yes.  There are many parts in progress right now.
  1. Adding the framework to sheriff-o-matic to show trooper alerts: https://code.google.com/p/chromium/issues/detail?id=399732. shans@ has a WIP for this, but unfortunately is going on vacation starting tomorrow.  He can correct me if I'm wrong, but I think he need someone to take this over.  Mike, is there someone else in Sydney who Shane can brain-dump this on before he leaves?  This needs to be in place so we can start feeding alerts into it.
  2. Abnormality detection system to detect cases where we can't set up simple threshold/prober style alerts: https://code.google.com/p/chromium/issues/detail?id=403906. eroman@ exploring data and running experiments now. Will send out design proposal to this list soon.
  3. Robust comprehensive data collection system that will enable generating alerts, dashboards, etc: eae@ is working with stip@ and pgervais@ on design doc.  Will send it out later today to this list.
We could definitely still use more help, particularly once #1 is done and alerts can be fed into sheriff-o-matic.

Please contact me if you can help out!
Julie


--
You received this message because you are subscribed to the Google Groups "Chromium Hackability Code Yellow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hackability-c...@chromium.org.
To post to this group, send email to hackabi...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CALhVsw0BnPdJ0-7_50JnqJLvvOaCrfQLniOWqFEwpH7jvDoKSw%40mail.gmail.com.

John Abd-El-Malek

unread,
Sep 4, 2014, 5:16:08 PM9/4/14
to Julie Parent, sh...@chromium.org, mikel...@chromium.org, Eric Roman, e...@chromium.org, Mike Stipicevic, Philippe Gervais, hackability-cy
On Thu, Sep 4, 2014 at 12:09 PM, Julie Parent <jpa...@chromium.org> wrote:
Yes.  There are many parts in progress right now.
  1. Adding the framework to sheriff-o-matic to show trooper alerts: https://code.google.com/p/chromium/issues/detail?id=399732. shans@ has a WIP for this, but unfortunately is going on vacation starting tomorrow.  He can correct me if I'm wrong, but I think he need someone to take this over.  Mike, is there someone else in Sydney who Shane can brain-dump this on before he leaves?  This needs to be in place so we can start feeding alerts into it.
  2. Abnormality detection system to detect cases where we can't set up simple threshold/prober style alerts: https://code.google.com/p/chromium/issues/detail?id=403906. eroman@ exploring data and running experiments now. Will send out design proposal to this list soon.

Ok, this is the part I'm mostly referring to. This is required for the trybots-don't-relapse aspect.

Is there an estimate of when the system will start working?

Shane Stephens

unread,
Sep 4, 2014, 5:46:06 PM9/4/14
to Julie Parent, John Abd-El-Malek, Shane Stephens, mikel...@chromium.org, ero...@chromium.org, e...@chromium.org, Mike Stipicevic, pger...@chromium.org, hackability-cy
On Fri, Sep 5, 2014 at 5:09 AM, Julie Parent <jpa...@chromium.org> wrote:
Yes.  There are many parts in progress right now.
  1. Adding the framework to sheriff-o-matic to show trooper alerts: https://code.google.com/p/chromium/issues/detail?id=399732. shans@ has a WIP for this, but unfortunately is going on vacation starting tomorrow.  He can correct me if I'm wrong, but I think he need someone to take this over.  Mike, is there someone else in Sydney who Shane can brain-dump this on before he leaves?  This needs to be in place so we can start feeding alerts into it.
Not on vacation - I'll be travelling to Europe for a CSSWG meeting, then to MTV/SF to talk to a few people. I'll have some availability but it will be limited in time and (especially in Europe) bandwidth could be a killer.

I think the patch that implements this (https://codereview.chromium.org/476903004/) is pretty much done. It has been working  and in review cycles for quite some time now.

Julie Parent

unread,
Sep 4, 2014, 5:59:57 PM9/4/14
to Shane Stephens, John Abd-El-Malek, Shane Stephens, mikel...@chromium.org, ero...@chromium.org, e...@chromium.org, Mike Stipicevic, pger...@chromium.org, hackability-cy
Thanks for the clarification Shane, that makes more sense. I was surprised you were going on vacation without any prior notice :)

If you can get that patch committed before you head off that would be really helpful so we can start feeding other alert data into the stream.  Also, let me know when you'll be in SF, it would be great to chat a bit in person.
Reply all
Reply to author
Forward
0 new messages