Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: dev-telemetry-alerts Digest, Vol 68, Issue 2

6 views
Skip to first unread message

Andrew McCreight

unread,
Jul 8, 2020, 9:09:09 AM7/8/20
to dev-teleme...@lists.mozilla.org
On Wed, Jul 8, 2020 at 5:00 AM <
dev-telemetry-...@lists.mozilla.org> wrote:

> Send dev-telemetry-alerts mailing list submissions to
> dev-teleme...@lists.mozilla.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.mozilla.org/listinfo/dev-telemetry-alerts
> or, via email, send a message with subject or body 'help' to
> dev-telemetry-...@lists.mozilla.org
>
> You can reach the person managing the list at
> dev-telemetry...@lists.mozilla.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dev-telemetry-alerts digest..."
>
>
> Today's Topics:
>
> 1. Intent to Decommission: This List (dev-telemetry-alerts@)
> (Chris H-C)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 7 Jul 2020 09:49:24 -0400
> From: Chris H-C <chu...@mozilla.com>
> To: dev-teleme...@lists.mozilla.org
> Cc: Glean team <glean...@mozilla.com>
> Subject: Intent to Decommission: This List (dev-telemetry-alerts@)
> Message-ID:
> <CAMPhgK-C6vpNBcHqJQJyGBktfTkT8_KsVB_5cC5=
> Jpf5y...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hello,
>
> Summary: I plan to decommission this list (no more emails to/from, wiping
> the subscriber list, etc) on or shortly after July 17, 2020. Expiry alert
> emails currently sent (by default) to this list will be sent instead to
> glean-team@.
>
> This list was originally built and maintained as a place to notify people
> about and discuss alerts detected by the Telemetry Regression Detector
> system (cerberus). It expanded in scope to also include automated expiry
> information, then cerberus was disabled as part of the Data Pipeline GCP
> Migration (because it wasn't worth porting given its poor performance on
> then-current data collections, and lack of maintainers).
>
> So now the only function of this list appears to be as the "default
> addressee" of the probe-scraper-driven expiry alert emails in the event the
> expiring probe has no notification_email value set. Well, that, and
> receiving low-level spam that escapes the filters.
>
> We don't need a full mailing list for receiving default alerts. This
> function can be supported by a mozilla-maintained distribution list (I
> propose glean-team@) which I then don't have to moderate and administer.
>
> If you've read this far and noticed that I've encoded a lot of
> assumptions into this email, good for you! There might be other uses this
> list is currently being put to that are just low-frequency enough that I
> haven't noticed, or there might be benefit to using an open-subscription
> Mailing List over a moz-only Distribution List as the default addressee for
> probe expiry emails. If any of these (or other) assumptions are wrong, or
> you have other opinions or ideas about this proposal, please do reply.
>
> Otherwise, in absence of opposition, on July 17 I'll file patches against
> probe-scraper to point emails to glean-team@ and find someone (I can't
> find
> the delete button myself) to remove dev-telemetry-alerts@ (preserving the
> archive if it isn't too much of a burden).
>

I had been using this list to keep an eye on regressions caught by
telemetry (emailing people or filing bugs as appropriate), but I guess that
went away at some point? I suppose I hadn't seen any in a while, but I
hadn't thought too much about it. It sounds like some "glean-team" mailing
list is a replacement for getting that kind of information or will it not
be available any more? What is the "Glean Team"? Is there any other traffic
on that list? It would be nice if there was a passive way to get
information about telemetry regressions, as I found it to be very handy,
without needing to get emails about whatever internal telemetry engineering
discussions are occuring. (The expiry alert emails aren't particularly
useful to me, so I don't need those.)

Thanks,
Andrew


> Thank you for your attention,
>
> Your Friendly Neighbourhood Firefox Telemetry Team
> :brizental, :chutten, :dexter, :janerik, :mdboom, :tlong
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> dev-telemetry-alerts mailing list
> dev-teleme...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-telemetry-alerts
>
>
> ------------------------------
>
> End of dev-telemetry-alerts Digest, Vol 68, Issue 2
> ***************************************************
>

Chris H-C

unread,
Jul 13, 2020, 9:09:42 AM7/13/20
to Andrew McCreight, dev-teleme...@lists.mozilla.org
Sorry about that, the regression detector went rogue over the holiday break
in 2019 and was taken offline. Given how many months it was since we'd
decided cerberus' GCP migration plan was "shut it down", I think notifying
the users fell through the cracks.

The regression detection system has been offline for about 7 months. If you
would benefit from automated monitoring of your probes, please file a bug
with Data Engineering[1] and it will be triaged and routed. To my knowledge
there is not at this time any plan for building and maintaining a generic
regression alerting system like cerberus was. (Its cost-to-benefit was
pretty bad, especially when factoring in really had no idea what to do with
anything that wasn't a linear or exponential performance histogram with
tens of thousands of samples)

"glean-team" is a closed mailing list including the Friendly Neighbourhood
Firefox Telemetry Team plus some others from Data Engineering and
management. It's less of a mailing list and more of a "If there's anyone
who'd be interested in default-addressed expiry alerts, it'd be someone on
this list" sort of thing. Over time there'll be fewer and fewer emails to
it (because there'll be fewer and fewer unowned probes with default email
addresses), so this is more of a stopgap until that glorious future is
reached.

Once again, I'm sorry for not notifying you and cerberus' other users of
this list when the regression detector went dark.

Your Friendly Neighbourhood Firefox Telemetry Team
:brizental, :chutten, :dexter, :janerik, :mdboom, :tlong

[1]:
https://bugzilla.mozilla.org/enter_bug.cgi?product=Data%20Platform%20and%20Tools&component=Monitoring%20%26%20Alerting

On Wed, Jul 8, 2020 at 9:08 AM Andrew McCreight <amccr...@mozilla.com>
wrote:

Andrew McCreight

unread,
Jul 13, 2020, 10:25:25 AM7/13/20
to dev-teleme...@lists.mozilla.org
On Mon, Jul 13, 2020 at 6:09 AM Chris H-C <chu...@mozilla.com> wrote:

> Sorry about that, the regression detector went rogue over the holiday
> break in 2019 and was taken offline. Given how many months it was since
> we'd decided cerberus' GCP migration plan was "shut it down", I think
> notifying the users fell through the cracks.
>
> The regression detection system has been offline for about 7 months. If
> you would benefit from automated monitoring of your probes, please file a
> bug with Data Engineering[1] and it will be triaged and routed. To my
> knowledge there is not at this time any plan for building and maintaining a
> generic regression alerting system like cerberus was. (Its cost-to-benefit
> was pretty bad, especially when factoring in really had no idea what to do
> with anything that wasn't a linear or exponential performance histogram
> with tens of thousands of samples)
>

I didn't realize from your initial email that the regression detection
system was down entirely. I thought it was just the notification of this
particular mailing list. Has that been communicated more widely and I just
missed it? I guess I maybe vaguely remember reading something about it. The
regression alert system did catch a number of GC-ish regressions, but of
course I have no idea what the cost side of running it was. And it did miss
out on some regressions I would have liked to know about. Anyways, I'll
have to think about the broader implications for me of how I weigh the use
of telemetry if there's no regression alerts any more.

"glean-team" is a closed mailing list including the Friendly Neighbourhood
> Firefox Telemetry Team plus some others from Data Engineering and
> management. It's less of a mailing list and more of a "If there's anyone
> who'd be interested in default-addressed expiry alerts, it'd be someone on
> this list" sort of thing. Over time there'll be fewer and fewer emails to
> it (because there'll be fewer and fewer unowned probes with default email
> addresses), so this is more of a stopgap until that glorious future is
> reached.
>

Ok, thanks for the information!

Andrew

Chris H-C

unread,
Jul 14, 2020, 11:11:53 AM7/14/20
to Andrew McCreight, dev-teleme...@lists.mozilla.org
On Mon, Jul 13, 2020 at 10:25 AM Andrew McCreight <amccr...@mozilla.com>
wrote:
Unfortunately, looking back, there was no communication. I can lay some
blame on its final demise happening over the holidays while I was out...
but its fate had been long since decided, and I failed to let anyone know.

The infra cost of running it was not too bad. The human cost of running it
was pretty draining (if I say so myself). The infra cost of running it
suddenly spiked when we had no way of porting it in its existing form when
the data pipeline moved from AWS to GCP. That, coupled with its high
false-positive rate and unknown false-negative rate, was the deciding
factor in choosing to not migrate it.

We did have some lovely research projects on how to Do Generic Regression
Detection Correctly (Agnieszka, an intern from 2018, did some stunning work
on a Machine Learning-based approach), but at least for existing Telemetry
it appeared as though Generic alerting is a tough (impossible?) nut to
crack, whereas specific alerting is much, much easier.

With the Glean SDK's metric types being much, much higher-level it'll be
easier to identify things that can be alerted on (timing distributions,
counters) and things that can't (strings, events, custom distributions), so
maybe we'll see a resurrection of a cerberus-shaped data tool. But the Data
Tools team is presently working on tools with broader impact (like GLAM
(think TMO's Measurement Dashboards, but for all the products that support
Glean, and for once actually counting things properly)), and I can't speak
to their future plans.

:chutten (speaking personally this time, not for the whole team)
0 new messages