Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Improving Platform quality

83 views
Skip to first unread message

Gabor Krizsanits

unread,
Mar 10, 2016, 5:07:21 AM3/10/16
to dev-platform
While the other thread about fuzzing friendly Gecko is an interesting
option I would like to go back to the original topic, and start another
thread to collect other ideas too, that might help getting better on the
performance front. Here are some of my thoughts after spending some time
with the profiler and Talos tests in the past couple of weeks.

Probably most regression happens where we don't detect them because of the
lack of perf. test coverage. It should be easy and straightforward to add a
new Talos test (it isn't right now). There is an ongoing work on this I
think but don't know where is that work being tracked. We clearly need more
tests. A lot more. Especially if we want to ship features with huge impact
like multi-process Firefox or removing XUL. I don't think we have all the
metrics we need yet to make the best decisions.

We do have some explanations about each Talos tests at
https://wiki.mozilla.org/Buildbot/Talos/Tests and I'm thankful for that but
some of the tests need more explanation, and some of them does not have
any. We could further improve that, it will save a lot of engineering time
(this wiki rocks by the way).

The great thing about Talos tests that they can be profiled. What would be
even better if I could just compare two runs on the profiler level as
easily as I can compare the results now on threeherder compare. It would be
simpler to assign perf bugs, also less time consuming to fix them if I knew
instantly where that test used to spend the time and where is it spending
it right now. And most importantly where do I have to tune my module to
make the biggest impact on performance even if there is no regression.
(Backing out all the features that causing regression is not always the
option or the best way to gain back performance.)

I don't think the goal is the optimize Gecko. We need to optimize our end
products. So performance tests that go through the entire browser and
reproducing common user stories are the most important I think (we need
more). Gecko and Firefox is often deeply intertwined to a level that joint
effort between Platform and Firefox team folks has the best chance to
tackle the more complex cases. I don't want to end up with a super fast
engine and slow browser because we put our focus on the wrong goal.

Add-ons. Last number I heard is that 40% of our users using some Add-ons,
we have access to these Add-ons code yet we don't have any performance
tests using them. It should be our responsibility to make sure if we
regress the user experience of our users with some of the most popular
Add-ons, we at least give a heads up to the authors and help them to
address the problem. I know resources are limited but maybe there are some
low hanging fruit here that would make a huge impact.

These are my two cents off the top of my head, I hope others have even
better ideas to share.

- Gabor

On Wed, Mar 9, 2016 at 12:09 AM, David Bryant <dbr...@mozilla.com> wrote:

> Platform Peeps,
>
> Improving release quality is one of the three fundamental goals Platform
> Engineering committed to this year. To this end, lmandel built a Bugzilla
> dashboard that allows us to track regressions found in any given release
> cycle. This dashboard is up on monitors in many of the offices and can also
> be found at: http://mozilla.github.io/releasehealth/
>
> While this metric might not be perfect, it does expose the number of
> newly-discovered regressions we would ship in a release. As of Monday*, we
> had *58* new regressions in Firefox 45 Beta -- this is the version that was
> released today. Of these bugs, 43 of them are unassigned**. Both of these
> things are unacceptable, and we will not continue to operate this way.
>
> Starting in release 46, we will *not* ship unless all new regressions are
> triaged and are either fixed or explicitly deferred by release management
> (working with Firefox engineering and Platform Engineering leads). We will
> hold a triage meeting every Monday at 2pm PT in the ReleaseCoordination
> Vidyo room, open to all of engineering, to stay on top of the overall
> regression list, and our first such meeting was yesterday. Bugs will be
> assigned by engineering managers and treated as release blockers.
>
> Engineering managers own the triage of their team's components. Please
> work with them and also let Johnny, Doug, or me know if you need help.
>
> All of us, together, are accountable for adopting this fundamental change
> in how we work. This is one of several changes that we’ll be making, so
> more to come.
>
>
> Thanks,
> David
>
>
> * Yes, I am aware that dougt, dveditz, and jst triaged on Monday, so this
> number may be slightly lower now. Still it isn’t zarro.
>
> ** Yes, I am aware that some of the teams don’t always assign bugs, but I
> am asking everyone to start doing this. Unowned bugs will signal to us that
> they need help. Basically, we want the assignee field to be someone that is
> directly responsible for moving the bug to the next state. That might be
> the engineering manager or it might be an engineer.
> --
> ------------------------------
> <http://www.mozilla.org> *David Bryant*
> Vice President of Platform Engineering and CTO
> Mozilla
> 331 E. Evelyn Avenue, Mountain View CA 94041
> Phone: +1 408 728 9311 | Email: <dbr...@mozilla.com>dbr...@mozilla.com
> ------------------------------
>
> --
> You received this message because you are subscribed to the Google Groups
> "gecko-all" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gecko-all+...@mozilla.com.
> To post to this group, send email to geck...@mozilla.com.
> To view this discussion on the web visit
> https://groups.google.com/a/mozilla.com/d/msgid/gecko-all/56DF5BC2.7070205%40mozilla.com
> <https://groups.google.com/a/mozilla.com/d/msgid/gecko-all/56DF5BC2.7070205%40mozilla.com?utm_medium=email&utm_source=footer>
> .
>

William Lachance

unread,
Mar 10, 2016, 5:54:05 PM3/10/16
to
Hi Gabor! Thanks for starting this thread.

On 2016-03-10 5:07 AM, Gabor Krizsanits wrote:
> While the other thread about fuzzing friendly Gecko is an interesting
> option I would like to go back to the original topic, and start another
> thread to collect other ideas too, that might help getting better on the
> performance front. Here are some of my thoughts after spending some time
> with the profiler and Talos tests in the past couple of weeks.
>
> Probably most regression happens where we don't detect them because of the
> lack of perf. test coverage. It should be easy and straightforward to add a
> new Talos test (it isn't right now). There is an ongoing work on this I
> think but don't know where is that work being tracked. We clearly need more
> tests. A lot more. Especially if we want to ship features with huge impact
> like multi-process Firefox or removing XUL. I don't think we have all the
> metrics we need yet to make the best decisions.

Yes, this is now a lot easier now that we don't have to configure
graphserver every time we add a unit test (Perfherder, as opposed to
Graphserver, is smart enough to basically be able to handle anything new
that people care to submit to it). In fact, :mconley just added a new
test (tabpaint) last week, with no modifications necessary to Perfherder.

We (mainly meaning jmaher and myself, the Talos/Perfherder maintainers)
haven't really emphasized/encouraging adding new tests in the past, as
there was a feeling that we were having difficulty just staying on top
of the existing tests that we had. Now that we have a better system for
sheriffing regressions (as well as a system to seperate tests into
different "buckets" so the burden of this can be shared amongst a larger
group of people), it may well be time to consider adding new benchmarks.

Another thing to note is that new tests don't even need to be part of
talos. We are now capable of accepting data from *any* job that is
ingested by treeherder. For example, gbrown added a new Android-specific
memory test as part of the mochitest-browser-chrome suite a couple weeks
back:

https://gbrownmozilla.wordpress.com/2016/01/27/test_awsy_lite/

> We do have some explanations about each Talos tests at
> https://wiki.mozilla.org/Buildbot/Talos/Tests and I'm thankful for that but
> some of the tests need more explanation, and some of them does not have
> any. We could further improve that, it will save a lot of engineering time
> (this wiki rocks by the way).

In the past, I've found needinfo'ing the test owner helpful for this
sort of thing.

> Add-ons. Last number I heard is that 40% of our users using some Add-ons,
> we have access to these Add-ons code yet we don't have any performance
> tests using them. It should be our responsibility to make sure if we
> regress the user experience of our users with some of the most popular
> Add-ons, we at least give a heads up to the authors and help them to
> address the problem. I know resources are limited but maybe there are some
> low hanging fruit here that would make a huge impact.

If I remember correctly, there were some efforts to run the standard set
of Talos tests with addons enabled, but I don't think it was
particularly successful. In general, I'm not crazy about creating too
many variants of the existing tests will just lead to a firehose of
information that will be difficult to manage.

I suspect creating a microbenchmark measuring some of the common
internal operations an addon might want to do may be a better approach,
though I could be wrong.

Will

0 new messages